Statistics 3701 (Geyer, Fall 2022) Homework 5

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Canvas should be an Rmarkdown file with text and code chunks that can be run to produce what you did. We do not take your word for what the output is. We may run it ourselves. But we also want the output.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading this homework. https://canvas.umn.edu/courses/330843/assignments/2864751.

Each homework includes the preceding quiz. You may either redo the quiz questions for homework or not redo them if you are satisfied with your quiz answers. In either case the quiz questions also count as homework questions (so quiz questions count twice, once on the quiz and once on the homework, whether redone or not). If you don't submit anything for problems 1–3 (the quiz questions), then we assume you liked the answers you already submitted.

Quiz 5

Problem 1

The following R command


foo <- read.table(url("https://www.stat.umn.edu/geyer/3701/data/2022/q5p1.txt"), header = TRUE)

assigns one R object foo, which is a dataframe containing one variable x.

We assume these data are independent and identically distributed from the logistic location family. See the help for R function dlogis for a description of this family. We assume the default value (1) for the scale parameter.

Find the MLE for these data. Good starting points (root-n-consistent estimators) for this model include both the sample mean and the sample median (also any trimmed mean).
Produce a large-sample approximate 95% confidence interval the true unknown location parameter from these data using the usual theory of asymptotics of maximum likelihood.

Problem 2

This problem continues where the preceding problem left off. We use the same data and the same statistical model.

Also produce a large-sample approximate 95% confidence interval for the parameter that is a level set of the log likelihood, following Section 5.4.4 of the course notes on models, part II.

Problem 3

The following R command


load(url("https://www.stat.umn.edu/geyer/3701/data/2022/q5p3.rda"))

loads one R object f, which is a function having one argument, which is a four-dimensional numeric vector. The value of this function is numeric scalar.

The problem is to minimize this function.

This function has been deliberately constructed to have multiple local minima. So use R function optim with method = "SANN" to try to minimize it (Section 7.1.2.3 of the notes on optimization).

Since this method uses random search, use R function set.seed to get repeatability as you work on this problem.

Use R function optim method = "SANN" ten different times with default arguments of the control parameters starting at zero (in four-dimensional space). Do you always get the same answer? What does this tell you about method SANN?

Homework 5

Problem 4

This problem continues where Problem 3 left off. Minimize the same function f used in Problem 3. Again use R function optim method = "SANN". But this time read the help for R function optim, especially

the part of the Details section about method SANN that describes how the temperature changes over time and what that temperature does in the algorithm (controls the step size) and
the part of the Details section about the control argument that says how the variables in the temperature change function can be specified, the components maxit, temp, and tmax being relevant to method SANN.

Note that the temperature at iteration t is a function of all of these parameters control components together. So adjusting any one without adjusting some of the others makes no sense. The idea is to start at a high (but not too high) temperature and end (when t = maxit) at a low (but not too low) temperature but how high is too high and how low is too low is problem specific. But you want the final temperature (when t = maxit) to be many times lower than when t = 1.

Try to get one run that starts at zero (in 4-dimensional space) and finds a solution at least as low as any found in problem 3. Do a run that lasts at least 10 minutes. You can use R function system.time to tell how long it takes. To get full credit, you do not have to actually find the global optimum (we don't even know what that is). The point is to use the control argument in a reasonable way.

Problem 5

minimize:	`x`² + `y`⁴ + `z`⁶ + sin(`x` + `y` + `z`)
subject to:	`x`² + `y`² + `z`² ≥ 4

Problem 6

This is the same problem as problem 5 except that you are required to supply functions that calculate the derivatives of the objective function (the function to minimize) and the constraint function (the function required to be ≥ 0) to the R function doing the minimization.

Also these derivatives must be analytic derivatives (not calculated by the R package numDeriv or any other R code that does derivatives by finite differences). You may use the R functions D or deriv to calculate these derivatives, but I just did them by hand.

Problem 7

This problem continues where problem 1 left off. We use the same statistical model: logistic scale family.

But now we are going to do a simulation study comparing the following four estimators:

the maximum likelihood estimator, what was used in problem 1,
the sample mean,
the sample median,
a 10% trimmed mean, what R function mean with optional argument trim = 0.1 evaluates.

Do a simulation study comparing these for estimators like the simulation study in Section 5 of the notes on simulation. You do not have to make any plots, just find

the mean square error (estimated from the simulations) of each of the four estimators and
the Monte Carlo standard error of each of the quantities in the previous item.

Use data sample size and simulation sample size


n <- 10
nsim <- 1e5

(the former the same as in the course notes but the latter 10 times larger).

Note: since this is a location family, any location parameter will give the same mean square errors as any other. You can use location parameter equal to zero (the default) for R function rlogis.

Note: use the principle of common random numbers as the example in the course notes does.

Note: you do not have to make a nice R markdown table like the course notes do. Just putting the numbers in a matrix with labels and showing the matrix will do.