Statistics 3701 (Geyer, Fall 2022) Quiz 6

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Canvas should be an Rmarkdown file with text and code chunks that can be run to produce what you did. We do not take your word for what the output is. We may run it ourselves. But we also want the output.

You may ask questions during the quiz, especially if the wording of a question is confusing or there seems to be an issue with the question, but the instructor will not be giving hints.

You must be in the classroom, Molecular and Cellular Biology 2-120, to take the quiz.

Quizzes must uploaded by the end of class (1:10). It should actually allow a few minutes after that, but those not uploaded by 1:10 will be marked late. Here is the link for uploading this quiz https://canvas.umn.edu/courses/330843/assignments/2880927.

Quiz 6

Problem 1

This is a simulation study (course notes about simulation).

We are going to study estimators for the Cauchy location family for which we did maximum likelihood in (Section 5.3.3 of the second course notes on models).

R function rcauchy simulates random variates from this distribution. R function dcauchy evaluates the probability density function of this distribution (illustrated in the section of the notes linked just above).

In all cases, we are going to assume the scale parameter is known and equal to the default value (1.0). The unknown parameter to be estimated is the location parameter (called location by the R functions mentioned above).

The estimators we want to compare are

the maximum likelihood estimator (MLE) calculated as in the section of the notes linked above.
the sample median, calculated by R function median.
the sample mean, calculated by R function mean.
the 10% trimmed sample mean, calculated by R function mean with optional argument trim = 0.1.

Note: Theory says the sample mean is a very bad estimator. But that's OK. That's one thing a simulation can show.

Use sample size n = 15.

Use simulation truth parameter value θ = 0. (In any location family, and in this one in particular, the variance of an equivariant estimator, which all of these are, does not depend on the true unknown parameter value, so the results should be the same regardless of what θ we choose).

Use at least 10⁴ simulations.

Like in the course notes, make a table comparing the MSE of these estimators (with MCSE as appropriate). You do not have to format these nicely in a table using Rmarkdown. As long as you print each number (one estimate of MSE and one estimate of MCSE of MSE for each estimator, which is four numbers) with a clear indication (maybe a comment, maybe a row or column label in a matrix) of what it is, that's enough.

The R code that makes the tables at the ends of Section 5.5 of those course notes and Section 5.6 of those course notes is hidden (not in the HTML) but can be seen in the Rmarkdown source of those course notes.

Be sure to use the principle of common random numbers.

Problem 2

The R command


foo <- read.table(url("http://www.stat.umn.edu/geyer/3701/data/2022/q6p2.txt"),
    header = TRUE)

assigns one R object foo, which is a data frame containing one variable x, which is the data for this problem.

The estimator we are going to use in this problem is the 25% trimmed mean (calculated by mean(x, trim = 0.25) if the data are x). What parameter does this estimate? It estimates the 25% trimmed mean of the true unknown distribution of the data. Call that unknown parameter θ.

Calculate a 95% nonparametric bootstrap t confidence interval using these data and R function boott in R package bootstrap using no sdfun (because we don't know how to make one) following the example in the course notes on the bootstrap.

Use at least 10³ bootstrap samples.

Use at least 200 bootstrap samples, for the inner bootstrap that determines the SD function (argument nbootsd).

Problem 3

This problem is about parallelization.

We are going to parallelize, one of the simulations in problem 1, the one for the sample median.

Parallelize this simulation using R function mclapply in R package parallel following the example in (Section 7.1 of the course notes on parallel computing. If you are on Microsoft Windows, you will have to use mc.cores = 1 as explained in those notes.

Do everything as in problem 1 except parallelize using mclapply and use only the sample median, not the other estimators.