Rules
See the Section about Rules for Quizzes and Homeworks on the General Info page.
Your work handed into Canvas should be an Rmarkdown file with text and code chunks that can be run to produce what you did. We do not take your word for what the output is. We may run it ourselves. But we also want the output.
You may ask questions during the quiz, especially if the wording of a question is confusing or there seems to be an issue with the question, but the instructor will not be giving hints.
You must be in the classroom, Molecular and Cellular Biology 2-120, to take the quiz.
Quizzes must uploaded by the end of class (1:10). It should actually allow a few minutes after that, but those not uploaded by 1:10 will be marked late. Here is the link for uploading this quiz https://canvas.umn.edu/courses/330843/assignments/2880927.
Quiz 6
Problem 1
This is a simulation study (course notes about simulation).
We are going to study estimators for the Cauchy location family for which we did maximum likelihood in (Section 5.3.3 of the second course notes on models).
R function rcauchy
simulates random variates from this
distribution.
R function dcauchy
evaluates the probability density function
of this distribution (illustrated in the section of the notes linked
just above).
In all cases, we are going to assume the scale parameter is known and equal
to the default value (1.0). The unknown parameter to be estimated is
the location parameter (called location
by the R functions
mentioned above).
The estimators we want to compare are
- the maximum likelihood estimator (MLE) calculated as in the section of the notes linked above.
- the sample median, calculated by R function
median
. - the sample mean, calculated by R function
mean
. - the 10% trimmed sample mean, calculated by R function
mean
with optional argumenttrim = 0.1
.
Note: Theory says the sample mean is a very bad estimator. But that's OK. That's one thing a simulation can show.
Use sample size n
= 15.
Use simulation truth parameter value θ = 0. (In any location family, and in this one in particular, the variance of an equivariant estimator, which all of these are, does not depend on the true unknown parameter value, so the results should be the same regardless of what θ we choose).
Use at least 104 simulations.
Like in the course notes, make a table comparing the MSE of these estimators (with MCSE as appropriate). You do not have to format these nicely in a table using Rmarkdown. As long as you print each number (one estimate of MSE and one estimate of MCSE of MSE for each estimator, which is four numbers) with a clear indication (maybe a comment, maybe a row or column label in a matrix) of what it is, that's enough.
The R code that makes the tables at the ends of Section 5.5 of those course notes and Section 5.6 of those course notes is hidden (not in the HTML) but can be seen in the Rmarkdown source of those course notes.
Be sure to use the principle of common random numbers.
Problem 2
The R command
foo <- read.table(url("http://www.stat.umn.edu/geyer/3701/data/2022/q6p2.txt"), header = TRUE)assigns one R object
foo
, which is a data frame containing
one variable x
, which is the data for this problem.
The estimator we are going to use in this problem is
the 25% trimmed mean (calculated by mean(x, trim = 0.25)
if
the data are x
). What parameter does this estimate?
It estimates the 25% trimmed mean of the true unknown distribution of the
data. Call that unknown parameter θ.
Calculate a 95% nonparametric bootstrap t confidence interval
using these data and R function boott
in R package bootstrap
using no sdfun
(because
we don't know how to make one) following the example in the
course notes on the bootstrap.
Use at least 103 bootstrap samples.
Use at least 200 bootstrap samples, for the inner bootstrap that determines
the SD function (argument nbootsd
).
Problem 3
This problem is about parallelization.
We are going to parallelize, one of the simulations in problem 1, the one for the sample median.
Parallelize this simulation using R function mclapply
in R package parallel
following the example in
(Section 7.1 of the course notes on parallel computing. If you are on
Microsoft Windows, you will have to use mc.cores = 1
as
explained in those notes.
Do everything as in problem 1 except parallelize using mclapply
and use only the sample median, not the other estimators.