Statistics 3701 (Geyer, Spring 2017) Quiz 6

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.

Note: Plain text specifically excludes Microsoft Word native format (extension .docx). If you have to Word as your text editor, then save as and choose the format to be Text (.txt) or something like that. Then upload the saved plain text file.

Note: Plain text specifically excludes PDF (Adobe Portable Document Format) (extension .pdf). If you use Sweave, knitr, or Rmarkdown, upload the source (extension .Rnw or .Rmd) not PDF or any other kind of output.

If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1368653.

You must be in the classroom, Armory 202, while taking the quiz.

Quizzes must uploaded by the end of class (1:10). Moodle actually allows a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1368662.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1368665.

Quiz 6

Problem 1

This is a simulation study (course notes about simulation).

We are going to study almost but not exactly the same model as in (Section 5 of those course notes). The model for this problem is normal with mean θ and variance θ⁴ (not θ² as in the example in the notes).

The estimators we want to compare are the sample mean (calculated by mean(x) if the data are x) and the signed square root of the sample sd (calculated by sign(mean(x)) * sqrt(sd(x)) if the data are x). Just those two estimators.

Use sample size n = 15.

Use simulation truth parameter value θ = e (that is, theta <- exp(1)).

Use at least 10⁴ simulations.

Like in the course notes, make a table comparing the MSE of these estimators (with MCSE as appropriate). You do not have to format these nicely in a table using Rmarkdown. As long as you print each number (one estimate of MSE and one estimate of MCSE of MSE for each estimator, which is four numbers) with a clear indication (maybe a comment, maybe a row or column label in a matrix) of what it is, that's enough.

The R code that makes the tables at the ends of Section 5.5 of those course notes and Section 5.6 of those course notes is hidden (not in the HTML) but can be seen in the Rmarkdown source of those course notes.

Be sure to use the principle of common random numbers.

Problem 2

The R command


x <- read.csv(url("http://www.stat.umn.edu/geyer/5102/data/prob7-1.txt"))$x

assigns one R object x, which is the data vector for this problem.

These same data were also used Section 5 of the course notes on models, Part II and Section 9 of those course notes, but unlike in those notes we are going to investigate different estimators.

The estimators of location and scale we are going to use are the 25% trimmed mean (calculated by mean(x, trim = 0.25) if the data are x) and the interquartile range (calculated by IQR(x) if the data are x), respectively. Both are highly robust, tolerating up to 25% gross outliers or errors in the data.

Calculate a 95% bootstrap t confidence interval using these data and these estimators (the trimmed mean is the estimator of the parameter of interest and the IQR is the sdfun).

Use the method of the first example in Section 6.1.4 of the course notes on the bootstrap.

Do not use the method of the second example in that section where the sdfun argument is omitted.

You could also use the method of Section 6.1.3 of those notes, but why? It is a lot more code to fuss with.

Use at least 10⁴ bootstrap samples.

Problem 3

This problem is about two-parameter maximum likelihood, which was covered in Sections 6–9 of the course notes on models, Part II.

In particular, it is about Wald-type confidence intervals, exemplified in Section 9.3.1 of those course notes and Section 9.3.2 of those course notes.

However, in order to not use a distribution already used in the course notes, we are going to use the logistic distribution (which gives logistic regression its name).

The R function dlogis calculates the PDF of this distribution and works like all other d functions for distributions R knows about. The help page help("dlogis") gives a mathematical equation for the PDF but you shouldn't need it for this problem (you can use it if you think you need to).

This is a symmetric distribution. The population median is the center of symmetry, which is the location parameter. The interquartile range (IQR) of the standard logistic distribution is


> qlogis(0.75) - qlogis(0.25)
[1] 2.197225

times the scale parameter (the number above is for the default value of the scale parameter, which is one). We can see this from


> foo <- rexp(5)
> foo
[1] 1.3009717 0.1212703 0.2030711 0.6550454 0.9039590
> (qlogis(0.75, scale = foo) - qlogis(0.25, scale = foo)) / foo
[1] 2.197225 2.197225 2.197225 2.197225 2.197225

Hence the sample median is a good estimator of location and the IQR divided by qlogis(0.75) - qlogis(0.25) is a good estimator of scale.