Statistics 3701 (Geyer, Spring 2017) Practice Problems 6

Rules

No rules. This is practice.

Grades

No grades. This is practice.

Disclaimer

These practice problems are supplied without any guarantee that they will help you do the quiz problems. However, they were written after the quiz problems were written and with the intention that they would help.

These practice problems are also supplied without any guarantee that they are exactly or even nearly like the quiz problems. However, they are like at least some quiz problems in at least some respects.

Problem 1

This problem has no data. It is a simulation study.

We are going to compare the MLE for the two-parameter gamma distribution described in Section 6.2 of the course notes on the bootstrap versus the so called method of moments estimators that are described as good starting estimates.

The R code for the method of moments estimators is


alpha.start <- mean(x)^2 / var(x)
beta.start <- var(x) / mean(x)

(copied from the course notes). You have to figure out how to calculate MLE (although this section of the course notes also illustrates that).

Use sample size n = 50.

Use simulation truth parameter value α = 2.5 and β = 3.5 (where α is the shape parameter and β is the scale parameter, what the R function dgamma calls the scale parameter).

Use at least 10⁴ simulations.

Like in the course notes, make a table comparing the MSE of these estimators (with MCSE as appropriate). You do not have to format these nicely in a table using Rmarkdown. As long as you print each number (one estimate of MSE and one estimate of MCSE of MSE for each estimator and each parameter, which is sixteen numbers, two estimators for each of two parameters and MSE and MCSE for each) with a clear indication (maybe a comment, maybe a row or column label in a matrix) of what it is, that's enough.

The R code that makes the tables at the ends of Section 5.5 of those course notes and Section 5.6 of those course notes is hidden (not in the HTML) but can be seen in the Rmarkdown source of those course notes.

Be sure to use the principle of common random numbers.

Problem 2

The R command


x <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/boot1.csv")$x

assigns one R object x, which is the data vector for this problem. These data were also used for Section 6.1 of the course notes on the bootstrap, but unlike in those notes we are going to investigate different estimators.

We are going to estimate the population median θ.
We are going to estimate it using the sample median, which is calculated by the R function median.
We are going to use the sample median absolute deviation from the median (MAD), which is calculated by the R function mad, as an estimator of scale, as what the R function boott in the CRAN package bootstrap calls the sdfun.

These estimators are highly robust. The median tolerates up to 50% outliers or gross errors in the data without becoming useless, and the MAD tolerates up to 25%.

Calculate a 95% bootstrap t confidence interval using these data and these estimators.

Use the method of the first example in Section 6.1.4 of the course notes on the bootstrap.

Do not use the method of the second example in that section where the sdfun argument is omitted.

You could also use the method of Section 6.1.3 of those notes, but why? It is a lot more code to fuss with.

Use at least 10⁴ bootstrap samples.

Problem 3

This problem is about two-parameter maximum likelihood, which was covered in Sections 6–9 of the course notes on models, Part II.

In particular, it is about likelihood confidence intervals, exemplified in Section 9.3.3 of those course notes.

However, in order to not use a distribution already used in the course notes, we are going to use the so-called hyperbolic secant distribution described in this Wikipedia article (which you should not need to read to do this problem).

The PDF of the standard hyperbolic secant distribution is given by

f(x) = 1 ⁄ [2 * cosh(π x ⁄ 2)]

where cosh is the so-called hyperbolic cosine that is calculated by the R function cosh and is described in this Wikipedia article (which you should not need to read to do this problem). It is given by

cosh(x) = [exp(x) + exp(− x)] ⁄ 2

R does not have a function to carefully calculate the log of this function but we can define one as follows.


lcosh <- function(x) abs(x) + log1p(exp(- 2 * abs(x))) - log(2)

The statistical model we are interested in is the location-scale family that is generated by this standard distribution. It has PDF given by

f_{μ, σ}(x) = f([x − μ] ⁄ σ) ⁄ |σ|

where f is the PDF of the standard hyperbolic secant distribution defined above.

This is a symmetric distribution. The population median is the center of symmetry, which is the location parameter. The interquartile range (IQR) of the standard hyperbolic secant distribution is 1.1222 (by calculations I won't show because you do not need to see them to do this problem).

Hence the sample median is a good estimator of location and the IQR divided by 1.1222 is a good estimator of scale.