Rules
No rules. This is practice.
Grades
No grades. This is practice.
Disclaimer
These practice problems are supplied without any guarantee that they will help you do the quiz problems. However, they were written after the quiz problems were written and with the intention that they would help.
These practice problems are also supplied without any guarantee that they are exactly or even nearly like the quiz problems. However, they are like at least some quiz problems in at least some respects.
Problem 1
This problem has no data. It is a simulation study.
We are going to compare the MLE for the two-parameter gamma distribution
described in
Section 6.2
of the course notes on the bootstrap
versus the so called method of moments estimators that are described
as good starting estimates
.
The R code for the method of moments estimators is
alpha.start <- mean(x)^2 / var(x) beta.start <- var(x) / mean(x)(copied from the course notes). You have to figure out how to calculate MLE (although this section of the course notes also illustrates that).
Use sample size n
= 50.
Use simulation truth parameter value α = 2.5
and β = 3.5 (where α is the shape parameter and β is
the scale parameter, what the R function dgamma
calls
the scale parameter).
Use at least 104 simulations.
Like in the course notes, make a table comparing the MSE of these estimators (with MCSE as appropriate). You do not have to format these nicely in a table using Rmarkdown. As long as you print each number (one estimate of MSE and one estimate of MCSE of MSE for each estimator and each parameter, which is sixteen numbers, two estimators for each of two parameters and MSE and MCSE for each) with a clear indication (maybe a comment, maybe a row or column label in a matrix) of what it is, that's enough.
The R code that makes the tables at the ends of Section 5.5 of those course notes and Section 5.6 of those course notes is hidden (not in the HTML) but can be seen in the Rmarkdown source of those course notes.
Be sure to use the principle of common random numbers.
Problem 2
The R command
x <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/boot1.csv")$xassigns one R object
x
, which is the data vector for this
problem. These data were also used for
Section 6.1 of the course notes on the bootstrap,
but unlike in those notes we are going to investigate different estimators.
- We are going to estimate the population median θ.
- We are going to estimate it using the sample median, which
is calculated by the R function
median
. - We are going to use the sample median absolute deviation from the
median (MAD), which is calculated by the R function
mad
, as an estimator of scale, as what the R functionboott
in the CRAN packagebootstrap
calls thesdfun
.
Calculate a 95% bootstrap t confidence interval using these data and these estimators.
Use the method of the first example in Section 6.1.4 of the course notes on the bootstrap.
Do not use the method of the second example in that section where
the sdfun
argument is omitted.
You could also use the method of Section 6.1.3 of those notes, but why? It is a lot more code to fuss with.
Use at least 104 bootstrap samples.
Problem 3
This problem is about two-parameter maximum likelihood, which was covered in Sections 6–9 of the course notes on models, Part II.
In particular, it is about likelihood confidence intervals, exemplified in Section 9.3.3 of those course notes.
However, in order to not use a distribution already used in the course notes, we are going to use the so-called hyperbolic secant distribution described in this Wikipedia article (which you should not need to read to do this problem).
The PDF of the standard hyperbolic secant distribution is given by
where cosh is the so-called hyperbolic cosine that is calculated by the
R function cosh
and is described in
this
Wikipedia article (which you should not need to read to do this problem).
It is given by
R does not have a function to carefully calculate the log of this function but we can define one as follows.
lcosh <- function(x) abs(x) + log1p(exp(- 2 * abs(x))) - log(2)The statistical model we are interested in is the location-scale family that is generated by this standard distribution. It has PDF given by
where f is the PDF of the standard hyperbolic secant distribution defined above.
This is a symmetric distribution. The population median is the center of symmetry, which is the location parameter. The interquartile range (IQR) of the standard hyperbolic secant distribution is 1.1222 (by calculations I won't show because you do not need to see them to do this problem).
Hence the sample median is a good estimator of location and the IQR divided by 1.1222 is a good estimator of scale.
The R command
x <- read.csv(url("http://www.stat.umn.edu/geyer/3701/data/p6p3.csv"))$xassigns one R object
x
, which is the data vector for this
problem.
- Use these starting values to find MLE of both parameters (location and scale) for these data assuming they are IID hyperbolic secant distributed.
- Provide likelihood-based 95% confidence intervals for both parameters.