Rules
See the Section about Rules for Quizzes and Homeworks on the General Info page.
Your work handed into Moodle should be a plain text file with R commands and comments that can be run to produce what you did. We do not take your word for what the output is. We run it ourselves.
Note: Plain text
specifically excludes
Microsoft Word native format (extension .docx
). If you have to
Word as your text editor, then save as and choose the format to be
Text (.txt)
or something like that. Then upload the
saved plain text file.
Note: Plain text
specifically excludes
PDF (Adobe Portable Document Format) (extension .pdf
). If you
use Sweave, knitr, or Rmarkdown, upload the source (extension .Rnw
or .Rmd
) not PDF or any other kind of output.
If you have questions about the quiz, ask them in the Moodle forum for this quiz. Here is the link for that https://ay16.moodle.umn.edu/mod/forum/view.php?id=1368653.
You must be in the classroom, Armory 202, while taking the quiz.
Quizzes must uploaded by the end of class (1:10). Moodle actually allows a few minutes after that. Here is the link for uploading the quiz https://ay16.moodle.umn.edu/mod/assign/view.php?id=1368662.
Homeworks must uploaded before midnight the day they are due. Here is the link for uploading the homework. https://ay16.moodle.umn.edu/mod/assign/view.php?id=1368665.
Quiz 6
Problem 1
This is a simulation study (course notes about simulation).
We are going to study almost but not exactly the same model as in (Section 5 of those course notes). The model for this problem is normal with mean θ and variance θ4 (not θ2 as in the example in the notes).
The estimators we want to compare are the sample mean
(calculated by mean(x)
if the data are x
)
and the signed square root of the sample sd
(calculated by sign(mean(x)) * sqrt(sd(x))
if the data
are x
). Just those two estimators.
Use sample size n
= 15.
Use simulation truth parameter value θ = e
(that is, theta <- exp(1)
).
Use at least 104 simulations.
Like in the course notes, make a table comparing the MSE of these estimators (with MCSE as appropriate). You do not have to format these nicely in a table using Rmarkdown. As long as you print each number (one estimate of MSE and one estimate of MCSE of MSE for each estimator, which is four numbers) with a clear indication (maybe a comment, maybe a row or column label in a matrix) of what it is, that's enough.
The R code that makes the tables at the ends of Section 5.5 of those course notes and Section 5.6 of those course notes is hidden (not in the HTML) but can be seen in the Rmarkdown source of those course notes.
Be sure to use the principle of common random numbers.
Problem 2
The R command
x <- read.csv(url("http://www.stat.umn.edu/geyer/5102/data/prob7-1.txt"))$xassigns one R object
x
, which is the data vector for this
problem.
These same data were also used Section 5 of the course notes on models, Part II and Section 9 of those course notes, but unlike in those notes we are going to investigate different estimators.
The estimators of location and scale we are going to use are
the 25% trimmed mean (calculated by mean(x, trim = 0.25)
if
the data are x
) and
the interquartile range (calculated by IQR(x)
if
the data are x
), respectively.
Both are highly robust, tolerating up to 25% gross outliers or errors in
the data.
Calculate a 95% bootstrap t confidence interval using these
data and these estimators (the trimmed mean is the estimator of the parameter
of interest and the IQR is the sdfun
).
Use the method of the first example in Section 6.1.4 of the course notes on the bootstrap.
Do not use the method of the second example in that section where
the sdfun
argument is omitted.
You could also use the method of Section 6.1.3 of those notes, but why? It is a lot more code to fuss with.
Use at least 104 bootstrap samples.
Problem 3
This problem is about two-parameter maximum likelihood, which was covered in Sections 6–9 of the course notes on models, Part II.
In particular, it is about Wald-type confidence intervals, exemplified in Section 9.3.1 of those course notes and Section 9.3.2 of those course notes.
However, in order to not use a distribution already used in the course notes, we are going to use the logistic distribution (which gives logistic regression its name).
The R function dlogis
calculates the PDF of this distribution
and works like all other
functions for distributions
R knows about. The help page d
help("dlogis")
gives a mathematical
equation for the PDF but you shouldn't need it for this problem (you can use
it if you think you need to).
This is a symmetric distribution. The population median is the center of symmetry, which is the location parameter. The interquartile range (IQR) of the standard logistic distribution is
> qlogis(0.75) - qlogis(0.25) [1] 2.197225times the scale parameter (the number above is for the default value of the scale parameter, which is one). We can see this from
> foo <- rexp(5) > foo [1] 1.3009717 0.1212703 0.2030711 0.6550454 0.9039590 > (qlogis(0.75, scale = foo) - qlogis(0.25, scale = foo)) / foo [1] 2.197225 2.197225 2.197225 2.197225 2.197225Hence the sample median is a good estimator of location and the IQR divided by
qlogis(0.75) - qlogis(0.25)
is a good estimator of scale.
The R command
x <- read.csv(url("http://www.stat.umn.edu/geyer/s17/3701/data/q6p3.csv"))$xassigns one R object
x
, which is the data vector for this
problem.
- Use these starting values to find MLE of both parameters (location and scale) for these data assuming they are IID logistic distributed.
- Provide Wald-type 95% confidence intervals of both parameters (location and scale) using observed Fisher information to compute standard errors for the point estimates.