General Instructions
The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructor (Geyer).
You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:
- For simple computer commands, you may just write down the command
you used and the result it gave on your exam solution.
- For complicated commands or plots, make a printout and attach
the printout to your exam solution.
- For complicated commands or plots, make a printout and attach the printout to your exam solution. In every case show commands and results of commands (not just the commands).
No credit for numbers with no indication of where they came from!
Question 1 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable
x
is
loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These are simulated data from a symmetric distribution. This problem is about comparing three different estimators of location,
- the sample mean (calculated by the R function
mean
(on-line help) - the sample 25%-trimmed mean (calculated by the R function
mean
with optional arguments (on-line help) - the sample median (calculated by the R function
median
(on-line help)
-
Calculate these three estimators for these data.
-
Estimate the standard error of each of these estimators
(considered as a point estimate of the population center of symmetry) using
Efron's nonparametric bootstrap.
Use at least 1000 bootstrap samples to calculate your bootstrap standard
errors.
-
Which estimator do your bootstrap calculations say is the best?
- For the best estimator (by your calculations), make a histogram showing the bootstrap distribution of this estimator and also showing the point which is the corresponding estimate for the original data.
Question 2 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, two variables x
and y
are loaded. This is regression data: y
is the response variable and x is the predictor variable.
(If you are using R at home,
see the footnote about reading this data into R).
This problem is a regression problem.
In an example
and a homework problem
we used the R functions lmsreg
and ltsreg
.
Another robust regression function
in R is the function rlm
in the MASS
library
(on-line
help).
We want to try that out.
- Make a scatterplot of these data. Fit the model that says y
is a cubic polynomial in x (with all terms up to degree three)
plus IID mean zero error. Use the
function
rlm
. You may use the defaults for all the arguments to this function.For comparison, fit the same model using ordinary least squares regression done by the R function
lm
(on-line help).For each of these fits obtain the estimated regression function (estimated conditional expected value of y given x, for short call these
regression curves
). Put both regression curves on the same plot. Make it clear which is which. -
Do a nonparametric bootstrap of the robust regression done by the
rlm
function. Bootstrap residuals, not cases. Use at least 200 bootstrap samples to calculate your estimate. Make a plot showing- all of the bootstrap regression curves,
- the regression curve for the actual data, and
- the scatterplot of the actual data points.
-
Estimate the standard errors of the four regression coefficients
for the robust regression done by
rlm
considered as a point estimates of the population regression coefficients using the same bootstrap as in part (b).
Question 3 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable x
is loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These data are actually a simulated stationary time series. The model is a so-called MA(2) model specified as follows
where all of the Zt variables are independent and
identically distributed mean-zero normal random variables, and each
of the Zt variables is also independent of all
of the Xs variables for s < t.
We are interested in the estimates of θ1
and θ2 that are given by the R function
arima
(on-line
help). Specifically, the R statement
coef(arima(x.star, order = c(0, 0, 2)))provides a vector of length three with the three parameter estimates named
"ma1"
, "ma2"
, and "intercept"
.
We are interested in the first two, which are estimates of θ1
and θ2, respectively.
- Estimate θ1 and θ2 for these data.
- Do a subsampling bootstrap to obtain bootstrap distributions of these two estimators. Use subsampling bootstrap sample size 30, and assume a root n rate of convergence. Calculate approximate standard errors of these two estimators using this subsampling bootstrap.
Question 4 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable
x
is loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These data are simulated IID (independent and identically distributed) data from a distribution has support (0, ∞), which means the maximum of a sample Mn goes to infinity as the sample size goes to infinity. In fact, according to theory we won't tell you any more about in this course (it not being a theory course),
converges in distribution (to some distribution, the job of this problem is to say something about that distribution).
Since there is no
nα in the formula (*),
the rate of convergence
for the quantity (*) is
n0 = 1. The shape and scale of the distribution does
not depend on the sample size for sufficiently large sample sizes.
- Do a subsampling bootstrap to obtain bootstrap distribution of the
quantity (*) (we do not call it an
estimator
because there is no parameter it estimates). Use subsampling bootstrap sample size 35.Make a histogram of the bootstrap distribution of the quantity (*).
- There is no part (b).
Reading Data into R
If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in is URL at the beginning. So all together, you must do for problem 1, for example,
X <- read.table(url("http://www.stat.umn.edu/geyer/5601/mydata/t2q1.txt"), header = TRUE) names(X) attach(X)
to produce the variable x
needed for your analysis.
Of course, you read different data files for different problems that use external data entry. Everything else stays the same.