General Instructions

The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructor (Geyer).

You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:

No credit for numbers with no indication of where they came from!

Question 1 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t2q1.txt

With that URL given to Rweb, one variable x is loaded. (If you are using R at home, see the footnote about reading this data into R).

These are simulated data from a symmetric distribution. This problem is about comparing three different estimators of location,

  1. Calculate these three estimators for these data.

  2. Estimate the standard error of each of these estimators (considered as a point estimate of the population center of symmetry) using Efron's nonparametric bootstrap. Use at least 1000 bootstrap samples to calculate your bootstrap standard errors.

  3. Which estimator do your bootstrap calculations say is the best?

  4. For the best estimator (by your calculations), make a histogram showing the bootstrap distribution of this estimator and also showing the point which is the corresponding estimate for the original data.

Question 2 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t2q2.txt

With that URL given to Rweb, two variables x and y are loaded. This is regression data: y is the response variable and x is the predictor variable. (If you are using R at home, see the footnote about reading this data into R).

This problem is a regression problem. In an example and a homework problem we used the R functions lmsreg and ltsreg.

Another robust regression function in R is the function rlm in the MASS library (on-line help). We want to try that out.

  1. Make a scatterplot of these data. Fit the model that says y is a cubic polynomial in x (with all terms up to degree three) plus IID mean zero error. Use the function rlm. You may use the defaults for all the arguments to this function.

    For comparison, fit the same model using ordinary least squares regression done by the R function lm (on-line help).

    For each of these fits obtain the estimated regression function (estimated conditional expected value of y given x, for short call these regression curves). Put both regression curves on the same plot. Make it clear which is which.

  2. Do a nonparametric bootstrap of the robust regression done by the rlm function. Bootstrap residuals, not cases. Use at least 200 bootstrap samples to calculate your estimate. Make a plot showing
    • all of the bootstrap regression curves,
    • the regression curve for the actual data, and
    • the scatterplot of the actual data points.

  3. Estimate the standard errors of the four regression coefficients for the robust regression done by rlm considered as a point estimates of the population regression coefficients using the same bootstrap as in part (b).

Question 3 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t2q3.txt

With that URL given to Rweb, one variable x is loaded. (If you are using R at home, see the footnote about reading this data into R).

These data are actually a simulated stationary time series. The model is a so-called MA(2) model specified as follows

Xt = μ + Zt + θ1 Zt − 1 + θ2 Zt − 2

where all of the Zt variables are independent and identically distributed mean-zero normal random variables, and each of the Zt variables is also independent of all of the Xs variables for s < t. We are interested in the estimates of θ1 and θ2 that are given by the R function arima (on-line help). Specifically, the R statement

coef(arima(x.star, order = c(0, 0, 2)))
provides a vector of length three with the three parameter estimates named "ma1", "ma2", and "intercept". We are interested in the first two, which are estimates of θ1 and θ2, respectively.

  1. Estimate θ1 and θ2 for these data.

  2. Do a subsampling bootstrap to obtain bootstrap distributions of these two estimators. Use subsampling bootstrap sample size 30, and assume a root n rate of convergence. Calculate approximate standard errors of these two estimators using this subsampling bootstrap.

Question 4 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t2q4.txt

With that URL given to Rweb, one variable x is loaded. (If you are using R at home, see the footnote about reading this data into R).

These data are simulated IID (independent and identically distributed) data from a distribution has support (0, ∞), which means the maximum of a sample Mn goes to infinity as the sample size goes to infinity. In fact, according to theory we won't tell you any more about in this course (it not being a theory course),

Mn - log(n)               (*)

converges in distribution (to some distribution, the job of this problem is to say something about that distribution).

Since there is no nα in the formula (*), the rate of convergence for the quantity (*) is n0 = 1. The shape and scale of the distribution does not depend on the sample size for sufficiently large sample sizes.

  1. Do a subsampling bootstrap to obtain bootstrap distribution of the quantity (*) (we do not call it an estimator because there is no parameter it estimates). Use subsampling bootstrap sample size 35.

    Make a histogram of the bootstrap distribution of the quantity (*).

  2. There is no part (b).

If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in is URL at the beginning. So all together, you must do for problem 1, for example,

X <- read.table(url("http://www.stat.umn.edu/geyer/5601/mydata/t2q1.txt"),
    header = TRUE)
names(X)
attach(X)

to produce the variable x needed for your analysis.

Of course, you read different data files for different problems that use external data entry. Everything else stays the same.