General Instructions

This exam is Due by 3:30 pm on Wednesday, December 18, 2013. Hand the exam in person to the instructor (in his office Ford 356) or to one of the staff (in the department office Ford 313).

The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructor (Geyer).

You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:

No credit for numbers with no indication of where they came from!

Question 1 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t3q1.txt

With that URL given to Rweb, one variable x is loaded. (If you are using R at home, see the footnote about reading this data into R).

These data are to be taken as a random sample from some population. We want a confidence interval for the population median.

  1. Calculate a 95% bootstrap t confidence interval for the population median using the sample median as your point estimator and the sample interquartile range, computed by the R function IQR as an estimate of the standard deviation of this point estimate (the sdfun argument to the boott function). Use bootstrap sample size at least 1000.

  2. Calculate a 95% bootstrap percentile confidence interval for the population median using the sample median as your point estimator. Use bootstrap sample size at least 1000.

  3. Calculate a 95% BCa confidence interval for the population median using the sample median as your point estimator. Use bootstrap sample size at least 1000.

Question 2 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t3q2.txt

With that URL given to Rweb, one variable x is loaded. (If you are using R at home, see the footnote about reading this data into R).

The parameter of interest in this problem is the so-called coefficient of variation θ = σ ⁄ μ, where μ is the population mean and σ is the population standard deviation. This is calculated by the R function

foo <- function(x) {
    n <- length(x)
    sd(x) * sqrt((n - 1) / n) / mean(x)
}

We put in the sqrt((n - 1) / n) to make it use the standard deviation of the empirical distribution rather than the so-called sample standard deviation, which divides by n − 1 rather than n in its formula. We have been using that estimator and its square (variance of the empirical distribution), following Efron and Tibshirani, ever since we started with the bootstrap.

Produce a 95% ABC bootstrap confidence interval for θ using the estimator coded above (which has to be recoded into resampling form to do the ABC interval).

Question 3 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t3q3.txt

With that URL given to Rweb, one variable x is loaded. (If you are using R at home, see the footnote about reading this data into R).

These data are a stationary time series. The parameter θ we want to estimate is the upper quartile (0.75 quantile) of the marginal distribution of the data.

Produce a 95% subsampling bootstrap confidence interval for θ. Use a subsample size b = 50. Assume the square root law holds for this estimator (the rate is n1 ⁄ 2).

Question 4 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/t3q4.txt

With that URL given to Rweb, two variables x and y are loaded. (If you are using R at home, see the footnote about reading this data into R).

These data are regression data: x is the predictor variable and y is the response variable. We are interested in a nonparametric estimate of the regression function g(x) = E(y | x).

  1. Estimate the regression function using a kernel smoother with optimal bandwidth chosen automatically. Say what method of bandwidth selection was used. You may need to refer to the on-line help for the R function you are using to accomplish this. Show the R or Rweb output needed to accomplish this, and turn in a scatter plot of the data with the estimated regression function. shown.

  2. Estimate the regression function using a smoothing spline with optimal smoothing parameter chosen automatically. Say what method of smoothing parameter selection was used. You may need to refer to the on-line help for the R function you are using to accomplish this. Show the R or Rweb output needed to accomplish this, and turn in a scatter plot of the data with the estimated regression function.

If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in is URL at the beginning. So all together, you must do for problem 1, for example,

X <- read.table(url("http://www.stat.umn.edu/geyer/5601/mydata/t3q1.txt"),
    header = TRUE)
names(X)
attach(X)

To produce the variable x needed for your analysis.

Of course, you read different data files for different problems that use external data entry. Everything else stays the same.