General Instructions
This exam is Due by 3:30 pm on Wednesday, December 18, 2013. Hand the exam in person to the instructor (in his office Ford 356) or to one of the staff (in the department office Ford 313).
The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructor (Geyer).
You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:
- For simple computer commands, you may just write down the command
you used and the result it gave on your exam solution.
- For complicated commands or plots, make a printout and attach
the printout to your exam solution.
No credit for numbers with no indication of where they came from!
Question 1 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable x
is loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These data are to be taken as a random sample from some population. We want a confidence interval for the population median.
-
Calculate a 95% bootstrap t confidence interval for the population
median using the sample median as your point estimator and the sample
interquartile range, computed by the R function
IQR
as an estimate of the standard deviation of this point estimate (thesdfun
argument to theboott
function). Use bootstrap sample size at least 1000. -
Calculate a 95% bootstrap percentile confidence interval for the population
median using the sample median as your point estimator.
Use bootstrap sample size at least 1000.
- Calculate a 95% BCa confidence interval for the population median using the sample median as your point estimator. Use bootstrap sample size at least 1000.
Question 2 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable x
is loaded.
(If you are using R at home,
see the footnote about reading this data into R).
The parameter of interest in this problem is the so-called coefficient of variation θ = σ ⁄ μ, where μ is the population mean and σ is the population standard deviation. This is calculated by the R function
foo <- function(x) { n <- length(x) sd(x) * sqrt((n - 1) / n) / mean(x) }
We put in the sqrt((n - 1) / n)
to make it use the
standard deviation of the empirical distribution rather than the
so-called sample standard deviation,
which divides by n − 1 rather than n
in its formula.
We have been using that estimator and its square (variance of the
empirical distribution), following Efron and Tibshirani,
ever since we started with the bootstrap.
Produce a 95% ABC bootstrap confidence interval for θ
using the estimator coded above (which has to be recoded into
resampling form
to do the ABC interval).
Question 3 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, one variable x
is loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These data are a stationary time series. The parameter θ we want to estimate is the upper quartile (0.75 quantile) of the marginal distribution of the data.
Produce a 95% subsampling bootstrap confidence interval for θ.
Use a subsample size b = 50.
Assume the square root law holds for this estimator (the rate
is n1 ⁄ 2).
Question 4 [25 pts.]
The data for this problem are at the URL
With that URL given to Rweb, two variables x
and y
are loaded.
(If you are using R at home,
see the footnote about reading this data into R).
These data are regression data: x
is the predictor variable
and y
is the response variable.
We are interested in a nonparametric estimate of the regression function
g(x) = E(y | x).
- Estimate the regression function using a kernel smoother with
optimal bandwidth chosen automatically. Say what method of bandwidth
selection was used. You may need to refer to the on-line help for
the R function you are using to accomplish this.
Show the R or Rweb output needed to accomplish this, and
turn in a scatter plot of the data with the estimated regression function.
shown.
- Estimate the regression function using a smoothing spline with optimal smoothing parameter chosen automatically. Say what method of smoothing parameter selection was used. You may need to refer to the on-line help for the R function you are using to accomplish this. Show the R or Rweb output needed to accomplish this, and turn in a scatter plot of the data with the estimated regression function.
Reading Data into R
If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in is URL at the beginning. So all together, you must do for problem 1, for example,
X <- read.table(url("http://www.stat.umn.edu/geyer/5601/mydata/t3q1.txt"), header = TRUE) names(X) attach(X)
To produce the variable x
needed for your analysis.
Of course, you read different data files for different problems that use external data entry. Everything else stays the same.