General Instructions

The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructor.

You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:

No credit for numbers with no indication of where they came from!

Question 1 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/f06/5601/mydata/camel.txt

With that URL given to Rweb, two variables x and y are loaded. (If you are doing this problem in R rather than Rweb, see the footnote about reading this data into R).

This is regression data. We assume the standard model that is nonparametric about the regression function

Yi = g(Xi) + σ Zi

where g is an unknown smooth function (infinite-dimensional parameter), σ is an unknown constant (scalar parameter), and the Zi are IID standard normal.

Use the R function sm.regression (on-line help) to fit a regression function (g hat) to these data. Use optimal smoothing, where optimal is defined by this package's method (ordinary cross-validation).

Hand in a scatterplot with the smoothing spline regression estimate shown. Also report the value of the bandwidth used (the one chosen by cross-validation).

Question 2 [25 pts.]

This problem continues the analysis started in Question 1 and uses the same data and the same model assumptions.

Suppose in the plot that is the answer to Question 1 we want a confidence interval for the value of the population regression function g(x) at x = 2.0.

Run an (Efron) nonparametric bootstrap to estimate the sampling distribution of this estimator of g(2.0) and calculate the 95% bootstrap percentile confidence interval obtained from the bootstrap sampling distribution.

The following detailed instructions are necessary. They were not covered in class or on the class web page about sm.regression (which was here).

Hand in a histogram of your bootstrap estimate of the sampling distribution of the estimator showing the endpoints of the confidence interval on the histogram. Also report the numbers that are the endpoints of the confidence interval.

Question 3 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/f06/5601/mydata/gibbs.txt

With that URL given to Rweb, one variable x is loaded. (If you are doing this problem in R rather than Rweb, see the footnote about reading this data into R).

These data are a stationary time series for which we want to estimate the 0.9 quantile (also called the 90th percentile) of the marginal distribution of each x value (the marginal distribution is the same for all times by stationarity). The quantile function (on-line help) estimates quantiles.

Calculate the 0.9 quantile of x.

Calculate a 95% confidence interval for the parameter estimated by this estimator (the population 0.9 quantile) calculated as described in our careful example for bootstrapping time series and also hand in a histogram with relevant quantiles marked as done in that example.

Use subsampling bootstrap sample size b = 50.

You may assume this estimator obeys the square root law (has n1 ⁄ 2 rate of convergence).

Question 4 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/f06/5601/mydata/gamma.txt

With that URL given to Rweb, one variable x is loaded. (If you are doing this problem in R rather than Rweb, see the footnote about reading this data into R).

This is independent and identically distributed data. It comes from a gamma distribution, which is a skewed distribution having density curve

f(x) = c(α, λ) xα - 1 exp(− λ x),         x > 0

where c(α, λ) is a constant that depends on the parameters α and λ (whose exact expression doesn't matter).

In this problem we wish to estimate the shape parameter α. It is explained in theory books that the so-called method of moments estimator of α is

alpha.hat <- mean(x)^2 / (((n - 1) / n) * var(x))

(as usual, n is the sample size length(x) and the (n - 1) / n converts the so-called sample variance into the empirical variance).

Calculate a bootstrap t with double bootstrap variance estimate 95% confidence interval for the true unknown parameter α using the method of moments point estimator. Use bootstrap sample size 1000 and the default double bootstrap sample size (for the inner bootstrap that determines the variance estimate).

Footnote about Reading Data for Problem 1 into R

If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in a URL at the beginning. So all together, you must do for problem 1, for example,

X <- read.table(url("http://www.stat.umn.edu/geyer/f06/5601/mydata/camel.txt"),
    header = TRUE)
names(X)
attach(X)

To produce the variables x and y needed for your analysis.

Of course, you read different data files for different problems that use external data entry, and the variables in those files may have names other than x and y. Everything else stays the same.