## General Instructions

The exam is open book, open notes, open web pages. Do not discuss this exam with anyone except the instructor.

You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer. Show all your work:

• For simple computer commands, you may just write down the command you used and the result it gave on your exam solution.

• For complicated commands or plots, make a printout and attach the printout to your exam solution.

No credit for numbers with no indication of where they came from!

## Question 1 [25 pts.]

The data for this problem are at the URL

With that URL given to Rweb, two variables `x` and `y` are loaded. (If you are doing this problem in R rather than Rweb, see the footnote about reading this data into R).

This is regression data. We assume the standard model that is nonparametric about the regression function

Yi = g(Xi) + σ Zi

where g is an unknown smooth function (infinite-dimensional parameter), σ is an unknown constant (scalar parameter), and the Zi are IID standard normal.

Use the R function `sm.regression` (on-line help) to fit a regression function (g hat) to these data. Use optimal smoothing, where optimal is defined by this package's method (ordinary cross-validation).

Hand in a scatterplot with the smoothing spline regression estimate shown. Also report the value of the bandwidth used (the one chosen by cross-validation).

## Question 2 [25 pts.]

This problem continues the analysis started in Question 1 and uses the same data and the same model assumptions.

Suppose in the plot that is the answer to Question 1 we want a confidence interval for the value of the population regression function g(x) at x = 2.0.

Run an (Efron) nonparametric bootstrap to estimate the sampling distribution of this estimator of g(2.0) and calculate the 95% bootstrap percentile confidence interval obtained from the bootstrap sampling distribution.

The following detailed instructions are necessary. They were not covered in class or on the class web page about `sm.regression` (which was here).

• Just after the `library(sm)` command, before doing anything else, issue the command
```sm.options(eval.points = x)
```

This statement makes the `sm.regression` function evaluate the smooth at the given `x` points (which is not its default behavior). This command need be given only once. It stays in effect throughout the Rweb submission.

• It turns out that the function `hcv` is now obsolete, although it is still in the package and can be used in problem 1. For this problem, however, the bootstrap causes it to fail and the new function `h.select` (on-line help) should be used. Instead of
```h <- hcv(x, y)
```

in this problem do

```h <- h.select(x, y, method = "cv")
```
• Be sure to recalculate the optimal bandwidth for each bootstrap sample (that way the bootstrap approximates the error in both parts of the algorithm: bandwidth calculation and smoothing). Of course, the arguments of `h.select` change to evaluate for bootstrap data.

• If `out` is the output of the `sm.regression` function (on-line help), then
```out\$estimate
```

gives the vector of predicted values at all `x` points and

```out\$estimate[out\$eval.points == 2.0]
```

gives the predicted value corresponding to `x` = 2.0.

• Bootstrap residuals not cases. (The required predicted values are discussed in the preceding item.)

• Use bootstrap sample size 400 (anything longer takes too long on `rweb.stat.umn.edu`).

• In order to not make `nboot` plots, the option `display = "none"` must be given to the function `sm.regression` inside the bootstrap loop (and also outside if you prefer).

Hand in a histogram of your bootstrap estimate of the sampling distribution of the estimator showing the endpoints of the confidence interval on the histogram. Also report the numbers that are the endpoints of the confidence interval.

## Question 3 [25 pts.]

The data for this problem are at the URL

With that URL given to Rweb, one variable `x` is loaded. (If you are doing this problem in R rather than Rweb, see the footnote about reading this data into R).

These data are a stationary time series for which we want to estimate the 0.9 quantile (also called the 90th percentile) of the marginal distribution of each `x` value (the marginal distribution is the same for all times by stationarity). The `quantile` function (on-line help) estimates quantiles.

Calculate the 0.9 quantile of `x`.

Calculate a 95% confidence interval for the parameter estimated by this estimator (the population 0.9 quantile) calculated as described in our careful example for bootstrapping time series and also hand in a histogram with relevant quantiles marked as done in that example.

Use subsampling bootstrap sample size b = 50.

You may assume this estimator obeys the square root law (has n1 ⁄ 2 rate of convergence).

## Question 4 [25 pts.]

The data for this problem are at the URL

With that URL given to Rweb, one variable `x` is loaded. (If you are doing this problem in R rather than Rweb, see the footnote about reading this data into R).

This is independent and identically distributed data. It comes from a gamma distribution, which is a skewed distribution having density curve

f(x) = c(α, λ) xα - 1 exp(− λ x),         x > 0

where c(α, λ) is a constant that depends on the parameters α and λ (whose exact expression doesn't matter).

In this problem we wish to estimate the shape parameter α. It is explained in theory books that the so-called method of moments estimator of α is

```alpha.hat <- mean(x)^2 / (((n - 1) / n) * var(x))
```

(as usual, `n` is the sample size `length(x)` and the `(n - 1) / n` converts the so-called sample variance into the empirical variance).

Calculate a bootstrap t with double bootstrap variance estimate 95% confidence interval for the true unknown parameter α using the method of moments point estimator. Use bootstrap sample size 1000 and the default double bootstrap sample size (for the inner bootstrap that determines the variance estimate).

## Footnote about Reading Data for Problem 1 into R

If you are doing this problem in R rather than Rweb, you will have to duplicate what Rweb does reading in a URL at the beginning. So all together, you must do for problem 1, for example,

```X <- read.table(url("http://www.stat.umn.edu/geyer/f06/5601/mydata/camel.txt"),
header = TRUE)
names(X)
attach(X)
```

To produce the variables `x` and `y` needed for your analysis.

Of course, you read different data files for different problems that use external data entry, and the variables in those files may have names other than `x` and `y`. Everything else stays the same.