University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb Computing Examples

Stat 5601 (Geyer) Final Exam, Due Wednesday, December 18, 3:30 PM (or earlier)

Question 1
Question 2
Question 3
Question 4

Question 1 [25 pts.]

The data for this problem are at the URL

http://rweb.stat.umn.edu/WSdata/Ch08data/passtime.txt

With that URL given to Rweb, one variable passtime is loaded.

Data taken from Wild and Seber Chance Encounters: A First Course in Data Analysis and Inference (Wiley, 2000), page 328.

These data are measurements of the passage time in microseconds for a beam of light to pass from one mirror to another 3721 meters away and return. The experiment was performed by the American physicist Simon Newcomb in 1882. The vector

   speed <- 2 * 3721 / passtime * 1e6

gives the values velocity of light corresponding to the passage time measurements.

Calculate the Hodges-Lehmann estimator associated with the Wilcoxon signed rank test for these data (speed not passtime).
Calculate the confidence interval for the parameter this estimator estimates associated with the Wilcoxon signed rank test. Give the smallest interval that has achieved confidence level above 95% (note this is a bit different from the instructions on the other tests). Report the interval and the achieved confidence level.
What parameter does this point estimator estimate (and the confidence interval try to cover)? What assumptions are required about the distribution of speed of light measurements in order for this parameter to make sense and the point estimate and confidence interval to be nonparametric distribution-free?

Question 2 [25 pts.]

The data for this problem are at the URL

http://www.stat.umn.edu/geyer/5601/mydata/w.txt

With that URL given to Rweb, one variable x is loaded.

The one-sample Kolmogorov-Smirnov test can be modified to allow estimated parameters. To test whether a data set is normal, just do

ks.test(x, pnorm, mean = mean(x), sd = sd(x))

But (a very big but), when the mean and sd arguments are estimates as we have here rather than known constants specifed without reference to the data, then the test statistic computed by the R function

foo <- function(x) ks.test(x, pnorm, mean = mean(x), sd = sd(x))$statistic

no longer has the same distribution as when the parameters are known (the theoretical distribution of the Kolmogorov-Smirnov test statistic).

The distribution of this estimate with parameters estimated has no nice theory, is not distribution-free, and must be approximated by simulation. When the distribution assumed by the null hypothesis is the normal distribution, this test is known as the Lillefors test. Since change of location or scale does not change the distribution of the test statistic, this is not, strictly speaking, a bootstrap. The Lillefors test is exact and distribution-free. However, it proceeds just like a parametric bootstrap. Generate (parametric) bootstrap data using the R statement

x.star <- rnorm(n)

where n is the length of the data x.

Do a parametric bootstrap using at least 10,000 (1e4) bootstrap samples.

Draw a histogram showing the bootstrap distribution of the test statistic calculated by the function foo defined above. Show the value of the test statistic for the observed data by a line on the histogram.
Calculate the P-value of the Lillefors test, which is the probability that the bootstrap test statistic is larger than the observed test statistic.
For comparison, get the so-called P-value calculated by the ks.test function, just to see how wrong it is.

Question 3 [25 pts.]

The data for this problem are the cholost data set in the bootstrap library of R used for all the examples on the more on regression page, in particular, for the smoothing spline example.

When the smoothing spline is calculated as in that example, the predicted value at the point x = 43 is calculated by the code

as.numeric(predict(out, newdata = data.frame(x = 43)))

(the as.numeric is required to keep the bootstrap below from crashing).

Note: this originally said 42 instead of 43 (my mistake) you can use 42 for your answer if you've already done that.

Calculate the predicted value at the point x = 43 described above.
Calculate a 95% BC_a confidence interval for the predicted value at the point x = 43.

Hint: The on-line help for the bcanon function describes how to use bcanon to bootstrap regression (or any complex data structure) using the k.star trick. Write your theta function to take one argument k.star

foo <- function(k.star) {
some code that calculates the estimate as a function of k.star
}

and give the sequence 1:n as the first argument to bcanon

bcanon(1:n, theta = foo, and any other arguments you need)

Question 4 [25 pts.]

The data for this problem are the lutenhorm data set in the bootstrap library of R that was used as the example of the subsampling bootstrap for time series both the simple standard error calculation and the more complicated confidence interval calculation. What you are to do is change the latter (the more complicated confidence interval calculation) to use a sensible estimator of the AR(1) parameter ρ.

The R code

library(ts)
ar.mle(x, order.max = 1)$ar

is the estimator of the AR(1) parameter for the time series x that we want to use for this problem. This function calculates the maximum likelihood estimator (on-line help), but you don't need to know what it is to use it. Note that you do need the library statement to access to this function.

Change the example to use this estimator, and produce the point estimate, histogram from which critical values are derived, and confidence interval using these critical values as in the example (but with the new estimator). (You do not need to produce the last confidence interval based on standard error and an assumption of normality and unbiasedness of the estimator).