University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb

Stat 5601 (Geyer) Final Exam

Go to question: 1 2 3 4

General Instructions

The exam is take-home. It is due in my office 6:00pm, Friday December 21, but can be handed in any time before. If I am not in my office, it can be handed to the staff in the School of Statistics office (313 Ford Hall) when they are open (8:00 to 4:30 Monday through Friday).

Do not work with anyone else or even discuss the test with anyone but the instructor.

The exam is open book, open notes, open web pages. You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer.

Show all your work:

For simple computer commands, you may just write the command you used and the result it gave.
For complicated commands or plots, make a printout and attach the printout to your test paper.

No credit for numbers with no indication of where they came from!

Question 1 [20 pts.]

The file

http://www.stat.umn.edu/geyer/5601/mydata/darwin.txt

contains data on two variables x and y which are paired observations obtained by Charles Darwin in an experiment designed to study whether seedlings from cross-fertilized plants tend to be superior to those from self-fertilized plants. Each pair consists of the plant height of two plants grown from two seeds from the same parent, one (x) cross-fertilized and one (y) self fertilized (data and description taken from The Statistical Sleuth by Ramsey and Schafer, Duxbury, 1997).

In this problem we are interested in two estimators of the difference (in height) between the two groups, the median and the median of Walsh averages (the point estimates that go with the sign test and the Wilcoxon signed rank test, respectively).

For each of these estimators (or for each of the corresponding tests, however you wish to think of it) calculate the following three things (that's six things in all: three for each of two estimators).

the point estimate
the corresponding confidence interval with the smallest possible confidence level that is at least 95% (report the actual confidence level you use)
the P-value for the two-tailed test with null hypothesis of no difference between the groups

Question 2 [20 pts.]

The file

http://www.stat.umn.edu/geyer/5601/mydata/e.txt

contains data on two variables x and y which are paired. We wish to examine the dependence of y on x nonparametrically.

Using the procedures associated with Kendall's tau find the following three things

The point estimate of tau
A 95% large-sample confidence interval for tau
The P-value for the two-tailed test of whether tau is zero.

Also interpret the P-value. Is there statistically significant dependence of y on x?

Extra Credit

Notice that the parametric procedures (done by the cor.test function with no method specified) produce quite a different story. A glance at the plot (done by plot(x, y)) may give you an idea why. What is the reason?

Question 3 [30 pts.]

The file

http://www.stat.umn.edu/geyer/5601/mydata/sally.txt

contains data on two variables x and y. In this problem we are only interested in y which is a random sample from a Cauchy population.

In this problem we are interested in two estimators of location, the median and the 25% trimmed mean, calculated by the code

median(y)
mean(y, trim = 0.25)

For each of these estimators calculate the following four 95% confidence intervals (that's eight intervals in all: four intervals for each of two estimators).

point estimate plus or minus 1.96 bootstrap standard errors
the bootstrap t interval using the double bootstrap variance estimate
Note: For this interval, with these estimators, the bootstrap sample sizes suggested in the web page take too long (more than 5 minutes and Rweb usually loses the results -- this wouldn't happen if you were running R not through Rweb). So use nboott = 1000 and the default for nbootsd (that is, don't specify it and let the function use its default).
Efron's percentile method
the BC_a interval

Two of these four types of intervals have so-called second order accuracy. Which are they?

Question 4 [30 pts.]

The lynx data set included in the R time series package is made available and plotted by the commands

library(ts)
data(lynx)
plot(lynx)

A description of this time series is given by the on-line help for this data set.

We may consider this a stationary time series (although is has also been considered an example of chaotic dynamics in the biology literature). We are interested in three estimators

quantile(lynx, 0.25)
median(lynx)
quantile(lynx, 0.75)

which we may consider to have asymptotics with a square root of n rate.

Find bootstrap 95% confidence intervals for these three estimators using the subsampling bootstrap and the confidence interval method explained in the handout and on the subsampling bootstrap confidence intervals web page (that's three intervals, one for each estimator).

Use a subsample size of b = 30 so that you get the same results as everyone else.