University of Minnesota, Twin Cities     School of Statistics     Stat 5601     Rweb

Stat 5601 (Geyer) Final Exam

Go to question:     1     2     3     4

General Instructions

The exam is take-home. It is due in my office 6:00pm, Friday December 21, but can be handed in any time before. If I am not in my office, it can be handed to the staff in the School of Statistics office (313 Ford Hall) when they are open (8:00 to 4:30 Monday through Friday).

Do not work with anyone else or even discuss the test with anyone but the instructor.

The exam is open book, open notes, open web pages. You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer.

Show all your work:

No credit for numbers with no indication of where they came from!

Question 1 [20 pts.]

The file

http://www.stat.umn.edu/geyer/5601/mydata/darwin.txt
contains data on two variables x and y which are paired observations obtained by Charles Darwin in an experiment designed to study whether seedlings from cross-fertilized plants tend to be superior to those from self-fertilized plants. Each pair consists of the plant height of two plants grown from two seeds from the same parent, one (x) cross-fertilized and one (y) self fertilized (data and description taken from The Statistical Sleuth by Ramsey and Schafer, Duxbury, 1997).

In this problem we are interested in two estimators of the difference (in height) between the two groups, the median and the median of Walsh averages (the point estimates that go with the sign test and the Wilcoxon signed rank test, respectively).

For each of these estimators (or for each of the corresponding tests, however you wish to think of it) calculate the following three things (that's six things in all: three for each of two estimators).

Question 2 [20 pts.]

The file

http://www.stat.umn.edu/geyer/5601/mydata/e.txt
contains data on two variables x and y which are paired. We wish to examine the dependence of y on x nonparametrically.

Using the procedures associated with Kendall's tau find the following three things

Also interpret the P-value. Is there statistically significant dependence of y on x?

Extra Credit

Notice that the parametric procedures (done by the cor.test function with no method specified) produce quite a different story. A glance at the plot (done by plot(x, y)) may give you an idea why. What is the reason?

Question 3 [30 pts.]

The file

http://www.stat.umn.edu/geyer/5601/mydata/sally.txt
contains data on two variables x and y. In this problem we are only interested in y which is a random sample from a Cauchy population.

In this problem we are interested in two estimators of location, the median and the 25% trimmed mean, calculated by the code

median(y)
mean(y, trim = 0.25)

For each of these estimators calculate the following four 95% confidence intervals (that's eight intervals in all: four intervals for each of two estimators).

Two of these four types of intervals have so-called second order accuracy. Which are they?

Question 4 [30 pts.]

The lynx data set included in the R time series package is made available and plotted by the commands

library(ts)
data(lynx)
plot(lynx)
A description of this time series is given by the on-line help for this data set.

We may consider this a stationary time series (although is has also been considered an example of chaotic dynamics in the biology literature). We are interested in three estimators

quantile(lynx, 0.25)
median(lynx)
quantile(lynx, 0.75)
which we may consider to have asymptotics with a square root of n rate.

Find bootstrap 95% confidence intervals for these three estimators using the subsampling bootstrap and the confidence interval method explained in the handout and on the subsampling bootstrap confidence intervals web page (that's three intervals, one for each estimator).

Use a subsample size of b = 30 so that you get the same results as everyone else.