University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb

The exam is take-home. It is due in my office 6:00pm, Friday December 21, but can be handed in any time before. If I am not in my office, it can be handed to the staff in the School of Statistics office (313 Ford Hall) when they are open (8:00 to 4:30 Monday through Friday).

Do not work with anyone else or even discuss the test with anyone but the instructor.

The exam is open book, open notes, open web pages. You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer.

Show all your work:

- For simple computer commands, you may just write the command
you used and the result it gave.
- For complicated commands or plots, make a printout and attach
the printout to your test paper.

The file

http://www.stat.umn.edu/geyer/5601/mydata/darwin.txtcontains data on two variables

`x`

and `y`

which
are paired observations obtained by Charles Darwin in an experiment designed
to study whether seedlings from cross-fertilized plants tend to be superior
to those from self-fertilized plants. Each pair consists of the plant height
of two plants grown from two seeds from the same parent, one (`x`

)
cross-fertilized and one (`y`

) self fertilized
(data and description taken from In this problem we are interested in two estimators of the difference (in height) between the two groups, the median and the median of Walsh averages (the point estimates that go with the sign test and the Wilcoxon signed rank test, respectively).

For each of these estimators (or for each of the corresponding tests, however you wish to think of it) calculate the following three things (that's six things in all: three for each of two estimators).

- the point estimate
- the corresponding confidence interval with the smallest possible confidence level that is at least 95% (report the actual confidence level you use)
- the
`P`-value for the two-tailed test with null hypothesis of no difference between the groups

The file

http://www.stat.umn.edu/geyer/5601/mydata/e.txtcontains data on two variables

`x`

and `y`

which are
paired. We wish to examine the dependence of `y`

on `x`

nonparametrically.
Using the procedures associated with Kendall's tau find the following three things

- The point estimate of tau
- A 95% large-sample confidence interval for tau
- The
`P`-value for the two-tailed test of whether tau is zero.

Also interpret the `P`-value. Is there statistically significant
dependence of `y`

on `x`

?

Notice that the parametric procedures (done by the `cor.test`

function with no `method`

specified)
produce quite a different story. A glance at the plot (done by
`plot(x, y)`

) may give you an idea why. What is the reason?

The file

http://www.stat.umn.edu/geyer/5601/mydata/sally.txtcontains data on two variables

`x`

and `y`

.
In this problem we are only interested in `y`

which is
a random sample from a Cauchy population.
In this problem we are interested in two estimators of location, the median and the 25% trimmed mean, calculated by the code

median(y) mean(y, trim = 0.25)

For each of these estimators calculate the following four 95% confidence intervals (that's eight intervals in all: four intervals for each of two estimators).

- point estimate plus or minus 1.96 bootstrap standard errors
- the bootstrap t interval using the double bootstrap variance estimate
**Note:**For this interval, with these estimators, the bootstrap sample sizes suggested in the web page take too long (more than 5 minutes and Rweb usually loses the results -- this wouldn't happen if you were running R not through Rweb). So use`nboott = 1000`

and the default for`nbootsd`

(that is, don't specify it and let the function use its default). - Efron's percentile method
- the BC
_{a}interval

Two of these four types of intervals have so-called second order accuracy

. Which are they?

The `lynx`

data set included in the R time series package
is made available and plotted by the commands

library(ts) data(lynx) plot(lynx)A description of this time series is given by the on-line help for this data set.

We may consider this a stationary time series (although is has also been considered an example of chaotic dynamics in the biology literature). We are interested in three estimators

quantile(lynx, 0.25) median(lynx) quantile(lynx, 0.75)which we may consider to have asymptotics with a square root of

Find bootstrap 95% confidence intervals for these three estimators using the subsampling bootstrap and the confidence interval method explained in the handout and on the subsampling bootstrap confidence intervals web page (that's three intervals, one for each estimator).

Use a subsample size of `b` = 30 so that you get the same results
as everyone else.