University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb
The exam is take-home. It is due in my office 6:00pm, Friday December 21, but can be handed in any time before. If I am not in my office, it can be handed to the staff in the School of Statistics office (313 Ford Hall) when they are open (8:00 to 4:30 Monday through Friday).
Do not work with anyone else or even discuss the test with anyone but the instructor.
The exam is open book, open notes, open web pages. You may use the computer, a calculator, or pencil and paper to get answers, but it is expected that you will use the computer.
Show all your work:
The file
http://www.stat.umn.edu/geyer/5601/mydata/darwin.txtcontains data on two variables
x
and y
which
are paired observations obtained by Charles Darwin in an experiment designed
to study whether seedlings from cross-fertilized plants tend to be superior
to those from self-fertilized plants. Each pair consists of the plant height
of two plants grown from two seeds from the same parent, one (x
)
cross-fertilized and one (y
) self fertilized
(data and description taken from The Statistical Sleuth by
Ramsey and Schafer, Duxbury, 1997).
In this problem we are interested in two estimators of the difference (in height) between the two groups, the median and the median of Walsh averages (the point estimates that go with the sign test and the Wilcoxon signed rank test, respectively).
For each of these estimators (or for each of the corresponding tests, however you wish to think of it) calculate the following three things (that's six things in all: three for each of two estimators).
The file
http://www.stat.umn.edu/geyer/5601/mydata/e.txtcontains data on two variables
x
and y
which are
paired. We wish to examine the dependence of y
on x
nonparametrically.
Using the procedures associated with Kendall's tau find the following three things
Also interpret the P-value. Is there statistically significant
dependence of y
on x
?
Notice that the parametric procedures (done by the cor.test
function with no method
specified)
produce quite a different story. A glance at the plot (done by
plot(x, y)
) may give you an idea why. What is the reason?
The file
http://www.stat.umn.edu/geyer/5601/mydata/sally.txtcontains data on two variables
x
and y
.
In this problem we are only interested in y
which is
a random sample from a Cauchy population.
In this problem we are interested in two estimators of location, the median and the 25% trimmed mean, calculated by the code
median(y) mean(y, trim = 0.25)
For each of these estimators calculate the following four 95% confidence intervals (that's eight intervals in all: four intervals for each of two estimators).
Note: For this interval, with these estimators, the bootstrap sample
sizes suggested in the web page take too long (more than 5 minutes and Rweb
usually loses the results -- this wouldn't happen if you were running R not
through Rweb). So use nboott = 1000
and the default for
nbootsd
(that is, don't specify it and let the function use
its default).
Two of these four types of intervals have so-called second order accuracy
. Which are they?
The lynx
data set included in the R time series package
is made available and plotted by the commands
library(ts) data(lynx) plot(lynx)A description of this time series is given by the on-line help for this data set.
We may consider this a stationary time series (although is has also been considered an example of chaotic dynamics in the biology literature). We are interested in three estimators
quantile(lynx, 0.25) median(lynx) quantile(lynx, 0.75)which we may consider to have asymptotics with a square root of n rate.
Find bootstrap 95% confidence intervals for these three estimators using the subsampling bootstrap and the confidence interval method explained in the handout and on the subsampling bootstrap confidence intervals web page (that's three intervals, one for each estimator).
Use a subsample size of b = 30 so that you get the same results as everyone else.