University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb
For the data in Table 3.3 in Hollander and Wolfe (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t3-3.txt
The problem just above is Problems 3.1, 3.19, 3.27, 3.63, and more in (H & W).
For the data in Table 3.6 in (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t3-6.txt, repeat parts (a) and (b) above. Read 3.43 in place of 3.1 in part (a).
The problem just above is Problem 3.12 and more in (H & W).
Problem 3.54 in (H & W).
Problem 3.87 in (H & W). The data are in Table 3.9 in (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t3-9.txt.
Note: answers are are now here.
For the data in Table 4.3 in Hollander and Wolfe (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t4-3.txt
This is Problems 4.1, 4.15, 4.27, and more.
Problem 11.29. The data are at http://www.stat.umn.edu/geyer/5601/hwdata/t11-15.txt
For the data in Table 4.4 in Hollander and Wolfe (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t4-4.txt
This is Problems 4.5, 4.19, 4.34, and more.
Note: answers are are now here.
For the data in Table 6.4 in Hollander and Wolfe (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t6-4.txt
For the data in Table 6.7 in Hollander and Wolfe (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t6-7.txt Do a Jonckheere-Terpstra test of the hypotheses described in Problem 6.19 in Hollander and Wolfe. Describe the result of the test in terms of "statistical significance".
For the data in Table 7.2 in Hollander and Wolfe (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t7-2.txt
For the data in Table 8.3 in Hollander and Wolfe (H & W) also at http://www.stat.umn.edu/geyer/5601/hwdata/t8-3.txt
This is problems 8.1, 8.20, and, except for a change of confidence level, 8.27.
Note: answers are are here.
4.1, 4.2, 4.3 (a), 5.2, and 5.4
x
compute
mean(x, trim=0.25)
And for each of these three point estimates calculate a bootstrap standard error (use bootstrap sample size 1000 at least).
y
x
is actually a random sample from a normal distribution,
we actually know the asymptotic relative efficiency (ARE) of estimators
(i) and (ii) in part (a). What is it, and how close is the ratio of
bootstrap standard errors?
y
is actually a random sample from a Cauchy distribution,
those who have had a theory course should know the asymptotic relative
efficiency (ARE) of estimators
(i) and (ii) in part (b). What is it, and how close is the ratio of
bootstrap standard errors?
Note: answers to the additional question are are here.
The data set LakeHuron
in the ts
package gives
annual measurements of the level, in feet, of Lake Huron 1875-1972
.
As with all data sets included in R the usage is
library(ts) data(LakeHuron)Then the name of the time series is
LakeHuron
, for example,
plot(LakeHuron)does a time series plot.
The average water level over this period was
> mean(LakeHuron) [1] 579.0041Obtain a standard error for this estimate using the subsampling bootstrap with subsample length b = 10. Assume that the sample mean obeys the square root law (that is, the
rateis square root of n).
The
documentation for the R function lmsreg
says
There seems no reason other than historical to use thelms
andlqs
options. LMS estimation is of low efficiency (converging at rate n^{-1/3}) whereas LTS has the same asymptotic efficiency as an M estimator with trimming at the quartiles (Marazzi, 1993, p.201). LQS and LTS have the same maximal breakdown value of(floor((n-p)/2) + 1)/n
attained iffloor((n+p)/2) <= quantile <= floor((n+p+1)/2)
. The only drawback mentioned of LTS is greater computation, as a sort was thought to be required (Marazzi, 1993, p.201) but this is not true as a partial sort can be used (and is used in this implementation).
Thus it seems that LMS regression is the
Wrong Thing (with a capital W and a capital T), something that survives
only for historico-sociological reasons: it was invented first and most
people that have heard of robust regression at all have only heard of it.
To be fair to Efron and Tibshirani, the literature cited in the documentation
for lmsreg
is the same vintage as their book. So maybe they
thought they were using the latest and greatest.
Anyway, the problem is to redo the LMS examples using LTS.
What changes? Does LTS really work better here than LMS? Describe the differences you see and why you think these differences indicate LTS is better (or worse, if that is what you think) than LMS.
Note: LTS does take longer, so a smaller nboot
might be
advisible. Also I got some warning messages, which perhaps (???) can
be ignored.
For data used in Question 1 on the second midterm, at http://www.stat.umn.edu/geyer/5601/mydata/gamma.txt we calculated the sample coefficient of skewness calculated by the function
skew <- function(x) { xbar <- mean(x) mu2.hat <- mean((x - xbar)^2) mu3.hat <- mean((x - xbar)^3) mu3.hat / sqrt(mu2.hat)^3 }
second order accuracyproperty using the
boott
function.
Note: Since you have no idea about how to write an sdfun
that will variance stabilize the coefficient of skewness, you will have to
use one of the other two methods described on the
bootstrap t
page.
Note: this will require that you write a quite different
function, starting
skew
skew <- function(p, x) {(and you have to fill in the rest of the details, which should, I hope, be clear enough from the discussion of our ABC example).
For data used in Question 2 on the second midterm, at http://www.stat.umn.edu/geyer/5601/mydata/ar1.txt we calculated the sample mean (and the sample standard deviation, but we'll ignore the latter for this problem).
Note: The test question forgot to say this explicitly, but this time series does obey the square root law. The proper rate to us is square root of n.
As on the test, use subsample size 50 for both parts (you can use the same samples).
Note: answers are are here.