## Due Date

Due Mon Nov 11, 2013.

## First Problem

The data set `LakeHuron`

included in R
(on-line help)
gives
annual measurements of the level, in feet, of Lake Huron 1875-1972

.

The name of the time series is `LakeHuron`

, for example,

plot(LakeHuron)

does a time series plot.

The average water level over this period was

> mean(LakeHuron) [1] 579.0041

Obtain a standard error for this estimate
(i. e., `mean(LakeHuron)`

) using the subsampling bootstrap
with subsample length `b` = 10. Assume that the sample mean
obeys the square root law (that is, the rate

is
square root of `n`),
and assume the time series is stationary.

## Second Problem

The
documentation for the R function `lmsreg`

says

There seems no reason other than historical to use the

`lms`

and`lqs`

options. LMS estimation is of low efficiency (converging at raten^{− 1 ⁄ 3}) whereas LTS has the same asymptotic efficiency as an M estimator with trimming at the quartiles (Marazzi, 1993, p. 201). LQS and LTS have the same maximal breakdown value of`(floor((n-p)/2) + 1)/n`

attained if`floor((n+p)/2) <= quantile <= floor((n+p+1)/2)`

. The only drawback mentioned of LTS is greater computation, as a sort was thought to be required (Marazzi, 1993, p. 201) but this is not true as a partial sort can be used (and is used in this implementation).

Thus it seems that LMS regression is the
Wrong Thing (with a capital W and a capital T), something that survives
only for historico-sociological reasons: it was invented first and most
people that have heard of robust regression at all have only heard of it.
To be fair to Efron and Tibshirani, the literature cited in the documentation
for `lmsreg`

is the same vintage as their book. So maybe they
thought they were using the latest and greatest.

Anyway, the problem is to redo the LMS examples using LTS.

What changes? Does LTS really work better here than LMS? Describe the differences you see and why you think these differences indicate LTS is better (or worse, if that is what you think) than LMS.

## Third Problem

The file

contains a vector `x`

of
data from a heavy tailed distribution such that the
sample mean has
rate of convergence `n`^{1 ⁄ 3},
that is

`n^(1 / 3) * (theta.hat - theta)`

has nontrivial asymptotics (nontrivial

here meaning it doesn't
converge to zero in probability and also is bounded in probability,
so `n`^{1 ⁄ 3} is the right rate) where
`theta.hat`

is the sample mean and `theta`

is the true unknown population mean.

When the sample mean behaves as badly as this, the sample variance
behaves even worse (it converges in probability to infinity), but
robust measures of scale make sense, for example, the interquartile
range (calculated by the `IQR`

function in R, note that
the capital letters are not a mistake).

- Using subsample size
`b`= 20, do a subsampling bootstrap to estimate the distribution of the sample mean. - Make a histogram of
`theta.star`

marking the point`theta.hat`

. - Make a histogram of
`b^(1 / 3) * (theta.star - theta.hat)`

which is the analog in thebootstrap world

of the distribution of the quantity having nontrivial asymptotics displayed above. - Calculate the subsampling bootstrap estimate of the IQR
of
`theta.hat`

, rescaling by the ratio of rates in the appropriate fashion.

## Answers

Answers in the back of the book

are here.