# Stat 5601 (Geyer) Examples (Subsampling Bootstrap)

## General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

## Time Series

### Sections 8.5 and 8.6 in Efron and Tibshirani.

• As usual, `library(bootstrap)` says we are going to use code in the `bootstrap` library, which is not available without this command.

• The vector `z[-1]` is all the elements of `z` except the first and the vector `z[-n]` is all the elements of `z` except the last.

Thus the statement

```out <- lm(z[-1] ~ z[-n] + 0)
```
regresses zt on zt - 1 with no intercept (the `+ 0` means no intercept).

• For time series, the subsampling bootstrap uses only blocks of contiguous variables. For a series of length `n` and blocks of `blen` there are exactly `n - blen + 1` such blocks. Generally, we use them all. No need for random samples.

• Note that the bootstrap samples are correlated, as the time series plot for `beta.star` shows. However, this does not matter, so long as `blen` is long enough so the samples are representative of the behavior of the whole series.

As usual, Efron and Tibshirani are using a ridiculously small sample size in this toy problem. There is no reason to believe the subsampling bootstrap here. But it is reasonable for (much) larger data sets.

• The `sqrt(blen / n)` in the last line adjusts for the relative sample sizes of the subsample and the whole series. Note that the `sqrt` here is only valid for estimators obeying the square root law. If the rate is not root n, then a different function of `blen / n` is needed, as in the following example.

## Extreme Values

### Section 7.4 in Efron and Tibshirani.

#### Theory

Suppose X1, X2, . . ., Xn are independent and identically distributed Uniform(0, θ) random variables. Since the larger the sample the more the largest values crowd up against θ, the natural estimator of θ is the maximum data value X(n). This is in fact the maximum likelihood estimate.

The main statistical interest in this estimator is that it is a counter example to both the square root law and the usual asymptotics of maximum likelihood.

• The rate is n rather than root n.

• The asymptotic distribution is not normal.

More precisely, (this was proved for homework in my theory class, Problem 10-4 in my lecture notes)

n (θ − X(n))
converges in distribution to the Exp(1 / θ) distribution.

But to use the subsampling bootstrap, we need only know that that the square root law fails and the actual rate is n.

## External Data Entry

Enter a dataset URL :

• The vector `xmax.star` stores `max(x)` for samples from the subsampling bootstrap.

• The vector `xmax.bogo` stores `max(x)` for samples from the ordinary (Efron) bootstrap.

• Note that the `sample` statement is quite different for the regular (Efron) bootstrap and the (Politis and Romano) subsampling bootstrap.

For the Efron bootstrap, we sample with replacement at the original sample size with something like

```x.star <- sample(x, replace = TRUE)
```

The subsampling bootstrap samples without replacement at the much smaller sample size `nsub` with something like

```x.star <- sample(x, nsub, replace = FALSE)
```

Both the `size` and the `replace` arguments of `sample` differ. (For the Efron bootstrap the `size` argument is missing so the default `length(x)` is used.)

• Since the asymptotic distribution is non-normal, it makes no sense to be calculating standard errors. What does make sense is a bootstrap percentile interval, but we will have to wait until we learn about that and revisit this issue.

• For now, we just show that the subsampling bootstrap has done the Right Thing (with a capital R and a capital T). The plot is a so-called Q-Q plot. The sorted values of the variable
```z.star <- nsub * (xmax - xmax.star)
```
which is supposed to have an Exp(1 / θ) distribution according to the theory, are plotted against the appropriate quantiles of this distribution. If the points lie near the line y = x, then `z.star` does indeed have the claimed distribution.

We emphasize that we don't need to know the asymptotic distribution to use the bootstrap samples `z.star` to construct a confidence interval for θ. We can't do it yet because we haven't covered Chapters 12, 13, and 14 in Efron and Tibshirani. When we've done them, we can return to this example and finish it.

• For comparison, we put also the `xmax.bogo` samples on the Q-Q plot, so it can be clearly seen they do the Wrong Thing (with a capital W and a capital T).