To do each example, just click the
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
Now that we have some idea of how (Efron style) bootstrap confidence intervals work, we can return to the subsampling bootstrap and look at the kind of confidence intervals Politis and Romano (1994) recommend for the subsampling bootstrap.
Somewhat suprisingly, they recommend the kind of intervals that Efron and
Tibshirani disparage in Section 13.4 titles
Is the Percentile Interval
Backwards?. This section is a response to the recommendations of Hall
(The Bootstrap and Edgeworth Expansion, Springer, 1992, and earlier
papers) where he charactizes bootstrap percentile intervals as
[in] the wrong statistical tables backwards.
We take no sides on the argument between Efron and Hall. There are arguments on both sides, and either method performs well in some examples and badly in others.
However, for the subsampling bootstrap, there seems to be no dispute. The standard method for the subsampling bootstrap is very much like what Hall recommends for the ordinary bootstrap.
This method is explained in a handout (which will be handed out in class). For those who didn't get a copy in dead tree format, here is a PDF.
- Everything down to the bottom of the for loop should be familiar,
just like what we did when we previously
visited this example except some plot commands have been removed.
The four lines below the bootstrap loop calculate the subsampling
bootstrap confidence interval. The example could have stopped there.
The histogram shows the distribution which the subsampling bootstrap uses
to calculate critical values. And the confidence interval below gives
what one could do if one wanted to assume the asymptotic distribution is
Note that the histogram is quite skewed. Hence we get a confidence interval
that is not symmetric about
theta.hat. Thus we do much better by not assuming normality. We get a shorter confidence interval with better coverage.
- As usual, Efron and Tibshirani are using a ridiculously small sample size in this toy problem. There is no reason to believe the subsampling bootstrap here. But it is reasonable for (much) larger data sets. (This comment repeated from the previous discussion of the subsampling bootstrap).
Time Series Revisited
In this section we look at a long AR(1) time series, so long that it is silly to use all blocks of length b as subsamples. We take a sample of the subsamples.
Also we use a sound method of estimating the AR coefficient.
The R function
help) estimates AR(k) time
series models for arbitrary k and also determines k.
Here we want the case k = 1, which makes the call a bit complicated
(the default is to estimate k). Another complication is that
this function returns a list, the
ar element of which is
the estimate. That's why the
$ar at the end of the function
call. It picks out that element of the list.
spc <- 100 says we will use every 100-th block
of length b. This complicates the specification of the subsamples.
The starting indices of the blocks we use are given by the
ii and the i-th time through the loop
the subsample will be the block starting at
ii[i] and ending
ii[i] + b - 1. This is just like the code where we use
all the subsamples except we have replaced
at this point.
Everything else is the same as the preceding example.
Extreme Values and IID in General
- Everything down to the bottom of the bootstrap loop should be familiar,
just like what we did when we previously
visited this example except some the commands to also do the Efron
bootstrap have been removed.
The code immediately following the bootstrap loop calculates
a two-sided confidence interval as described in the handout and in
Politis and Romano (1994). That is what you would usually do.
- In this problem we actually want a one-sided interval because we know the estimator always misses the parameter on the low side. That's done by the lines below the comment.
Estimating the Rate
- Now we do subsampling at four different sample sizes defined by
b <- c(40, 60, 90, 135)The number of sizes (here 4) and the sizes themselves are arbitrary. You may change them as you see fit.
- Now the bootstrap loop becomes a loop within a loop. In a minor
innovation of our own (not from Politis and Romano) we have used the
principle of common random numbers from Monte Carlo theory.
Our estimation will be as accurate as possible if we make the subsamples
of different sizes as similar as possible given the requirement that
they be samples without replacement from the original sample. The
obvious way to do that is to subsample the subsamples. For the sizes
we have here
- we take a subsample of size 135 from the original sample,
- then we take a subsample of size 90 from the subsample of size 135,
- then we take a subsample of size 60 from the subsample of size 90,
- then we take a subsample of size 40 from the subsample of size 60.
Warning: Don't do
x.star <- sample(x.star, b[j], replace = FALSE)(or anything like it with
x.starappearing twice) anywhere else in bootstrapping. The only reason this works is because of the index
jrunning backwards and because of the assignment
x.star <- xdone just before the inner loop. This is a clever trick for this situation only. Don't generalize it out of context.
After the bootstrap loop within a loop,
theta.staris a matrix, whose
i-th column contains
nbootbootstrap subsamples of sample size
theta.hatis subtracted from each
theta.starand the results are stuffed into a list (
zlist) and shown as parallel boxplots.
It is clear from the boxplots how the scale changes with subsample size.
Now we make another arbitrary definition
k <- (nboot + 1) * seq(0.05, 0.45, 0.05) l <- (nboot + 1) * seq(0.55, 0.95, 0.05)The vectors
lcan be any integer sequences of the same length such that
0 <= k < l <= nboot.
The next step is to find quantile differences,
sort(z.star)[l] - sort(z.star)[k], although the code uses a partial sort to save time (whether the time saved is worth the extra code complexity is unknown). These are stuffed into another list
qlist. Then we take logs and plot the results as what some call a
dot plotbut R calls
stripchart. This shows how these log quantile differences vary with b.
Finally we set
y[i]to be the average log quantile differences for the subsample with subsample size
b[i]. Then we find the estimate of
That was the hard part. Now that the rate is estimated, everything is
just like the the preceding example we just use
b^beta.hatwhere rates are needed.
The only tricky bit is that we don't bother to go back and get another bootstrap sample (we already have four!). We just use the