General Instructions
To do each example, just click the Submit
button.
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
Theory
All of the bootstrap t confidence intervals on this page are second order correct (Efron and Tibshirani, Chapter 22, p. 325).Bootstrap T with Variance Estimate Supplied
Try 1
Section 12.5 in Efron and Tibshirani.
Comments
- As usual,
library(bootstrap)says we are going to use code in thebootstraplibrary, which is not available without this command. Herelibrary(bootstrap)is necessary for two reasons. Without it we can't get the datamouse.cand we also can't get the functionboott(on-line help) that we use to construct bootstrap t intervals. - The argument
thetais a function that calculates the point estimate on which the interval is based. Here the point estimate is the sample mean, calculated by the functionmean. - The function
sdfuncalculates an estimate of the standard error of the estimator calculated bytheta. In this example the sample mean (calculated by themeanfunction) has standard error calculated bysdfun. - The call signature for
sdfunis specified in the documentation forboott. It must have exactly this form, with three specified arguments and thedot, dot, dot
argument. - Generally, we only use the first argument
xin our definition ofsdfun. We can also use some of the others, but they are not so useful.-
nbootsdis a dummy argument that is not used. -
thetais the function that we passed in via thethetaargument ofboott(in this example,mean). We generally do not need to use thethetaargument though, because we know the function: we can saymean(x)instead oftheta(x). -
...indicates that other named arguments to the function are allowed. But we can also use global variables rather than arguments.
-
- The
nboott = 1000is because of the of the comment in the documentation forboott
and the similar comment near the top of p. 161 in Efron and Tibshirani.200 is a bare minimum and 1000 or more is needed for reliable α % confidence points, α > .95 say.
Try 2
This is exactly the same as the preceding section but we make the output be only the confidence interval we want.
Bootstrap T with Double Bootstrap Variance Estimate
Section 12.5 in Efron and Tibshirani.
Comments
- If we omit the
sdfunargument, then the standardization is done another way -- via the bootstrap. In eachouter
bootstrap iteration a samplex star
with replacement from the data is formed, the functionthetais applied to it to gettheta star
, the the standard error is estimated fromx star
by aninner
bootstrap that forms samplesx star star
with replacement fromx star
and appliesthetato them to get a bootstrap standard error.This function with
sdfunomitted behaves as if we had supplied ansdfunof the following formsdfun <- function(x, nbootsd, theta, ...) { theta.star <- double(nbootsd) for (i in 1:nbootsd) { x.star <- sample(x, replace = TRUE) theta.star[i] <- theta(x.star) } return(sd(theta.star)) }Note that when this
sdfunis called with the original dataxas an argument it just calculates the bootstrap standard error just like we did on our web page on that subject. But that when thissdfunis called with bootstrap datax.staras its argumentx, then everyx.starinside the function is a random sample without replacement from another random sample without replacement. Hence the namedouble bootstrap
. -
As everywhere else, the default bootstrap sample sizes for these functions
are too low. But cranking them up, as we do here, makes things really slow.
In each of the
nboott = 1000iterations of the outer loop, the arenbootsd = 200iterations of the inner loop, for1000 * 200 = 2e5iterations in all.If you get bored, you can use the defaults (omit
nboottandnbootsd), but you won't get as accurate an answer.
Variance Stabilized Bootstrap T
Try 1
Section 12.6 in Efron and Tibshirani.
Another method of automagic variance stabilization estimates a variance stabilizing transformation using the double bootstrap.
Comments
- The function
sallyhere calculates the estimator. It uses a trick from documentation forboott. It's a trick we have been using all along. For complicated data structures, sample the indices 1, . . ., n rather than the data vectors. So we just supply1:nas the argument toboottand write the estimation function (heresallyto take the (resampled) indices as an argument.
Try 2
This is exactly the same as the preceding section but we make the output be only the confidence interval we want, no plot showing the variance-stabilizing transformation.