General Instructions
To do each example, just click the Submit
button.
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
Theory
All of the bootstrap t confidence intervals on this page are second order correct (Efron and Tibshirani, Chapter 22, p. 325).Bootstrap T with Variance Estimate Supplied
Try 1
Section 12.5 in Efron and Tibshirani.
Comments
- As usual,
library(bootstrap)
says we are going to use code in thebootstrap
library, which is not available without this command. Herelibrary(bootstrap)
is necessary for two reasons. Without it we can't get the datamouse.c
and we also can't get the functionboott
(on-line help) that we use to construct bootstrap t intervals. - The argument
theta
is a function that calculates the point estimate on which the interval is based. Here the point estimate is the sample mean, calculated by the functionmean
. - The function
sdfun
calculates an estimate of the standard error of the estimator calculated bytheta
. In this example the sample mean (calculated by themean
function) has standard error calculated bysdfun
. - The call signature for
sdfun
is specified in the documentation forboott
. It must have exactly this form, with three specified arguments and thedot, dot, dot
argument. - Generally, we only use the first argument
x
in our definition ofsdfun
. We can also use some of the others, but they are not so useful.-
nbootsd
is a dummy argument that is not used. -
theta
is the function that we passed in via thetheta
argument ofboott
(in this example,mean
). We generally do not need to use thetheta
argument though, because we know the function: we can saymean(x)
instead oftheta(x)
. -
...
indicates that other named arguments to the function are allowed. But we can also use global variables rather than arguments.
-
- The
nboott = 1000
is because of the of the comment in the documentation forboott
200 is a bare minimum and 1000 or more is needed for reliable α % confidence points, α > .95 say.
Try 2
This is exactly the same as the preceding section but we make the output be only the confidence interval we want.
Bootstrap T with Double Bootstrap Variance Estimate
Section 12.5 in Efron and Tibshirani.
Comments
- If we omit the
sdfun
argument, then the standardization is done another way -- via the bootstrap. In eachouter
bootstrap iteration a samplex star
with replacement from the data is formed, the functiontheta
is applied to it to gettheta star
, the the standard error is estimated fromx star
by aninner
bootstrap that forms samplesx star star
with replacement fromx star
and appliestheta
to them to get a bootstrap standard error.This function with
sdfun
omitted behaves as if we had supplied ansdfun
of the following formsdfun <- function(x, nbootsd, theta, ...) { theta.star <- double(nbootsd) for (i in 1:nbootsd) { x.star <- sample(x, replace = TRUE) theta.star[i] <- theta(x.star) } return(sd(theta.star)) }
Note that when this
sdfun
is called with the original datax
as an argument it just calculates the bootstrap standard error just like we did on our web page on that subject. But that when thissdfun
is called with bootstrap datax.star
as its argumentx
, then everyx.star
inside the function is a random sample without replacement from another random sample without replacement. Hence the namedouble bootstrap
. -
As everywhere else, the default bootstrap sample sizes for these functions
are too low. But cranking them up, as we do here, makes things really slow.
In each of the
nboott = 1000
iterations of the outer loop, the arenbootsd = 200
iterations of the inner loop, for1000 * 200 = 2e5
iterations in all.If you get bored, you can use the defaults (omit
nboott
andnbootsd
), but you won't get as accurate an answer.
Variance Stabilized Bootstrap T
Try 1
Section 12.6 in Efron and Tibshirani.
Another method of automagic variance stabilization estimates a variance stabilizing transformation using the double bootstrap.
Comments
- The function
sally
here calculates the estimator. It uses a trick from documentation forboott
. It's a trick we have been using all along. For complicated data structures, sample the indices 1, . . ., n rather than the data vectors. So we just supply1:n
as the argument toboott
and write the estimation function (heresally
to take the (resampled) indices as an argument.
Try 2
This is exactly the same as the preceding section but we make the output be only the confidence interval we want, no plot showing the variance-stabilizing transformation.