General Instructions
To do each example, just click the Submit
button.
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
Theory
The most important bit of theory about nonparametric bootstrap hypothesis tests is that, in general, there ain't any!
The reason is that the empirical F hat
is close to the
true (population) distribution, and may or may not satisfy
the null hypothesis. Thus the bootstrap samples are generally not
simulations from a distribution satisfying the null hypothesis. Hence they
are, in general, completely useless for doing a hypothesis test, because,
whether parametric or nonparametric and whether done analytically or by
simulation, the reference distribution for a hypothesis test must
be a distribution satisfying the null hypothesis. (How else can it be relevant
to rejecting
the null hypothesis?)
A naive person attempting to do a bootstrap test just calculates a Pvalue as something like
mean(tstat.star > tstat.hat)
where tstat.hat
is the value of the test statistic calculated
for the actual data and tstat.star
is a vector of values of
the test statistic calculated for bootstrap samples.
The resulting test is a valid hypothesis test in the sense that a test with nominal significance level α actually has that significance level.
But (a big but!) this test typically has no power. It rejects the null
hypothesis with probability α regardless of what the alternative is.
No matter how large the deviation of the true parameter value from the
null hypothesis, the naive bootstrap test typically doesn't find
any statistical significance.
Hence the naive bootstrap test proves nothing except
a little knowledge is a dangerous thing.
Theory, Part Two
There are a variety of special situations in which something that makes sense as a nonparametric bootstrap hypothesis test can be done. Efron and Tibshirani, Chapter 16 describe a few. But if you don't see a general principle in their explanation, don't worry. There isn't any.
There is one limited general principle mentioned in passing in their Section 16.4, which we would like to amplify.
Any confidence interval has a hypothesis test dual to it.
Thus if you know how to do a bootstrap confidence interval for some parameter, then you also know how to do hypothesis tests concerning that single parameter.
Inverting Intervals: Decisions
The decision theoretic view of inverting confidence intervals to get tests is the simplest. Just reject H_{0} if the interval doesn't contain the hypothesized value of the parameter. More precisely, reject H_{0} at level α if the interval with coverage probability 1 − α doesn't cover the hypothesized value of the parameter.
The only tricky part is doing onetailed tests, which involves the use of onetailed intervals. But once one sees that, the rest is obvious.
Inverting Intervals: Pvalues
Pvalues are a bit trickier. Here is one way to describe the Pvalue corresponding to a confidence interval or, more precisely, to a recipe for creating confidence intervals of any specified coverage probability. The Pvalue that goes with a confidence interval recipe is the α such that the confidence interval with coverage probability 1 − α has the value of the parameter hypothesized under H_{0} exactly on top of one of the endpoints of the interval.
It is actually a little easier to see this with onesided confidence intervals and onetailed tests, because there is only one endpoint of such an interval (the other endpoint is infinity or minus infinity) that we need to adjust to get the endpoint exactly on top of the hypothesized parameter value.
If one likes equaltailed twosided intervals, as Efron and Tibshirani do, then the usual rule holds
The twotailed Pvalue is twice the lower of the two onetailed Pvalues.
So just do both onetailed tests and double the Pvalue of the one that is less than onehalf.
A side note for the curious, which will not be mentioned again: if one does not have any particular fondness for equaltailed twosided intervals, then there is no unambiguous way to calculate Pvalues for twotailed tests. One gets a different Pvalue for each different confidence interval recipe.
Percentile Intervals
The data here are just two variables x
and y
which may or may not be correlated. That's what we test.
Comments

The code down through the
for
loop should be familiar by now. Because there are two variables to sample, we use the usual trick of sampling indicesk
rather than the datax
andy
.  The lowertailed Pvalue is the α such that the
αth quantile of the distribution of the bootstrap samples
theta.star
is equal to the hypothesized value of the parameter under the null hypothesis (here zero).The α such that F^{−1}(α) = 0 is just F(0), that is, we calculate the probability that
theta.star
is less than or equal to zero, which is howltpv
is defined. 
The uppertailed and twotailed Pvalues should be obvious from
the general discussion above.

Note the humongous sample size. The bootstrap
Pvalue we calculate cannot be
smaller than
1 / nboot
. So if the Pvalue should be really, really small (highly, highly statistically significant) we need a really, really bignboot
.
BC_{a} Intervals
The bcanon
doesn't produce a theta.star
vector
for us. Nor, for that matter, are its intervals based on percentiles of
the theta.star
distribution, so we can't do the test the
same way as in the preceding example.
However, bcanon
does purport to produce for any alpha
the corresponding confpoint
. So we just ask for a whole lot of
confpoints
corresponding to
the whole range of possible alpha
values and interpolate as before.
Comments

The code down to the invocation of the
bca
should be more or less obvious. Thekcor
function is defined the way it has to be for bootstrapping two variables. The only example I find of this isn't for BC_{a} but for variance stabilized bootstrap t but the principle is the same. 
The
seq(along = x)
is just a short way to say1:n
whenn
has previously been assigned the valuelength(x)
. This just avoids the assignment.  The
alpha
argument tobcanon
asks for allconfpoints
on a grid of values ranging from zero to one. 
For future use we save the results in
bca.out
. The component we want isbca.out$confpoints
which is a matrix with two columns
bca.out$confpoints[ , 1]
is the suppliedalpha
values. 
bca.out$confpoints[ , 2]
is the corresponding BC_{a}confpoints.


The plot shows how the the BC_{a}
confpoint
varies as a function ofalpha
. We want to find thealpha
where the curve crosses zero, which is the horizontal line in the plot. 
The
approx
command definingltpv
does this (approximately).  The rest is the same as in the preceding example.
Other Intervals
The same general principle of inverting confidence intervals to find tests applies to any confidence interval whatsoever (nonparametric bootstrap or not).
Annoyingly, if you really want to know the details of inverting some other interval (say ABC), we leave these as an exercise for the reader.
(Of course, change annoyingly
to mercifully
if you have had enough of this subject.)