General Instructions

To do each example, just click the Submit button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Theory

The most important bit of theory about nonparametric bootstrap hypothesis tests is that, in general, there ain't any!

The reason is that the empirical F hat is close to the true (population) distribution, and may or may not satisfy the null hypothesis. Thus the bootstrap samples are generally not simulations from a distribution satisfying the null hypothesis. Hence they are, in general, completely useless for doing a hypothesis test, because, whether parametric or nonparametric and whether done analytically or by simulation, the reference distribution for a hypothesis test must be a distribution satisfying the null hypothesis. (How else can it be relevant to rejecting the null hypothesis?)

A naive person attempting to do a bootstrap test just calculates a P-value as something like

mean(tstat.star > tstat.hat)

where tstat.hat is the value of the test statistic calculated for the actual data and tstat.star is a vector of values of the test statistic calculated for bootstrap samples.

The resulting test is a valid hypothesis test in the sense that a test with nominal significance level α actually has that significance level.

But (a big but!) this test typically has no power. It rejects the null hypothesis with probability α regardless of what the alternative is. No matter how large the deviation of the true parameter value from the null hypothesis, the naive bootstrap test typically doesn't find any statistical significance.

Hence the naive bootstrap test proves nothing except a little knowledge is a dangerous thing.

Theory, Part Two

There are a variety of special situations in which something that makes sense as a nonparametric bootstrap hypothesis test can be done. Efron and Tibshirani, Chapter 16 describe a few. But if you don't see a general principle in their explanation, don't worry. There isn't any.

There is one limited general principle mentioned in passing in their Section 16.4, which we would like to amplify.

Any confidence interval has a hypothesis test dual to it.

Thus if you know how to do a bootstrap confidence interval for some parameter, then you also know how to do hypothesis tests concerning that single parameter.

Inverting Intervals: Decisions

The decision theoretic view of inverting confidence intervals to get tests is the simplest. Just reject H0 if the interval doesn't contain the hypothesized value of the parameter. More precisely, reject H0 at level α if the interval with coverage probability 1 − α doesn't cover the hypothesized value of the parameter.

The only tricky part is doing one-tailed tests, which involves the use of one-tailed intervals. But once one sees that, the rest is obvious.

Inverting Intervals: P-values

P-values are a bit trickier. Here is one way to describe the P-value corresponding to a confidence interval or, more precisely, to a recipe for creating confidence intervals of any specified coverage probability. The P-value that goes with a confidence interval recipe is the α such that the confidence interval with coverage probability 1 − α has the value of the parameter hypothesized under H0 exactly on top of one of the endpoints of the interval.

It is actually a little easier to see this with one-sided confidence intervals and one-tailed tests, because there is only one endpoint of such an interval (the other endpoint is infinity or minus infinity) that we need to adjust to get the endpoint exactly on top of the hypothesized parameter value.

If one likes equal-tailed two-sided intervals, as Efron and Tibshirani do, then the usual rule holds

The two-tailed P-value is twice the lower of the two one-tailed P-values.

So just do both one-tailed tests and double the P-value of the one that is less than one-half.

A side note for the curious, which will not be mentioned again: if one does not have any particular fondness for equal-tailed two-sided intervals, then there is no unambiguous way to calculate P-values for two-tailed tests. One gets a different P-value for each different confidence interval recipe.

Percentile Intervals

The data here are just two variables x and y which may or may not be correlated. That's what we test.

Comments

BCa Intervals

The bcanon doesn't produce a theta.star vector for us. Nor, for that matter, are its intervals based on percentiles of the theta.star distribution, so we can't do the test the same way as in the preceding example.

However, bcanon does purport to produce for any alpha the corresponding confpoint. So we just ask for a whole lot of confpoints corresponding to the whole range of possible alpha values and interpolate as before.

Comments

Other Intervals

The same general principle of inverting confidence intervals to find tests applies to any confidence interval whatsoever (nonparametric bootstrap or not).

Annoyingly, if you really want to know the details of inverting some other interval (say ABC), we leave these as an exercise for the reader.

(Of course, change annoyingly to mercifully if you have had enough of this subject.)