To do each example, just click the
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
TheoryAll of the confidence intervals on this page are second order correct (Efron and Tibshirani, Chapter 22, p. 325 and Section 22.6).
Section 14.3 in Efron and Tibshirani.
BCa stands for
bias corrected and accelerated.
It is an example of really horrible
alphabet soup terminology.
Really trendy, though. Used to be that scientists used terminology
that involved real English (or Latin) words. Nowadays, it is trendy
to just use letters. It's molecular biology envy (a la
DNA, RNA, G6PD, and so forth). If you can actually express yourself
and be understood, then you must not be a real scientist, because as
everyone knows science is hard to understand. Hence the modern trend
for scientists to speak and write as illiterately as possible.
To parody this trend, we call these
alphabet soup, type 1 intervals
type 2 see below).
- As usual,
library(bootstrap)says we are going to use code in the
bootstraplibrary, which is not available without this command. Here
library(bootstrap)is necessary for two reasons. Without it we can't get the data
spatial(on-line help) and we also can't get the function
bcanon(on-line help) that we use to construct BCa intervals.
- The argument
thetais a function that calculates the point estimate on which the interval is based. Here the point estimate is the variance of the empirical distribution, calculated by the function
nboot = 1000is because of the notion that in general we need a large bootstrap sample size for confidence intervals. Also we have to supply this argument. There is no default.
These are the
alphabet soup, type 2 intervals.
ABC stands for
approximate bootstrap confidence, whatever that means. It doesn't
actually bootstrap, but just approximates the bootstrap. Chapter 22 of
Efron and Tibshirani explains, but we won't get into that.
Section 14.4 in Efron and Tibshirani.
The rather strange form of
is an estimator written in
resampling form, which we saw
before in the improved bootstrap bias correction
As the example shows and the
help documents, the
tt argument to the
function must have the signature
xis the data
pis a probability vector the same length as the data.
The idea is that the relationship of a bootstrap sample
to the original data
x can be expressed as a probability
p.star such that
p.star[i] is the fraction
x[i] occurs in
We have to write a function that calculates the estimator given
And this function must work for any probability vector
p.star, not just ones with elements that are multiples of
1 / n, because that's what the ABC method requires.
Unfortunately, this is, in general, hard.
Fortunately, this is, for moments, quite straightforward.
For any function
g, any data vector
and any probability vector
p, the expression
sum(g(x) * p)
calculates the expectation of the random variable g(X)
in the probability model that assigns probability
p[i] to the
x[i] for each
i (and probability zero to
sum(x * p)
calculates the mean
sum((x - a)^2 * p)
calculates the second moment about the point
a, and so forth.
Resampling Form Estimators
The improved bootstrap bias correction
web page explains how to write a function that calculates a sample
resampling form. Unfortunately, it won't work with
abcnon function because of the requirement that it work
for any probability vector
p (not only those whose elements
are multiples of 1 ⁄ n).
So, try 2.
rmedian calculates the median of a bootstrap
sample given in
The definition of the median of a discrete distribution is a bit tricky. There are two cases.
Case I. If F is the cumulative distribution function, then any point x satisfying
amedian (we cannot say
Case II. There is no x satisfying equation (*) above. In this case there is a unique x such that
and that unique x is the (unique in this case) median.
The code is a bit tricky. Inside our definition of the
cumsum function turns the probability vector
(a probability mass function) into a cumulative distribution function).
Then we define
ihig to be two indices
ihig. In case I we actually
ihig and any point between
In case II we actually have
x[ihig] are the same and are
the value returned by the function.
As the comment in the code says, it is important that the vector
is sorted. Otherwise these indices would not be picking out the right points.
We try it out, and indeed do get the same answers either way.
Clearly, this function is very specific to medians. With appropriate changes, it would do any quantile. But it looks nothing like a function to calculate anything besides quantiles.
Confession: This function was not easy to write.
Even staring at the definition, it took three tries for your humble
author to get this right. Coding up estimators for the convenience
abcnon function can be arbitrarily tricky (meaning
there is no upper bound to the amount of trickiness that can be