University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb
BetterBootstrap)
BCa stands for bias corrected and accelerated
.
It is an example of really horrible alphabet soup
terminology.
Really trendy, though. Used to be that scientists used terminology
that involved real English (or Latin) words. Nowadays, it is trendy
to just use letters. It's molecular biology envy (a la
DNA, RNA, G6PD, and so forth). If you can actually express yourself
and be understood, then you must not be a real scientist, because as
everyone knows science is hard to understand. Hence the modern trend
for scientists to speak and write as illiterately as possible.
To parody this trend, we call these alphabet soup, type 1
intervals
(for type 2
see below).
library(bootstrap)
says we are going to
use code in the bootstrap
library, which is not available
without this command. Here library(bootstrap)
is necessary
for two reasons. Without it we can't get the data spatial
and we also can't get the function bcanon
(on-line
help) that we use to
construct BCa intervals.
a <- as.numeric(spatial[1, ])is because, without it, the example doesn't work. The object
spatial
is an R data frame. And Efron and Tibshirani have
put the data in sidways. So spatial[1, ]
is still a data frame
and the var
function doesn't do the right thing to it.
The as.numeric
says to just treat this as a vector of numbers!
Forget all this nonsense (data frames, etc.) that is supposed to be helpful
but is actually making my life difficult! If you know about stuff like
as.numeric
you qualify as a knowledgeable user
of R
(or S-Plus).
theta
is a function that calculates the
point estimate on which the interval is based. Here the point estimate
is the variance of the empirical distribution, calculated by the function
my.var
.
If I had defined a function like this in the second problem on the midterm,
I wouldn't have made the mistake I did. In this case, we have to define
the function because we need to supply a function that calculates the estimate
to bcanon
.
nboot = 1000
is because of the notion that in general
we need a large bootstrap sample size for confidence intervals. Also we
have to supply this argument. There is no default.
These are the alphabet soup, type 2
intervals.
ABC stands for
approximate bootstrap confidence
, whatever that means. It doesn't
actually bootstrap, but just approximates the bootstrap. Chapter 22 of
Efron and Tibshirani explains, but we won't get into that.
as.numeric
applies here too.
my.var
.
As the example shows, it must have the signature function(p, x)
where
x
is the data
p
is a probability vector the same length as the data.
We saw this probability vector stuff before in Section 10.4 about
improved
bootstrap bias estimation.
The idea is that the relationship of a bootstrap sample x.star
to the original data x
can be expressed as a probability
vector p.star
such that p.star[i]
is the fraction
of times x[i]
occurs in x.star
.
Since x
and x.star
both have length n
all of the p.star[i]
will be multiples of 1 / n
,
and n p.star[i]
will be integers. We can construct
x.star
from x
and p.star
by repeating
each x[i]
in the bootstrap sample we are constructing
n p.star[i]
times. Thus we only need x
and p.star
to construct bootstrap samples, we don't
need x.star
.
We have to write a function that calculates the estimator given x
and p.star
rather than given x.star
which is what
we have done up till now (or not bothered with a function, just written
expressions).
Worse, we have to provide a function that works for any probability vector
p.star
, not just ones with elements that are multiples of
1 / n
, because that's what the ABC method requires.
Unfortunately, this is, in general, hard.
Fortunately, this is, for moments, quite straightforward.
For any function g
, any data vector x
,
and any probability vector p
, the expression
p[i]
to the
point x[i]
for each i
(and probability zero to
everywhere else).
Thus
sum(x * p)calculates the mean
sum((x - a)^2 * p)calculates the second moment about the point
a
, and so forth.
stop
commands for various error situations are, of course,
not required. If the function call is done properly they don't do anything.
But it will save you endless hours of head scratching sometime if you get
in the habit of putting error checks in the functions you write.