University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb

Stat 5601 (Geyer) Examples (Better Bootstrap)

General Instructions
BC_a Intervals
ABC Intervals

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

BC_a stands for bias corrected and accelerated. It is an example of really horrible alphabet soup terminology. Really trendy, though. Used to be that scientists used terminology that involved real English (or Latin) words. Nowadays, it is trendy to just use letters. It's molecular biology envy (a la DNA, RNA, G6PD, and so forth). If you can actually express yourself and be understood, then you must not be a real scientist, because as everyone knows science is hard to understand. Hence the modern trend for scientists to speak and write as illiterately as possible.

To parody this trend, we call these alphabet soup, type 1 intervals (for type 2 see below).

Comments

As usual, library(bootstrap) says we are going to use code in the bootstrap library, which is not available without this command. Here library(bootstrap) is necessary for two reasons. Without it we can't get the data spatial and we also can't get the function bcanon (on-line help) that we use to construct BC_a intervals.
The folderol
```
a <- as.numeric(spatial[1, ])
```
is because, without it, the example doesn't work. The object spatial is an R data frame. And Efron and Tibshirani have put the data in sidways. So spatial[1, ] is still a data frame and the var function doesn't do the right thing to it.
The as.numeric says to just treat this as a vector of numbers! Forget all this nonsense (data frames, etc.) that is supposed to be helpful but is actually making my life difficult! If you know about stuff like as.numeric you qualify as a knowledgeable user of R (or S-Plus).
The argument theta is a function that calculates the point estimate on which the interval is based. Here the point estimate is the variance of the empirical distribution, calculated by the function my.var.
If I had defined a function like this in the second problem on the midterm, I wouldn't have made the mistake I did. In this case, we have to define the function because we need to supply a function that calculates the estimate to bcanon.
The nboot = 1000 is because of the notion that in general we need a large bootstrap sample size for confidence intervals. Also we have to supply this argument. There is no default.

ABC Intervals

These are the alphabet soup, type 2 intervals.

ABC stands for approximate bootstrap confidence, whatever that means. It doesn't actually bootstrap, but just approximates the bootstrap. Chapter 22 of Efron and Tibshirani explains, but we won't get into that.

Section 14.4 in Efron and Tibshirani.

Comments

Of course, the comment in the other part about as.numeric applies here too.
The main comment is about the rather strange form of my.var.
As the example shows, it must have the signature function(p, x) where
- x is the data
- p is a probability vector the same length as the data.
We saw this probability vector stuff before in Section 10.4 about improved bootstrap bias estimation.
The idea is that the relationship of a bootstrap sample x.star to the original data x can be expressed as a probability vector p.star such that p.star[i] is the fraction of times x[i] occurs in x.star.
Since x and x.star both have length n all of the p.star[i] will be multiples of 1 / n, and n p.star[i] will be integers. We can construct x.star from x and p.star by repeating each x[i] in the bootstrap sample we are constructing n p.star[i] times. Thus we only need x and p.star to construct bootstrap samples, we don't need x.star.
We have to write a function that calculates the estimator given x and p.star rather than given x.star which is what we have done up till now (or not bothered with a function, just written expressions).
Worse, we have to provide a function that works for any probability vector p.star, not just ones with elements that are multiples of 1 / n, because that's what the ABC method requires.
Unfortunately, this is, in general, hard.
Fortunately, this is, for moments, quite straightforward.
For any function g, any data vector x, and any probability vector p, the expression
sum(g(x) * p)
calculates the expectation of the random variable g(X) in the probability model that assigns probability p[i] to the point x[i] for each i (and probability zero to everywhere else).
Thus
```
sum(x * p)
```
calculates the mean
```
sum((x - a)^2 * p)
```
calculates the second moment about the point a, and so forth.
The stop commands for various error situations are, of course, not required. If the function call is done properly they don't do anything. But it will save you endless hours of head scratching sometime if you get in the habit of putting error checks in the functions you write.

Stat 5601 (Geyer) Examples (Better Bootstrap)

Contents

General Instructions

BC_a Intervals

Section 14.3 in Efron and Tibshirani.

Comments

ABC Intervals

Section 14.4 in Efron and Tibshirani.

Comments

Stat 5601 (Geyer) Examples (Better Bootstrap)

Contents

General Instructions

BCa Intervals

Section 14.3 in Efron and Tibshirani.

Comments

ABC Intervals

Section 14.4 in Efron and Tibshirani.

Comments

BC_a Intervals