Statistics 5601 (Geyer, Fall 2013) Better Bootstrap Confidence Intervals

General Instructions

To do each example, just click the Submit button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Theory

All of the confidence intervals on this page are second order correct (Efron and Tibshirani, Chapter 22, p. 325 and Section 22.6).

BC_a Intervals

Section 14.3 in Efron and Tibshirani.

BC_a stands for bias corrected and accelerated. It is an example of really horrible alphabet soup terminology. Really trendy, though. Used to be that scientists used terminology that involved real English (or Latin) words. Nowadays, it is trendy to just use letters. It's molecular biology envy (a la DNA, RNA, G6PD, and so forth). If you can actually express yourself and be understood, then you must not be a real scientist, because as everyone knows science is hard to understand. Hence the modern trend for scientists to speak and write as illiterately as possible.

To parody this trend, we call these alphabet soup, type 1 intervals (for type 2 see below).

Comments

As usual, library(bootstrap) says we are going to use code in the bootstrap library, which is not available without this command. Here library(bootstrap) is necessary for two reasons. Without it we can't get the data spatial (on-line help) and we also can't get the function bcanon (on-line help) that we use to construct BC_a intervals.
The argument theta is a function that calculates the point estimate on which the interval is based. Here the point estimate is the variance of the empirical distribution, calculated by the function evar.
The nboot = 1000 is because of the notion that in general we need a large bootstrap sample size for confidence intervals. Also we have to supply this argument. There is no default.

ABC Intervals

These are the alphabet soup, type 2 intervals.

ABC stands for approximate bootstrap confidence, whatever that means. It doesn't actually bootstrap, but just approximates the bootstrap. Chapter 22 of Efron and Tibshirani explains, but we won't get into that.

Section 14.4 in Efron and Tibshirani.

Comments

The rather strange form of rvar is an estimator written in resampling form, which we saw before in the improved bootstrap bias correction procedure.

As the example shows and the on-line help documents, the tt argument to the abcnon function must have the signature function(p, x) where

x is the data
p is a probability vector the same length as the data.

The idea is that the relationship of a bootstrap sample x.star to the original data x can be expressed as a probability vector p.star such that p.star[i] is the fraction of times x[i] occurs in x.star.

We have to write a function that calculates the estimator given x and p.star.

And this function must work for any probability vector p.star, not just ones with elements that are multiples of 1 / n, because that's what the ABC method requires.

Unfortunately, this is, in general, hard.

Fortunately, this is, for moments, quite straightforward.

For any function g, any data vector x, and any probability vector p, the expression

sum(g(x) * p)

calculates the expectation of the random variable g(X) in the probability model that assigns probability p[i] to the point x[i] for each i (and probability zero to everywhere else).

Thus

sum(x * p)

calculates the mean

sum((x - a)^2 * p)

calculates the second moment about the point a, and so forth.

More on Resampling Form Estimators

The improved bootstrap bias correction web page explains how to write a function that calculates a sample median in resampling form. Unfortunately, it won't work with the abcnon function because of the requirement that it work for any probability vector p (not only those whose elements are multiples of 1 ⁄ n).

So, try 2.

Comments

The function rmedian calculates the median of a bootstrap sample given in resampling form.

The definition of the median of a discrete distribution is a bit tricky. There are two cases.

Case I. If F is the cumulative distribution function, then any point x satisfying

F(x) = 0.5 (*)

is a median. If there is one such x, then there is an interval of them, because F is a step function, hence the median is nonunique. Any point in the interval is a median (we cannot say the median).

Case II. There is no x satisfying equation (*) above. In this case there is a unique x such that

0.5 < F(x) < 0.5

and that unique x is the (unique in this case) median.

The code is a bit tricky. Inside our definition of the rmean function the cumsum function turns the probability vector (a probability mass function) into a cumulative distribution function). Then we define ilow and ihig to be two indices with ilow ≤ ihig. In case I we actually have ilow < ihig and any point between x[ilow] and x[ihig] is a median. In case II we actually have ilow = ihig and so x[ilow] and x[ihig] are the same and are the value returned by the function.

As the comment in the code says, it is important that the vector x is sorted. Otherwise these indices would not be picking out the right points.

We try it out, and indeed do get the same answers either way.

Clearly, this function is very specific to medians. With appropriate changes, it would do any quantile. But it looks nothing like a function to calculate anything besides quantiles.

Confession: This function was not easy to write. Even staring at the definition, it took three tries for your humble author to get this right. Coding up estimators for the convenience of the abcnon function can be arbitrarily tricky (meaning there is no upper bound to the amount of trickiness that can be encountered).

Statistics 5601 (Geyer, Fall 2013) Better Bootstrap Confidence Intervals

General Instructions

Theory

BC_a Intervals

Section 14.3 in Efron and Tibshirani.

Comments

ABC Intervals

Section 14.4 in Efron and Tibshirani.

Comments

More on Resampling Form Estimators

Comments

Navigation

Contents

Statistics 5601 (Geyer, Fall 2013) Better Bootstrap Confidence Intervals

General Instructions

Theory

BCa Intervals

Section 14.3 in Efron and Tibshirani.

Comments

ABC Intervals

Section 14.4 in Efron and Tibshirani.

Comments

More on Resampling Form Estimators

Comments

Navigation

Contents

BC_a Intervals