Statistics 5102 (Geyer, Spring 2003) The Bootstrap

Relevant and Irrelevant Simulation
Efron's Nonparametric Bootstrap
The Parametric Bootstrap
- Doing the Simulation
- Confidence Intervals Based on Asymptotically Pivotal Quantities
A Final Comment

Relevant and Irrelevant Simulation

Simulations are not all alike. Some are more relevant than others.

Suppose someone has done a simulation that suggests some procedure (a point estimator, for example) works well for some particular data model (with a specific sample size and a specific probability distribution). And suppose you are now analyzing a somewhat different kind of data (with a different sample size or a different probability distribution). What is the relevance of those old simulations to your data? Perhaps none.

What you need are some relevant simulations concerning the actual probability model for the actual data you are now analyzing. Irrelevant simulations concerning data with some other structure and some other assumed probability model are, in a word, irrelevant, or if you prefer academic weasel wording, of questionable relevance.

So the principle of relevant versus irrelevant simulations says that each data analyst should do a new relevant simulation for each data analysis.

One slight problem: a truly relevant simulation would simulate data from the unknown true distribution of the actual data. And, of course, the unknown true distribution is unknown. If it were known, we wouldn't be doing statistical estimation about it.

So closest one can come to doing truly relevant simulation is to simulate data from the best available estimate of the unknown true distribution of the actual data. Of course, best available is a matter of opinion. It might be the distribution in a parametric model given by a good estimate of the parameter. It might be the empirical distribution of the data. It might be something else.

A cutesy name for as close as possible to truly relevant simulation is the bootstrap coined by B. Efron (a native of St. Paul, Minnesota). The name refers to a cliche about pulling oneself up by one's bootstraps, bootstraps being little loops at the top of riding boots used to pull them on your feet. So if you can pull yourself up by your bootstraps, you can just pick yourself up in the air, a physical impossibility. But the cliche (like a lot of language) doesn't actually mean what it literally means and actually means succeed by your own efforts, without help. But the metaphor isn't completely dead. There is a hint of impossibility implied. So presumably Efron chose the name to indicate that it does something that seems impossible to the naive but actually is just hard work.

Anyway, whatever aura the cutesy name implies, the bootstrap actually is simulation using the best available estimate of the unknown true distribution of the actual data. There are two types

the parametric bootstrap uses a parametric estimate of the unknown true distribution.
the nonparametric bootstrap uses a nonparametric estimate of the unknown true distribution, usually the empirical distribution of the data.

Efron's Nonparametric Bootstrap

Doing the Simulation

Not liking any of the examples in DeGroot and Schervish, we do something else. The data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/bob.txt

is from a symmetric heavy-tailed distribution. We use a 20% trimmed mean as our estimator of location (center of symmetry).

The R statements below calculate the point estimate theta.hat and simulate nboot independent and identically distributed realizations theta.star from the sampling distribution of the point estimate when the population is the empirical distribution of the actual data.

Each time through the for loop the object x.star is an i. i. d. sample of size n from the empirical distribution of the data, which can also be described as a sample with replacement from the data vector x, which is what the statement

x.star <- sample(x, replace = TRUE)

does. Efron's nonparametric bootstrap always works this way. The bootstrap data are always samples with replacement from the original data.

The histogram shows the bootstrap estimate of the sampling distribution of the estimator. The vertical dashed line shows theta.hat.

Percentile Confidence Intervals

Continuing with the example above we do the bootstrap simulation exactly the same way, but for reasons to be explained later, we pick a bootstrap sample size nboot one less than a round number.

The sorted values of theta.star are empirical quantiles of the bootstrap approximation to the sampling distribution of theta.hat. The k-th value in sorted order is like the k / (n_boot + 1) quantile. So when we pick nboot + 1 to be a round number, sorted values estimate round number quantiles.

In our example when nboot is 999, then the 25 and 975 quantiles estimate the 0.025 and 0.975 quantiles of this sampling distribution.

And these quantiles are the endpoints of the bootstrap confidence interval for the parameter of interest calculated by the so-called percentile method. This method is by far the simplest method of obtaining bootstrap confidence intervals. It is also arguably the worst method. But we won't explain any others, not enough time for the complicated ones.

Bootstrapping Complicated Data

The bootstrap works much the same way for complicated data. Say the data are (X, Y) pairs and the parameter of interest is the correlation coefficient.

In order to do the bootstrap correctly we have to sample pairs with replacement from the original data. The way to do that is sample indices rather than data. So we

make k the vector of possible indices (runs from 1 to length(x)),
sample k with replacement giving k.star,
then use k.star as an index vector for both x and y.

This procedure keeps the (X, Y) pairs together.

Other than this k.star trick, this example works just like the others.

The Parametric Bootstrap

The parametric bootstrap is quite similar, except we don't usually use the percentile method of constructing confidence intervals.

Doing the Simulation

Suppose we want to use the MLE as an estimator of the shape parameter α for data assumed to be independent and identically Gamma(&alpha, 1) distributed. Minus the log likelihood is coded up just as we did in the likelihood section of the course.

mlogl <- function(alpha, x) {
    if (length(alpha) > 1) stop("alpha must be scalar")
    if (alpha <= 0) stop("alpha must be positive")
    return(- sum(dgamma(x, shape = alpha, log = TRUE)))
}

but we will use a different optimizer (on-line help) to calculate the MLE and observed Fisher information

out <- optim(mean(x), mlogl, method = "L-BFGS-B", lower = 0,
    x = x, hessian = TRUE, control(fnscale = length(x)))

after which the MLE is out$par and observed Fisher information is out$hessian.

The data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/billie.txt

are gamma, but the sample size 10 is too small for us to trust the asymptotics based on Fisher information.

Hence we study the sampling distribution of the MLE.

External Data Entry

Enter a dataset URL :

Confidence Intervals Based on Asymptotically Pivotal Quantities

Actually we are interested in the asymptotically pivotal quantity

(point estimate - true unknown parameter value) / standard error

so we simulate that distribution, with standard error here being square root of observed Fisher information

mlogl <- function(alpha, x) { if (length(alpha) > 1) stop("alpha must be scalar") if (alpha <= 0) stop("alpha must be positive") return(- sum(dgamma(x, shape = alpha, log = TRUE))) } n <- length(x) out <- optim(mean(x), mlogl, method = "L-BFGS-B", lower = 0, x = x, hessian = TRUE, control = list(fnscale = n)) alpha.hat <- out$par print(alpha.hat) fish.hat <- out$hessian print(fish.hat) print(out$convergence) nboot <- 1e3 - 1 z.star <- double(nboot) for (i in 1:nboot) { x.star <- rgamma(n, shape = alpha.hat) out <- optim(mean(x.star), mlogl, method = "L-BFGS-B", lower = 0, x = x.star, hessian = TRUE, control = list(fnscale = n)) if (out$convergence != 0) { if (out$convergence >= 50) { warning("convergence code = ", out$convergence, ", message = ", out$message, "\n") } else { warning("convergence code = ", out$convergence, "\n") } } z.star[i] <- (out$par - alpha.hat) * sqrt(out$hessian) } hist(z.star) k <- (nboot + 1) * c(0.025, 0.975) print(k) crit <- sort(z.star)[k] print(crit) abline(v = crit, lty = 2) # 95% confidence interval alpha.hat - rev(crit) / sqrt(fish.hat)

External Data Entry

Enter a dataset URL :

A Final Comment

There are many ways of doing bootstrap confidence intervals.

In particular, nonparametric bootstrap confidence intervals can also use asymptotically pivotal quantities, just like our parametric bootstrap example. This is the so-called percentile-t method explained in Example 11.5.6 in DeGroot and Schervish.

But there's lots more ways. You really need to get a bootstrap book, such as Efron and Tibshirani: An Introduction to the Bootstrap or Davison and Hinkley: Bootstrap Methods and their Application.

Statistics 5102 (Geyer, Spring 2003) The Bootstrap

Contents