Statistics 5102 (Geyer, Fall 2016) Examples: Bootstrap

Nonparametric Bootstrap

The data set

http://www.stat.umn.edu/geyer/5102/data/ex8-1.txt

is data of (X_i, Y_i) pairs, from which we wish to estimate the correlation coefficient and get some idea of its sampling distribution.

The following R statements do a nonparametric bootstrap estimate of the sampling distribution of the correlation coefficient.

The R functions sample and sample.int (on-line help) sample with or without replacement from a finite population. Here we use sample.int which samples from the integers from one to n (the data sample size).

Applying the result to the original data, we get bootstrap data x.star and y.star which we use to calculate one random realization the estimator theta.star[i] each time through the loop.

The histogram shows the sampling distribution of theta.star which is assumed to be close to the sampling distribution of the actual estimator. More precisely the distribution of theta.hat − theta is assumed to be close to the distribution of theta.star − theta.hat.

Bootstrap Percentile Intervals

The simplest method of making confidence intervals for the unknown parameter is to take α ⁄ 2 and 1 − α ⁄ 2 quantiles of the bootstrap distribution of theta.star.

Other Bootstrap Confidence Intervals

Many different methods of making bootstrap confidence intervals have been proposed, far too many to cover in this course. The course on nonparametric inference (Stat 5601) usually covers them.

Here are some web pages from the last time your instructor taught that course. These cover some but by no means all the methods.

Bootstrap Hypothesis Tests

The bootstrap doesn't do hypothesis tests in general, the reason being that the bootstrap has no general way to sample from (an analog of) the null hypothesis when the null hypothesis is not true. The bootstrap simulates from (an analog of) the true unknown distribution. Hence when the alternative hypothesis is true, the bootstrap samples from (an analog of) the alternative hypothesis. Not what is wanted.

In special situations, one can cook up a bootstrap-like procedure that can be claimed to simulate from (an analog of) the null hypothesis. But there is no general procedure for that.

One can always invert bootstrap confidence intervals to perform a hypothesis test about the parameter the confidence interval is for. This is a simple application of the duality of tests and confidence intervals (slide 206, deck 2).

Here is a web page from the last time your instructor taught Stat 5601 covering that.

Nonparametric Bootstrap Hypothesis Tests

Parametric Bootstrap

Here is our example of confidence intervals for mean values for a generalized linear model redone using the parametric bootstrap.

The data set

http://www.stat.umn.edu/geyer/5102/data/ex6-1.txt

contains two variables the response y, which is Bernoulli, and the predictor x which is quantative and the distribution of which doesn't matter, since we condition on it.

The following R statements fit the model and do a parametric bootstrap of the mean value for an individual whose x value is 25.

From the histogram of the parametric bootstrap distribution of the estimator, we see we are a long way from asymptopia.

Bootstrap T Intervals

The generally accepted way to make parametric bootstrap confidence intervals is via bootstrap t procedures, which are analogous to t confidence intervals when the data are assumed normal.