General Instructions

To do each example, just click the Submit button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Theory

The theory of the parametric bootstrap is quite similar to that of the nonparametric bootstrap, the only difference is that instead of simulating bootstrap samples that are IID from the empirical distribution (the nonparametric estimate of the distribution of the data) we simulate bootstrap samples that are IID from the estimated parametric model.

All the same considerations arise.

And so forth.

Simulating from a parametric model is not so easy as simulating from the empirical distribution. In fact, it can be arbitrarily complicated. So hard that it is an open research problem how to do it. For some parametric models sampling is easy, others not. In general, it bears no relation to sampling from the empirical.

If the observed data are in the vector x, then

x.star <- sample(x, replace = TRUE)

makes a nonparametric bootstrap sample.

In contrast, if the observed data are assumed to be IID normal, then

x.star <- rnorm(length(x), mean = mean(x), sd = sd(x))

makes a parametric bootstrap sample. This does not do the right thing because we should specify mean to be the true population mean and sd to be the true population standard deviation (but since we don't know the population values we must use estimates).

For more contrast, if the observed data are assumed to be IID Cauchy, then

x.star <- rcauchy(length(x), location = median(x), scale = IQR(x) / 2)

makes a parametric bootstrap sample. We can't use mean(x) and sd(x) as estimators of location and scale because the Cauchy distribution doesn't have moments and hence these aren't consistent estimators (of anything, much less location and scale). Why median(x) and IQR(x) / 2 are consistent (even asymptotically normal) estimators of location and scale would be more theory than we want to go into here. The only point we wanted to make is that the three examples look a lot different from each other.

The Normal Location Problem

Comments

  1. We are using a trimmed mean mean(x, trim = 0.25) for the location estimator (on-line help). The untrimmed mean is optimal for the normal distribution (assuming we really believe the data are normal), but we can use whatever estimator we want and the bootstrap is still valid.
  2. We are using the median absolute deviation mad(x) for the scale estimator (on-line help). The standard deviation is optimal for the normal, but . . . .

The Cauchy Location Problem

  1. We are using a trimmed mean mean(x, trim = 0.25) for the location estimator (on-line help). No simple estimator is optimal.
  2. We are using the median absolute deviation mad(x) for the scale estimator (on-line help). No simple estimator is optimal. We change the constant argument to be right for the Cauchy distribution. The population MAD being given by qcauchy(0.75), which is 1.0.

Theory II: Multinomial Sampling

To get to some examples with wading through a tremendous amount of theory, we will stick to one parametric model for which the sampling looks fairly similar to the nonparametric bootstrap. This is the multinomial distribution.

The multinomial distribution is the distribution of categorical measurements on IID individuals. The number of individuals in each category make up the data vector x and the probabilities of individuals being in each category make up a probability vector p (where probability vector means all(p >= 0) and sum(p) == 1).

Given a probability vector p of length k and a sample size n one creates a multinomial sample with the R statements

c.star <- sample(1:k, n, prob = p, replace = TRUE)
x.star <- tabulate(c.star, k)

(The first statement creates an IID sample of category numbers with the specified probabilities. The second counts the number of individuals in each category. So x.star is a vector of length k.)

A Chi-Square Test of Goodness of Fit

For example, suppose we observe the multinomial data defined to be x in the form below, and we want to test the null hypothesis that the true category probabilities are all equal (to 1 / 6 because there are 6 categories). The R function chisq.test does the usual chi-square test that uses the large-sample approximation (that the chi-square test statistic has a chi-square distribution). The remainder of the code does the parametric bootstrap test.

Actually, since the null hypothesis is completely specified here this is, strictly speaking, a Monte Carlo test rather than a parametric bootstrap. The test is exact. (We only say we are doing a parametric bootstrap when the null distribution of the test statistic depends on the true unknown parameter value and we are plugging in an estimate for the unknown truth).

Goodness of Fit with Estimated Parameters

For another example, suppose we observe the Poisson data defined to be x in the form below, and we want to test the null hypothesis that the data is Poisson against the alternative hypothesis that is isn't.

The usual way to do this is to truncate the data, making a category 8 or more for example. The reason for the truncation is that the usual asymptotics of the chi-square test requires it. But this introduces needless complication when we are not using the asymptotics (except that theta hat is near theta).

Comments

A Chi-Square Test of Independence

For another example, suppose we observe the contingency table defined to be x in the form below, and we want to test the null hypothesis of independence (that the row category labels and column category labels are independent random variables).

For some reason, the R chisq.test function does this test for 2-way tables by simulation, but doesn't do the analogous test for 1-way tables (go figure).

Logistic Regression

Theory

The glm function in R (on-line help) does so-called generalized linear models. The theory of these is usually covered in the categorical data course (like Stat 5421), so we won't cover it here.

For those interested, I wrote up some notes about generalized linear models for another class (Stat 5931), but there's no real need to look at it to understand this example.

The response variable in this problem kyphosis is categorical with values present or absent which we model as independent but not identically distributed Bernoulli random variables

Yi = Bernoulli(pi)

where the so-called mean value parameter vector

p = (p1, . . ., pn)

is related to the so-called linear predictor vector

η = (η1, . . ., ηn)

by the link function

ηi = logit(pi) = log(pi / (1 − pi))

and the linear predictor vector has the usual form of a mean function in ordinary linear regression

η = X β

where X is the so-called design matrix or model matrix for the problem (the rows are cases and the columns are predictor variables) and β is a vector of regression coefficients.

Kyphosis is a misalignment of the spine. The data are on 83 laminectomy (a surgical procedure involving the spine) patients. The predictor variables are age and age^2 (that is, a quadratic function of age), number of vertebrae involved in the surgery and start the vertebra number of the first vertebra involved. The response is presence or absence of kyphosis after the surgery (and perhaps caused by it).

Practice

Comments

  1. The command pred <- predict(out2, type = "response") estimates the mean value parameter vector p. If type = "link" (the default) it produces the linear predictor η, which is not what we need for the next item.
  2. We use predicted values for out2 because that is the fit for the small model (null hypothesis) and you always simulate under the null hypothesis when doing a test.
  3. The command kyphosis.star <- rbinom(n, 1, pred) simulates new Bernoulli data with mean value parameter vector p. That's what the parametric bootstrap requires.
  4. We save both the bootstrap values of the test statistic dev.star and the P-value pev.star (which can also be thought of as a test statistic with the proviso that we reject the null for low values of pev as opposed to high values of dev and other usual test statistics).
  5. The plots serve as a diagnostic of whether the parametric bootstrap is necessary and also show what it does.
    1. In the first plot hist(dev.star) the histogram should approximate a chi-square distribution with 2 degrees of freedom, but this is a bit hard to see.
    2. In the second plot hist(pev.star) the histogram should approximate a uniform distribution on the interval (0, 1), and it is clear that, although the approximation is not too bad, it could be better.
    3. In the third plot, the theoretical Q-Q plot of pev.star against the quantiles of the uniform(0, 1) distribution, again the approximation is not bad but could be better. In this last plot, the operation of the parametric bootstrap is to use the curve rather than the straight line in computing the (bootstrap, corrected) P-value. Hence the difference between the curve (which is the empirical CDF of the bootstrap P-values) and the straight line (which is the CDF of the uniform(0,1) distribution) at the abscissa corresponding to the ordinary P-value derived from believing the asymptotics.