Statistics 8053 (Geyer, Fall 2014) Parametric Bootstrap Confidence Intervals

General Instructions

To do each example, just click the Submit button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Theory

The parametric bootstrap, like the name says, simulates from the parametric model. We say bootstrap rather than simulation because the former term recognizes that we are doing the wrong thing (simulating under the distribution indicated by our parameter estimate rather than under the true unknown distribution).

Like the nonparametric bootstrap, there are presumably many different ways to construct parametric bootstrap confidence intervals, although the literature on the subject is thin (the nonparametric bootstrap gets most of the literature).

But we just illustrate one method: the parametric analog of nonparametric bootstrap t confidence intervals.

Practice

We are going to do a logistic regression example.

The response variable in this problem kyphosis is categorical with values present or absent which we model as independent but not identically distributed Bernoulli random variables.

Kyphosis is a misalignment of the spine. The data are on 83 laminectomy (a surgical procedure involving the spine) patients. The predictor variables are age and age^2 (that is, a quadratic function of age), number of vertebrae involved in the surgery and start the vertebra number of the first vertebra involved. The response is presence or absence of kyphosis after the surgery (and perhaps caused by it).

R statements conf.level <- 0.95 print(X) out <- glm(kyphosis ~ age + I(age^2) + number + start, family = "binomial") sout <- summary(out) print(sout) names(sout) ### usual asymptotic c. i. for coefficient of age^2 theta.hat <- sout$coefficients["I(age^2)", "Estimate"] se.theta.hat <- sout$coefficients["I(age^2)", "Std. Error"] zcrit <- qnorm((1 + conf.level) / 2) zcrit theta.hat + c(-1, 1) * zcrit * se.theta.hat ### now (parametric) bootstrap this pred <- predict(out, type = "response") n <- length(kyphosis) nboot <- 999 theta.star <- double(nboot) se.theta.star <- double(nboot) for (i in 1:nboot) { kyphosis.star <- rbinom(n, 1, pred) out.star <- glm(kyphosis.star ~ age + I(age^2) + number + start, family = "binomial") sout.star <- summary(out.star) theta.star[i] <- sout.star$coefficients["I(age^2)", "Estimate"] se.theta.star[i] <- sout.star$coefficients["I(age^2)", "Std. Error"] } ### bootstrap asymptotically pivotal quantity z.star <- (theta.star - theta.hat) / se.theta.star hist(z.star) qqnorm(z.star) abline(0, 1) ### appropriate quantiles of this distribution alpha <- 1 - conf.level ilow <- (nboot + 1) * alpha / 2 ihig <- (nboot + 1) * (1 - alpha / 2) ilow <- round(ilow) ihig <- round(ihig) zcrit.low <- sort(z.star)[ilow] zcrit.hig <- sort(z.star)[ihig] c(zcrit.low, zcrit.hig) ### parametric bootstrap confidence interval theta.hat - c(zcrit.hig, zcrit.low) * se.theta.hat ### alternative method for quantiles (not so nice IMHO) zcrit.both <- quantile(z.star, probs = c(alpha / 2, 1 - alpha / 2)) names(zcrit.both) <- NULL zcrit.both ### parametric bootstrap confidence interval theta.hat - rev(zcrit.both) * se.theta.hat cat("Calculation took", proc.time()[1], "seconds\n")

Dataset URL

Comments

The command pred <- predict(out2, type = "response") estimates the mean value parameter vector.
The command kyphosis.star <- rbinom(n, 1, pred) simulates new Bernoulli data with mean value parameter vector. That's what the parametric bootstrap requires.
We save both the bootstrap values of the estimator and its standard error.
Then we make the bootstrap analog of the asymptotically pivotal (asymptotically standard normal) quantity (theta.hat - theta) / se.theta.hat.
The warnings show that we are actually far from asymptopia. In some bootstrap samples we have MLE at infinity (perhaps), and in any case are in trouble with the asymptotics. But we do not throw these out of the bootstrap distribution. They reflect the actual performance of logistic regression for these data and this model.
The histogram shows the sampling distribution of this asymptotically pivotal quantity is not only not standard normal, it is both skewed and biased.
It just turns out that the bootstrap critical values do not reflect the skewness, but this is only because of the confidence level we chose. They would be very different from plus or minus the same quantity if we chose a 99% confidence level.

Statistics 8053 (Geyer, Fall 2014) Parametric Bootstrap Confidence Intervals

General Instructions

Theory

Practice

Comments

Navigation

Contents