University of Minnesota, Twin Cities     School of Statistics     Stat 5601     Rweb     Computing Examples

Stat 5601 (Geyer) Examples (More on Bootstrapping Regression)

Contents

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Nonparametric Regression: Loess

Section 7.3 in Efron and Tibshirani.

External Data Entry

Enter a dataset URL :

Comments

Nonparametric Regression: Smoothing Splines

Section 7.3 in Efron and Tibshirani.

External Data Entry

Enter a dataset URL :

Comments

Bootstrap Estimate of Prediction Error

So which does a better job? From looking at the color smears, it seems that LOESS is better. It certainly makes a narrower smear.

So if (that's a very big if) those color smears could be thought of as confidence bands (as many people do), that would settle the issue.

But in this section we do something that is actually correct. We calculate a bootstrap estimate of the mean square prediction error.

Sections 7.3 and 17.6 in Efron and Tibshirani.

External Data Entry

Enter a dataset URL :

Comments

The Moral of the Story

Bootstrap color smears aren't very much like confidence intervals, although many people treat them as confidence intervals.

MSPE is the sum of three terms

  1. error variance
  2. prediction variance
  3. prediction bias squared
The color smears only reflect the second term. They completely ignore the bias term. (The first term is inherent in the problem and cannot be reduced by better estimation). In fact, the more biased a procedure is the better it looks on the color smears because there is usually a bias-variance trade-off. Procedure parameters can be adjusted to lower term 2 or to lower term 3, but not both at the same time. Term 2 can be made nearly zero if one doesn't mind term 3 skyrocketting. The color smears reflecting only term 2 is a good way to make this mistake.

As far as I know, good bootstrap confidence bands for nonparametric regression is an open research problem (meaning nobody really knows how to do that).

Bootstrap Estimate of Bias

Bias in regression is an issue about conditional expectation. Is the conditional expectation of the response variable given the predictor variable equal to the conditional expectation of the regression predictions given by the nonparametric regression routine?

Since it is a question about the conditional model, we should bootstrap residuals not cases.

External Data Entry

Enter a dataset URL :

Looking at the plot we see the black line (regression prediction from the nonparametric routine) is biased. The blue line (the bootstrap bias corrected estimate) is more wiggly. This is typical of nonparametric regression. The estimates are always biased and always less wiggly than the truth. They erode the peaks and fill in the valleys says a catchphrase about this.

So why don't we use this bias-corrected estimate? Isn't bias bad and bias correction good? No. If we compared mean square prediction error for the procedure implemented by the routine an our supposedly better bias-corrected predictions, we would find that decreasing the bias increases mean square prediction error! There is a bias-variance trade-off, and the routine R provides, choses the right spot on the bias-variance trade-off curve (this is shown by one of the plots made by the gam.check function).

So some bias is good! Attempting to be unbiased is the stupidest thing you can do in nonparametric regression.

Bootstrap Estimate of Bias, Take 2

Because of the way nonparametric regression works, the more wiggly the true regression curve, the more the bias. So once we decide the true regression curve is more wiggly than we previously thought, we have to conclude that the bias is also more than we now think. Thus we should iterate the bias estimation.

External Data Entry

Enter a dataset URL :

Nonparametric Regression: Kernel Smoothing

Section 7.3 in Efron and Tibshirani.

External Data Entry

Enter a dataset URL :

Comments

Bootstrap Estimate of Bias, Take 3

The final story. The plots are the bias-corrected (with iteration) regression curve calculated in take 2 with color smears for bootstrap of this curve plus random residuals.

External Data Entry

Enter a dataset URL :