University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb Computing Examples

Stat 5601 (Geyer) Examples (More on Bootstrapping Regression)

General Instructions
Nonparametric Regression: Loess
Nonparametric Regression: Smoothing Splines
Estimation of Mean Square Prediction Error
The Moral of the Story
Bootstrap Estimate of Bias
Bootstrap Estimate of Bias, Take 2
Nonparametric Regression: Kernel Smoothing
Bootstrap Estimate of Bias, Take 3

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Nonparametric Regression: Loess

Section 7.3 in Efron and Tibshirani.

Comments

LOESS (also called LOWESS) is a very complicated nonparametric regression procedure is done by the R function loess ( on-line help) or an older function lowess ( on-line help) that does not work like the most other R regression functions.
We would really need a whole semester to understand how nonparametric regression works. Efron and Tibshirani (Section 7.3) make an attempt to explain this, but it is very sketchy. We'll just treat the loess function as a black box that does regression prediction.
I originally ran it with just the defaults
out <- loess(y ~ x)
but it crashed during the bootstrap. Some grovelling in the on-line help for the function predict.loess revealed
When the fit was made using surface="interpolate" (the default), predict.loess will not extrapolate so points outside an axis-aligned hypercube enclosing the original data will have missing (NA) predictions and standard errors.

So that means we have to change the definition of surface, wherever that is. Now looking back in the on-line help for the function loess we don't see anything about surface but there is a link to the on-line help for the function loess.control from which we see that setting the control argument to loess to loess.control(surface = "direct") is what's needed.
If that seems clear as mud. It is. If you want to become an R expert, this kind of grovelling in the documentation is part of the process.

Nonparametric Regression: Smoothing Splines

Section 7.3 in Efron and Tibshirani.

Comments

Cubic smoothing splines with smoothing parameter chosen by generalized cross-validation is another very complicated nonparametric regression procedure, this one is done by the R function gam ( on-line help).
This procedure is perhaps even more complicated than LOESS. At least its theory is more complicated. So, as with LOESS, we'll just treat it as a black box that does regression prediction.
Note that the curves gam are generally more wiggly than the curves loess draws.

Bootstrap Estimate of Prediction Error

So which does a better job? From looking at the color smears, it seems that LOESS is better. It certainly makes a narrower smear.

So if (that's a very big if) those color smears could be thought of as confidence bands (as many people do), that would settle the issue.

But in this section we do something that is actually correct. We calculate a bootstrap estimate of the mean square prediction error.

Sections 7.3 and 17.6 in Efron and Tibshirani.

library(bootstrap)
data(cholost)
x <- cholost[ , 1]
y <- cholost[ , 2]

library(modreg)
library(mgcv)

n <- length(x)
nboot <- 250
lmse.star <- double(nboot)
smse.star <- double(nboot)
for (i in 1:nboot) {
    k.star <- sample(n, replace = TRUE)
    x.star <- x[k.star]
    y.star <- y[k.star]
    # LOESS
    out.star <- loess(y.star ~ x.star,
        control = loess.control(surface = "direct"))
    pred.star <- predict(out.star,
        newdata = data.frame(x.star = x))
    lmse.star[i] <- mean((y - pred.star)^2)
    # cubic smoothing spline
    out.star <- gam(y.star ~ s(x.star, bs = "cr"))
    pred.star <- predict(out.star,
        newdata = data.frame(x.star = x))
    smse.star[i] <- mean((y - pred.star)^2)
}
mean(lmse.star)
mean(smse.star)

par(mfrow = c(2, 1))
foo <- hist(lmse.star, plot = FALSE)
bar <- hist(smse.star, plot = FALSE)
baz <- range(foo$breaks, bar$breaks)
hist(lmse.star, xlim = baz,
    xlab = "mean square prediction error",
    main = "Histogram for LOESS")
hist(smse.star, xlim = baz,
    xlab = "mean square prediction error",
    main = "Histogram for Smoothing Spline")
par(mfrow = c(1, 1))
plot(lmse.star, smse.star, xlab = "MSPE for LOESS",
    ylab = "MSPE for Smoothing Spline", xlim = baz,
    ylim = baz)
abline(0, 1)
cat("Calculation took", proc.time()[1], "seconds\n")

External Data Entry

Enter a dataset URL :

Comments

The bootstrapping is just the same as in the preceding examples, except for two things.
- We apply both procedures (loess and gam) to each bootstrap data set.
- We save the bootstrap estimate of mean square prediction error (MSPE). This is calculated (twice, once for each procedure) by the R code
```
pred.star <- predict(out.star, newdata = data.frame(x.star = x))
mean((y - pred.star)^2)
```
  Note that pred.star is the predicted y values corresponding to the original x values, even though the fit out.star corresponds to the bootstrap data (x.star, y.star). Then y - pred.star is the predicted residuals at the original x values.
  The reason for this is explained in Section 17.6 in Efron and Tibshirani
Having calculated MSPE for both procedures, we plot it two ways.
- First, we look at one on top of the other histograms with scales aligned, so they are comparable.
- Second, we look at a scatter plot. The line is the where the two MSPE's are equal. Note that there are only a few points above the line. Those correspond to the bootstrap samples where LOESS does better than the smoothing spline (as measured by MSPE). So LOESS is usually worse.

The Moral of the Story

Bootstrap color smears aren't very much like confidence intervals, although many people treat them as confidence intervals.

MSPE is the sum of three terms

error variance
prediction variance
prediction bias squared

The color smears only reflect the second term. They completely ignore the bias term. (The first term is inherent in the problem and cannot be reduced by better estimation). In fact, the more biased a procedure is the better it looks on the color smears because there is usually a bias-variance trade-off. Procedure parameters can be adjusted to lower term 2 or to lower term 3, but not both at the same time. Term 2 can be made nearly zero if one doesn't mind term 3 skyrocketting. The color smears reflecting only term 2 is a good way to make this mistake.

As far as I know, good bootstrap confidence bands for nonparametric regression is an open research problem (meaning nobody really knows how to do that).

Bootstrap Estimate of Bias

Bias in regression is an issue about conditional expectation. Is the conditional expectation of the response variable given the predictor variable equal to the conditional expectation of the regression predictions given by the nonparametric regression routine?

Since it is a question about the conditional model, we should bootstrap residuals not cases.

External Data Entry

Enter a dataset URL :

Looking at the plot we see the black line (regression prediction from the nonparametric routine) is biased. The blue line (the bootstrap bias corrected estimate) is more wiggly. This is typical of nonparametric regression. The estimates are always biased and always less wiggly than the truth. They erode the peaks and fill in the valleys says a catchphrase about this.

So why don't we use this bias-corrected estimate? Isn't bias bad and bias correction good? No. If we compared mean square prediction error for the procedure implemented by the routine an our supposedly better bias-corrected predictions, we would find that decreasing the bias increases mean square prediction error! There is a bias-variance trade-off, and the routine R provides, choses the right spot on the bias-variance trade-off curve (this is shown by one of the plots made by the gam.check function).

So some bias is good! Attempting to be unbiased is the stupidest thing you can do in nonparametric regression.

Bootstrap Estimate of Bias, Take 2

Because of the way nonparametric regression works, the more wiggly the true regression curve, the more the bias. So once we decide the true regression curve is more wiggly than we previously thought, we have to conclude that the bias is also more than we now think. Thus we should iterate the bias estimation.

library(bootstrap)
data(cholost)
x <- cholost[ , 1]
y <- cholost[ , 2]

library(modreg)
library(mgcv)

plot(x, y)
out <- gam(y ~ s(x, bs = "cr"))
curve(predict(out, newdata = data.frame(x = x)),
    add = TRUE)
pred <- predict(out)
resid <- y - pred

save.pred <- NULL
save.epred <- NULL
save.bias <- NULL
pred.i <- pred

nboot <- 250
epred.star <- double(length(x))
for (i in 1:4) {
    set.seed(13571117) # common random numbers!
    for (j in 1:nboot) {
        y.star <- pred.i + sample(resid, replace = TRUE)
        out.star <- gam(y.star ~ s(x, bs = "cr"))
        pred.star <- predict(out.star)
        epred.star <- epred.star + pred.star
    }
    epred.star <- epred.star / nboot
    save.epred <- cbind(save.epred, epred.star)
    bias <- epred.star - pred.i
    save.bias <- cbind(save.bias, bias)
    pred.i <- pred - bias
    save.pred <- cbind(save.pred, pred.i)
}

curve(predict(out, newdata = data.frame(x = x)),
    ylab = "uncorrected and bias-corrected predicted values")
xsort <- sort(unique(x))
isort <- match(xsort, x)
matlines(xsort, save.pred[isort, ], col = "blue", lty = 1)

matplot(xsort, save.bias[isort, ], col = "red", type = "l",
    lty = 1, ylab = "bias corrections")

curve(predict(out, newdata = data.frame(x = x)),
    ylab ="bootstrap conditional expectation of predicted values")
matlines(xsort, save.epred[isort, ], col = "green", type = "l", lty = 1)

matplot(xsort, sweep(save.epred, 1, pred)[isort, ], type = "l", lty = 1,
    ylab ="bootstrap conditional expectation of predicted values minus original predicted values", col = "magenta")
abline(h = 0)

cat("Calculation took", proc.time()[1], "seconds\n")

External Data Entry

Enter a dataset URL :

Nonparametric Regression: Kernel Smoothing

Section 7.3 in Efron and Tibshirani.

Comments

Kernel smoothing with smoothing parameter chosen by cross-validation is another very complicated nonparametric regression procedure, this one is done by the R functions sm.regression ( on-line help) and hcv ( on-line help).
As far as this web page goes, we will treat this procedure as a black box too. How it works was explained in class.
We can't bootstrap cases with this because cross-validation crashes. When the bootstrap sample has duplicated cases (it almost always does) one gets perfect prediction at all the duplicated cases with bandwidth zero, that is, the leave-one-out residuals are zero for those cases at bandwidth zero, and this seems to really mess up the hcv function. So we will only bootstrap residuals (last section).

Bootstrap Estimate of Bias, Take 3

The final story. The plots are the bias-corrected (with iteration) regression curve calculated in take 2 with color smears for bootstrap of this curve plus random residuals.

library(bootstrap)
data(cholost)
x <- cholost[ , 1]
y <- cholost[ , 2]

library(modreg)
library(mgcv)
library(sm)

out <- gam(y ~ s(x, bs = "cr"))
pred <- predict(out)
resid <- y - pred

save.pred <- NULL
save.epred <- NULL
save.bias <- NULL
pred.i <- pred

xsort <- sort(unique(x))
isort <- match(xsort, x)

pred <- save.pred[ , ncol(save.pred)]
nboot <- 100

set.seed(135798642) # common random numbers!
plot(x, pred + resid, ylab = "y",
    main = "Color Smear Plot for LOESS")
for (i in 1:nboot) {
    resid.star <- sample(resid, replace = TRUE)
    y.star <- pred + resid.star
    out.star <- loess(y.star ~ x,
        control = loess.control(surface = "direct"))
    curve(predict(out.star, data.frame(x = x)),
        add = TRUE, col = "turquoise")
}
points(x, pred + resid)
lines(xsort, pred[isort], lwd = 2)

set.seed(135798642) # common random numbers!
plot(x, pred + resid, ylab = "y",
    main = "Color Smear Plot for Smoothing Spline")
for (i in 1:nboot) {
    resid.star <- sample(resid, replace = TRUE)
    y.star <- pred + resid.star
    out.star <- gam(y.star ~ s(x, bs = "cr"))
    curve(predict(out.star, data.frame(x = x)),
        add = TRUE, col = "peachpuff")
}
points(x, pred + resid)
lines(xsort, pred[isort], lwd = 2)

set.seed(135798642) # common random numbers!
plot(x, pred + resid, ylab = "y",
    main = "Color Smear Plot for Kernel Smooth")
usr <- par("usr")
xx <- seq(usr[1], usr[2], length = 1001)
for (i in 1:nboot) {
    resid.star <- sample(resid, replace = TRUE)
    y.star <- pred + resid.star
    h.star <- hcv(x, y.star)
    sm.regression(x, y.star, h.star,
        add = TRUE, col = "lightblue")
}
points(x, pred + resid)
lines(xsort, pred[isort], lwd = 2)

cat("Calculation took", proc.time()[1], "seconds\n")

External Data Entry

Enter a dataset URL :

Stat 5601 (Geyer) Examples (More on Bootstrapping Regression)

Contents

Section 7.3 in Efron and Tibshirani.

Section 7.3 in Efron and Tibshirani.

Sections 7.3 and 17.6 in Efron and Tibshirani.

Section 7.3 in Efron and Tibshirani.