# Stat 5601 (Geyer) Examples (Bootstrapping Regression)

## General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

## Bootstrapping Cases

### Sections 9.6 and 9.7 in Efron and Tibshirani.

• The on-line help for the function `lmsreg` is clear as mud. The name seems to suggest that it does least median of squares regression, but the documentation doesn't say that anywhere.

However, if we decipher the jargon, it says:

• The function `lmsreg` is a compatibility wrapper for the function `lqs`.

• Using `lmsreg` forces `method = "lms"` in `lqs`.

• In the Details section it says the function minimizes some function of the sorted squared residuals, for `method = "lms"` the `quantile` squared residual. And the default for `quantile` (for this method) is `floor((n+1)/2)`. In short, a median.

• Woof!

• Our LMS regression coefficients
```Rweb:> coefficients(fred)
(Intercept)          V1     I(V1^2)
0.64916667 -1.20893617  0.04478648
```
don't agree with the ones reported in Efron and Tibshirani.

LMS regression is notoriously computationally difficult. The algorithm must look at all `choose(n, p + 1)` subsets of the cases to find the regression that exactly fits `p + 1` points which minimizes the least median of squares (where `p` is the number of nonconstant predictors).

Of course, in this toy data set, ` choose(14, 3) = 364` is not very large. So the algorithm does find the exact solution (according to the documentation).

But many older LMS regression algorithms only sample subsets of size `p + 1` and hence do not find the exact solution (as does the `lqs` function when `choose(n, p + 1)` is greater than 5000). Then LMS presumably stands for low (somewhat) rather than least median of squares.

Maybe Efron and Tibshirani used a buggy LMS regression routine (or maybe we are), or maybe theirs just used sampling and didn't find the right answer. Or maybe I'm not doing this right. Our picture looks sensible, though.

• The code
```points(V1, V2, pch = 16)
curve(predict(fred, data.frame(V1 = x)),
add = TRUE, lwd = 2)
```
puts the original data and LMS regression curve back on the plot after they have been painted over in red by all of the bootstrap LMS regression lines drawn by the `curve` statement in the loop. The `pch = 16` and `lwd = 2` make heavier points and lines.

If you don't want this to take so long, making `nboot` smaller draws the picture faster (though, of course, it makes the bootstrap standard errors less accurate).

• Efron and Tibshirani don't plot bootstrap regression lines in this section, but they should. The bizarreness of the picture casts doubt on the sensibleness of the whole procedure (how's that for academic weasel wording).

## Bootstrapping Residuals

### Sections 9.6 and 9.7 in Efron and Tibshirani.

• The residuals and predicted values for the original data are assigned to the vectors `resid` and `pred` for use in the bootstrap.
• The bootstrap data, which were `x.star` and `y.star` above (when bootstrapping cases) are now `V1` and `y.star` with
```k <- sample(1:n, replace = TRUE)