University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb
lmsreg is clear as mud.
The name seems to suggest that it does least median of
squares regression, but the documentation doesn't say that anywhere.
However, if we decipher the jargon, it says:
lmsreg is a compatibility wrapperfor the function
lqs.
lmsreg forces method = "lms" in
lqs.
Detailssection it says the function minimizes some function of the sorted squared residuals, for
method = "lms" the
quantile squared residual. And the default
for quantile (for this method) is floor((n+1)/2).
In short, a median.
Rweb:> coefficients(fred) (Intercept) V1 I(V1^2) 0.64916667 -1.20893617 0.04478648don't agree with the ones reported in Efron and Tibshirani.
LMS regression is notoriously computationally difficult. The algorithm
must look at all choose(n, p + 1) subsets of the cases to
find the regression that exactly fits p + 1 points which
minimizes the least median of squares (where p is the number
of nonconstant predictors).
Of course, in this toy data set,
choose(14, 3) = 364 is not very large. So the algorithm
does find the exact solution (according to the documentation).
But many older LMS regression algorithms only sample subsets of size
p + 1 and hence do not find the exact solution (as does
the lqs function when choose(n, p + 1) is
greater than 5000). Then LMS presumably stands for low (somewhat)
rather than least
median of squares.
Maybe Efron and Tibshirani used a buggy LMS regression routine (or maybe we are), or maybe theirs just used sampling and didn't find the right answer. Or maybe I'm not doing this right. Our picture looks sensible, though.
points(V1, V2, pch = 16) curve(predict(fred, data.frame(V1 = x)), add = TRUE, lwd = 2)puts the original data and LMS regression curve back on the plot after they have been painted over in red by all of the bootstrap LMS regression lines drawn by the
curve statement
in the loop. The pch = 16 and lwd = 2
make heavier points and lines.
If you don't want this to take so long, making nboot
smaller draws the picture faster (though, of course, it makes the
bootstrap standard errors less accurate).
resid and pred
for use in the bootstrap.
x.star and y.star
above (when bootstrapping cases) are now V1 and y.star
with
k <- sample(1:n, replace = TRUE) y.star <- pred + resid[k]Note that the X values don't change. This bootstrap attempts to simulate the conditional distribution of Y given X, so, as always with conditional probability, the variables (here X) we are conditioning on are treated as if constant. Hence the bootstrap leaves them alone. We construct the new Y by attaching random residuals (a bootstrap sample of residuals) to the original predicted values.
engineering details, also called
nuts and bolts, of the procedures.