University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb
library(bootstrap)
says we are going to
use code in the bootstrap
library, which is not available
without this command.
In this example, it is a fair guess that the name of the data set may
have something to do with mice. The only ones on the list that look
like that are mouse.c
and mouse.t
.
Since the actual help pages for the data sets are completely useless,
saying only See Efron and Tibshirani (1993) for details on these
datasets
but not giving a page number or anything else of the slightest
help in identifying which data set it is, the only way to find out whether
these really are the data sets we want is to look at them. For example,
data(mouse.t) mouse.tprints
mouse.t
and, sure enough, these are the 7 numbers
in the treatment group for the mouse data.
data
statement, as in the example.
stripchart
statement makes the dot plotshowing the data (as points on the number line).
for
loop makes nboot
bootstrap samples
for the sample means of the treatment and control groups
(mean.t
and mean.c
, respectively).
mean
function were replaced by the median
function both places it is used in the for
loop, the we would
be bootstrapping the sample median.
trim = 0.25
were added to the
arguments of the mean
function both places it is used in
the for
loop, the we would be bootstrapping the 25% trimmed mean.
foo
function calculates a point
estimate, then replacing mean
by foo
has us bootstrapping the estimate foo
calculates.
library
and data
statements.
so that's what we used here.La.eigen
is preferred toeigen
for new projects
for
loop and we save all the junk
we want to know about in five data structures
eigenval.1
eigenval.2
eigenval.sum
eigenvec.1
eigenvec.2
scor
,
we do the sampling in two steps
k <- sample(1:n, replace = TRUE) scor.star <- scor[k, ]First we generate the vector
k
which contains the indices
(subscripts) of the data vectors that go into the bootstrap sample.
Then we use the R subscripting operations to generate the corresponding
data structure.
scor
is a matrix and so is scor.star
.
Every row of scor.star
is a row of scor
.
The i
-th row of scor.star
is the
k[i]
-th row of scor
.
And that's exactly what we want since k[i]
is a random
integer in the range of allowed row indices of scor
.
var(...)
with
var(...) * (n - 1) / n
because Efron and Tibshirani use
the variance of the empirical distribution (divide by n
)
rather than the so-called sample variance(divide by
n - 1
)
which is what the R var
function calculates.
sample
#
) is an R comment
(ignored by the computer).
patch up signs . . . .Solves the problem that Efron and Tibshirani moan about on the bottom half of p. 69 the right way rather than the wrong way. If the sign is arbitrary, fix the signs to be consistent rather than just throwing out the ones with the
wrongsigns, which may bias the bootstrap calculation.
Here we calculate the inner product with the corresponding eigenvector for the original data and adjust the signs of the bootstrap eigenvectors so the inner products are all positive.
boxplot
function wants a list of vectors rather than a
matrix. The data.frame
converts a matrix into a list of vectors
which are the columns of the matrix. That's why we use that.
You may be wondering how anyone is supposed to come up with that. Simple. The on-line help for the boxplot function has an example illustrating this trick. I just copied the example.
var(eigenvec.1)
and var(eigenvec.2)
are those variance matrices.
But it's not easy to interpret a five dimensional variance matrix. So
we take a hint from the nature of the problem. If eigenvalues and eigenvectors
are in general a good way to look at (symmetric) matrices, then they are
in particular a good way to look at
var(eigenvec.1)
and var(eigenvec.2)
.
Hence we look at the eigenvalues and eigenvectors (of the variance matrix of the bootstrap sampling distribution of an eigenvector of the data). (eigenvectors of . . . an eigenvector! Woof!)
The eigenvalues are a bit hard to interpret because of the scientific notation. Here they are in the same form
0.005101354 0.002037788 0.001290034 0.000786616 0.000026688 0.035153496 0.017123143 0.004842430 0.002551065 0.001749651
It is clear that all of the eigenvalues for the variance of the first eigenvector are much smaller than the corresponding eigenvalues for the second eigenvector. Hence the first eigenvector is much less variable than the second.