To do each example, just click the
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
Section 10.3 in Efron and Tibshirani.
Everything pretty obvious here.
- The function
ratiocalculates the estimator we are investigating.
- The bootstrap is just like other bootstraps we have done that
- The bootstrap estimate of bias is the mean of the
theta.hat. This is the obvious analog in the
bootstrap worldof the actual bias, which is the mean of
theta.hatminus the true unknown parameter value
Improved Bias Estimation
Section 10.4 in Efron and Tibshirani.
The main comment is about the rather strange form of the functions
rratiothat calculate the ratio estimator.
These functions use what Efron and Tibshirani call the resampling vector (pp. 130–132) and the resampling form (pp. 189–190) of the estimator.
The resampling vector is the vector of weights given to the original data points in a resample X1*, . . ., Xn*. The weight pi* given to the original data point Xi is the fraction of times Xi appears in the resample. This is calculated by the statement
p.star <- tabulate(k.star, n) / nin the bootstrap loop, where
k.staris the by now familiar resample of indices. The analogous vector for the original sample is calculated by the statement
p.star <- rep(1 / n, n)
We now have to write a function that calculates the estimator given the data
zthe resampling vector
Unfortunately, this is, in general, hard.
Fortunately, this is, for moments, quite straightforward.
For any function
g, any data vector
x, and any probability vector
p, the expression
sum(g(x) * p)
p[i]to the point
i(and probability zero to everywhere else).
sum(x * p)calculates the mean
sum((x - a)^2 * p)calculates the second moment about the point
a, and so forth.
rmean(p, x)calculates the sample mean of the data vector
xin resampling form.
stopcommands for various error situations are, of course, not required. If the function call is done properly they don't do anything. But it will save you endless hours of head scratching sometime if you get in the habit of putting error checks in the functions you write.
rratio(p, x, y)calculates the ratio estimator for data
rmeanfunction in the obvious fashion.
In the bootstrap loop the vector
p.baraccumulates the sum of the
p.starvectors. After the bootstrap loop terminates, it is divided by
nbootto give the average of the
p.barwould be the same as
nbootis considerably less than infinity,
p.baris different from
sd(theta.star)is based on resamples that yielded the
p.barvector, it makes sense to subtract off
rratio(p.bar, y, z)rather than
theta.hatto estimate bias.
The logic is that that the Monte Carlo errors in
rratio(p.bar, y, z)tend to be in the same direction and cancel to some degree, giving an
This method of expressing estimators in
resampling formis an important bootstrap technique, which will be used again for the ABC
betterbootstrap confidence interval technique.
Resampling Form Estimators
rmedian calculates the median of a bootstrap
sample given in
The code is a bit tricky. The statement
n.star <- round(p * n)converts
pback to counts, the
roundfunction being there to make sure the result is exactly integer-valued (not just close). Then the statement
k.star <- rep(1:n, n.star)converts these back to the index values that were counted: each element of the sequence
1:nis repeated as many times as the corresponding count in
n.star. The resulting
k.starinside the function definition is just like the k.star outside the function definition except for order, which doesn't matter. Then we can use
x.starin the usual way, and apply the function that computes the estimator to
x.starin the usual way.
We try it out, and indeed do get the same answers either way.
Clearly, this function has nothing particular to do with medians. Changing the last line lets it calculate any other function.