To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.
Everything pretty obvious here.
ratio
calculates the estimator we
are investigating.
k.star
trick.
theta.star
minus theta.hat
. This is the obvious analog in the bootstrap worldof the actual bias, which is the mean of
theta.hat
minus the true unknown parameter value theta
.
ImprovedBias Estimation
rmean
and rratio
that calculate the
ratio estimator.
These functions use what Efron and Tibshirani call the resampling vector (pp. 130-132) and the resampling form (pp. 189-190) of the estimator.
The resampling vector is the vector of weights given to the original data points in a resample X1*, . . ., Xn*. The weight pi* given to the original data point Xi is the fraction of times Xi appears in the resample. This is calculated by the statement
p.star <- tabulate(k.star, n) / nin the bootstrap loop, where
k.star
is the by now familiar resample of indices.
The analogous vector for the original sample is calculated by the statement
p.star <- rep(1 / n, n)
We now have to write a function that calculates the estimator given
the data y
and z
the resampling vector
p.star
.
Unfortunately, this is, in general, hard.
Fortunately, this is, for moments, quite straightforward.
For any function g
, any data vector x
,
and any probability vector p
, the expression
sum(g(x) * p)
p[i]
to the
point x[i]
for each i
(and probability zero to
everywhere else).
Thus
sum(x * p)calculates the mean
sum((x - a)^2 * p)calculates the second moment about the point
a
, and so forth.
rmean(p, x)
calculates the sample mean of the data
vector x
in resampling form.
The stop
commands for various error situations are, of course,
not required. If the function call is done properly they don't do anything.
But it will save you endless hours of head scratching sometime if you get
in the habit of putting error checks in the functions you write.
The function rratio(p, x, y)
calculates the ratio estimator
for data x
and y
using the rmean
function in the obvious fashion.
p.bar
accumulates the
sum of the p.star
vectors. After the bootstrap loop terminates,
it is divided by nboot
to give the average of
the p.star
vectors.
nboot
were infinity, p.bar
would be
the same as p.hat
. Since nboot
is considerably
less than infinity, p.bar
is different from p.hat
Since sd(theta.star)
is based on resamples that yielded
the p.bar
vector, it makes sense to subtract off
rratio(p.bar, y, z)
rather than theta.hat
to estimate bias.
The logic is that that the Monte Carlo errors in sd(theta.star)
and rratio(p.bar, y, z)
tend to be in the same direction
and cancel to some degree, giving an improved
estimator.
resampling formis an important bootstrap technique, which will be used again for the ABC
betterbootstrap confidence interval technique.