# Statistics 5601 (Geyer, Fall 2003) Examples: Bootstrap Bias Correction

## General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

## Bias Estimation

### Section 10.3 in Efron and Tibshirani.

External Data Entry

Enter a dataset URL :

Everything pretty obvious here.

• The function `ratio` calculates the estimator we are investigating.

• The bootstrap is just like other bootstraps we have done that use the `k.star` trick.

• The bootstrap estimate of bias is the mean of the `theta.star` minus `theta.hat`. This is the obvious analog in the bootstrap world of the actual bias, which is the mean of `theta.hat` minus the true unknown parameter value `theta`.

## Improved Bias Estimation

### Section 10.3 in Efron and Tibshirani.

External Data Entry

Enter a dataset URL :

• The main comment is about the rather strange form of the functions `rmean` and `rratio` that calculate the ratio estimator.

These functions use what Efron and Tibshirani call the resampling vector (pp. 130-132) and the resampling form (pp. 189-190) of the estimator.

The resampling vector is the vector of weights given to the original data points in a resample X1*, . . ., Xn*. The weight pi* given to the original data point Xi is the fraction of times Xi appears in the resample. This is calculated by the statement

```p.star <- tabulate(k.star, n) / n
```
in the bootstrap loop, where `k.star` is the by now familiar resample of indices. The analogous vector for the original sample is calculated by the statement
```p.star <- rep(1 / n, n)
```

We now have to write a function that calculates the estimator given the data `y` and `z` the resampling vector `p.star`.

Unfortunately, this is, in general, hard.

Fortunately, this is, for moments, quite straightforward.

For any function `g`, any data vector `x`, and any probability vector `p`, the expression

`sum(g(x) * p)`
calculates the expectation of the random variable g(X) in the probability model that assigns probability `p[i]` to the point `x[i]` for each `i` (and probability zero to everywhere else).

Thus

```sum(x * p)
```
calculates the mean
```sum((x - a)^2 * p)
```
calculates the second moment about the point `a`, and so forth.

• The function `rmean(p, x)` calculates the sample mean of the data vector `x` in resampling form.

The `stop` commands for various error situations are, of course, not required. If the function call is done properly they don't do anything. But it will save you endless hours of head scratching sometime if you get in the habit of putting error checks in the functions you write.

The function `rratio(p, x, y)` calculates the ratio estimator for data `x` and `y` using the `rmean` function in the obvious fashion.

• In the bootstrap loop the vector `p.bar` accumulates the sum of the `p.star` vectors. After the bootstrap loop terminates, it is divided by `nboot` to give the average of the `p.star` vectors.

• Ideally, if `nboot` were infinity, `p.bar` would be the same as `p.hat`. Since `nboot` is considerably less than infinity, `p.bar` is different from `p.hat`

Since `sd(theta.star)` is based on resamples that yielded the `p.bar` vector, it makes sense to subtract off `rratio(p.bar, y, z)` rather than `theta.hat` to estimate bias.

The logic is that that the Monte Carlo errors in `sd(theta.star)` and `rratio(p.bar, y, z)` tend to be in the same direction and cancel to some degree, giving an improved estimator.

• This method of expressing estimators in resampling form is an important bootstrap technique, which will be used again for the ABC better bootstrap confidence interval technique.