\documentclass{article} \usepackage{indentfirst} \usepackage{amstext} \begin{document} \title{A Bootstrap Example} \author{Charles J. Geyer} \maketitle \section{The Problem} Suppose $X_1$, $X_2$, $\ldots$ are independent and identically distributed $\text{Gam}(\alpha, \lambda)$ and we want to estimate the shape parameter $\alpha$. The usual method of moments estimator is $$ \hat{\alpha}_n = \frac{\bar{x}_n^2}{s^2_n} $$ where $\bar{x}_n$ and $s^2_n$ are the sample mean and variance, respectively. The delta method gives the asymptotic distribution $$ \hat{\alpha}_n \approx \mathcal{N}\left(\alpha, \frac{2 \alpha (1 + \alpha)}{n} \right). $$ The variance stabilizing transformation defined by $$ g(\alpha) = \sqrt{2} \cdot \sinh^{-1}(\sqrt{\alpha}), $$ makes $g(\hat{\alpha}_n)$ asymptotically standard normal. We can make some test data as follows <>= set.seed(42) alpha.true <- 2.2 n <- 100 x <- rgamma(n, shape = alpha.true) @ \section{Asymptotic Theory Intervals} A 95\% confidence interval for $\alpha$ using the delta method is given by <>= conf.level <- 0.95 zcrit <- qnorm((1 + conf.level) / 2) est <- function(x) mean(x)^2 / var(x) sdfun <- function(x) { alpha.hat <- est(x) n <- length(x) return(sqrt(2 * alpha.hat * (1 + alpha.hat) / n)) } foo <- est(x) + c(-1,1) * zcrit * sdfun(x) foo bar <- foo @ This can be improved using variance stabilization as follows <>= varstab <- function(alpha) sqrt(2) * asinh(sqrt(alpha)) varstabinv <- function(beta) sinh(beta / sqrt(2))^2 foo <- varstabinv(varstab(est(x)) + c(-1,1) * zcrit / sqrt(n)) foo bar <- rbind(bar, foo) @ \section{Bootstrap Intervals} Bootstrap intervals are presented in (I think) the order of treatment in Efron and Tibshirani. \subsection{\protect{Bootstrap-$t$} Intervals} \subsubsection{Asymptotic Variance Supplied} Using the function \verb@boott@ with the asymptotic variance of the estimator supplied as a function \verb@sdfun@ so it can bootstrap an asymptotically pivotal quantity <>= library(bootstrap) perc <- c(1 - conf.level, 1 + conf.level) / 2 sdfun <- function(x, nbootsd, theta, ...) { alpha.hat <- est(x) n <- length(x) return(sqrt(2 * alpha.hat * (1 + alpha.hat) / n)) } nboot <- 1000 out.piv <- boott(x, est, perc = perc, sdfun = sdfun, nboott = nboot) out.piv$confpoints bar <- rbind(bar, out.piv$confpoints) @ The reason for the \verb@nboott@ argument is given in the on-line help for the \verb@boott@ command, which says about this argument \begin{quotation} The number of bootstrap samples used to estimate the distribution of the bootstrap T statistic. 200 is a bare minimum and 1000 or more is needed for reliable $\alpha \%$ confidence points, $\alpha > .95$ say. \end{quotation} The reason for the odd call signature \verb@function(x, nbootsd, theta, ...)@ for \verb@sdfun@ that adds some arguments we don't use is that's what the on-line help for the \verb@boott@ command specifies. \subsubsection{Asymptotic Variance Estimated by Double Bootstrap} Just like the preceding except that when \verb@sdfun@ is omitted it is calculated by double bootstrap <>= out <- boott(x, est, perc = perc, nboott = nboot) out$confpoints bar <- rbind(bar, out$confpoints) @ \subsubsection{Variance-Stabilizing Transformation by Double Bootstrap} An alternative use of the double bootstrap is to estimate an approximate variance-stabilizing transformation <>= out.vs <- boott(x, est, perc = perc, VS = TRUE, nboott = nboot) out.vs$confpoints bar <- rbind(bar, out.vs$confpoints) @ If we want to see the approximate variance-stabilizing transformation, it's in \verb@out.vs@. The R statements <>= plot(out.vs$theta, out.vs$g, xlab = expression(alpha), ylab = expression(g(alpha))) @ plot the graph of the function. \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Approximate Variance Stabilizing Transformation Calculated by the Double Bootstrap} \label{fig:boott} \end{figure} \subsection{Percentile Intervals} So-called ``percentile intervals'' are by far the simplest to use. Their endpoints are just quantiles of the distribution of $\theta^*$ the bootstrap analog of $\hat{\theta}$. Unlike all the other bootstrap intervals described here, these are \emph{not} second order accurate. <>= out.boot <- bootstrap(x, nboot, est) foo <- quantile(out.boot$thetastar, perc) foo bar <- rbind(bar, foo) @ \subsection{Alphabet Soup Intervals} These are very complicated. See Efron and Tibshirani (1994) for explanations. \subsubsection{\protect{$\text{BC}_a$}} <>= out.bca <- bcanon(x, nboot, est, alpha = perc) out.bca$confpoints bar <- rbind(bar, out.bca$confpoints[ , "bca point"]) @ \subsubsection{ABC} This one actually doesn't bootstrap. It calculates an analytic approximation to what the bootstrap should do to be second order accurate. It requires the estimating function be written in ``resampling form'' as a function with signature \verb@function(p, x)@ where \verb@x@ is the data and \verb@p@ is a probability vector the same length as the data. The idea is that the relationship of a bootstrap sample \verb@x.star@ to the original data \verb@x@ can be expressed as a probability vector \verb@p.star@ such that \verb@p.star[i]@ is the fraction of times \verb@x[i]@ occurs in \verb@x.star@. In general this is hard, for moments it is straightforward. <>= estfun <- function(p, x) { mu <- sum(p * x) sigsq <- sum(p * (x - mu)^2) * n / (n - 1) return(mu^2 / sigsq) } out.abc <- abcnon(x, estfun, alpha = perc) out.abc$limits bar <- rbind(bar, out.abc$limits[ , "abc"]) @ \section{Summary} Having saved all this junk, we can now make a table <>= dimnames(bar)[[1]] <- c("ordinary asymptotic", "variance-stabilized asymptotic", "bootstrap-$t$ (pivoting)", "bootstrap-$t$ (default)", "bootstrap-$t$ (variance-stabilized)", "percentile", "$\\text{BC}_a$", "ABC") bar @ The reason for the funny labels is to use them in \LaTeX\ with the R \verb@xtable@ command \begin{table}[h] \begin{center} <>= library(xtable) print(xtable(bar), floating = FALSE) @ \end{center} \end{table} \end{document}