Statistics 5102 (Geyer, Fall 2016) Examples: Coverage of Confidence Intervals

The course slides (Deck 2, Slide 93 and Slides 112–121) derive three different asymptotic confidence intervals for binomial data and mentions an exact one (that involves heavy computing and is not derived). This web page examines the exact performance of all four (plots coverage as a function of the parameter value).

As discussed in the course slides (Deck 2, Slide 92) the coverage probability considered as a function of the parameter can never be a constant function when the data have a discrete distribution. So even the exact confidence interval does not achieve exactly the nominal coverage for all parameter values but only at least the nominal coverage, so is perhaps better termed conservative-exact.

Usual Binomial Confidence Intervals

The usual confidence interval is the one shown on Slide 113, Deck 2 of the course slides. It is the one taught in all intro stats courses.

The following R code plots the coverage as a function of the parameter. The solid line is the graph of this function (the vertical sections are not, strictly speaking, part of the graph, but it is easiest to plot it this way). The dashed horizontal line is the nominal coverage level.

This confidence interval is also called the Wald interval because it is a special case of a general method proposed by Abraham Wald.

Score Binomial Confidence Intervals

The score confidence interval is the one shown on Slide 116, Deck 2 of the course slides. It is the computed by the R function prop.test (on-line help). This one is beginning to be taught in some intro stats courses.

This confidence interval is also called the Rao interval because it is a special case of a general method proposed by C. R. Rao.

The name score comes from the name score given the first derivative of the log likelihood function by R. A. Fisher.

Variance Stabilized Binomial Confidence Intervals

The confidence interval using the variance stabilizing transformation is the one shown on Slide 120, Deck 2 of the course slides.

Clopper-Pearson Binomial Confidence Intervals

The Clopper-Pearson confidence interval is not derived on the course slides. It is the computed by the R function binom.test (on-line help).

Modified Usual Binomial Confidence Intervals

Geyer (2009, Electronic Journal of Statistics 3, 259 289) proposed a simple modification that fixes the behavior of binomial confidence intervals near zero and one (and also applies to other distributions). For the binomial distribution it says that when we observe zero successes, the confidence interval should be from zero to 1 − α^{1 ⁄ n}, where coverage 1 − α is wanted and n is the sample size. And it says that when we observe all successes (n out of n), the confidence interval should be from α^{1 ⁄ n} to one.

In this section we use this modification when zero or n successes are observed and use the usual confidence interval (section Usual Binomial Confidence Intervals above) for other data.

Modified Variance Stabilized Binomial Confidence Intervals

We use the same modification used in the preceding section but now apply it to the Variance Stabilized Binomial Confidence Intervals described above.

Likelihood-Based Confidence Intervals

This section is way out of order. This web page was originally designed to go with deck 2 of the course notes, and did not include this section (which is not covered in deck 2). In fact, the procedure illustrated in this section is not covered at all in this course, although it almost is.

This section covers confidence intervals obtained by inverting the likelihood ratio test (deck 6, slides 32–39 of the course slides). If the likelihood ratio test compares models differing in dimension by one (so one has one more parameter than the other), the confidence interval obtained by inverting the test is what is illustrated here. These intervals are done by the R function confint, or at least it does them when applied to generalized linear models (since this is a generic function authors of R packages can make it do what they please for models fit by their packages).

Unfortunately, the R function confint.glm is broken when the data are x = 0 or x = n. So we have to write our own code. Annoying.

R statements n <- 20 conf.level <- 0.95 crit <- qchisq(conf.level, df = 1) # make endpoints once to speed up computing endpoints <- matrix(NA, nrow = n + 1, ncol = 2) for (i in 1:nrow(endpoints)) { x <- i - 1 phat <- x / n logl <- function(p) ifelse(x == 0, n * log(1 - p), ifelse(x == n, n * log(p), x * log(p) + (n - x) * log(1 - p))) tol <- sqrt(.Machine$double.eps) fred <- function(p) 2 * (logl(phat) - logl(p)) - crit if (phat == 0) { low <- 0 } else { low <- uniroot(fred, lower = 0, upper = phat, tol = tol)$root } if (phat == 1) { hig <- 1 } else { hig <- uniroot(fred, lower = phat, upper = 1, tol = tol)$root } endpoints[i, 1] <- low endpoints[i, 2] <- hig } colnames(endpoints) <- c("lower", "upper") rownames(endpoints) <- 0:n endpoints <- as.data.frame(endpoints) endpoints cover <- function(p) { stopifnot(is.numeric(p)) stopifnot(length(p) == 1) stopifnot(0 <= p & p <= 1) x <- 0:n inies <- as.numeric(endpoints$lower <= p & p <= endpoints$upper) fpx <- dbinom(x, n, p) sum(inies * fpx) } p <- seq(0.001, 0.999, 0.001) plot(p, vapply(p, cover, 0.5), type = "l", ylab = "coverage probability") abline(h = conf.level, lty = 2)

The likelihood ratio test is also called the Wilks test because its asymptotic distribution was first proved by S. S. Wilks. So this interval could also be called a Wilks interval.

Summary

Of the three confidence intervals with simple definitions, only the score intervals behave well for all values of the unknown parameter. But the usual intervals and the variance-stabilized intervals can be modified by patching up what they do when x = 0 or x = n is observed, as described in the sections Modified Usual Binomial Confidence Intervals and Modified Variance Stabilized Binomial Confidence Intervals above.

The Clopper-Pearson Binomial Confidence Intervals, of course, also have good performance, because they are defined to be that way (guaranteed conservative).

Similar sorts of plots and similar sorts of modifications can be cooked up for other confidence intervals, but we do not belabor the subject and leave it here.