Statistics 3011 (Geyer and Jones, Spring 2006) Examples:
Performance of Confidence Intervals for Proportions

Performance
Large Sample
Plus Four
Wilson
prop.test
Clopper-Pearson

Performance

By performance we mean the actual confidence level as opposed to the nominal or advertised confidence level.

Unlike confidence intervals involving continuous data confidence intervals involving discrete data are not exact under any assumptions. This is a simple consequence of discreteness. The actual confidence level must be a discontinuous function of the true unknown parameter value.

This has the unfortunate consequence that there is (and can be) no universally agreed upon recipe for making confidence intervals in such situations.

The textbook considers two recipes for a one-sample confidence interval for the population proportion. R implements two others. We look at all four plus one more here.

Large Sample

These are the intervals

phat ± z* √phat (1 &minus phat) ⁄ n

where phat is the sample proportion (usually a p with an actual hat on it, but you can't do that on the web).

These are the traditional intervals, based on math that goes back to 1750. They are discussed in all textbooks (pp. 475 ff. in ours) and are perfectly good for very large n, up in the thousands, and true (unknown) success probability p not near zero or one.

Their actual performance is abysmal for small n or for p sufficiently near zero or one (no matter how large n is, there are p small enough or large enough to make the performance abysmal).

R: n <- 10 conf.level <- 0.95 alpha <- 1 - conf.level ci <- matrix(NA, n + 1, 2) for (i in 1:(n + 1)) { phat <- (i - 1) / n conf.int <- phat + c(-1,1) * qnorm(1 - alpha / 2) * sqrt(phat * (1 - phat) / n) ci[i, ] <- conf.int } eps <- 1e-8 theta.ci <- sort(as.numeric(ci)) theta <- seq(0, 1, length = 1001) theta <- sort(c(theta, theta.ci - eps, theta.ci + eps)) theta <- theta[0 < theta & theta < 1] cover <- double(length(theta)) for (i in 1:length(cover)) { inies <- ci[ , 1] <= theta[i] & theta[i] <= ci[ , 2] cover[i] <- sum(as.numeric(inies) * dbinom(0:n, n, theta[i])) } plot(theta, 100 * cover, xlab = "true success probability", ylab = "actual confidence level", type = "l", cex = 1.5, cex.axis = 1.5, cex.lab = 1.5, lwd = 1.5, ylim = c(0, 100)) abline(h = 100 * (1 - alpha), lty = 2, lwd = 1.5) title(paste("Usual large sample (", 100 * (1 - alpha), "%, n = ", n, ")", sep = ""))

Try varying the sample size n and the confidence level conf.level.

Plus Four

These are the intervals

ptwiddle ± z* √ptwiddle (1 &minus ptwiddle) ⁄ (n + 4)

where ptwiddle is (x + 2) ⁄ (n + 4), where x is the number of successes and n is the sample size (usually a p with an actual tilde on it, but you can't do that on the web).

These are new, only proposed in 1998 as an approximation to the interval treated in the following section suitable for hand calculation by people befuddled by the quadratic equation.

They are discussed only in recent textbooks (pp. 478 ff. in ours) and are much better than the traditional intervals treated in the preceding section.

R: n <- 10 conf.level <- 0.95 alpha <- 1 - conf.level ci <- matrix(NA, n + 1, 2) for (i in 1:(n + 1)) { ptwiddle <- ((i - 1) + 2) / (n + 4) conf.int <- ptwiddle + c(-1,1) * qnorm(1 - alpha / 2) * sqrt(ptwiddle * (1 - ptwiddle) / (n + 4)) ci[i, ] <- conf.int } eps <- 1e-8 theta.ci <- sort(as.numeric(ci)) theta <- seq(0, 1, length = 1001) theta <- theta[-1] theta <- theta[-length(theta)] theta <- sort(c(theta, theta.ci - eps, theta.ci + eps)) theta <- theta[0 <= theta & theta <= 1] cover <- double(length(theta)) for (i in 1:length(cover)) { inies <- ci[ , 1] <= theta[i] & theta[i] <= ci[ , 2] cover[i] <- sum(as.numeric(inies) * dbinom(0:n, n, theta[i])) } plot(theta, 100 * cover, xlab = "true success probability", ylab = "actual confidence level", type = "l", cex = 1.5, cex.axis = 1.5, cex.lab = 1.5, lwd = 1.5) abline(h = 100 * (1 - alpha), lty = 2, lwd = 1.5) title(paste("Plus Four (", 100 * (1 - alpha), "%, n = ", n, ")", sep = ""))

Wilson

These are the intervals consisting of all p that satisfy the inequality

(phat − p) ⁄ √p (1 &minus p) ⁄ n < z*

where phat is the sample proportion (usually a p with an actual hat on it, but you can't do that on the web).

These are new (?), only proposed in 1927. Finding the p satisfying the inequality above involves solving a quadratic equation.

R: n <- 10 conf.level <- 0.95 alpha <- 1 - conf.level ci <- matrix(NA, n + 1, 2) for (i in 1:(n + 1)) { ci[i, ] <- prop.test(i - 1, n, conf.level = conf.level, correct = FALSE)$conf.int } eps <- 1e-8 theta.ci <- sort(as.numeric(ci)) theta <- seq(0, 1, length = 1001) theta <- sort(c(theta, theta.ci - eps, theta.ci + eps)) theta <- theta[0 <= theta & theta <= 1] cover <- double(length(theta)) for (i in 1:length(cover)) { inies <- ci[ , 1] <= theta[i] & theta[i] <= ci[ , 2] cover[i] <- sum(as.numeric(inies) * dbinom(0:n, n, theta[i])) } plot(theta, 100 * cover, xlab = "true success probability", ylab = "actual confidence level", type = "l", cex = 1.5, cex.axis = 1.5, cex.lab = 1.5, lwd = 1.5) abline(h = 100 * (1 - alpha), lty = 2, lwd = 1.5) title(paste("Wilson (", 100 * (1 - alpha), "%, n = ", n, ")", sep = ""))

`prop.test`

These intervals, calculated by the R function prop.test (on-line help) are like the intervals treated in the preceding section except that they add a continuity correction.

R: n <- 10 conf.level <- 0.95 alpha <- 1 - conf.level ci <- matrix(NA, n + 1, 2) for (i in 1:(n + 1)) { ci[i, ] <- prop.test(i - 1, n, conf.level = conf.level)$conf.int } eps <- 1e-8 theta.ci <- sort(as.numeric(ci)) theta <- seq(0, 1, length = 1001) theta <- sort(c(theta, theta.ci - eps, theta.ci + eps)) theta <- theta[0 <= theta & theta <= 1] cover <- double(length(theta)) for (i in 1:length(cover)) { inies <- ci[ , 1] <= theta[i] & theta[i] <= ci[ , 2] cover[i] <- sum(as.numeric(inies) * dbinom(0:n, n, theta[i])) } plot(theta, 100 * cover, xlab = "true success probability", ylab = "actual confidence level", type = "l", cex = 1.5, cex.axis = 1.5, cex.lab = 1.5, lwd = 1.5) abline(h = 100 * (1 - alpha), lty = 2, lwd = 1.5) title(paste("prop.test (", 100 * (1 - alpha), "%, n = ", n, ")", sep = ""))

Clopper-Pearson

These intervals, unlike the four preceding ones are conservative. Their actual confidence level is always at least their nominal confidence level.

They are calculated by the R function binom.test (on-line help).

R: n <- 10 conf.level <- 0.95 alpha <- 1 - conf.level ci <- matrix(NA, n + 1, 2) for (i in 1:(n + 1)) { ci[i, ] <- binom.test(i - 1, n, conf.level = conf.level)$conf.int } eps <- 1e-8 theta.ci <- sort(as.numeric(ci)) theta <- seq(0, 1, length = 1001) theta <- sort(c(theta, theta.ci - eps, theta.ci + eps)) theta <- theta[0 <= theta & theta <= 1] cover <- double(length(theta)) for (i in 1:length(cover)) { inies <- ci[ , 1] <= theta[i] & theta[i] <= ci[ , 2] cover[i] <- sum(as.numeric(inies) * dbinom(0:n, n, theta[i])) } plot(theta, 100 * cover, xlab = "true success probability", ylab = "actual confidence level", type = "l", cex = 1.5, cex.axis = 1.5, cex.lab = 1.5, lwd = 1.5, ylim = c(100 * (1 - alpha), 100)) abline(h = 100 * (1 - alpha), lty = 2, lwd = 1.5) title(paste("Clopper-Pearson (", 100 * (1 - alpha), "%, n = ", n, ")", sep = ""))

Statistics 3011 (Geyer and Jones, Spring 2006) Examples: Performance of Confidence Intervals for Proportions