By performance we mean the actual confidence
level as opposed to the nominal or
advertised confidence level.
Unlike confidence intervals involving continuous data
confidence intervals involving discrete data are not exact under any
assumptions. This is a simple consequence of discreteness.
The actual confidence level must be a discontinuous function of the true
unknown parameter value.
This has the unfortunate consequence that there is (and can be) no
universally agreed upon recipe for making confidence intervals in such
situations.
The textbook considers two recipes for a one-sample confidence interval
for the population proportion. R implements two others.
We look at all four plus one more here.
where phat is the sample proportion (usually a p with an
actual hat on it, but you can't do that on the web).
These are the traditional intervals, based on math that goes back to 1750.
They are discussed in all textbooks (pp. 475 ff. in ours) and
are perfectly good for very large n, up in the thousands,
and true (unknown) success probability p not near zero or one.
Their actual performance is abysmal for small n or for
p sufficiently near zero or one (no matter how large n
is, there are p small enough or large enough to make the performance
abysmal).
Try varying the sample size n and the confidence level
conf.level.
where ptwiddle is (x + 2) ⁄ (n + 4),
where x is the number of successes and n is the
sample size (usually a p with an
actual tilde on it, but you can't do that on the web).
These are new, only proposed in 1998 as an approximation to the
interval treated in the following section
suitable for hand calculation by people befuddled by the quadratic equation.