Statistics 5101 (Geyer, Spring 2022) Central Limit Theorem

Independent and Identically Distributed Gamma

This web page gives a few examples of the central limit theorem (CLT) in action.

We know the sum of gamma random variables is gamma. The CLT says when the number of terms in the sum is large, this gamma distribution should be approximately normal.

But how large the number of terms in the sum has to be depends on the actual gamma distribution chosen for the terms.

Change the assignments of alpha and n in the first two lines to experiment.

The larger alpha is, the less skewed the distribution of the individual terms is and the smaller n has to be to get good normal approximation. You should be able to see this for yourself, if you experiment.

Independent and Identically Distributed Bernoulli

We know the sum of Bernoulli random variables is binomial. The CLT says when the number of terms in the sum is large, this binomial distribution should be approximately normal. This shows there is no problem with discrete distributions converging in distribution to the (continuous) normal distribution

But how large the number of terms in the sum (the binomial sample size) has to be depends on the actual Bernoulli distribution chosen for the terms.

There is a rule of thumb taught in many intro statistics courses that the normal approximation is OK if n p ≥ 5 and n (1 − p) ≥ 5. But, of course, this is, strictly speaking, nonsense. There is no sharp borderline between bad and good normal approximation. It all depends on what probabilities you are calculating and how much accuracy you want.

R statements p <- 1 / 3 n <- 10 # see help(hist) for definition of histogram object object <- structure(list( breaks = 0:2 - 1/2, counts = rep(1, 2), density = dbinom(0:1, 1, p), mids = 0:1 + 1 / 2, xname = "x", equidist = TRUE), class = "histogram") main <- "PMF of individual terms" plot(object, main = main, freq = FALSE) # trim domain to part with appreciable probability cutoff <- 1e-4 xx <- seq(0, n) lower <- pbinom(xx, n, p) xx <- xx[lower > cutoff / 2] upper <- pbinom(xx - 1, n, p, lower.tail = FALSE) xx <- xx[upper > cutoff / 2] # see help(hist) for definition of histogram object object <- structure(list( breaks = c(xx - 1/2, max(xx) + 1/2), counts = rep(1, length(xx) + 1), density = dbinom(xx, n, p), mids = xx + 1 / 2, xname = "x", equidist = TRUE), class = "histogram") main <- paste("PMF of sum (black)", "and PDF of normal approx. (red)") plot(object, main = main, freq = FALSE) curve(dnorm(x, mean = n * p, sd = sqrt(n * p * (1 - p))), col = "red", add = TRUE)

Change the assignments of p and n in the first two lines to experiment.

The closer p is to 1 ⁄ 2, the less skewed the distribution of the individual terms is and the smaller n has to be to get good normal approximation. You should be able to see this for yourself, if you experiment.

Independent and Identically Distributed Bernoulli Mixture of Normal

In this example, we use a bimodal distribution for the individual terms. Since there are no brand name bimodal distributions, we make one up, the distribution of X + Y when X has the Bernoulli distribution with success probability p and Y has the normal distribution with mean zero and variance σ².

The distribution of the sum of n IID random variables having this distribution is the distribution of X + Y when X has the Binomial(n, p) distribution and Y has the Normal(0, n σ²) distribution.

R statements p <- 1 / 2 sigma <- 0.1 n <- 10 cutoff <- 1e-3 dfoo <- function(x) { (1 - p) * dnorm(x, 0, sigma) + p * dnorm(x, 1, sigma) } pfoo <- function(x) { (1 - p) * pnorm(x, 0, sigma) + p * pnorm(x, 1, sigma) } qfoo <- function(x) { lower <- 0 upper <- 1 k <- 1 repeat { if (pfoo(lower) < x) break lower <- lower - k k <- k + 1 } k <- 1 repeat { if (pfoo(upper) > x) break upper <- upper + k k <- k + 1 } phred <- function(y) pfoo(y) - x out <- uniroot(phred, lower = lower, upper = upper) out$root } from <- qfoo(cutoff / 2) to <- qfoo(1 - cutoff / 2) curve(dfoo(x), from = from, to = to, ylab = "f(x)", xlab = "x", main = "PDF of individual terms") xx <- seq(0, n) pp <- dbinom(xx, n, p) dfoo <- function(x) { yy <- rep(NA, length(x)) for (i in seq(along = x)) yy[i] <- sum(pp * dnorm(x[i], xx, sqrt(n) * sigma)) return(yy) } pfoo <- function(x) sum(pp * pnorm(x, xx, sqrt(n) * sigma)) from <- qfoo(cutoff / 2) to <- qfoo(1 - cutoff / 2) curve(dfoo(x), from = from, to = to, ylab = "f(x)", xlab = "x", n = 5101, main = "PDF of sum (black) and normal approx. (red)") curve(dnorm(x, mean = n * p, sd = sqrt(n * p * (1 - p) + n * sigma^2)), col = "red", add = TRUE)

Change the assignments of p, sigma, and n in the first three lines to experiment.

The closer p is to one-half, the less skewed the distribution of the individual terms is and the smaller n has to be to get good normal approximation. You should be able to see this for yourself, if you experiment.

The wiggliness caused by small sigma is less important.

Skewness

The Berry-Esseen Theorem says the rate of convergence in the central limit theorem is controlled by skewness. Every other aspect of the distribution of the individual terms has less importance. This is why it is the very skewed examples above have the worst approximation.

Statistics 5101 (Geyer, Spring 2022) Central Limit Theorem

Independent and Identically Distributed Gamma

Independent and Identically Distributed Bernoulli

Independent and Identically Distributed Bernoulli Mixture of Normal

Skewness

Navigation

Contents