University of Minnesota, Twin Cities School of Statistics Stat 3011 Rweb Textbook (Wild and Seber)

Stat 3011 (Geyer) In-Class Examples (The Bogosity of Student's t-Distribution)

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions (that's already done for you). You do not have to select a dataset (that's already done for you).

Bogosity

From the distribution theory section of the t-distribution page

Question: What condition is required for T to have a t-distribution?
Bad Answer: Small n. (Completely irrelevant, as explained above.)
Correct Answer: The population distribution is exactly normal.

from which it follows

Unless the population distribution is exactly normal, the t-distribution is bogus.

Of course for large n (large sample size) T is almost the same as Z and both are approximately normal regardless of the population distribution. So this is a small n issue. But as the "bad answer" above makes clear, small n is not enough

Use of the t-distribution assumes a normal population (implicitly if not explicitly).

Bimodal Skewed Population Distribution

The simulation below makes a random sample of size n from the bimodal skewed distribution used on the CLT page and calculates the sample mean and variance. It does this repeatedly nsim times. Each time it calculates the t and z confidence intervals with nominal 95% coverage.

Of course, neither interval has the nominal coverage, even approximately. The t interval needs the population to be exactly normal, which it is not. The z interval needs the sample size to be large, which it is not.

Averaging over the simulations, we find out the true coverage probability of each interval.

What you can learn from the simulation

If you change p to various values between 0.5 (which gives a symmetric bimodal distribution and 0.0 (the closer p is to zero the more skewed the bimodal distribution is), the coverage gets worse and worse. For p between 0.45 and 0.5 the t interval actually has more than its nominal coverage (and the z doesn't) but as the distribution gets more and more skewed, the actual coverage falls well below the nominal coverage.
If you increase the sample size n everything gets closer to the normal distribution (the t-distribution and the actual sampling distribution of T) and so the actual coverage of both intervals gets closer to their nominal coverage.
If you increase the simulation size nsim this decreases the simulation variablity, but nsim = 10000 should be large enough. You don't really need to mess with this.