University of Minnesota, Twin Cities School of Statistics Stat 3011 Rweb Textbook (Wild and Seber)
This distribution was discovered by W. S. Gosset, the chief statistician of the Guinness brewery in Dublin, Ireland. He discovered the t-distribution in order to deal with small samples arising in statistical quality control.
The brewery had a policy against employees publishing under their own names, thus he published is results about the t-distribution under the pen name "Student", and that name has become attached to the distribution.
If
is the sample mean and S the sample standard
deviation for a random sample of size n from a population with
mean
and standard deviation
,
define
From comparing item 1 and item 3, it is clear that the t-distribution is close to the Normal(0, 1) distribution if n is large. Hence the difference only matters when n is small. For this reason the t-distribution is sometimes called a "small sample" distribution, but that name is misleading in two ways
Question: What condition is required for T to have a t-distribution?Bad Answer: Small n. (Completely irrelevant, as explained above.)
Correct Answer: The population distribution is exactly normal.
Since we can only tell the difference between the t-distribution and the normal distribution when the sample size is small, we will use a small sample size.
The R code below simulates many (nsim
) random samples from
a normal distribution calculating both the z-statistic and the
t-statistic for each sample. The histogram of each statistic
is plotted along with the standard normal density curve (black) and
the Student t-distribution density curve for the appropriate
(n - 1) degrees of freedom (red). You should be able to see that
the histogram for the z-statistic is closer to the normal density
curve and the histogram for the t statistic is closer to the Student
t-distribution density curve.
One why to compare Student's t-distribution and the standard normal distribution is just to run the simulation in the preceding section with different sample sizes n.
Another way is just to plot the theoretical density curves for various t-distributions and the standard normal distribution (no simulation).
Doing probability and quantile look-up for the Student's t-distribution is exactly like similar problems for the normal distribution except for the differences (duh!)
pt
and qt
rather than pnorm
and qnorm
.
pt
and qt
functions have no
mean
and sd
arguments.
pt
and qt
functions do have a
df
(degrees of freedom) argument, which must be supplied.
The function
F(x) = pr(X <= x)is called the cumulative distribution function (CDF) of a probability model. It gives lower-tail probabilities.
The R function that gives the CDF of Student's t-distribution is
pt
.
There is, of course, a different t-distribution for each different degrees of freedom, so you have to specify the degrees of freedom as well as the endpoint of the interval.
For example,
calculates pr(T < 1.35), where T has the Student(6) distribution (Student's t-distribution with 6 degrees of freedom).pr(a < X < b) = F(b) - F(a)
Thus
the probability of an interval is the difference of the values of the CDF at the endpoints.
For example,
calculates pr(-2 < T < 2), where T has the Student(7) distribution (also calculated in the middle of p. 309 in Wild and Seber).Again as with the normal distribution the quantile function looks up quantiles (this is also the inverse CDF function). For example,
calculates the 0.05 quantile (also called the 5th percentile) of the Student(7) distribution.A so-called "critical value" of the t-distribution (or any other distribution, for that matter) is the point x such that the upper tail to the right of the point x has a specified probability, say p.
The lower tail to the left of x has, by the complement rule, probability 1 - p. So we can find the critical value by looking up the (1 - p)-th quantile.
For example, the first several values in the first line of Table 7.6.1 in Wild and Seber (p. 311) are looked up by
Note: Wild and Seber don't seem to use the term "critical value". They call them "percentage points". They must think that's clear, but I don't see how you keep from confusing them with percentiles. Lots of other books call them "critical values" and we'll follow the herd on this.
Of course, there isn't a lot of difference between quantiles and critical values. By the symmetry of the t-distribution one is just the negative of the other. That is,
have the same absolute value, just different signs.