Chapter 5 t-based Procedures
The standard methods for making inference about the mean of a population are based on the t-distribution, giving us t-tests and t-confidence intervals. There are also t-tests and t-confidence intervals for the difference of the means of two populations. All of these procedures are based on the assumption that the population, or populations, follow a normal distribution. One version of the two-sample procedures also assumes that the variances of the two populations are the same. Fortunately, these procedures also work pretty well for some non-normally distributed populations, with the accuracy of the results being better for larger sample sizes and/or populations that are not too non-normal (and the more non-normal the population is the larger the sample size needs to be).
All of these are addressed using the R function t.test()
.
5.1 One-sample procedures
Consider the cfcdae::Thermocouples
data set, which contains the difference in temperature at two places in an oven over 64
consecutive time periods.
One concern might be whether the temperature in the oven is consistent. If so, the mean difference would be zero, and we might want to test
whether the data are consistent with the null hypothesis of a zero mean. A second approach might be to assume that there is non-uniformity in the temperatures and, rather than testing, estimate the difference with a margin of error. These lead us to the t-test and t-confidence interval.
> temp.diff <- cfcdae::Thermocouples$temp.diff
> hist(temp.diff,main="Temperature Differences")

Figure 5.1: Histogram of Temperature Differences
qqnorm()
function. This should look roughly like a
straight line if the data follow the normal. The griddiness in the
plot reflects that the values have been rounded to the nearest
hundredth, and the lowest three values are a bit more extreme than what we would expect from normally distributed data.
> qqnorm(temp.diff,main="NPP of Differences")

Figure 5.2: Normal Probability Plot of Temperature Differences
t.test()
yields both the test and the confidence interval. The basic usage assumes a mean of zero and a 95%
confidence interval.
> t.test(temp.diff)
One Sample t-test
data: temp.diff
t = 1001.1, df = 63, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
3.134979 3.147521
sample estimates:
mean of x
3.14125
The t-statistic is over 1000, and the p-value is infinitesimal! Thus there is extremely strong evidence against the null hypothesis of 0 mean (this should be no surprise given that the data ranged from 3.05 to 3.20, with no data close to, or on either side of, 0). The confidence interval is from 3.135 to 3.148.
You can change the mean value in the null hypothesis as well as the coverage of the interval. Suppose that we want more coverage and that a great deal of historical data indicate the the mean difference should be 3.125; are these data consistent with the historical value?> t.test(temp.diff,mu=3.125,conf.level=.999)
One Sample t-test
data: temp.diff
t = 5.1787, df = 63, p-value = 2.493e-06
alternative hypothesis: true mean is not equal to 3.125
99.9 percent confidence interval:
3.130419 3.152081
sample estimates:
mean of x
3.14125
The p-value is less than \(10^{-5}\), which is pretty strong evidence that the mean of these data differs from the historical value.
5.2 Two-sample procedures
Two-sample procedures refer to making inference about two populations using samples from the two populations. Typically, we are making inference about the group means: Is there evidence that they are not equal? Is there evidence one is greater? What is a range of values for the difference of means that is consistent with the data?
Consider the data on breaking strength for notched and unnotched
boards, data set cfcdae::NotchedBoards
. We would like to
investigate the null hypothesis that unnotched boards of thickness .625 inch
have the same strength as boards of thickness .75 inch with a 1 inch
wide notch cut in the center to thickness .625 inch.
NotchedBoards
data from cfcdae
into your
current work space.
> data(NotchedBoards) # creates variable NotchedBoards
> unnotched <- NotchedBoards$strength[NotchedBoards$shape == "uniform"]
> notched <- NotchedBoards$strength[NotchedBoards$shap == "notched"]
> unnotched
[1] 243 229 305 395 210 311 289 269 282 399 222 331 369
> notched
[1] 215 202 273 292 253 247 350 246 352 398 267 331 342
> # boxplot(notched, unnotched) # most obvious version
> # boxplot(list(Notched=notched, Unnotched=unnotched)) # provides better labels
> boxplot(strength ~ shape, data=NotchedBoards) # formula version

Figure 5.3: Boxplots of board strengths
You can just give several data vectors as arguments to boxplot
,
you can (should) give them better labels,
or you can use a formula of the form response ~ groupings
.
The latter is easiest if you have many groups or your data come
in a data frame.
The two-sample t-test is the typical method used to do tests regarding
the means of two groups. In R, this is the t.test(x,y)
function. This command does a two-sample t-test
between the sets of data in x
and y
.
The confidence interval it generates is for the mean of x
minus the mean of y
.
There is also a “formula” version of t.test()
. The formula takes
the form of response ~ predictor
, where in our case the
predictor is a factor (grouping variable) with two levels. You get the same
results, and it’s a little less fuss if your data come from a data frame.
> t.test(unnotched, notched)
Welch Two Sample t-test
data: unnotched and notched
t = 0.27353, df = 23.911, p-value = 0.7868
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-43.31049 56.54126
sample estimates:
mean of x mean of y
296.4615 289.8462
> t.test(strength ~ shape, data=NotchedBoards)
Welch Two Sample t-test
data: strength by shape
t = -0.27353, df = 23.911, p-value = 0.7868
alternative hypothesis: true difference in means between group notched and group uniform is not equal to 0
95 percent confidence interval:
-56.54126 43.31049
sample estimates:
mean in group notched mean in group uniform
289.8462 296.4615
Note that by default R uses an unpooled estimate of variance (the
Welch version with fractional degrees of freedom), a two-sided
alternative, and produces a confidence interval
with 95% coverage. You can also get a pooled estimate of variance and/or upper or
lower alternatives (i.e., x
has greater mean or lesser mean)
and/or change the confidence level
by using the appropriate optional arguments.
The unpooled (Welch) version is generally the better option in typical two-sample use, because it is almost as good as the pooled version when the population variances are equal and is much better when the population variances differ. However, Analysis of Variance, which generalizes the t-test to multiple groups and to more complicated settings, is a generalization of the unpooled version.
Here we do the test with the option that forces the group variances to be equal and a 99.5% coverage level. Then we jump ahead to an Analysis of Variance approach just to show that for two groups its p-value agrees with the equal variances t-test (the F value is the square of the t). The p-values in both cases are large providing no evidence against the null of equal means.> # pooled is nearly identical to unpooled for these data
> t.test(unnotched, notched, var.equal=TRUE, conf.level=.995)
Two Sample t-test
data: unnotched and notched
t = 0.27353, df = 24, p-value = 0.7868
alternative hypothesis: true difference in means is not equal to 0
99.5 percent confidence interval:
-68.12963 81.36040
sample estimates:
mean of x mean of y
296.4615 289.8462
> anova(lm(strength~shape, data=NotchedBoards)) # preview
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
shape 1 284 284.5 0.0748 0.7868
Residuals 24 91249 3802.0
> t.test(unnotched, notched, alternative="greater")
Welch Two Sample t-test
data: unnotched and notched
t = 0.27353, df = 23.911, p-value = 0.3934
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-34.76902 Inf
sample estimates:
mean of x mean of y
296.4615 289.8462
5.3 Paired t-procedures
Sometimes we get two measurements on a subject under different circumstances, or perhaps we treat every experimental unit with two different treatments and get the responses for the two treatments for each unit. These data are paired, because we expect that the two responses for the same subject or unit to both be high or both be low. They are correlated with each other because they share some aspect of the subject or unit that causes the responses to be similar.
When working with paired data you must take that correlation into account. (If you treat the data as unpaired, you get a very inefficient analysis. For example, your confidence intervals will typically be much wider than necessary.) Generally speaking, this can be done by taking the difference between the two measurements for each subject or unit and then doing your analysis on the differences. Because the mean of the differences is the same as the difference of the means, you get inference about the same thing: the difference of means under the two conditions.
Will will demonstrate using the data cfcdae::RunStitch
. Each worker run stitches collars using two different
setups: the conventional setup and an ergonomic setup. The two
runs are made in random order for each worker, and the interest
is in any difference in average speed between the two setups.
> data(RunStitch)
> head(RunStitch)
Standard Ergonomic
1 4.90 3.87
2 4.50 4.54
3 4.86 4.60
4 5.57 5.27
5 4.62 5.59
6 4.65 4.61
> differences <- RunStitch[,"Standard"] - RunStitch[,"Ergonomic"]
> differences
[1] 1.03 -0.04 0.26 0.30 -0.97 0.04 -0.57 1.75 0.01 0.42 0.45 -0.80
[13] 0.39 0.25 0.18 0.95 -0.18 0.71 0.42 0.43 -0.48 -1.08 -0.57 1.10
[25] 0.27 -0.45 0.62 0.21 -0.21 0.82
> summary(differences)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.0800 -0.2025 0.2550 0.1753 0.4450 1.7500
> hist(differences, freq=FALSE)

Figure 5.4: Histogram of runstitch differences
> t.test(differences)
One Sample t-test
data: differences
t = 1.49, df = 29, p-value = 0.147
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.06532811 0.41599478
sample estimates:
mean of x
0.1753333
Should this be a one or two-sided alternative? That is a good question without an obvious answer. I could hypothesize that the employees will be faster with the standard because they are used to it, or I could hypothesize that they will be faster with the ergonomic, because it’s just better. I don’t know which is reasonable, so it is probably best to use a two-sided alternative that will check for either.
You
can also do the paired test using the two sets of responses
as for a two-sample setting but using the
paired=TRUE
argument. Results are the same as above.
> t.test(RunStitch$Standard, RunStitch$Ergonomic, paired=TRUE)
Paired t-test
data: RunStitch$Standard and RunStitch$Ergonomic
t = 1.49, df = 29, p-value = 0.147
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-0.06532811 0.41599478
sample estimates:
mean difference
0.1753333
> t.test(differences, mu=.5)
One Sample t-test
data: differences
t = -2.7591, df = 29, p-value = 0.009934
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
-0.06532811 0.41599478
sample estimates:
mean of x
0.1753333