Chapter 22 Standard Analysis of Linear Contrasts
Linear contrasts are extremely important in the analysis of experimental data, sufficiently important that they have been implemented in several packages. Here we will investigate:
linear.contrast()
from thecfcdae
packagecontrast
from theemmeans
packageglht()
from themultcomp
package
Note that several other R packages implement contrasts in various ways.
22.1 Preliminaries
All of the functions to do contrasts work on a linear model with one or more factor variables as predictors. So let’s set up a couple of model fits using the rat liver weight data and the resin lifetime data.> library(cfcdae)
> data(RatLiverWeight)
> fit.RLW <- lm(weight~diet,data=RatLiverWeight)
> coef(fit.RLW)
(Intercept) diet1 diet2 diet3
3.71163690 0.03407738 -0.13163690 -0.11330357
> data(ResinLifetimes)
> fit.RL <- lm(logTime ~ temp, data=ResinLifetimes)
> coef(fit.RL)
(Intercept) temp1 temp2 temp3 temp4
1.43794048 0.49455952 0.19080952 -0.06044048 -0.24365476
We will investigate the following contrasts for the rat liver weights:
- (1/3,1/3,1/3,-1) This compares the average response of the first three treatments (manufacturer 1) to the average response of the fourth treatment (manufacturer 2).
- (1,-.5,-.5,0) Within manufacturer 1, this compares the average response of the standard rations to the response of the premium ration.
- (1,0,0,-1), (0,1,0,-1), (0,0,1,-1) These individually compare each ration from manufacturer 1 to the ration from manufacturer 2.
We will investigate the following contrasts for the resin lifetime data:
- (-1,1,0,0,0) Compare the lifetimes for the first two levels of temperature. This is an example of a pairwise comparison, which simply compares two treatments.
- (2,-1,-2,-1,2) If there is a quadratic effect of temperature, this contrast should be non-zero.
- (1,-4,6,-4,1) If there is a quartic effect of temperature, this contrast should be non-zero.
Note that the coefficients for the quadratic and quartic contrasts would yield sums of squares equal to the corresponding orthogonal polynomials if the treatments were equally spaced and all the \(n_i\) were equal. Those conditions are not met, so these contrasts are just approximations of the true quadratic and quartic contrasts.
22.2 Using linear.contrast()
linear.contrast()
(from package cfcdae
)
computes linear contrasts of treatment effects. Its first argument is the
linear model object, the second is the variable (factor)
for which you want to make a contrast (you must specify it even if there is only one factor in the model), and the third is the contrast
coefficients. There must be one coefficient for every level of the
factor and the coefficients must add to 0.
The value of linear.contrast()
is an estimate of the contrast,
its standard error, t-value, p-value for testing the null hypothesis that the
contrast is zero with two-sided alternative, and a confidence interval.
> linear.contrast(fit.RLW,diet,c(1/3,1/3,1/3,-1))
estimates se t-value p-value lower-ci upper-ci
1 -0.2811508 0.08467399 -3.320392 0.00276231 -0.4555401 -0.1067615
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,c(1,-.5,-.5,0))
estimates se t-value p-value lower-ci upper-ci
1 0.1565476 0.09448756 1.656807 0.1100575 -0.03805315 0.3511484
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,c(1,0,0,-1))
estimates se t-value p-value lower-ci upper-ci
1 -0.1767857 0.1052754 -1.679269 0.1055566 -0.3936044 0.04003301
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,c(0,1,0,-1))
estimates se t-value p-value lower-ci upper-ci
1 -0.3425 0.1017057 -3.36756 0.002457339 -0.5519668 -0.1330332
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,c(0,0,1,-1))
estimates se t-value p-value lower-ci upper-ci
1 -0.3241667 0.1098547 -2.950867 0.00679258 -0.5504167 -0.09791667
attr(,"class")
[1] "linear.contrast"
linear.contrast()
if you put the coefficents in a matrix, with each column of the matrix
corresponding to a contrast. Here cbind()
joins vectors (as columns)
into a matrix.
> mat.cfs <- cbind(c(1,0,0,-1), c(0,1,0,-1), c(0,0,1,-1))
> mat.cfs
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
[4,] -1 -1 -1
> linear.contrast(fit.RLW,diet,mat.cfs)
estimates se t-value p-value lower-ci upper-ci
1 -0.1767857 0.1052754 -1.679269 0.105556638 -0.3936044 0.04003301
2 -0.3425000 0.1017057 -3.367560 0.002457339 -0.5519668 -0.13303321
3 -0.3241667 0.1098547 -2.950867 0.006792580 -0.5504167 -0.09791667
attr(,"class")
[1] "linear.contrast"
linear.contrast()
to test the joint
null hypothesis that all of the contrasts are simultaneously
zero against the alternative that one or more of them is non-zero
by using the optional joint=
argument.
For the set of contrasts in mat.cfs
, all of the contrasts being zero
implies all the means are the same (single mean model); allowing them to
be non-zero implies that the means can all be different (separate means model).
Thus testing these three contrasts jointly is equivalent to comparing
the single mean and separate means models.
> linear.contrast(fit.RLW,diet,mat.cfs,joint=TRUE)
$estimates
estimates se t-value p-value lower-ci upper-ci
1 -0.1767857 0.1052754 -1.679269 0.105556638 -0.3936044 0.04003301
2 -0.3425000 0.1017057 -3.367560 0.002457339 -0.5519668 -0.13303321
3 -0.3241667 0.1098547 -2.950867 0.006792580 -0.5504167 -0.09791667
$Ftest
F df1 df2 p-value
4.658146 3 25 0.01015794
attr(,"class")
[1] "linear.contrast"
> anova(fit.RLW)
Analysis of Variance Table
Response: weight
Df Sum Sq Mean Sq F value Pr(>F)
diet 3 0.57821 0.192736 4.6581 0.01016 *
Residuals 25 1.03440 0.041376
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linear.contrast()
will figure that out
for any joint test. Here the last contrast is the average of the first three.
> mat2.cfs <- cbind(mat.cfs,c(1/3,1/3,1/3,-1))
> mat2.cfs
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0.3333333
[2,] 0 1 0 0.3333333
[3,] 0 0 1 0.3333333
[4,] -1 -1 -1 -1.0000000
> linear.contrast(fit.RLW,diet,mat2.cfs,joint=TRUE)$Ftest
F df1 df2 p-value
4.658146 3 25 0.01015794
> rl.cfs <- cbind(c(-1,1,0,0,0), c(2,-1,-2,-1,2), c(1,-4,6,-4,1))
> colnames(rl.cfs) <- c("1 vs 2", "quad", "quartic")
> rl.cfs
1 vs 2 quad quartic
[1,] -1 2 1
[2,] 1 -1 -4
[3,] 0 -2 6
[4,] 0 -1 -4
[5,] 0 2 1
> linear.contrast(fit.RL,temp,rl.cfs)
estimates se t-value p-value lower-ci upper-ci
1 -0.30375000 0.04790063 -6.3412521 4.056402e-07 -0.4013204 -0.2061796
2 0.40029762 0.13324726 3.0041714 5.139664e-03 0.1288818 0.6717134
3 -0.03797619 0.28863670 -0.1315709 8.961475e-01 -0.6259099 0.5499575
attr(,"class")
[1] "linear.contrast"
For comparison, recall that R can use the parameterization
where the first treatment effect is zero. That means that the
second treatment effect would be the level 2 mean minus the
level 1 mean. That should be the same as the contrast we just
did.
Here we force the “first effect 0” via the contrast=
argument,
and then see the level 2 treatment effect in the summary
matching what we saw in our contrast above.
> fit.RL.trt1 <- lm(logTime~temp,data=ResinLifetimes,
+ contrasts=list(temp=contr.treatment))
> summary(fit.RL.trt1)
Call:
lm(formula = logTime ~ temp, data = ResinLifetimes, contrasts = list(temp = contr.treatment))
Residuals:
Min 1Q Median 3Q Max
-0.22667 -0.03667 0.00250 0.03125 0.20333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.93250 0.03387 57.055 < 2e-16 ***
temp2 -0.30375 0.04790 -6.341 4.06e-07 ***
temp3 -0.55500 0.04790 -11.586 5.49e-13 ***
temp4 -0.73821 0.04958 -14.889 6.13e-16 ***
temp5 -0.87583 0.05174 -16.928 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0958 on 32 degrees of freedom
Multiple R-squared: 0.9233, Adjusted R-squared: 0.9138
F-statistic: 96.36 on 4 and 32 DF, p-value: < 2.2e-16
linear.contrast()
has a number of other
arguments that you might find useful. For example, if you give the optional
argument allpairs=TRUE
, then you don’t need to give your own coefficients,
and the function will take differences of all pairs. If you want to compare all
treatments to a particular treatment, say treatment k, then you can use
the optional argument controlpairs=k
, and again you don’t need to
specify your own contrasts. You can also change the confidence level
on the intervals with the confidence=x
optional argument.
> linear.contrast(fit.RLW,diet,allpairs=TRUE,confidence=.99)
estimates se t-value p-value lower-ci upper-ci
1 - 2 0.16571429 0.1052754 1.5741028 0.128035240 -0.1277341 0.45916268
1 - 3 0.14738095 0.1131676 1.3023241 0.204676468 -0.1680666 0.46282850
1 - 4 -0.17678571 0.1052754 -1.6792691 0.105556638 -0.4702341 0.11666268
2 - 3 -0.01833333 0.1098547 -0.1668871 0.868801395 -0.3245463 0.28787960
2 - 4 -0.34250000 0.1017057 -3.3675598 0.002457339 -0.6259981 -0.05900191
3 - 4 -0.32416667 0.1098547 -2.9508675 0.006792580 -0.6303796 -0.01795374
attr(,"class")
[1] "linear.contrast"
> linear.contrast(fit.RLW,diet,controlpairs=4,confidence=.99)
estimates se t-value p-value lower-ci upper-ci
1 - 4 -0.1767857 0.1052754 -1.679269 0.105556638 -0.4702341 0.11666268
2 - 4 -0.3425000 0.1017057 -3.367560 0.002457339 -0.6259981 -0.05900191
3 - 4 -0.3241667 0.1098547 -2.950867 0.006792580 -0.6303796 -0.01795374
attr(,"class")
[1] "linear.contrast"
22.3 Using emmeans::contrast()
emmeans
is short for estimated marginal means, sometimes called least
squares means. You first fit the linear model, then you fit the estimated
marginal means (emm) object, then you do the contrast(s). Needing an extra step
is a bit of a bother, but the compensation is that the emm object is
useful in its own right, and contrast()
applied to the emm object
is more capable than linear.contrast()
.
You use emmeans()
to create the emm object. The first argument is the
fitted linear model, and the second argument is the right hand side of a model
giving the factor (margin) over which to compute means. This will make
much more sense when we have more than one factor. Simply printing
the emm object gives you the estimated treatment means along with
the SE and a confidence interval.
> library(emmeans) # you may also need to install emmeans
> RLW.emm <- emmeans(fit.RLW,~diet)
> RLW.emm
diet emmean SE df lower.CL upper.CL
1 3.75 0.0769 25 3.59 3.90
2 3.58 0.0719 25 3.43 3.73
3 3.60 0.0830 25 3.43 3.77
4 3.92 0.0719 25 3.77 4.07
Confidence level used: 0.95
> RL.emm <- emmeans(fit.RL,~temp)
> RL.emm
temp emmean SE df lower.CL upper.CL
175 1.93 0.0339 32 1.864 2.00
194 1.63 0.0339 32 1.560 1.70
213 1.38 0.0339 32 1.309 1.45
231 1.19 0.0362 32 1.121 1.27
250 1.06 0.0391 32 0.977 1.14
Confidence level used: 0.95
contrast()
will include column labels when
labeling the contrasts.
> contrast(RLW.emm,list(diet=mat.cfs))
contrast estimate SE df t.ratio p.value
diet.1 -0.177 0.105 25 -1.679 0.1056
diet.2 -0.343 0.102 25 -3.368 0.0025
diet.3 -0.324 0.110 25 -2.951 0.0068
> contrast(RL.emm,list(temp=rl.cfs))
contrast estimate SE df t.ratio p.value
temp.1 vs 2 -0.304 0.0479 32 -6.341 <.0001
temp.quad 0.400 0.1332 32 3.004 0.0051
temp.quartic -0.038 0.2886 32 -0.132 0.8961
> contrast(RLW.emm,method="pairwise",adjust="none")
contrast estimate SE df t.ratio p.value
diet1 - diet2 0.1657 0.105 25 1.574 0.1280
diet1 - diet3 0.1474 0.113 25 1.302 0.2047
diet1 - diet4 -0.1768 0.105 25 -1.679 0.1056
diet2 - diet3 -0.0183 0.110 25 -0.167 0.8688
diet2 - diet4 -0.3425 0.102 25 -3.368 0.0025
diet3 - diet4 -0.3242 0.110 25 -2.951 0.0068
> contrast(RLW.emm,method="trt.vs.ctrl1",ref=4,adjust="none")
contrast estimate SE df t.ratio p.value
diet1 - diet4 -0.177 0.105 25 -1.679 0.1056
diet2 - diet4 -0.343 0.102 25 -3.368 0.0025
diet3 - diet4 -0.324 0.110 25 -2.951 0.0068
The trt.vs.ctrl1
with ref=4
says to compare each treatment to the
fourth treatment. adjust="none"
says to make no adjustment
to the p-values for multiple comparisons.
22.4 Using multcomp::glht()
glht()
stands for general linear hypothesis test. Basically it
does inference for linear combinations of parameters. Contrasts are
one form of linear combination, so we can use glht()
.
To use glht()
you must specify a linear combination of parameters,
a null value, and an alternative. The defaults for the null and
alternative are 0 and two-sided, and we’ll just use those.
The linear combinations to use with glht()
depend on the parameterization.
In our rat liver weight example, the coefficients are the overall mean and the
effects for treatments 1, 2, and 3. We are using the “sum to 0” parameterization,
so the effect for treatment 4 is minus the sum of the effects for treatments 1, 2, and 3.
As a linear combination of coefficients, \(\alpha_1\) uses (0,1,0,0), and \(\alpha_4\)
uses (0,-1,-1,-1). To get the difference of treatment 1 and treatment 4 (1 minus 4)
we would need to use (0,1,0,0)-(0,-1,-1,-1)=(0,2,1,1). Similarly, 2 versus 4 and
3 versus 4 are (0,1,2,1) and (0,1,1,2).
If you use a different parameterization in your model, you need to use
different linear combinations.
glht()
the linear coefficients must be the rows of a matrix.
This output ought to look pretty familiar by now.
> library(multcomp)
Loading required package: mvtnorm
Loading required package: survival
Loading required package: TH.data
Loading required package: MASS
Attaching package: 'TH.data'
The following object is masked from 'package:MASS':
geyser
> glht.out1 <- glht(fit.RLW,t(c(0,2,1,1)))
> summary(glht.out1)
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = weight ~ diet, data = RatLiverWeight)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
1 == 0 -0.1768 0.1053 -1.679 0.106
(Adjusted p values reported -- single-step method)
> glht.out2 <- glht(fit.RLW,t(c(0,1,2,1)))
> summary(glht.out2)
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = weight ~ diet, data = RatLiverWeight)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
1 == 0 -0.3425 0.1017 -3.368 0.00246 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
What linear.contrast()
does is essentially figure out the modified coefficients
for you, and then do something very much like what glht()
does.
You can use a matrix with more than one row, but glht()
adjusts p-values
in a way that we have not studied. You can also have glht()
do all pairwise
comparisons, but it also wants to make p-value adjustments we have not yet
discussed.