Chapter 47 Covariates
47.1 Thread data
These are data from Montgomery. We want to assess the strength of threads made by three different machines. Each thread is made from a batch of cotton, and some batches tend to form thicker thread than other batches. There’s no way to know how thick it will be until you make it. Regardless of how the machines may affect thread strength, thicker threads are stronger. Thus we record diameter as well as strength, with diameter as a covariate.
> library(cfcdae)
> library(car)
> strength<-c(36,41,39,42,49,40,48,39,45,44,35,37,42,34,32)
> diameter<-c(20,25,24,25,32,22,28,22,30,28,21,23,26,21,15)
> machine<-factor(rep(1:3,each=5))
47.2 Comparing machines
From the plot, it is fairly clear here that thicker threads are stronger. It is not so clear whether one machine is stronger than another; machine 3 seems a little low, machine 2 may be a bit high.
> plot(diameter,strength,pch=as.character(machine))

Figure 47.1: Plot of strength against diameter, labeled by machine
Igonoring covariates, machine 2 seems to have more strength, and machine 3 seems low. Means verify this.
> boxplot(strength ~ machine)

Figure 47.2: Boxplots of strength by machine
> tapply(strength,machine,mean)
1 2 3
41.4 43.2 36.0
But machine 2 also seems to have larger diameters, and machine 3 smaller diameters. Means verify this.
> boxplot(diameter ~ machine)

Figure 47.3: Boxplots of diameter by machine
> tapply(diameter,machine,mean)
1 2 3
25.2 26.0 21.2
47.3 Basic Analysis of Covariance
Let’s begin with the basic, standard Analysis of Covariance. Then we will go back and situate that in a larger context.
ANCOVA asks whether adding treatment effects improves a model that already includes the covariate. This is treatments after covariates.
> cov.then.trt <- lm(strength~diameter+machine)
> anova(cov.then.trt)
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
diameter 1 305.130 305.130 119.9330 2.96e-07 ***
machine 2 13.284 6.642 2.6106 0.1181
Residuals 11 27.986 2.544
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is little evidence that machine affects strength after accounting for diameter effects. This is the analysis of covariance.
However, we really need to check assumptions. The residual plot looks a bit like the flopping fish.
> plot(cov.then.trt,which=1)

Figure 47.4: Residual plot for strength data
Box-Cox suggests a reciprocal (reciprocal of strength is weakness?), but the power 1 is just within the interval.
> boxCox(cov.then.trt)

Figure 47.5: Box-Cox plot for strength data
Residuals on the reciprocal scale do look a little better, but not
a lot better. To keep things simple, I’ll continue on the original
scale, but further exploration of a transformation might be
warranted. (Although I will point out that the range of strength
is from 32 to 49, a factor of 1.5; typical power transformations
are not going to be able to do much over that range.)
To summarize, after accounting for the covariate, there is little evidence that the treatment makes any difference.
47.4 Variance Reduction
What if we had not used our covariate? Then we would have just run a model using only machine. The MSE in the model with the covariate is 2.54, but the MSE in the model without the covariate is 17.2. That is a huge amount of variance reduction that we modeled away using the covariate.
> mach.only <- lm(strength~machine)
> anova(mach.only)
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
machine 2 140.4 70.200 4.0893 0.04423 *
Residuals 12 206.0 17.167
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(cov.then.trt)
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
diameter 1 305.130 305.130 119.9330 2.96e-07 ***
machine 2 13.284 6.642 2.6106 0.1181
Residuals 11 27.986 2.544
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the same way, the standard errors from the treatment effects in the covariate model are smaller by a factor of 2.5.
> summary(mach.only)
Call:
lm(formula = strength ~ machine)
Residuals:
Min 1Q Median 3Q Max
-5.4 -2.8 -0.4 1.4 7.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.200 1.070 37.578 8.11e-14 ***
machine1 1.200 1.513 0.793 0.4431
machine2 3.000 1.513 1.983 0.0707 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.143 on 12 degrees of freedom
Multiple R-squared: 0.4053, Adjusted R-squared: 0.3062
F-statistic: 4.089 on 2 and 12 DF, p-value: 0.04423
> summary(cov.then.trt)
Call:
lm(formula = strength ~ diameter + machine)
Residuals:
Min 1Q Median 3Q Max
-2.0160 -0.9586 -0.3841 0.9518 2.8920
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.1771 2.7830 6.172 6.99e-05 ***
diameter 0.9540 0.1140 8.365 4.26e-06 ***
machine1 0.1824 0.5950 0.307 0.765
machine2 1.2192 0.6201 1.966 0.075 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.595 on 11 degrees of freedom
Multiple R-squared: 0.9192, Adjusted R-squared: 0.8972
F-statistic: 41.72 on 3 and 11 DF, p-value: 2.665e-06
47.5 Covariate adjustment
Without the covariate, the estimated difference between machines 1 and 2 is 1.8; with the covariate the estimated difference is 1.04.
> linear.contrast(mach.only,machine,c(1,-1,0))
estimates se t-value p-value lower-ci upper-ci
1 -1.8 2.620433 -0.6869095 0.5051981 -7.509432 3.909432
attr(,"class")
[1] "linear.contrast"
> linear.contrast(cov.then.trt,machine,c(1,-1,0))
estimates se t-value p-value lower-ci upper-ci
1 -1.03681 1.012913 -1.023592 0.328012 -3.266217 1.192597
attr(,"class")
[1] "linear.contrast"
Make a data frame for prediction. We will predict in
all three levels of machine
; for the first three predictions
we use a common value of the covariate. For the last
three cases we use the means of the covariates
for each treatment.
> tapply(diameter,machine,mean)
1 2 3
25.2 26.0 21.2
> pred.frame <- data.frame(machine=factor(c(1,2,3,1,2,3)),diameter=c(25,25,25,25.2,26,21.2))
If we predict using the model that ignores the covariate, then the covariate has no effect on the prediction (duh!), and the predictions are the means by each treatment. Treatments 1 and 2 are predicted 1.8 units apart.
> predict(mach.only,pred.frame)
1 2 3 4 5 6
41.4 43.2 36.0 41.4 43.2 36.0
> tapply(strength,machine,mean)
1 2 3
41.4 43.2 36.0
On the other hand, use the model that includes the covariate. When you predict at the mean of each covariate, then the predictions are just the treatment means. But if you predict at a common value of the covariate, then the predictions differ by an amount in agreement with the coefficients in the covariate model.
> predict(cov.then.trt,pred.frame)
1 2 3 4 5 6
41.20920 42.24601 39.62515 41.40000 43.20000 36.00000
You can see the effect in this plot. The black lines are the covariate model fits. The vertical distance between the black lines represents the treatment effect in the covariate model. These differences are adjusted back to the same covariate.
The red circles and horizontal lines are what you get if you predict at each treatment’s average covariate. Then you get the unadjusted means in the vertical.
> plot(diameter,strength,pch=as.character(machine))
> cfs <- coefficients(cov.then.trt)
> abline(cfs[1]+cfs[3],cfs[2])
> abline(cfs[1]+cfs[4],cfs[2])
> abline(cfs[1]-cfs[3]-cfs[4],cfs[2])
> points(25.2,41.4,col="red")
> points(26,43.2,col="red")
> points(21.2,36,col="red")
> abline(h=36,col="red")
> abline(h=43.2,col="red")
> abline(h=41.4,col="red")

Figure 47.6: Strength data with model fits
47.6 What if treatments affect the covariates?
Ordinary ANCOVA does variance reduction and compares treatments at a common level of the covariate. What if the treatment changes the covariate? Then we might not want to compare at a common values of the covariate.
Some of the variability in the data can be explained by the treatment, some by the covariate, and some by either. Ordinary ANCOVA attributes the “either” portion to the covariate. We can change this so the “either” portion gets attributed to the treatment.
Adjusted covariates are the residuals from modeling the the covariate by the treatment. If you use adjusted covariates, then the “either” portion of the variation goes to the treatment.
Begin by getting the adjusted covariates, then fit the model using the adjusted covariate.
> diam.on.mach <- lm(diameter ~ machine)
> diam.adj <- residuals(diam.on.mach)
> cov.adj.trt <- lm(strength~diam.adj+machine)
The SS explained by this model is the same as when using the original covariate, but the SS for the treatment are what you would get for treatment if it were first in a Type 1 ANOVA.
In these data, using adjusted covariates leads to a highly significant treatment effect.
> anova(cov.adj.trt)
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
diam.adj 1 178.014 178.014 69.969 4.264e-06 ***
machine 2 140.400 70.200 27.593 5.170e-05 ***
Residuals 11 27.986 2.544
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(lm(strength~machine+diameter))
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
machine 2 140.400 70.200 27.593 5.170e-05 ***
diameter 1 178.014 178.014 69.969 4.264e-06 ***
Residuals 11 27.986 2.544
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The treatments coefficients are different, but the slope is the same.
> coefficients(cov.adj.trt)
(Intercept) diam.adj machine1 machine2
40.2000000 0.9539877 1.2000000 3.0000000
> coefficients(cov.then.trt)
(Intercept) diameter machine1 machine2
17.1770961 0.9539877 0.1824131 1.2192229
Now we plot using the adjusted covariates. The differences between the lines are the same as the differences between the raw treatment means.
> plot(diam.adj,strength,pch=as.character(machine))
> cfs <- coefficients(cov.adj.trt)
> abline(cfs[1]+cfs[3],cfs[2])
> abline(cfs[1]+cfs[4],cfs[2])
> abline(cfs[1]-cfs[3]-cfs[4],cfs[2])
> points(0,41.4,col="red")
> points(0,43.2,col="red")
> points(0,36,col="red")
> abline(h=36,col="red")
> abline(h=43.2,col="red")
> abline(h=41.4,col="red")

Figure 47.7: Strength data with adjusted covariate model fits
Moral: If you use the original covariate, then ANCOVA gives you covariate adjusted treatment means.
If you use the adjusted covariate, then ANCOVA gives you the unadjusted treatment means.
How do you tell if treatments affect covariates? Just look at the ANOVA for the model you used to get adjusted covariates.
Here there is no evidence that treatments affect covariates.
> anova(diam.on.mach)
Analysis of Variance Table
Response: diameter
Df Sum Sq Mean Sq F value Pr(>F)
machine 2 66.133 33.067 2.0286 0.1742
Residuals 12 195.600 16.300
47.7 Other covariate models
The standard covariate model we have been looking at is the “parallel lines” model, also called the “separate intercepts” model. The treatments have an additive effect, but the slope on the covariate is the same in all treatments.
Less common covariate models include:
Separate slopes In this model, there is a common intercept,
but each treatment group has its own slope. This model
is highly dependent on where 0 is on the covariate.
Separate lines In this model, each treatment has
its own slope and its own intercept.
You can compare single line, to parallel lines, to separate lines via ANOVA, because the models form a nested set.
You can compare single line, to separate slopes, to separate lines via ANOVA, because the models form a nested set.
Parallel lines and separate slopes cannot be compared using ANOVA (not nested), but you can compare via AIC.
Fit the models. Note that we also fit a separate slopes model where we shift the diameter by 25.
> single.line <- lm(strength~diameter)
> parallel.lines <- lm(strength~diameter+machine)
> separate.lines <- lm(strength~diameter*machine)
> separate.slopes <- lm(strength~1+diameter:machine)
> diameter.m25 <- lm(strength~1+I(diameter-25):machine)
The first comparison here is the normal analysis of covariance. The SS matches what we saw before, but the p-value printed is different, because it is using the MSE from the separate lines model and not the parallel lines model.
> anova(single.line,parallel.lines,separate.lines)
Analysis of Variance Table
Model 1: strength ~ diameter
Model 2: strength ~ diameter + machine
Model 3: strength ~ diameter * machine
Res.Df RSS Df Sum of Sq F Pr(>F)
1 13 41.270
2 11 27.986 2 13.2839 2.3675 0.1492
3 9 25.249 2 2.7372 0.4878 0.6293
Here is the comparison down the separate slopes side of the chain.
> anova(single.line,separate.slopes,separate.lines)
Analysis of Variance Table
Model 1: strength ~ diameter
Model 2: strength ~ 1 + diameter:machine
Model 3: strength ~ diameter * machine
Res.Df RSS Df Sum of Sq F Pr(>F)
1 13 41.270
2 11 27.913 2 13.3569 2.3806 0.1480
3 9 25.249 2 2.6642 0.4748 0.6367
Finally, note that changing the zero on the covariate changes the fit of the separate slopes model. This is not true from any of the other models.
> anova(separate.slopes)
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
diameter:machine 3 318.49 106.162 41.837 2.627e-06 ***
Residuals 11 27.91 2.538
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> anova(diameter.m25)
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
I(diameter - 25):machine 3 306.503 102.168 28.168 1.844e-05 ***
Residuals 11 39.897 3.627
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1