Chapter 33 Factorials

33.1 Barley data

Back to the barley sprouting data.

> library(cfcdae)
> data(SproutingBarley)
> head(SproutingBarley)
  weeks water sprouting weeks.z water.z
1     1     4        11       1       4
2     1     8         8       1       8
3     3     4         7       3       4
4     3     8         1       3       8
5     6     4         9       6       4
6     6     8         5       6       8

This is a two factor model. We want to fit both main effects and the two factor interaction. In R, interactions are indicated by using a colon, as in factor1:factor2.

> barleyfit1 <- lm(sprouting~water+weeks+water:weeks,data=SproutingBarley)
> anova(barleyfit1)
Analysis of Variance Table

Response: sprouting
            Df  Sum Sq Mean Sq F value   Pr(>F)    
water        1 1178.13 1178.13 19.7232 0.000251 ***
weeks        4 1321.13  330.28  5.5293 0.003645 ** 
water:weeks  4  208.87   52.22  0.8742 0.496726    
Residuals   20 1194.67   59.73                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sometimes you want to have a single factor that encompasses all of the combined leves. There is a join() function in conf.design that you can use to do that.

Note that in the anova the SS and df for ww are the sum of those for water, weeks, and water:weeks in the preceding anova.

> ww <- with(SproutingBarley,conf.design::join(water,weeks))
> ww
 [1] 4:1  8:1  4:3  8:3  4:6  8:6  4:9  8:9  4:12 8:12 4:1  8:1  4:3  8:3  4:6  8:6  4:9  8:9  4:12
[20] 8:12 4:1  8:1  4:3  8:3  4:6  8:6  4:9  8:9  4:12 8:12
Levels: 4:1  4:12 4:3  4:6  4:9  8:1  8:12 8:3  8:6  8:9 
> barleyfit2 <- lm(sprouting~ww,data=SproutingBarley)
> anova(barleyfit2)
Analysis of Variance Table

Response: sprouting
          Df Sum Sq Mean Sq F value   Pr(>F)   
ww         9 2708.1 300.904  5.0375 0.001269 **
Residuals 20 1194.7  59.733                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There are multiple short cuts for factorial models that can save you a lot of typing.

  1. A*B*C expands out into all main effects and interactions: A + B + C + A:B + A:C + B:C + A:B:C.
  2. (A+B+C)^k expands out into all of the main effects and interactions up to order k. Thuse (A+B+C)^2 is a short cut for A + B + C + A:B + A:C + B:C.
  3. You can subtract out terms. Thus A*B*C - A:B:C - A:C is the same as A + B + C + A:B + B:C.

Note: I have put in spaces in these models for clarity; the spaces are not actually needed.

Before we do any inference, we need to check our assumptions. We have non-constant variance.

> plot(barleyfit1,which=1)
Residual plot for barley data

Figure 33.1: Residual plot for barley data

Box-Cox suggests a square root, which should surprise no one as these are binomial data with smallish success probability.

> car::boxCox(barleyfit1)
Box-Cox plot for barley data

Figure 33.2: Box-Cox plot for barley data

After transforming, we see that the main effects are both pretty significant, but there is no evidence for interaction.

> barleyfit3 <- lm(sqrt(sprouting)~water*weeks,data=SproutingBarley)
> anova(barleyfit3)
Analysis of Variance Table

Response: sqrt(sprouting)
            Df  Sum Sq Mean Sq F value    Pr(>F)    
water        1 21.8930 21.8930 23.7606 9.177e-05 ***
weeks        4 21.8949  5.4737  5.9406  0.002555 ** 
water:weeks  4  2.2485  0.5621  0.6101  0.660139    
Residuals   20 18.4280  0.9214                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A side by side plot emphasizes that the main effects are large relative to residuals, but not so much the interaction.

> sidebyside(barleyfit3)
Side by side plot for transformed barley data

Figure 33.3: Side by side plot for transformed barley data

We can see the fitted effects using summary(). Note that there are many other effects that can be determined by the zero sum constraints we use.

> summary(barleyfit3)

Call:
lm(formula = sqrt(sprouting) ~ water * weeks, data = SproutingBarley)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.4250 -0.5001  0.1663  0.5928  1.4911 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    3.47148    0.17525  19.808 1.30e-14 ***
water1         0.85426    0.17525   4.874 9.18e-05 ***
weeks1        -0.96171    0.35050  -2.744   0.0125 *  
weeks2        -0.78037    0.35050  -2.226   0.0376 *  
weeks3         0.11369    0.35050   0.324   0.7490    
weeks4         0.19109    0.35050   0.545   0.5917    
water1:weeks1 -0.44200    0.35050  -1.261   0.2218    
water1:weeks2  0.04425    0.35050   0.126   0.9008    
water1:weeks3 -0.01444    0.35050  -0.041   0.9675    
water1:weeks4  0.42088    0.35050   1.201   0.2439    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9599 on 20 degrees of freedom
Multiple R-squared:  0.7141,    Adjusted R-squared:  0.5855 
F-statistic: 5.551 on 9 and 20 DF,  p-value: 0.0006986

If you want to see the full set of coefficients, it’s easier to use model.effects().

You need to enter the term exactly as it appears in the model, ie, you have to have the variables in the correct order, because model.effects() is too dumb to find the term otherwise.

> model.effects(barleyfit3,"weeks")
         1          3          6          9         12 
-0.9617085 -0.7803725  0.1136921  0.1910862  1.4373027 
> model.effects(barleyfit3,"water")
         4          8 
 0.8542634 -0.8542634 
> model.effects(barleyfit3,"water:weeks")
           1           3           6          9           12
4 -0.4419991  0.04424575 -0.01444493  0.4208793 -0.008681009
8  0.4419991 -0.04424575  0.01444493 -0.4208793  0.008681009
> model.effects(barleyfit3,"weeks:water")
Error in model.effects.default(barleyfit3, "weeks:water"): weeks:water is not an explicit term in the model

We can do contrasts, pairwise comparisons and so on. Because we have more than one factor in the model, the second argument identifying the factor to use makes more sense than it did when there was only one term in the model.

> linear.contrast(barleyfit3,weeks,c(-2,-1,0,1,2))
  estimates      se  t-value      p-value lower-ci upper-ci
1  5.769481 1.23922 4.655735 0.0001522892 3.184513 8.354449
attr(,"class")
[1] "linear.contrast"
> pairwise(barleyfit3,weeks,type="hsd")
                                      
Pairwise comparisons ( hsd ) of weeks 
            estimate signif diff     lower      upper
  1 - 3  -0.18133596    1.658362 -1.839698  1.4770265
  1 - 6  -1.07540057    1.658362 -2.733763  0.5829618
  1 - 9  -1.15279468    1.658362 -2.811157  0.5055677
* 1 - 12 -2.39901121    1.658362 -4.057374 -0.7406488
  3 - 6  -0.89406461    1.658362 -2.552427  0.7642978
  3 - 9  -0.97145871    1.658362 -2.629821  0.6869037
* 3 - 12 -2.21767525    1.658362 -3.876038 -0.5593128
  6 - 9  -0.07739411    1.658362 -1.735757  1.5809683
  6 - 12 -1.32361064    1.658362 -2.981973  0.3347518
  9 - 12 -1.24621654    1.658362 -2.904579  0.4121459

You can also get the same contrast in one variable computed separately for every level of a second variable by using the byvar argument. This is another way to look at interaction beyond interaction plots.

> with(SproutingBarley,linear.contrast(barleyfit3,weeks,c(-2,-1,0,1,2),
+                                      byvar=water))
  estimates       se  t-value      p-value  lower-ci  upper-ci
1  7.012751 1.752522 4.001519 0.0007010286 3.3570539 10.668448
2  4.526211 1.752522 2.582684 0.0177798490 0.8705145  8.181908
attr(,"class")
[1] "linear.contrast"

We can use emmeans() in several useful ways.

  1. We can get the fitted values.
  2. We can get marginal fitted values. Note that these can be misleading if you look at marginal means in the presence of an interaction.
  3. We can get a table of means even if the model includes just main effects.

Note that the first and third (full and additive) results are different.

> library(emmeans)
> emmeans(barleyfit3,~weeks:water)
 weeks water emmean    SE df lower.CL upper.CL
 1     4       2.92 0.554 20    1.766     4.08
 3     4       3.59 0.554 20    2.434     4.75
 6     4       4.42 0.554 20    3.269     5.58
 9     4       4.94 0.554 20    3.782     6.09
 12    4       5.75 0.554 20    4.598     6.91
 1     8       2.10 0.554 20    0.941     3.25
 3     8       1.79 0.554 20    0.637     2.95
 6     8       2.75 0.554 20    1.589     3.90
 9     8       2.39 0.554 20    1.231     3.54
 12    8       4.06 0.554 20    2.907     5.22

Results are given on the sqrt (not the response) scale. 
Confidence level used: 0.95 
> emmeans(barleyfit3,~weeks)
NOTE: Results may be misleading due to involvement in interactions
 weeks emmean    SE df lower.CL upper.CL
 1       2.51 0.392 20     1.69     3.33
 3       2.69 0.392 20     1.87     3.51
 6       3.59 0.392 20     2.77     4.40
 9       3.66 0.392 20     2.85     4.48
 12      4.91 0.392 20     4.09     5.73

Results are averaged over the levels of: water 
Results are given on the sqrt (not the response) scale. 
Confidence level used: 0.95 
> additive.fit <- lm(sqrt(sprouting)~weeks+water,data=SproutingBarley)
> emmeans(additive.fit,~weeks:water)
 weeks water emmean    SE df lower.CL upper.CL
 1     4       3.36 0.415 24    2.507     4.22
 3     4       3.55 0.415 24    2.689     4.40
 6     4       4.44 0.415 24    3.583     5.30
 9     4       4.52 0.415 24    3.660     5.37
 12    4       5.76 0.415 24    4.906     6.62
 1     8       1.66 0.415 24    0.799     2.51
 3     8       1.84 0.415 24    0.980     2.69
 6     8       2.73 0.415 24    1.874     3.59
 9     8       2.81 0.415 24    1.952     3.67
 12    8       4.05 0.415 24    3.198     4.91

Results are given on the sqrt (not the response) scale. 
Confidence level used: 0.95 

The effects package also has some utility. allEffects returns the table of means for any term in the model that is not contained in another term. In barleyfit3, that is the full interaction water:weeks. In additive.fit, that is the weeks margin and the water margin. Plotting effects contained in another term can be misleading.

> library(effects)
Loading required package: carData
lattice theme set by effectsTheme()
See ?effectsTheme for details.
> allEffects(barleyfit3)
 model: sqrt(sprouting) ~ water * weeks

 water*weeks effect
     weeks
water        1        3        6        9       12
    4 2.922038 3.589619 4.424993 4.937711 5.754367
    8 2.097510 1.792601 2.745356 2.387426 4.063203
> allEffects(additive.fit)
 model: sqrt(sprouting) ~ weeks + water

 weeks effect
weeks
       1        3        6        9       12 
2.509774 2.691110 3.585174 3.662569 4.908785 

 water effect
water
       4        8 
4.325746 2.617219 

You can plot the results of predictorEffects(). In the interactive model, we get two sets of plots: how water affects the response separately for every level of weeks, and how weeks affects the response separately for every level of water.

> plot(predictorEffects(barleyfit3,~water:weeks))
Plot of predictor effects for barley data with interaction

Figure 33.4: Plot of predictor effects for barley data with interaction

If you do the same thing with the additive model, it notices there is no interaction to handle and just gives you plots of each margin.

> plot(predictorEffects(additive.fit,~water:weeks))
Plot of predictor effects for barley data with additive fit

Figure 33.5: Plot of predictor effects for barley data with additive fit

33.2 Carbonwire data

These data are not in the book, so I read them in from a file using read.table() and then make the predictors into factors. The header=TRUE says that the column names are on the first row of the file.

> wire <- read.table("carbonwire.txt",header=TRUE)
> names(wire)
[1] "resist" "degas"  "temp"   "diff"  
> wire <- within(wire, {degas <- factor(degas); temp <- factor(temp);
+    diff <- factor(diff)})

Let’s begin by fitting the full model. We’ll need to check residuals, but it looks like anything involving degas is not significant, and the other three terms look highly significant.

> wire.fit <- lm(resist~degas*temp*diff,data=wire)
> anova(wire.fit)
Analysis of Variance Table

Response: resist
                Df Sum Sq Mean Sq  F value    Pr(>F)    
degas            1   0.47   0.467   1.5918    0.2125    
temp             2 410.69 205.346 699.6024 < 2.2e-16 ***
diff             2  80.54  40.270 137.1989 < 2.2e-16 ***
degas:temp       2   0.31   0.157   0.5342    0.5892    
degas:diff       2   0.70   0.351   1.1957    0.3104    
temp:diff        4  14.81   3.704  12.6177 2.566e-07 ***
degas:temp:diff  4   0.87   0.219   0.7450    0.5656    
Residuals       54  15.85   0.294                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

But we need to check those residuals. Constant variance looks fine.

> plot(wire.fit,which=1)
Residual plot of carbon wire data

Figure 33.6: Residual plot of carbon wire data

And normality is great. So no transformations!

> plot(wire.fit,which=2)
NPP of carbon wire data

Figure 33.7: NPP of carbon wire data

Side by side reinforces what we see in the anova.

> sidebyside(wire.fit)
Side by side plot of carbon wire data

Figure 33.8: Side by side plot of carbon wire data

Let’s look at that interaction. This plot makes temp-3/diff-2 look a little high.

> interactplot(temp,diff,resist,data=wire)
Carbonwire interaction plot; temp by diff

Figure 33.9: Carbonwire interaction plot; temp by diff

But with this view it looks more like temp-3/diff-1 is too low. It’s likely some mix of the two.

> interactplot(diff,temp,resist,data=wire)
Carbonwire interaction plot; diff by temp

Figure 33.10: Carbonwire interaction plot; diff by temp

33.3 Order of terms

Note that R will, by default, reorder your terms so that main effects come first, then 2fi, then 3fi, etc.

> anova(lm(resist~temp+diff+temp:diff+degas,data=wire))
Analysis of Variance Table

Response: resist
          Df Sum Sq Mean Sq  F value    Pr(>F)    
temp       2 410.69 205.346 717.6589 < 2.2e-16 ***
diff       2  80.54  40.270 140.7400 < 2.2e-16 ***
degas      1   0.47   0.467   1.6329    0.2061    
temp:diff  4  14.81   3.704  12.9434 1.014e-07 ***
Residuals 62  17.74   0.286                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You can force R to accept your order of terms, but it’s a little tedious. You use terms(,keep.order=TRUE) to set your order, then pass that to lm() instead of the model.

> anova(lm(terms(resist~temp+diff+temp:diff+degas,keep.order=TRUE),data=wire))
Analysis of Variance Table

Response: resist
          Df Sum Sq Mean Sq  F value    Pr(>F)    
temp       2 410.69 205.346 717.6589 < 2.2e-16 ***
diff       2  80.54  40.270 140.7400 < 2.2e-16 ***
temp:diff  4  14.81   3.704  12.9434 1.014e-07 ***
degas      1   0.47   0.467   1.6329    0.2061    
Residuals 62  17.74   0.286                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note: you can’t just choose any order you like. If you put an interaction in before terms that it includes, then the interaction will such up the SS and df of the lower order terms.

> anova(lm(terms(resist~temp+temp:diff+diff+degas,keep.order=TRUE),data=wire))
Analysis of Variance Table

Response: resist
          Df Sum Sq Mean Sq  F value Pr(>F)    
temp       2 410.69 205.346 717.6589 <2e-16 ***
temp:diff  6  95.35  15.892  55.5423 <2e-16 ***
degas      1   0.47   0.467   1.6329 0.2061    
Residuals 62  17.74   0.286                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1