Chapter 34 Factorial Topics

34.1 Hierarchy

Parameters are difficult to understand in non-hierarchical models, so I almost universally recommend hierarchical models. R helps you with that as well (assuming you think of it as help): when you put a term in a model before some of its subterms, R will have the term swallow up the subterms as well, enforcing hierarchy.

34.1.1 Barley sprouting

Let’s look at our old friend the barley sprouting data and try a couple of models.

  • If you put the weeks:water interaction in the model after water but before weeks, then the SS and df for weeks:water are the sums of those for weeks and weeks:water.
  • Note that the term that R fits is actually water:weeks instead of weeks:water. R names interactions with the elements in the order they first appeared in the model.
  • In the enforced hierarchy situation, not only do the SS and df add up, but the effects for the interaction are the sum of the effects for weeks and water:weeks.
> library(cfcdae)
> data(SproutingBarley)
> standard.model <- lm(sprouting~weeks*water,data=SproutingBarley)
> anova(standard.model)
Analysis of Variance Table

Response: sprouting
            Df  Sum Sq Mean Sq F value   Pr(>F)    
weeks        4 1321.13  330.28  5.5293 0.003645 ** 
water        1 1178.13 1178.13 19.7232 0.000251 ***
weeks:water  4  208.87   52.22  0.8742 0.496726    
Residuals   20 1194.67   59.73                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(standard.model,"weeks")
         1          3          6          9         12 
-7.5333333 -5.7000000  0.1333333  1.8000000 11.3000000 
> model.effects(standard.model,"weeks:water")
           4         8
1  -4.266667  4.266667
3  -1.433333  1.433333
6   0.400000 -0.400000
9   3.066667 -3.066667
12  2.233333 -2.233333
> skipped.model <- lm(sprouting~water+weeks:water,data=SproutingBarley)
> anova(skipped.model)
Analysis of Variance Table

Response: sprouting
            Df Sum Sq Mean Sq F value   Pr(>F)    
water        1 1178.1 1178.13 19.7232 0.000251 ***
water:weeks  8 1530.0  191.25  3.2017 0.016522 *  
Residuals   20 1194.7   59.73                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(skipped.model,"water:weeks")
           1         3          6         9        12
4 -11.800000 -7.133333  0.5333333  4.866667 13.533333
8  -3.266667 -4.266667 -0.2666667 -1.266667  9.066667

34.1.2 Amylase activity

Now we look at the amylase activity data. There are three factors: variety of corn, growth temperature, and analysis temperature. Read in the data; fit the preliminary model; check residuals. We see increasing variance.

> data(AmylaseActivity)
> amylase.fit <- lm(amylase ~ aTemp*gTemp*variety,data=AmylaseActivity)
> plot(amylase.fit,which=1)
Residual plot for amylase activity

Figure 34.1: Residual plot for amylase activity

Box-Cox suggests a log, but power 1 (leave the data alone) is surprisingly close to the interval.

> car::boxCox(amylase.fit)
Box-Cox plot for amylase activity

Figure 34.2: Box-Cox plot for amylase activity

Fit the model with the transformed data and do the anova. The highly significant terms are aTemp, variety, and gTemp:variety. Note that the main effect gTemp is not (even close to) significant, but gTemp interacts with variety. We still keep gTemp in the model. With a weighting other than the standard equal weighting, gTemp might be significant.

(If you look at residuals for the logged data, constant variance is better, but not perfect, and the residuals are a bit short tailed.)

> amylase.fitlog <- lm(log(amylase)~aTemp*gTemp*variety,data=AmylaseActivity)
> anova(amylase.fitlog)
Analysis of Variance Table

Response: log(amylase)
                    Df  Sum Sq Mean Sq  F value    Pr(>F)    
aTemp                7 3.01613 0.43088  78.8628 < 2.2e-16 ***
gTemp                1 0.00438 0.00438   0.8016 0.3739757    
variety              1 0.58957 0.58957 107.9085 2.305e-15 ***
aTemp:gTemp          7 0.08106 0.01158   2.1195 0.0539203 .  
aTemp:variety        7 0.02758 0.00394   0.7212 0.6543993    
gTemp:variety        1 0.08599 0.08599  15.7392 0.0001863 ***
aTemp:gTemp:variety  7 0.04764 0.00681   1.2457 0.2916176    
Residuals           64 0.34967 0.00546                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So what is going on? Look at the interaction plot. For variety B73, the response increases in growth temperature. For variety Oh43, the response decreases in growth temperature, by an almost equal amount. So simply averaging over the two, it does not appear that growth temperature has a main effect.

If variety B73 were five times as important (perhaps it is five times as prevalant in Minnesota), then you might want to weight B73 five times Oh43. With that weighting, the response would increase with growth temperature.

> interactplot(gTemp,variety,log(amylase),data=AmylaseActivity,
+              conf=.95,sigma2=.00546,df=64)
Interaction plot gTemp:variety for amylase activity

Figure 34.3: Interaction plot gTemp:variety for amylase activity

The growth temperature by analysis temperature interaction is marginally significant. If we look at it we find the activity rises more slowly with analysis temperature but drops more quickly when the growth temperature is low.

> interactplot(aTemp,gTemp,log(amylase),data=AmylaseActivity,
+             sigma2=.00546,df=64,conf=.95,jitter=.2)
Interactin plot for aTemp:gTemp for amylase data

Figure 34.4: Interactin plot for aTemp:gTemp for amylase data

34.1.3 What happens with unequal weights?

This next bit is heavy going; it is OK to just remember

  1. Parameterizations are abritrary.
  2. Parameters for terms that are included in other terms in the model can be affected by weightings of other factors.
  3. Stick with hierarchical models.

and just skip to the next section.

Still here? OK, now we’ll look at the data in Table 8.12. We can see that the estimated effects for b are zero, but these are zero in the presence of the interaction. (n=1 so no df for error so no SE or confidence intervals.)

> library(cfcdae)
> y <- c(120,144,192,168,168,120)
> a <- factor(c(1,2,3,1,2,3))
> b <- factor(c(1,1,1,2,2,2))
> fit.standard <- lm(y~a*b)
> summary(fit.standard)

Call:
lm(formula = y ~ a * b)

Residuals:
ALL 6 residuals are 0: no residual degrees of freedom!

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.52e+02        NaN     NaN      NaN
a1          -8.00e+00        NaN     NaN      NaN
a2           4.00e+00        NaN     NaN      NaN
b1          -3.78e-15        NaN     NaN      NaN
a1:b1       -2.40e+01        NaN     NaN      NaN
a2:b1       -1.20e+01        NaN     NaN      NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 5 and 0 DF,  p-value: NA

Don’t try this at home, kids, but now we’ll force R to weight the rows (1,2,1) instead of the usual (1,1,1). Ie, for row effects, the first, plus twice the second, plus the third add to zero.

The first thing we have to do is construct what R calls a “contrast” function.

> wts <- c(1,2,1)
> contr.weights.121 <-
+   function (n, contrasts = TRUE, sparse = FALSE) 
+ {
+   if (length(n) <= 1L) {
+     if (is.numeric(n) && length(n) == 1L && n > 1L) 
+       levels <- seq_len(n)
+     else stop("not enough degrees of freedom to define contrasts")
+   }
+   else levels <- n
+   levels <- as.character(levels)
+   nlevels <- length(levels)
+   if(!is.numeric(wts) || length(wts) != nlevels || min(wts) <= 0)
+       stop("wts must be vector of positive weights for each level of factor")
+ 
+   cont <- diag(rep(1,nlevels))
+                
+   if (contrasts) {
+     cont <- cont[, -nlevels, drop = FALSE]
+     colnames(cont) <- NULL
+   }
+   cont[length(levels),] <- -wts[1:(nlevels-1)]/wts[nlevels]
+   cont
+ }

Honestly, it took me some time to figure out how to do that; in the end it’s a modification of the contr.sum() contrasts that we normally use.

Now refit, forcing the rows to have those weights. Here, factor B has non-zero coefficients.

> fit.wts <- lm(y~a*b,contrasts=c(a=contr.weights.121,b=contr.sum))
> summary(fit.wts)

Call:
lm(formula = y ~ a * b, contrasts = c(a = contr.weights.121, 
    b = contr.sum))

Residuals:
ALL 6 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)      153        NaN     NaN      NaN
a1                -9        NaN     NaN      NaN
a2                 3        NaN     NaN      NaN
b1                -3        NaN     NaN      NaN
a1:b1            -21        NaN     NaN      NaN
a2:b1             -9        NaN     NaN      NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 5 and 0 DF,  p-value: NA

But both models have the same fitted values.

> library(emmeans)
> emmeans(fit.standard,~a:b)
Warning in qt((1 - level)/adiv, df): NaNs produced
 a b emmean  SE df lower.CL upper.CL
 1 1    120 NaN  0      NaN      NaN
 2 1    144 NaN  0      NaN      NaN
 3 1    192 NaN  0      NaN      NaN
 1 2    168 NaN  0      NaN      NaN
 2 2    168 NaN  0      NaN      NaN
 3 2    120 NaN  0      NaN      NaN

Confidence level used: 0.95 
> emmeans(fit.wts,~a:b)
Warning in qt((1 - level)/adiv, df): NaNs produced
 a b emmean  SE df lower.CL upper.CL
 1 1    120 NaN  0      NaN      NaN
 2 1    144 NaN  0      NaN      NaN
 3 1    192 NaN  0      NaN      NaN
 1 2    168 NaN  0      NaN      NaN
 2 2    168 NaN  0      NaN      NaN
 3 2    120 NaN  0      NaN      NaN

Confidence level used: 0.95 

If we don’t include the interaction term, then the coefficients for b are the same regardless of how a is weighted.

> summary(lm(y~a+b))

Call:
lm(formula = y ~ a + b)

Residuals:
  1   2   3   4   5   6 
-24 -12  36  24  12 -36 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  1.520e+02  1.833e+01   8.292   0.0142 *
a1          -8.000e+00  2.592e+01  -0.309   0.7868  
a2           4.000e+00  2.592e+01   0.154   0.8915  
b1          -6.020e-15  1.833e+01   0.000   1.0000  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 44.9 on 2 degrees of freedom
Multiple R-squared:  0.04545,   Adjusted R-squared:  -1.386 
F-statistic: 0.03175 on 3 and 2 DF,  p-value: 0.9903
> summary(lm(y~a+b,contrasts = c(a = contr.weights.121, 
+     b = contr.sum)))

Call:
lm(formula = y ~ a + b, contrasts = c(a = contr.weights.121, 
    b = contr.sum))

Residuals:
  1   2   3   4   5   6 
-24 -12  36  24  12 -36 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  1.530e+02  1.944e+01   7.869   0.0158 *
a1          -9.000e+00  2.970e+01  -0.303   0.7905  
a2           3.000e+00  1.944e+01   0.154   0.8915  
b1          -6.020e-15  1.833e+01   0.000   1.0000  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 44.9 on 2 degrees of freedom
Multiple R-squared:  0.04545,   Adjusted R-squared:  -1.386 
F-statistic: 0.03175 on 3 and 2 DF,  p-value: 0.9903

34.2 Pooling

Pooling is the practice of dropping non-significant terms from a model. Their SS and df thus appear in (get added to) error. They are said to be pooled into error.

I only recommend doing this if

  1. You have very few degrees of freedom for error and the F ratio for the potentially pooled term is less than 2, or
  2. The data are unbalanced (or more generally, your model terms are not all orthogonal as might happen with polynomial models) and you want to look at coefficients.

34.2.1 Carbon wire

Let’s look at the carbon wire data again.

> wire <- read.table("carbonwire.txt",header=TRUE)
> names(wire)
[1] "resist" "degas"  "temp"   "diff"  
> wire <- within(wire, {degas <- factor(degas); temp <- factor(temp);
+    diff <- factor(diff)})
> wire.fit <- lm(resist~degas*temp*diff,data=wire)
> anova(wire.fit)
Analysis of Variance Table

Response: resist
                Df Sum Sq Mean Sq  F value    Pr(>F)    
degas            1   0.47   0.467   1.5918    0.2125    
temp             2 410.69 205.346 699.6024 < 2.2e-16 ***
diff             2  80.54  40.270 137.1989 < 2.2e-16 ***
degas:temp       2   0.31   0.157   0.5342    0.5892    
degas:diff       2   0.70   0.351   1.1957    0.3104    
temp:diff        4  14.81   3.704  12.6177 2.566e-07 ***
degas:temp:diff  4   0.87   0.219   0.7450    0.5656    
Residuals       54  15.85   0.294                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

No term involving degas was significant. Some might argue to remove those terms and refit as below. I would say not to do that. The data are balanced, so removing terms will not change coefficient estimates, and we already have 54 df for error. We don’t need to risk biasing our MSE.

> wire.fit.pooled <- lm(resist~temp*diff,data=wire)
> anova(wire.fit.pooled)
Analysis of Variance Table

Response: resist
          Df Sum Sq Mean Sq F value    Pr(>F)    
temp       2 410.69 205.346 710.521 < 2.2e-16 ***
diff       2  80.54  40.270 139.340 < 2.2e-16 ***
temp:diff  4  14.81   3.704  12.815 1.085e-07 ***
Residuals 63  18.21   0.289                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

34.3 Single replications

Single replication experiments mean that we have no df for error, so we need to assemble some kind of surrogate estimate of error.

The most common approach is to drop one (or more) high-order interaction effects out of the model, thus pooling them into error. This is plausible based on the observation that interactions are typically smaller than main effects, and higher order interactions are typically the smallest.

Note that “typically smaller” does not mean always smaller. Whenever we construct a surrogate estimate of error, we run the risk of biasing our estimate of error.

34.3.1 Zinc retention

Let’s go back to the zinc retention data. That three-factor experiment had just one replication, so we could decide to use the three-factor interaction as error.

> data(ZincRetention)
> zinc.3out <- lm(retention~(m.protein+d.zinc+m.zinc)^2,data=ZincRetention)
> anova(zinc.3out)
Analysis of Variance Table

Response: retention
                 Df  Sum Sq Mean Sq  F value   Pr(>F)   
m.protein         3  827.25  275.75  27.1230 0.011261 * 
d.zinc            1  756.25  756.25  74.3852 0.003278 **
m.zinc            1 1225.00 1225.00 120.4918 0.001619 **
m.protein:d.zinc  3   28.25    9.42   0.9262 0.524375   
m.protein:m.zinc  3  298.50   99.50   9.7869 0.046563 * 
d.zinc:m.zinc     1    4.00    4.00   0.3934 0.574988   
Residuals         3   30.50   10.17                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA we see that there are only 3 df for error ( this is the three factor interaction). That is not many, so we look to see if we can pool. Both m.protein:d.zinc and d.zinc:m.zinc have F statistics less than 2 (and are non-significant), so we refit without them. We are now up to 7 df for error, which is better.

> zinc.reduced <- lm(retention~m.protein*m.zinc+d.zinc,data=ZincRetention)
> anova(zinc.reduced)
Analysis of Variance Table

Response: retention
                 Df  Sum Sq Mean Sq F value    Pr(>F)    
m.protein         3  827.25  275.75  30.761 0.0002106 ***
m.zinc            1 1225.00 1225.00 136.653 7.578e-06 ***
d.zinc            1  756.25  756.25  84.362 3.736e-05 ***
m.protein:m.zinc  3  298.50   99.50  11.100 0.0047289 ** 
Residuals         7   62.75    8.96                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What the pooling really was saying is that the two additional two-factor interactions we dropped are of roughly the same size (from an MS perspective) as the three-factor interaction. If the three-factor interaction was ignorable, then great. If the three-factor interaction was actually something beyond random noise, then all that as happened is that we are now more certain (more df) of a biased estimate of error.

Can we check this? In the full factorial model, the expected values of the residuals are zero, because we fit all the treatment means. In the reduced model, the expected values of the residuals follow the interaction(s) that were pooled into error.

Stipulating that residual plots with few df for error are difficult to interpret, we can look at our residuals. If they are not always centered around zero (ie, if there is a pattern in the mean), then our surrogate estimate of error is biased upwards. If it is biased upwards, then we are doing conservative tests; the reported p-values are likely larger than they should be.

Let’s plot the residuals from the two-way fit. That looks like a pattern to me, so the tests are probably conservative.

> plot(zinc.3out,which=1)
Residual plot for two-way model of zinc retention

Figure 34.5: Residual plot for two-way model of zinc retention

34.3.2 Page faults

The page fault data are a single replication of the number of page faults a computer has as a function of four factors: amount of RAM, size of the program, initialization sequence, and paging algorithm.

We note that the response faults is a non-negative variable that ranges over several orders of magnitude. The first quartile is 4.25 times the minimum; the median is 5.75 times the first quartile; the third quartile is 4.25 times the median; and the maximum is 5.13 times third quartile. These data are practically begging for a log transformation.

> data(PageFaults)
> head(PageFaults)
  alg init   size    ram faults
1   1    1  large  large     32
2   1    1  large medium     48
3   1    1  large  small    538
4   1    1 medium  large     53
5   1    1 medium medium     81
6   1    1 medium  small   1901
> summary(PageFaults$faults)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   32.0   136.0   782.5  2521.2  3333.5 17117.0 

To get started, we need to pool some terms into error. One reasonable place to start is to remove only the four-factor interaction. That is only going to be 8 df for error, so another possibility would be to pool all three and four-factor interactions. The latter has the advantage of more df for error and the risk of pooling an active term into error.

> faults.out4 <- lm(faults~(alg+init+size+ram)^3,data=PageFaults)
> faults.out34 <- lm(faults~(alg+init+size+ram)^2,data=PageFaults)

You can’t really see much in the residuals plot of faults.out4.

> plot(faults.out4,which=1)
Residuals plot of three-way model of page faults

Figure 34.6: Residuals plot of three-way model of page faults

However, Box-Cox is really convinced of the need to transform, with the optimum somewhere around power .1. Log is right at the edge of the interval, and 1 is so far away it’s funny. We really need to transform.

> car::boxCox(faults.out4,lambda=seq(-.5,.5,.01))
Box-Cox plot of three-way model of page faults

Figure 34.7: Box-Cox plot of three-way model of page faults

Now let’s look at the two-way model. Wow! When you do not include all terms in your model, some non-ignorable terms can drop down into error giving you systematic patterns in the residuals. I call this pattern the flopping fish.

This can often be fixed via transformation, but you sometimes need additional predictive terms.

> plot(faults.out34,which=1)
Residuals plot of two-way model of page faults

Figure 34.8: Residuals plot of two-way model of page faults

Box-Cox says that log is the right thing to do.

> car::boxCox(faults.out34)
Box-Cox plot of two-way model of page faults

Figure 34.9: Box-Cox plot of two-way model of page faults

OK, so log transformation it is. Let’s refit our two potential models with logged data.

> logfaults.out4 <- lm(log(faults)~(alg+init+size+ram)^3,data=PageFaults)
> logfaults.out34 <- lm(log(faults)~(alg+init+size+ram)^2,data=PageFaults)

Now we’ll check residuals for the transformed data. Looks OK when we just drop the four way.

> plot(logfaults.out4,which=1)
Residual plot of three way logged paged faults

Figure 34.10: Residual plot of three way logged paged faults

But the plot of the two-way model has a problem: there is a substantial pattern with the local average of the residuals moving up and down (the variability around that pattern is fairly constant, but that pattern …). This means that one or more of the three-way terms is large and inflating the MSE.

> plot(logfaults.out34,which=1)
Residual plot of two way logged paged faults

Figure 34.11: Residual plot of two way logged paged faults

OK, we can now start looking at ANOVA. We used the three-way model with logged data.

There are only 8 df for error, which isn’t very many. There are two non-significant terms with F statistics less than 2 that we can pool into error and still maintain hierarchy.

> anova(logfaults.out4)
Analysis of Variance Table

Response: log(faults)
              Df Sum Sq Mean Sq    F value    Pr(>F)    
alg            1  2.502   2.502   877.9857 1.824e-09 ***
init           2 24.639  12.320  4323.4054 7.300e-13 ***
size           2 41.692  20.846  7315.5596 8.919e-14 ***
ram            2 92.697  46.349 16265.4282 3.654e-15 ***
alg:init       2  0.018   0.009     3.0947  0.101042    
alg:size       2  0.022   0.011     3.8979  0.065794 .  
alg:ram        2  0.060   0.030    10.5350  0.005736 ** 
init:size      4  0.829   0.207    72.7278 2.511e-06 ***
init:ram       4  9.510   2.378   834.3927 1.632e-10 ***
size:ram       4  0.504   0.126    44.2447 1.689e-05 ***
alg:init:size  4  0.015   0.004     1.2778  0.354761    
alg:init:ram   4  0.026   0.007     2.2818  0.149073    
alg:size:ram   4  0.004   0.001     0.3511  0.836454    
init:size:ram  8  1.052   0.132    46.1535 6.726e-06 ***
Residuals      8  0.023   0.003                         
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s take those two terms out. Note the use of update() to change a model without needing to type everything in again.

This final model has some highly significant terms and a handful of marginally signficant terms. However, they have F ratios larger than 2 and we have 16 df for error, so we stay with this model.

> logfaults.final <- update(logfaults.out4,~.-alg:size:ram-alg:init:size)
> anova(logfaults.final)
Analysis of Variance Table

Response: log(faults)
              Df Sum Sq Mean Sq    F value    Pr(>F)    
alg            1  2.502   2.502   967.7896 9.683e-16 ***
init           2 24.639  12.320  4765.6206 < 2.2e-16 ***
size           2 41.692  20.846  8063.8244 < 2.2e-16 ***
ram            2 92.697  46.349 17929.1216 < 2.2e-16 ***
alg:init       2  0.018   0.009     3.4113 0.0583516 .  
alg:size       2  0.022   0.011     4.2966 0.0320944 *  
alg:ram        2  0.060   0.030    11.6126 0.0007664 ***
init:size      4  0.829   0.207    80.1667 2.243e-10 ***
init:ram       4  9.510   2.378   919.7377 < 2.2e-16 ***
size:ram       4  0.504   0.126    48.7702 9.148e-09 ***
alg:init:ram   4  0.026   0.007     2.5151 0.0825350 .  
init:size:ram  8  1.052   0.132    50.8743 6.241e-10 ***
Residuals     16  0.041   0.003                         
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We were primarily interested in the effects of algorithm. Most of the interactions involving algorithm are marginally significant, but alg:ram has a p-value less than .001. Use emmeans() to see what is happening.

Algorithm 1 produces fewer mean faults than algorithm 2 at all levels of RAM, but it’s advantage for small RAM (.34) is less than its advantage at medium or large (.48).

> margin <- emmeans::emmeans(logfaults.final,~alg:ram)
NOTE: Results may be misleading due to involvement in interactions
> margin
 alg ram    emmean      SE df lower.CL upper.CL
 1   large   4.710 0.01695 16    4.674    4.746
 2   large   5.190 0.01695 16    5.154    5.226
 1   medium  6.227 0.01695 16    6.191    6.263
 2   medium  6.702 0.01695 16    6.666    6.738
 1   small   7.990 0.01695 16    7.954    8.026
 2   small   8.326 0.01695 16    8.290    8.362

Results are averaged over the levels of: init, size 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95 
> matrix(predict(margin),ncol=2,byrow=TRUE)
         [,1]     [,2]
[1,] 4.710091 5.190324
[2,] 6.227057 6.702067
[3,] 7.989746 8.325972
> linear.contrast(logfaults.final,alg,c(-1,1),byvar=ram)
  estimates         se  t-value      p-value  lower-ci  upper-ci
1 0.4802324 0.02396805 20.03635 9.304802e-13 0.4294224 0.5310424
2 0.4750106 0.02396805 19.81849 1.101281e-12 0.4242006 0.5258206
3 0.3362258 0.02396805 14.02808 2.077443e-10 0.2854158 0.3870358
attr(,"class")
[1] "linear.contrast"

Finally, we return the our old adage that interaction depends on scale. Look at the two side by side plots below. For the raw data, many interactions are the size of main effects. On the log scale, many interactions are small relative to main effects. The transformation has not only improved the behavior of the residuals, it has simplified the model.

> sidebyside(faults.out4)
Side by side plot of page faults on original scale

Figure 34.12: Side by side plot of page faults on original scale

> sidebyside(logfaults.out4)
Side by side plot fo page faults on log scale

Figure 34.13: Side by side plot fo page faults on log scale