Chapter 35 Unbalanced Factorials

35.1 Grades data

Forty-five students are in a class. Some are undergraduates, some are graduate students; some are in lab 1, some are in lab 2; some take exam form 1, some take exam form 2. (The exams and the labs are haphazard, though not really random. Status is certainly not random. Still, it’s an interesting data set.)

The question at hand is whether the two test forms are equivalent. Student status and lab (there were two different TAs) are of secondary interest.

Let’s check the counts. We see that they \(n_{ijk}\) values differ, so these data are unbalanced.

> library(cfcdae)
> load("ExamData.rda")
> replications(~status:exam:section,data=ExamData)
$`status:exam:section`
, , section = 1

           exam
status      1 2
  Grad      5 7
  UnderGrad 5 4

, , section = 2

           exam
status      1 2
  Grad      6 8
  UnderGrad 6 4

Fit the full model. The sums of squares from anova() are sequential or Type I.

The three factor interaction is entered last. It is adjusted for all other factors and is thus Type III. It is adjusted for the largest hierarchical submodel not including it (ie, the main effects and two factor interactions), so it is also Type II. Jackpot: this SS is simulataneously Types I, II, and III.

Consider exam:section. It is adjusted for the main effects and the other two factor interactions (the largest hierarchical model not including it). Therefore, it is Type II in addition to Type I. It is not however, Type III.

The other sums of squares are Type I, but none of them is Type II or Type III.

> with(ExamData,hist(grades))
Histogram of exam grades

Figure 35.1: Histogram of exam grades

> fit1 <- lm(grades~status*exam*section,data=ExamData)
> anova(fit1)
Analysis of Variance Table

Response: grades
                    Df Sum Sq Mean Sq F value    Pr(>F)    
status               1 5622.7  5622.7 26.4818 8.976e-06 ***
exam                 1    7.5     7.5  0.0352   0.85229    
section              1   53.9    53.9  0.2537   0.61746    
status:exam          1  143.1   143.1  0.6739   0.41694    
status:section       1  725.2   725.2  3.4158   0.07258 .  
exam:section         1  164.5   164.5  0.7748   0.38443    
status:exam:section  1   32.3    32.3  0.1522   0.69871    
Residuals           37 7855.9   212.3                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note: I checked residuals, and everything seems OK (just a slight hint of decreasing variance, but nothing that Box-Cox thinks should be transformed). However, consider the following. Look at a new response, which is lost = 100 - grades (ie, the number of points lost). With that response, we see increasing rather than decreasing variance, and the square root looks better than the original scale, although it is not quite significant via Box-Cox.
Can you see why transformations work on lost but not on grades?

In order to get Type II for a main effect, we need to force a two-way interaction in the model before that main effect.

In this model we have status adjusted for section, exam, and their interaction (the largest hierarchical submodel not including status). Thus this is a Type II SS.

In this order, section:status is also Type II.

Note that the SS for the three-way has not changed. Type I sums of squares depend on what terms precede a term in a model, but not on the order of the preceding terms.

Likewise, the coefficients on the terms will be the same in the different orders, as long as you don’t force a non-hierarchical order.

> trms <- terms(grades~exam*section+exam*section*status,keep.order=TRUE)
> fit2 <- lm(trms, data=ExamData)
> anova(fit2)
Analysis of Variance Table

Response: grades
                    Df Sum Sq Mean Sq F value    Pr(>F)    
exam                 1  203.0   203.0  0.9561   0.33451    
section              1   72.4    72.4  0.3410   0.56282    
exam:section         1  239.0   239.0  1.1259   0.29553    
status               1 5445.2  5445.2 25.6458 1.157e-05 ***
exam:status          1  149.5   149.5  0.7042   0.40677    
section:status       1  607.7   607.7  2.8622   0.09909 .  
exam:section:status  1   32.3    32.3  0.1522   0.69871    
Residuals           37 7855.9   212.3                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(fit1,"status")
     Grad UnderGrad 
 11.56696 -11.56696 
> model.effects(fit2,"status")
     Grad UnderGrad 
 11.56696 -11.56696 

We can get Type II sums of squares for every term by fitting a lot of models and forcing different orders.

Or we could just use the car::Anova() function. Note that the Type II sums of squares are the same regardless of the order of the terms. Also note that the Type II SS computed above match with this table.

So Type II is a somewhat tedious concept, but it’s easy peasy with the Anova() function.

In terms of testing, we work from the top down. The three-way is not significant, so we can test the two-way interactions. None of them is significant, so we can test the main effects. Only the main effect of status is significant.

If we had found that the status:exam interaction was significant, we would not test either status or exam, because there are included in a term we kept. However, section is not included in a retained term, so we could test it.

> car::Anova(fit1,type=2)
Anova Table (Type II tests)

Response: grades
                    Sum Sq Df F value    Pr(>F)    
status              5445.2  1 25.6458 1.157e-05 ***
exam                  11.3  1  0.0533   0.81867    
section               51.0  1  0.2403   0.62685    
status:exam          129.6  1  0.6105   0.43956    
status:section       607.7  1  2.8622   0.09909 .  
exam:section         164.5  1  0.7748   0.38443    
status:exam:section   32.3  1  0.1522   0.69871    
Residuals           7855.9 37                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> car::Anova(fit2,type=2)
Anova Table (Type II tests)

Response: grades
                    Sum Sq Df F value    Pr(>F)    
exam                  11.3  1  0.0533   0.81867    
section               51.0  1  0.2403   0.62685    
exam:section         164.5  1  0.7748   0.38443    
status              5445.2  1 25.6458 1.157e-05 ***
exam:status          129.6  1  0.6105   0.43956    
section:status       607.7  1  2.8622   0.09909 .  
exam:section:status   32.3  1  0.1522   0.69871    
Residuals           7855.9 37                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Parameters in one term can depend on what other terms are in the model when the data are unbalanced. In this case, the coefficients for status only change a little, but it’s not just a little change in all cases.

> fit3 <- lm(grades~status,data=ExamData)
> model.effects(fit3,"status")
     Grad UnderGrad 
 11.31579 -11.31579 

We like contrasts, let’s do a contrast.

> linear.contrast(fit1,status,c(-1,1))
  estimates       se   t-value      p-value  lower-ci  upper-ci
1 -23.13393 4.463302 -5.183142 7.999982e-06 -32.17744 -14.09042
attr(,"class")
[1] "linear.contrast"

The F test is the square of the t test. The F you get for comparing the levels of status is not any of the F tests we have seen until now.

The SS for a contrast is the increase in error SS that we would obtain if we removed the contrast df from the model (meaning we forced the coefficients to fit as well as possible subject to the contrast value being zero), leaving all the other terms in the model. Note that since we are using a contrast in a main effect, we are taking the main effect of status out of the model, but leaving in all of the interactions that include status. This is OK if you are really interested in testing the corresponding equally weighted hypotheses about those parameters, but it does violate model hierarchy. This SS is a Type III SS.

> linear.contrast(fit1,status,c(-1,1))[,"t-value"]^2
[1] 26.86496

car::Anova() can compute the full Type III table. All methods are in agreement: status is very important and no other term is needed.

> car::Anova(fit1,type=3)
Anova Table (Type III tests)

Response: grades
                    Sum Sq Df  F value Pr(>F)    
(Intercept)         165406  1 779.0286 <2e-16 ***
status                5704  1  26.8650  8e-06 ***
exam                     5  1   0.0241 0.8776    
section                105  1   0.4926 0.4871    
status:exam            122  1   0.5748 0.4532    
status:section         601  1   2.8304 0.1009    
exam:section           184  1   0.8648 0.3584    
status:exam:section     32  1   0.1522 0.6987    
Residuals             7856 37                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Particularly in a large model, it can be tedious to do the backward testing starting with the highest order interaction and maintaining hierarchy.

We can automate this using step(), It finds the hierarchical model with the minimum AIC that can be found by eliminating terms. In general, AIC is a bit more willing to keep variables than are F-tests at the usual significance levels. In this case, the selected model has section*status, not just status.

Another thing to note: the SS for section:status in the last step is 741 (adusted only for section and status); that is substantially larger than the 601 or 603 we were getting in Types II and III. Unbalanced data, remember?

> step(fit1,direction="backward")
Start:  AIC=248.31
grades ~ status * exam * section

                      Df Sum of Sq    RSS    AIC
- status:exam:section  1    32.309 7888.3 246.49
<none>                             7855.9 248.31

Step:  AIC=246.49
grades ~ status + exam + section + status:exam + status:section + 
    exam:section

                 Df Sum of Sq    RSS    AIC
- status:exam     1    129.63 8017.9 245.22
- exam:section    1    164.50 8052.8 245.42
<none>                        7888.3 246.49
- status:section  1    607.70 8496.0 247.83

Step:  AIC=245.22
grades ~ status + exam + section + status:section + exam:section

                 Df Sum of Sq    RSS    AIC
- exam:section    1    158.52 8176.4 244.11
<none>                        8017.9 245.22
- status:section  1    627.58 8645.5 246.62

Step:  AIC=244.11
grades ~ status + exam + section + status:section

                 Df Sum of Sq    RSS    AIC
- exam            1     11.32 8187.7 242.17
<none>                        8176.4 244.11
- status:section  1    744.68 8921.1 246.03

Step:  AIC=242.17
grades ~ status + section + status:section

                 Df Sum of Sq    RSS    AIC
<none>                        8187.7 242.17
- status:section  1    741.89 8929.6 244.07

Call:
lm(formula = grades ~ status + section + status:section, data = ExamData)

Coefficients:
     (Intercept)           status1          section1  status1:section1  
          62.623            11.562            -1.720             4.119  

For these data, log(N) is about 3.8. If you wanted to select a model using BIC, you could do so by setting k. BIC almost gets rid of section:status, but not quite.

> step(fit1,direction="backward",k=3.8)
Start:  AIC=262.71
grades ~ status * exam * section

                      Df Sum of Sq    RSS    AIC
- status:exam:section  1    32.309 7888.3 259.09
<none>                             7855.9 262.71

Step:  AIC=259.09
grades ~ status + exam + section + status:exam + status:section + 
    exam:section

                 Df Sum of Sq    RSS    AIC
- status:exam     1    129.63 8017.9 256.02
- exam:section    1    164.50 8052.8 256.22
- status:section  1    607.70 8496.0 258.63
<none>                        7888.3 259.09

Step:  AIC=256.02
grades ~ status + exam + section + status:section + exam:section

                 Df Sum of Sq    RSS    AIC
- exam:section    1    158.52 8176.4 253.11
- status:section  1    627.58 8645.5 255.62
<none>                        8017.9 256.02

Step:  AIC=253.11
grades ~ status + exam + section + status:section

                 Df Sum of Sq    RSS    AIC
- exam            1     11.32 8187.7 249.37
<none>                        8176.4 253.11
- status:section  1    744.68 8921.1 253.23

Step:  AIC=249.37
grades ~ status + section + status:section

                 Df Sum of Sq    RSS    AIC
<none>                        8187.7 249.37
- status:section  1    741.89 8929.6 249.47

Call:
lm(formula = grades ~ status + section + status:section, data = ExamData)

Coefficients:
     (Intercept)           status1          section1  status1:section1  
          62.623            11.562            -1.720             4.119  

35.2 Highly unbalanced example

We are going to create a highly unbalanced 2x2 factorial; two of the treatments have only one replication, but the other two have 20 and 25. We will make up some responses that include main effects but no interactions. Then the fun really begins. (Note that we set the random number seed so that you can recreate this if you like.)

> muij<-rep(1:4,c(20,1,1,25))
> a<-factor(rep(c(1,2,1,2),c(20,1,1,25)))
> b<-factor(rep(c(1,1,2,2),c(20,1,1,25)))
> replications(~a:b,data.frame(a,b))
$`a:b`
   b
a    1  2
  1 20  1
  2  1 25
> set.seed(1234)
> y<-muij+1.5*rnorm(47)

Start by fitting the two factor model. Test with Type II; nothing significant. Test with Type III; nothing significant.

It seems like nothing is going on.

> twowayfit <- lm(y~a*b)
> car::Anova(twowayfit,type=2)
Anova Table (Type II tests)

Response: y
          Sum Sq Df F value Pr(>F)
a          2.777  1  1.4729 0.2315
b          3.074  1  1.6306 0.2085
a:b        0.263  1  0.1396 0.7105
Residuals 81.064 43               
> car::Anova(twowayfit,type=3)
Anova Table (Type III tests)

Response: y
            Sum Sq Df F value    Pr(>F)    
(Intercept) 32.082  1 17.0178 0.0001664 ***
a            2.785  1  1.4772 0.2308399    
b            3.083  1  1.6351 0.2078515    
a:b          0.263  1  0.1396 0.7104917    
Residuals   81.064 43                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now fit the model simply treating the design as four groups. It is highly significant!

> allinone <- lm(y~a:b)
> anova(allinone)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value   Pr(>F)    
a:b        3 68.190 22.7301  12.057 7.37e-06 ***
Residuals 43 81.064  1.8852                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What is happening is that the two factors explain essentially the same variation (row differences or column differences really boil down to treatment 1,1 verus treatment 2,2 in terms of sums of squares). Thus if you have a, you don’t need b. If you have b, you don’t need a. But either one of them can explain a lot of variability.

Moral: examine the overall fit of the model as well as Type II and Type III tests.

35.3 Proportional balance

There are some situations where sums of squares do not depend on the order of terms, even when the data do not have equal replications. One of these situations is called proportional balance.

In a two-way case, the counts in each row are constant multiples of each other, and the counts in each column are constant multiples of each other.

Here, the second column is three times the first column, and the third column is four times the first. Likewise, the second row is twice the first row, and the third row is three times the first row.

> nij <- c(1,2,3,3,6,9,4,8,12)
> a <- factor(rep(c(1,2,3,1,2,3,1,2,3),nij))
> b <- factor(rep(c(1,1,1,2,2,2,3,3,3),nij))
> replications(~a:b,data.frame(a,b))
$`a:b`
   b
a   1 2  3
  1 1 3  4
  2 2 6  8
  3 3 9 12

Add some data and observe that order does not matter. Well, Type III is different, but that’s another warning for non-hierarchical models.

> y <- rnorm(length(a))
> abfit <- lm(y~a*b)
> anova(abfit)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value  Pr(>F)  
a          2  0.141  0.0703  0.0807 0.92259  
b          2  8.138  4.0689  4.6712 0.01518 *
a:b        4  5.848  1.4619  1.6783 0.17456  
Residuals 39 33.971  0.8711                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> bafit <- lm(y~b*a)
> anova(bafit)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value  Pr(>F)  
b          2  8.138  4.0689  4.6712 0.01518 *
a          2  0.141  0.0703  0.0807 0.92259  
b:a        4  5.848  1.4619  1.6783 0.17456  
Residuals 39 33.971  0.8711                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> car::Anova(abfit,type=3)
Anova Table (Type III tests)

Response: y
            Sum Sq Df F value  Pr(>F)  
(Intercept)  1.202  1  1.3794 0.24732  
a            0.233  2  0.1337 0.87522  
b            6.001  2  3.4446 0.04192 *
a:b          5.848  4  1.6783 0.17456  
Residuals   33.971 39                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1