Chapter 35 Unbalanced Factorials
35.1 Grades data
Forty-five students are in a class. Some are undergraduates, some are graduate students; some are in lab 1, some are in lab 2; some take exam form 1, some take exam form 2. (The exams and the labs are haphazard, though not really random. Status is certainly not random. Still, it’s an interesting data set.)
The question at hand is whether the two test forms are equivalent. Student status and lab (there were two different TAs) are of secondary interest.
Let’s check the counts. We see that they \(n_{ijk}\) values differ, so these data are unbalanced.
> library(cfcdae)
> load("ExamData.rda")
> replications(~status:exam:section,data=ExamData)
$`status:exam:section`
, , section = 1
exam
status 1 2
Grad 5 7
UnderGrad 5 4
, , section = 2
exam
status 1 2
Grad 6 8
UnderGrad 6 4
Fit the full model. The sums of squares from anova()
are
sequential or Type I.
The three factor interaction is entered last. It is adjusted for all other factors and is thus Type III. It is adjusted for the largest hierarchical submodel not including it (ie, the main effects and two factor interactions), so it is also Type II. Jackpot: this SS is simulataneously Types I, II, and III.
Consider exam:section
. It is adjusted for the main effects
and the other two factor interactions (the largest hierarchical
model not including it). Therefore, it is Type II in addition
to Type I. It is not however, Type III.
The other sums of squares are Type I, but none of them is Type II or Type III.
> with(ExamData,hist(grades))

Figure 35.1: Histogram of exam grades
> fit1 <- lm(grades~status*exam*section,data=ExamData)
> anova(fit1)
Analysis of Variance Table
Response: grades
Df Sum Sq Mean Sq F value Pr(>F)
status 1 5622.7 5622.7 26.4818 8.976e-06 ***
exam 1 7.5 7.5 0.0352 0.85229
section 1 53.9 53.9 0.2537 0.61746
status:exam 1 143.1 143.1 0.6739 0.41694
status:section 1 725.2 725.2 3.4158 0.07258 .
exam:section 1 164.5 164.5 0.7748 0.38443
status:exam:section 1 32.3 32.3 0.1522 0.69871
Residuals 37 7855.9 212.3
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: I checked residuals, and everything seems OK (just
a slight hint of decreasing variance, but
nothing that Box-Cox thinks should be transformed).
However, consider the
following. Look at a new response, which is lost = 100 - grades
(ie,
the number of points lost). With that response, we see increasing rather
than decreasing variance, and the square root looks better than the original
scale, although it is not quite significant via Box-Cox.
Can you see why transformations
work on lost
but not on grades
?
In order to get Type II for a main effect, we need to force a two-way interaction in the model before that main effect.
In this model we have status
adjusted for section
, exam
,
and their interaction (the largest hierarchical submodel not
including status
). Thus this is a Type II SS.
In this order, section:status
is also Type II.
Note that the SS for the three-way has not changed. Type I sums of squares depend on what terms precede a term in a model, but not on the order of the preceding terms.
Likewise, the coefficients on the terms will be the same in the different orders, as long as you don’t force a non-hierarchical order.
> trms <- terms(grades~exam*section+exam*section*status,keep.order=TRUE)
> fit2 <- lm(trms, data=ExamData)
> anova(fit2)
Analysis of Variance Table
Response: grades
Df Sum Sq Mean Sq F value Pr(>F)
exam 1 203.0 203.0 0.9561 0.33451
section 1 72.4 72.4 0.3410 0.56282
exam:section 1 239.0 239.0 1.1259 0.29553
status 1 5445.2 5445.2 25.6458 1.157e-05 ***
exam:status 1 149.5 149.5 0.7042 0.40677
section:status 1 607.7 607.7 2.8622 0.09909 .
exam:section:status 1 32.3 32.3 0.1522 0.69871
Residuals 37 7855.9 212.3
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(fit1,"status")
Grad UnderGrad
11.56696 -11.56696
> model.effects(fit2,"status")
Grad UnderGrad
11.56696 -11.56696
We can get Type II sums of squares for every term by fitting a lot of models and forcing different orders.
Or we could just use the car::Anova()
function. Note
that the Type II sums of squares are the same regardless
of the order of the terms. Also note that the Type II
SS computed above match with this table.
So Type II is a somewhat tedious concept, but it’s easy peasy
with the Anova()
function.
In terms of testing, we work from the top down.
The three-way is not significant, so we can test the
two-way interactions. None of them is significant,
so we can test the main effects. Only
the main effect of status
is significant.
If we had found that the status:exam
interaction was
significant, we would not test either status
or exam
,
because there are included in a term we kept. However, section
is not included in a retained term, so we could test it.
> car::Anova(fit1,type=2)
Anova Table (Type II tests)
Response: grades
Sum Sq Df F value Pr(>F)
status 5445.2 1 25.6458 1.157e-05 ***
exam 11.3 1 0.0533 0.81867
section 51.0 1 0.2403 0.62685
status:exam 129.6 1 0.6105 0.43956
status:section 607.7 1 2.8622 0.09909 .
exam:section 164.5 1 0.7748 0.38443
status:exam:section 32.3 1 0.1522 0.69871
Residuals 7855.9 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> car::Anova(fit2,type=2)
Anova Table (Type II tests)
Response: grades
Sum Sq Df F value Pr(>F)
exam 11.3 1 0.0533 0.81867
section 51.0 1 0.2403 0.62685
exam:section 164.5 1 0.7748 0.38443
status 5445.2 1 25.6458 1.157e-05 ***
exam:status 129.6 1 0.6105 0.43956
section:status 607.7 1 2.8622 0.09909 .
exam:section:status 32.3 1 0.1522 0.69871
Residuals 7855.9 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Parameters in one term can depend on what
other terms are in the model when the data are unbalanced.
In this case, the coefficients for status
only change
a little, but it’s not just a little change in all cases.
> fit3 <- lm(grades~status,data=ExamData)
> model.effects(fit3,"status")
Grad UnderGrad
11.31579 -11.31579
We like contrasts, let’s do a contrast.
> linear.contrast(fit1,status,c(-1,1))
estimates se t-value p-value lower-ci upper-ci
1 -23.13393 4.463302 -5.183142 7.999982e-06 -32.17744 -14.09042
attr(,"class")
[1] "linear.contrast"
The F test is the square of the t test. The F you get for comparing the levels of status is not any of the F tests we have seen until now.
The SS for
a contrast is the increase in error SS that we would obtain
if we removed the contrast df from the model
(meaning we forced the coefficients to fit as well
as possible subject to the contrast value being zero), leaving all the
other terms in the model. Note that since we
are using a contrast in a main effect, we are taking
the main effect of status
out of the model, but leaving
in all of the interactions that include status
. This is
OK if you are really interested in testing the corresponding
equally weighted hypotheses about
those parameters, but it does violate model hierarchy.
This SS is a Type III SS.
> linear.contrast(fit1,status,c(-1,1))[,"t-value"]^2
[1] 26.86496
car::Anova()
can compute the full Type III table.
All methods are in agreement: status
is very important
and no other term is needed.
> car::Anova(fit1,type=3)
Anova Table (Type III tests)
Response: grades
Sum Sq Df F value Pr(>F)
(Intercept) 165406 1 779.0286 <2e-16 ***
status 5704 1 26.8650 8e-06 ***
exam 5 1 0.0241 0.8776
section 105 1 0.4926 0.4871
status:exam 122 1 0.5748 0.4532
status:section 601 1 2.8304 0.1009
exam:section 184 1 0.8648 0.3584
status:exam:section 32 1 0.1522 0.6987
Residuals 7856 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Particularly in a large model, it can be tedious to do the backward testing starting with the highest order interaction and maintaining hierarchy.
We can automate this using step()
,
It finds the hierarchical model with
the minimum AIC that can be found by
eliminating terms. In general, AIC
is a bit more willing to keep variables than are F-tests at the usual significance
levels. In this case, the selected model has section*status
, not just status.
Another thing to note: the SS for section:status
in the
last step is 741 (adusted only for section
and status
);
that is substantially larger than the
601 or 603 we were getting in Types II and III.
Unbalanced data, remember?
> step(fit1,direction="backward")
Start: AIC=248.31
grades ~ status * exam * section
Df Sum of Sq RSS AIC
- status:exam:section 1 32.309 7888.3 246.49
<none> 7855.9 248.31
Step: AIC=246.49
grades ~ status + exam + section + status:exam + status:section +
exam:section
Df Sum of Sq RSS AIC
- status:exam 1 129.63 8017.9 245.22
- exam:section 1 164.50 8052.8 245.42
<none> 7888.3 246.49
- status:section 1 607.70 8496.0 247.83
Step: AIC=245.22
grades ~ status + exam + section + status:section + exam:section
Df Sum of Sq RSS AIC
- exam:section 1 158.52 8176.4 244.11
<none> 8017.9 245.22
- status:section 1 627.58 8645.5 246.62
Step: AIC=244.11
grades ~ status + exam + section + status:section
Df Sum of Sq RSS AIC
- exam 1 11.32 8187.7 242.17
<none> 8176.4 244.11
- status:section 1 744.68 8921.1 246.03
Step: AIC=242.17
grades ~ status + section + status:section
Df Sum of Sq RSS AIC
<none> 8187.7 242.17
- status:section 1 741.89 8929.6 244.07
Call:
lm(formula = grades ~ status + section + status:section, data = ExamData)
Coefficients:
(Intercept) status1 section1 status1:section1
62.623 11.562 -1.720 4.119
For these data, log(N) is about 3.8. If you wanted to select
a model
using BIC, you could do so by setting k
. BIC almost
gets rid of section:status
, but not quite.
> step(fit1,direction="backward",k=3.8)
Start: AIC=262.71
grades ~ status * exam * section
Df Sum of Sq RSS AIC
- status:exam:section 1 32.309 7888.3 259.09
<none> 7855.9 262.71
Step: AIC=259.09
grades ~ status + exam + section + status:exam + status:section +
exam:section
Df Sum of Sq RSS AIC
- status:exam 1 129.63 8017.9 256.02
- exam:section 1 164.50 8052.8 256.22
- status:section 1 607.70 8496.0 258.63
<none> 7888.3 259.09
Step: AIC=256.02
grades ~ status + exam + section + status:section + exam:section
Df Sum of Sq RSS AIC
- exam:section 1 158.52 8176.4 253.11
- status:section 1 627.58 8645.5 255.62
<none> 8017.9 256.02
Step: AIC=253.11
grades ~ status + exam + section + status:section
Df Sum of Sq RSS AIC
- exam 1 11.32 8187.7 249.37
<none> 8176.4 253.11
- status:section 1 744.68 8921.1 253.23
Step: AIC=249.37
grades ~ status + section + status:section
Df Sum of Sq RSS AIC
<none> 8187.7 249.37
- status:section 1 741.89 8929.6 249.47
Call:
lm(formula = grades ~ status + section + status:section, data = ExamData)
Coefficients:
(Intercept) status1 section1 status1:section1
62.623 11.562 -1.720 4.119
35.2 Highly unbalanced example
We are going to create a highly unbalanced 2x2 factorial; two of the treatments have only one replication, but the other two have 20 and 25. We will make up some responses that include main effects but no interactions. Then the fun really begins. (Note that we set the random number seed so that you can recreate this if you like.)
> muij<-rep(1:4,c(20,1,1,25))
> a<-factor(rep(c(1,2,1,2),c(20,1,1,25)))
> b<-factor(rep(c(1,1,2,2),c(20,1,1,25)))
> replications(~a:b,data.frame(a,b))
$`a:b`
b
a 1 2
1 20 1
2 1 25
> set.seed(1234)
> y<-muij+1.5*rnorm(47)
Start by fitting the two factor model. Test with Type II; nothing significant. Test with Type III; nothing significant.
It seems like nothing is going on.
> twowayfit <- lm(y~a*b)
> car::Anova(twowayfit,type=2)
Anova Table (Type II tests)
Response: y
Sum Sq Df F value Pr(>F)
a 2.777 1 1.4729 0.2315
b 3.074 1 1.6306 0.2085
a:b 0.263 1 0.1396 0.7105
Residuals 81.064 43
> car::Anova(twowayfit,type=3)
Anova Table (Type III tests)
Response: y
Sum Sq Df F value Pr(>F)
(Intercept) 32.082 1 17.0178 0.0001664 ***
a 2.785 1 1.4772 0.2308399
b 3.083 1 1.6351 0.2078515
a:b 0.263 1 0.1396 0.7104917
Residuals 81.064 43
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now fit the model simply treating the design as four groups. It is highly significant!
> allinone <- lm(y~a:b)
> anova(allinone)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
a:b 3 68.190 22.7301 12.057 7.37e-06 ***
Residuals 43 81.064 1.8852
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What is happening is that the two factors explain essentially
the same variation (row differences or column differences
really boil down to treatment 1,1 verus treatment 2,2 in terms
of sums of squares). Thus if you have a
, you don’t need b
.
If you have b
, you don’t need a
. But either one of them
can explain a lot of variability.
Moral: examine the overall fit of the model as well as Type II and Type III tests.
35.3 Proportional balance
There are some situations where sums of squares do not depend on the order of terms, even when the data do not have equal replications. One of these situations is called proportional balance.
In a two-way case, the counts in each row are constant multiples of each other, and the counts in each column are constant multiples of each other.
Here, the second column is three times the first column, and the third column is four times the first. Likewise, the second row is twice the first row, and the third row is three times the first row.
> nij <- c(1,2,3,3,6,9,4,8,12)
> a <- factor(rep(c(1,2,3,1,2,3,1,2,3),nij))
> b <- factor(rep(c(1,1,1,2,2,2,3,3,3),nij))
> replications(~a:b,data.frame(a,b))
$`a:b`
b
a 1 2 3
1 1 3 4
2 2 6 8
3 3 9 12
Add some data and observe that order does not matter. Well, Type III is different, but that’s another warning for non-hierarchical models.
> y <- rnorm(length(a))
> abfit <- lm(y~a*b)
> anova(abfit)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
a 2 0.141 0.0703 0.0807 0.92259
b 2 8.138 4.0689 4.6712 0.01518 *
a:b 4 5.848 1.4619 1.6783 0.17456
Residuals 39 33.971 0.8711
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> bafit <- lm(y~b*a)
> anova(bafit)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
b 2 8.138 4.0689 4.6712 0.01518 *
a 2 0.141 0.0703 0.0807 0.92259
b:a 4 5.848 1.4619 1.6783 0.17456
Residuals 39 33.971 0.8711
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> car::Anova(abfit,type=3)
Anova Table (Type III tests)
Response: y
Sum Sq Df F value Pr(>F)
(Intercept) 1.202 1 1.3794 0.24732
a 0.233 2 0.1337 0.87522
b 6.001 2 3.4446 0.04192 *
a:b 5.848 4 1.6783 0.17456
Residuals 33.971 39
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1