Chapter 44 Split Plots
Set up the libraries.
> library(cfcdae)
> library(lme4)
> library(car)
44.1 Gums emulsion data
Emulsion data.
Eight samples of unprocessed gum from acacia senegal trees are obtained.
Four samples come from one batch of gum, and the other four come from
a second batch of gum. Within each batch, the four samples are
randomly assigned to the factor/level combinations of two factors.
The first factor is whether or not the gum is demineralized; the second
factor is whether or not the gum is pasteurized. An emulsion
is made using each sample of gum. Each emulsion is then
split into three smaller parts, with the parts randomly assigned
to be pH adjusted to 2.5, 4.5, or 5.5 using citric acid. The measured
response is the time (in hours) till the emulsion fails, called
the breakage time.
> gums <- read.table("gum.dat.txt",header=TRUE)
> gums <- within(gums,{past<-factor(past);demin<-factor(demin);
+ ph<-factor(ph);block<-factor(block)})
> gums
y past demin ph block
1 198.5 2 2 1 1
2 299.0 2 2 2 1
3 223.1 2 2 3 1
4 166.6 1 1 1 1
5 196.5 1 1 2 1
6 178.9 1 1 3 1
7 160.7 2 1 1 1
8 151.1 2 1 2 1
9 146.5 2 1 3 1
10 146.3 1 2 1 1
11 169.3 1 2 2 1
12 198.1 1 2 3 1
13 236.3 2 2 1 2
14 330.7 2 2 2 2
15 281.2 2 2 3 2
16 151.8 1 1 1 2
17 156.0 1 1 2 2
18 159.7 1 1 3 2
19 171.8 2 1 1 2
20 159.3 2 1 2 2
21 155.9 2 1 3 2
22 175.2 1 2 1 2
23 182.2 1 2 2 2
24 136.2 1 2 3 2
Whole plots are enumerated by the block by pasteurization by demineralization combinations. Alternatively, we can create a separate whole plots variable.
> wplots <- with(gums,conf.design::join(block,demin,past))
> wplots
[1] 1:2:2 1:2:2 1:2:2 1:1:1 1:1:1 1:1:1 1:1:2 1:1:2 1:1:2 1:2:1 1:2:1 1:2:1 2:2:2 2:2:2 2:2:2 2:1:1
[17] 2:1:1 2:1:1 2:1:2 2:1:2 2:1:2 2:2:1 2:2:1 2:2:1
Levels: 1:1:1 1:1:2 1:2:1 1:2:2 2:1:1 2:1:2 2:2:1 2:2:2
I am going to fit block as random (assuming batches of gum are random). Some would fit block as fixed. In balanced data it’s not going to make any difference. The two different versions of the model are equivalent.
> gums.lmer <- lmer(y~ph*past*demin+(1|block)+(1|wplots),data=gums)
boundary (singular) fit: see help('isSingular')
> gums.lmer.alt <- lmer(y~ph*past*demin+(1|block)+(1|block:demin:past),data=gums)
boundary (singular) fit: see help('isSingular')
Residuals do not look good. There appears to be decreasing variance and long tails.
> plot(gums.lmer)

Figure 44.1: Residual plot for gum data
> qqnorm(resid(gums.lmer,type="pearson"))

Figure 44.2: NPP of residuals for gum data
The ratio of largest to smallest for the response in these data is less than 2.5; it is going to take a pretty extreme transformation to make any difference.
After some trial and error, the fourth power seems to make residuals plots look better (the division by 100 just retains a more comprehensible magnitude). However, I am unconvinced by a fourth power transformation.
> gums4.lmer <- lmer((y/100)^4~ph*past*demin+(1|block)+(1|wplots),data=gums)
> plot(gums4.lmer,which=1)

Figure 44.3: Residual plot for fourth power of gum data
Normality is better but not great.
> qqnorm(resid(gums4.lmer,type="pearson"))

Figure 44.4: NPP of residuals for fourth power of gum data
For the moment, we will analyze these transformed data. In fact, we can do better, and we will do that later. But for now, we’re going to use the standard split plot model (albeit on an unusual transformation of the data).
Everything involving pH is very significant. Things not involving pH are borderline to not significant. However, they need to be retained for hierarchy.
> Anova(gums4.lmer,test="F")
Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
Response: (y/100)^4
F Df Df.res Pr(>F)
ph 25.5774 2 8 0.0003345 ***
past 8.0653 1 3 0.0656495 .
demin 9.9068 1 3 0.0513640 .
ph:past 18.8775 2 8 0.0009346 ***
ph:demin 22.7051 2 8 0.0005033 ***
past:demin 9.7097 1 3 0.0526335 .
ph:past:demin 24.5609 2 8 0.0003847 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Look at the interaction plot. Nothing is happening except when Pasteurization and demineralization are both high, and then pH also has an effect. This is really the take away for the entire experiment; the rest of what we do is making refinements and adding statistical bounds to what we just said.
> with(gums,interactplot(ph,demin:past,(y/100)^4))

Figure 44.5: Interaction plot for gum data
What can we see from the summary?
- Block variability is tiny.
- Whole plot variance is nearly four times the size of residual variance.
- SE for whole plot effects is much larger than SE for split plot effects (those that involve pH).
> summary(gums4.lmer)
Linear mixed model fit by REML ['lmerMod']
Formula: (y/100)^4 ~ ph * past * demin + (1 | block) + (1 | wplots)
Data: gums
REML criterion at convergence: 119.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.0678 -0.3572 0.0000 0.3572 1.0678
Random effects:
Groups Name Variance Std.Dev.
wplots (Intercept) 1.126e+02 10.612087
block (Intercept) 1.763e-06 0.001328
Residual 3.432e+01 5.858720
Number of obs: 24, groups: wplots, 8; block, 2
Fixed effects:
Estimate Std. Error t value
(Intercept) 19.740 3.938 5.013
ph1 -8.602 1.691 -5.086
ph2 11.666 1.691 6.898
past1 -11.183 3.938 -2.840
demin1 -12.395 3.938 -3.148
ph1:past1 6.800 1.691 4.021
ph2:past1 -10.206 1.691 -6.034
ph1:demin1 8.356 1.691 4.940
ph2:demin1 -10.890 1.691 -6.439
past1:demin1 12.271 3.938 3.116
ph1:past1:demin1 -8.479 1.691 -5.013
ph2:past1:demin1 11.413 1.691 6.748
Correlation of Fixed Effects:
(Intr) ph1 ph2 past1 demin1 ph1:p1 ph2:p1 ph1:d1 ph2:d1 pst1:1 p1:1:1
ph1 0.000
ph2 0.000 -0.500
past1 0.000 0.000 0.000
demin1 0.000 0.000 0.000 0.000
ph1:past1 0.000 0.000 0.000 0.000 0.000
ph2:past1 0.000 0.000 0.000 0.000 0.000 -0.500
ph1:demin1 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ph2:demin1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.500
past1:demn1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ph1:pst1:d1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ph2:pst1:d1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -0.500
If you wanted to look at treatment means, you could fit a model without 1. The standard errors of the means are all the same and quite large (small sample sizes in each mean, no variability cancelling through taking contrasts).
> gums4.lmer.means <- lmer((y/100)^4~ph:past:demin-1+(1|block)+(1|wplots),data=gums)
> summary(gums4.lmer.means)
Linear mixed model fit by REML ['lmerMod']
Formula: (y/100)^4 ~ ph:past:demin - 1 + (1 | block) + (1 | wplots)
Data: gums
REML criterion at convergence: 94.3
Scaled residuals:
Min 1Q Median 3Q Max
-1.0678 -0.3572 0.0000 0.3572 1.0678
Random effects:
Groups Name Variance Std.Dev.
wplots (Intercept) 1.126e+02 10.612087
block (Intercept) 1.763e-06 0.001328
Residual 3.432e+01 5.858720
Number of obs: 24, groups: wplots, 8; block, 2
Fixed effects:
Estimate Std. Error t value
ph1:past1:demin1 6.507 8.571 0.759
ph2:past1:demin1 10.416 8.571 1.215
ph3:past1:demin1 8.374 8.571 0.977
ph1:past2:demin1 7.690 8.571 0.897
ph2:past2:demin1 5.826 8.571 0.680
ph3:past2:demin1 5.257 8.571 0.613
ph1:past1:demin2 7.002 8.571 0.817
ph2:past1:demin2 9.618 8.571 1.122
ph3:past1:demin2 9.421 8.571 1.099
ph1:past2:demin2 23.352 8.571 2.724
ph2:past2:demin2 99.763 8.571 11.639
ph3:past2:demin2 43.650 8.571 5.092
Correlation of Fixed Effects:
p1:1:1 p2:1:1 p3:1:1 p1:2:1 p2:2:1 p3:2:1 p1:1:2 p2:1:2 p3:1:2 p1:2:2 p2:2:2
ph2:pst1:d1 0.766
ph3:pst1:d1 0.766 0.766
ph1:pst2:d1 0.000 0.000 0.000
ph2:pst2:d1 0.000 0.000 0.000 0.766
ph3:pst2:d1 0.000 0.000 0.000 0.766 0.766
ph1:pst1:d2 0.000 0.000 0.000 0.000 0.000 0.000
ph2:pst1:d2 0.000 0.000 0.000 0.000 0.000 0.000 0.766
ph3:pst1:d2 0.000 0.000 0.000 0.000 0.000 0.000 0.766 0.766
ph1:pst2:d2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ph2:pst2:d2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.766
ph3:pst2:d2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.766 0.766
We can also use emmeans()
to get similar information on
treatment means and marginal means. Note that the variability
of the means by level of pH is considerably greater
than the variability of the contrasts between
levels in pH. This does not happen in fixed effects.
We see this because whole plot variability shows up
in the pH means, but it cancels out of pH contrasts.
> library(emmeans)
> emmeans(gums4.lmer,~ph:past:demin)
Warning in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), : There are unevaluated constants in the response formula
Auto-detection of the response transformation may be incorrect
ph past demin emmean SE df lower.CL upper.CL
1 1 1 6.51 8.57 5.52 -14.92 27.9
2 1 1 10.42 8.57 5.52 -11.01 31.8
3 1 1 8.37 8.57 5.52 -13.05 29.8
1 2 1 7.69 8.57 5.52 -13.74 29.1
2 2 1 5.83 8.57 5.52 -15.60 27.3
3 2 1 5.26 8.57 5.52 -16.17 26.7
1 1 2 7.00 8.57 5.52 -14.42 28.4
2 1 2 9.62 8.57 5.52 -11.81 31.0
3 1 2 9.42 8.57 5.52 -12.01 30.8
1 2 2 23.35 8.57 5.52 1.93 44.8
2 2 2 99.76 8.57 5.52 78.34 121.2
3 2 2 43.65 8.57 5.52 22.22 65.1
Degrees-of-freedom method: kenward-roger
Results are given on the ( (not the response) scale.
Confidence level used: 0.95
> emmeans(gums4.lmer,~ph)
Warning in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), : There are unevaluated constants in the response formula
Auto-detection of the response transformation may be incorrect
NOTE: Results may be misleading due to involvement in interactions
ph emmean SE df lower.CL upper.CL
1 11.1 4.29 1.4 -17.49 39.8
2 31.4 4.29 1.4 2.78 60.0
3 16.7 4.29 1.4 -11.95 45.3
Results are averaged over the levels of: past, demin
Degrees-of-freedom method: kenward-roger
Results are given on the ( (not the response) scale.
Confidence level used: 0.95
> contrast(emmeans(gums4.lmer,~ph),method="pairwise")
Warning in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), : There are unevaluated constants in the response formula
Auto-detection of the response transformation may be incorrect
NOTE: Results may be misleading due to involvement in interactions
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
ph1 - ph2 -20.27 2.93 8 -6.919 0.0003
ph1 - ph3 -5.54 2.93 8 -1.890 0.2031
ph2 - ph3 14.73 2.93 8 5.029 0.0026
Results are averaged over the levels of: past, demin
Note: contrasts are still on the ( scale
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
Let’s get back to the idea that nothing happens unless pasteurization and demineralization are high. We’ll do this by first fitting the model to the other data and testing the effects.
Absolutely nothing is significant if we stay away from pasteurization and demineralization both high.
> gums4.lmer.subset <- lmer((y/100)^4~(1|block)+ph*(demin+past)+(1|block:past:demin),data=gums,
+ subset=!(past==2 & demin==2))
boundary (singular) fit: see help('isSingular')
> Anova(gums4.lmer.subset,test="F")
Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
Response: (y/100)^4
F Df Df.res Pr(>F)
ph 0.2342 2 6 0.7981
demin 0.0118 1 2 0.9235
past 0.9049 1 2 0.4419
ph:demin 0.0572 2 6 0.9449
ph:past 0.5740 2 6 0.5914
We also had the idea of it’s really only four means:
three levels of pH when pasteurization and demineralization
are both high, and one mean for everything else.
Let’s set that up. When this is done
phigh.dhigh.ph
is a factor with four levels, one
for when Pasteurization and demineralization are not
both high, and one for each of the pH levels when
the other two are both high.
> gums <- within(gums,phigh.dhigh <- (past==2 & demin==2)*1)
> gums$phigh.dhigh
[1] 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0
> gums <- within(gums,phigh.dhigh.ph <- phigh.dhigh*as.numeric(ph))
> gums <- within(gums, phigh.dhigh.ph <- factor(phigh.dhigh.ph))
> gums <- within(gums, phigh.dhigh <- factor(phigh.dhigh))
> gums$phigh.dhigh.ph
[1] 1 2 3 0 0 0 0 0 0 0 0 0 1 2 3 0 0 0 0 0 0 0 0 0
Levels: 0 1 2 3
Here we fit the model using just phigh.dhigh.ph
. Comparing
it to the full model, we see that there is no evidence
we need the larger model.
> gums4.lmer.pdph <- lmer((y/100)^4~(1|block)+phigh.dhigh.ph+(1|block:past:demin),data=gums)
> library(pbkrtest)
> KRmodcomp(gums4.lmer,gums4.lmer.pdph)
large : (y/100)^4 ~ ph * past * demin + (1 | block) + (1 | wplots)
small : (y/100)^4 ~ (1 | block) + phigh.dhigh.ph + (1 | block:past:demin)
stat ndf ddf F.scaling p.value
Ftest 0.1094 8.0000 6.5096 0.93226 0.997
Means and contrasts (in two ways). Comparisons between whole plots are less accurate than comparisons within whole plots (the pH comparisons).
> emmeans(gums4.lmer.pdph,~phigh.dhigh.ph)
Warning in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), : There are unevaluated constants in the response formula
Auto-detection of the response transformation may be incorrect
phigh.dhigh.ph emmean SE df lower.CL upper.CL
0 7.79 3.76 1.73 -11.08 26.7
1 23.35 7.02 7.74 7.07 39.6
2 99.76 7.02 7.74 83.48 116.0
3 43.65 7.02 7.74 27.37 59.9
Degrees-of-freedom method: kenward-roger
Results are given on the ( (not the response) scale.
Confidence level used: 0.95
> pairs(emmeans(gums4.lmer.pdph,~phigh.dhigh.ph))
Warning in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), : There are unevaluated constants in the response formula
Auto-detection of the response transformation may be incorrect
Note: Use 'contrast(regrid(object), ...)' to obtain contrasts of back-transformed estimates
contrast estimate SE df t.ratio p.value
phigh.dhigh.ph0 - phigh.dhigh.ph1 -15.6 7.94 6.35 -1.960 0.2941
phigh.dhigh.ph0 - phigh.dhigh.ph2 -92.0 7.94 6.35 -11.584 0.0001
phigh.dhigh.ph0 - phigh.dhigh.ph3 -35.9 7.94 6.35 -4.517 0.0140
phigh.dhigh.ph1 - phigh.dhigh.ph2 -76.4 4.67 14.00 -16.375 <.0001
phigh.dhigh.ph1 - phigh.dhigh.ph3 -20.3 4.67 14.00 -4.350 0.0033
phigh.dhigh.ph2 - phigh.dhigh.ph3 56.1 4.67 14.00 12.025 <.0001
Note: contrasts are still on the ( scale
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 4 estimates
> pairwise(gums4.lmer.pdph,phigh.dhigh.ph)
Pairwise comparisons ( hsd ) of phigh.dhigh.ph
estimate signif diff lower upper
0 - 1 -15.56193 23.65880 -39.22073 8.096864
* 0 - 2 -91.97347 23.65880 -115.63227 -68.314676
* 0 - 3 -35.86011 23.65880 -59.51891 -12.201317
* 1 - 2 -76.41154 13.90515 -90.31669 -62.506394
* 1 - 3 -20.29818 13.90515 -34.20333 -6.393034
* 2 - 3 56.11336 13.90515 42.20821 70.018506
44.2 Very nonstandard analysis
OK, now we’ll go back to an earlier point when we were thinking about nonconstant variance. Look at this boxplot of residuals by whole plot treatment. Recall that the 1.1, 2.1, and 1.2 treatments all have the same mean, and the 2.2 treatment has a higher mean. The variability is not constant, but it depends on the treatment combinations rather than the mean per se. It looks like demineralizing really bumps up the variance.
> boxplot(residuals(gums.lmer)~gums$demin+gums$past)

Figure 44.6: Boxplots of gums data residuals by whole plot treatments
We have nonconstant variance, but it isn’t really the kind that a power transformation can fix. No wonder we needed such a bizarre transformation and it didn’t fix all that much.
This is not a terribly unusual situation, but it is still a nuisance (or, as they say, it’s a bad situation, but a situation none the less).
What I am going to demonstrate here is using lme()
(from package nlme
) along with
a way to tell lme()
that the residual variances should be different in different groups. The varIdent(form=~1|past*demin)
says that errors are independent, but they have different
variances in the different combinations of past
and demin
.
You can see the different variance estimates in the summary, and you will find that the variances of the means, etc. differ from what we saw before. We also see that the residual plot and NPP look better.
> library(nlme)
> gums.lme.means <- lme(y~past:ph:demin-1,random=~1|block/wplots,data=gums)
> gums.lme.means.vars <- lme(y~past:ph:demin-1,random=~1|block/wplots,data=gums,
+ weights=varIdent(form=~1|past*demin))
> plot(gums.lme.means.vars)

Figure 44.7: Residual plot for gum data with modeled nonconstant variance
> qqnorm(residuals(gums.lme.means.vars,type="pearson"))

Figure 44.8: NPP for residuals of gum data with modeled nonconstant variance
> summary(gums.lme.means.vars)
Linear mixed-effects model fit by REML
Data: gums
AIC BIC logLik
138.8587 147.587 -51.42933
Random effects:
Formula: ~1 | block
(Intercept)
StdDev: 0.001148952
Formula: ~1 | wplots %in% block
(Intercept) Residual
StdDev: 17.73408 9.695371
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | past * demin
Parameter estimates:
1*1 2*1 1*2 2*2
1.0000000 0.1062495 3.1726664 1.0502877
Fixed effects: y ~ past:ph:demin - 1
Value Std.Error DF t-value p-value
past1:ph1:demin1 159.20 14.29157 5 11.139435 0.0001
past2:ph1:demin1 166.25 12.56103 5 13.235383 0.0000
past1:ph2:demin1 176.25 14.29157 5 12.332446 0.0001
past2:ph2:demin1 155.20 12.56103 5 12.355678 0.0001
past1:ph3:demin1 169.30 14.29157 5 11.846146 0.0001
past2:ph3:demin1 151.20 12.56103 5 12.037232 0.0001
past1:ph1:demin2 160.75 25.10664 5 6.402690 0.0014
past2:ph1:demin2 217.40 14.46011 5 15.034462 0.0000
past1:ph2:demin2 175.75 25.10664 5 7.000142 0.0009
past2:ph2:demin2 314.85 14.46011 5 21.773690 0.0000
past1:ph3:demin2 167.15 25.10664 5 6.657603 0.0012
past2:ph3:demin2 252.15 14.46011 5 17.437624 0.0000
Correlation:
p1:1:1 p2:1:1 p1:2:1 p2:2:1 p1:3:1 p2:3:1 p1:1:2 p2:1:2 p1:2:2 p2:2:2 p1:3:2
past2:ph1:demin1 0.000
past1:ph2:demin1 0.770 0.000
past2:ph2:demin1 0.000 0.997 0.000
past1:ph3:demin1 0.770 0.000 0.770 0.000
past2:ph3:demin1 0.000 0.997 0.000 0.997 0.000
past1:ph1:demin2 0.000 0.000 0.000 0.000 0.000 0.000
past2:ph1:demin2 0.000 0.000 0.000 0.000 0.000 0.000 0.000
past1:ph2:demin2 0.000 0.000 0.000 0.000 0.000 0.000 0.249 0.000
past2:ph2:demin2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.752 0.000
past1:ph3:demin2 0.000 0.000 0.000 0.000 0.000 0.000 0.249 0.000 0.249 0.000
past2:ph3:demin2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.752 0.000 0.752 0.000
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-9.711503e-01 -4.320812e-01 1.256113e-14 4.320812e-01 9.711503e-01
Number of Observations: 24
Number of Groups:
block wplots %in% block
2 8
And we can use phigh.dhigh.ph
.
> gums.lme.means.vars.pdph <- lme(y~phigh.dhigh.ph-1,random=~1|block/wplots,data=gums,
+ weights=varIdent(form=~1|past*demin))
> emmeans(gums.lme.means.vars.pdph,~phigh.dhigh.ph)
phigh.dhigh.ph emmean SE df lower.CL upper.CL
0 164 6.88 6 147 181
1 217 12.98 14 190 245
2 315 12.98 14 287 343
3 252 12.98 14 224 280
Degrees-of-freedom method: containment
Confidence level used: 0.95
> pairs(emmeans(gums.lme.means.vars.pdph,~phigh.dhigh.ph))
contrast estimate SE df t.ratio p.value
phigh.dhigh.ph0 - phigh.dhigh.ph1 -53.5 14.7 6 -3.641 0.0407
phigh.dhigh.ph0 - phigh.dhigh.ph2 -150.9 14.7 6 -10.275 0.0002
phigh.dhigh.ph0 - phigh.dhigh.ph3 -88.2 14.7 6 -6.007 0.0039
phigh.dhigh.ph1 - phigh.dhigh.ph2 -97.5 10.9 14 -8.968 <.0001
phigh.dhigh.ph1 - phigh.dhigh.ph3 -34.8 10.9 14 -3.198 0.0291
phigh.dhigh.ph2 - phigh.dhigh.ph3 62.7 10.9 14 5.770 0.0003
Degrees-of-freedom method: containment
P value adjustment: tukey method for comparing a family of 4 estimates