Chapter 46 Repeated Measures

46.1 Emulsions

More emulsion data. Three emulsifiers: milk protein concentrate (MPC), modified milk protein concentrate (mMPC), and sodium caseinate (NC). An emulsion is made from each of the emulsifiers and the turbidity is measured at 0, 1, 2, 3, 4, 5, 6, 12, 24, 36, and 48 hours. A good emulsifier should keep the turbidity high. The experiment is then repeated two more times.

This is a repeated measures design, because we cannot randomize time (although we do randomize emulsifiers to emulsions). Note

  • We also create a quantitative version of hours as well as the factor version.
  • The “subject” in this repeated measures design (analogous to a whole plot) is formed by the block by protein combinations.
> library(cfcdae)
> library(lme4)
> library(nlme)
> library(car)
> library(conf.design)
> library(pbkrtest)
> emulsions <- read.table("emul2.dat.txt",header=TRUE)
> emulsions <- within(emulsions,hrs <- c(0,1,2,3,4,5,6,12,24,36,48)[time])
> emulsions <- within(emulsions,{protein<-factor(protein);
+    time<-factor(time);block<-factor(block)})
> emulsions <- within(emulsions,subject <- join(protein,block))
> emulsions
       y protein time block hrs subject
1  1.035       1    1     1   0     1:1
2  1.051       1    1     2   0     1:2
3  1.054       1    1     3   0     1:3
4  1.047       2    1     1   0     2:1
5  1.067       2    1     2   0     2:2
6  1.087       2    1     3   0     2:3
7  1.036       3    1     1   0     3:1
8  1.078       3    1     2   0     3:2
9  1.083       3    1     3   0     3:3
10 1.032       1    2     1   1     1:1
11 1.044       1    2     2   1     1:2
12 1.016       1    2     3   1     1:3
13 1.091       2    2     1   1     2:1
14 1.067       2    2     2   1     2:2
15 1.058       2    2     3   1     2:3
16 1.087       3    2     1   1     3:1
17 1.071       3    2     2   1     3:2
18 1.079       3    2     3   1     3:3
19 0.978       1    3     1   2     1:1
20 1.016       1    3     2   2     1:2
21 0.978       1    3     3   2     1:3
22 1.085       2    3     1   2     2:1
23 1.072       2    3     2   2     2:2
24 1.060       2    3     3   2     2:3
25 1.080       3    3     1   2     3:1
26 1.066       3    3     2   2     3:2
27 1.065       3    3     3   2     3:3
28 0.927       1    4     1   3     1:1
29 0.998       1    4     2   3     1:2
30 0.974       1    4     3   3     1:3
31 1.074       2    4     1   3     2:1
32 1.071       2    4     2   3     2:2
33 1.060       2    4     3   3     2:3
34 1.028       3    4     1   3     3:1
35 1.066       3    4     2   3     3:2
36 1.062       3    4     3   3     3:3
37 0.925       1    5     1   4     1:1
38 0.960       1    5     2   4     1:2
39 0.938       1    5     3   4     1:3
40 1.067       2    5     1   4     2:1
41 1.058       2    5     2   4     2:2
42 1.048       2    5     3   4     2:3
43 1.075       3    5     1   4     3:1
44 1.062       3    5     2   4     3:2
45 1.062       3    5     3   4     3:3
46 0.886       1    6     1   5     1:1
47 0.923       1    6     2   5     1:2
48 0.904       1    6     3   5     1:3
49 1.067       2    6     1   5     2:1
50 1.048       2    6     2   5     2:2
51 1.049       2    6     3   5     2:3
52 1.072       3    6     1   5     3:1
53 1.072       3    6     2   5     3:2
54 1.063       3    6     3   5     3:3
55 0.868       1    7     1   6     1:1
56 0.899       1    7     2   6     1:2
57 0.890       1    7     3   6     1:3
58 1.067       2    7     1   6     2:1
59 1.046       2    7     2   6     2:2
60 1.043       2    7     3   6     2:3
61 1.076       3    7     1   6     3:1
62 1.054       3    7     2   6     3:2
63 1.066       3    7     3   6     3:3
64 0.794       1    8     1  12     1:1
65 0.768       1    8     2  12     1:2
66 0.792       1    8     3  12     1:3
67 1.064       2    8     1  12     2:1
68 1.038       2    8     2  12     2:2
69 1.038       2    8     3  12     2:3
70 1.062       3    8     1  12     3:1
71 1.050       3    8     2  12     3:2
72 1.040       3    8     3  12     3:3
73 0.750       1    9     1  24     1:1
74 0.726       1    9     2  24     1:2
75 0.768       1    9     3  24     1:3
76 1.079       2    9     1  24     2:1
77 1.007       2    9     2  24     2:2
78 0.996       2    9     3  24     2:3
79 1.057       3    9     1  24     3:1
80 1.026       3    9     2  24     3:2
81 0.989       3    9     3  24     3:3
82 0.740       1   10     1  36     1:1
83 0.731       1   10     2  36     1:2
84 0.768       1   10     3  36     1:3
85 1.036       2   10     1  36     2:1
86 0.935       2   10     2  36     2:2
87 0.917       2   10     3  36     2:3
88 1.023       3   10     1  36     3:1
89 0.938       3   10     2  36     3:2
90 0.956       3   10     3  36     3:3
91 0.720       1   11     1  48     1:1
92 0.698       1   11     2  48     1:2
93 0.738       1   11     3  48     1:3
94 0.900       2   11     1  48     2:1
95 0.924       2   11     2  48     2:2
96 0.891       2   11     3  48     2:3
97 0.925       3   11     1  48     3:1
98 0.924       3   11     2  48     3:2
99 0.934       3   11     3  48     3:3

46.2 Univariate (classical) analysis

The basic univariate analysis looks just like a split plot. It works fine if nature is being kind. We could fit this model using either lme() or lmer(), but we’ll fit both because we’ll need both later.

> emulsions.lme <- lme(y~protein*time,random=~1|block/subject,data=emulsions)
> emulsions.lmer <- lmer(y~protein*time+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')

Constant variance doesn’t look too bad:

> plot(emulsions.lme)
Residual plot of emulsions data

Figure 46.1: Residual plot of emulsions data

Normality is a bit more suspect:

> qqnorm(residuals(emulsions.lme))
NPP of residuals in emulsions data

Figure 46.2: NPP of residuals in emulsions data

Both main effects and the interaction are highly significant. (Note that we can often use plain old anova() after lme().)

> anova(emulsions.lme)
             numDF denDF  F-value p-value
(Intercept)      1    60 74591.40  <.0001
protein          2     4   191.59   1e-04
time            10    60    99.09  <.0001
protein:time    20    60    16.97  <.0001

Let’s take a look at that interaction. Protein 1 is lousy, and there is little difference between 2 and 3, which fail much more slowly than 1. But fail they all surely do.

> with(emulsions,interactplot(time,protein,y))
Interaction plot for emulsions data

Figure 46.3: Interaction plot for emulsions data

time is a factor, but the actual quantities the levels of time represents are not equally spaced. Thus the first interaction plot does not reproduce the functional effect of time. To get that, we need the horizontal locations to reflect the actual values of time. Those values are found in the hrs variable, and we feed them into interactplot() via the at= argument.

In this new interaction plot, we see that turbidity is dropping more or less linearly in time for proteins 2 and 3. For protein 1, it drops very quickly, and then very slowly after 8 hours. (This is kind of the opposite of how Hemmingway said he went bankrupt, “Gradually, and then suddenly.”)

> with(emulsions,interactplot(time,protein,y,at=sort(unique(hrs))))
Interaction plot for emulsions data with quantitative levels of time

Figure 46.4: Interaction plot for emulsions data with quantitative levels of time

46.2.1 Functional modeling

Time is quantitative, and we might wish to create a functional representation of how the turbidity changes. Typically, we use polynomials. However, the behavior we see for protein 1 (dropping quickly, then changing very slowly) is not a shape that polynomials can fit very well, at least not without using many powers.

Let’s instead fit linear splines with knot at 8 hours. This means our function will be linear up to 8, and linear after 8, but the slope could be different before and after 8. We represent this with two variables. The first is the standard “linear in time” represented by hrs. The second subtracts 8 from hrs and then sets negative values to 0. This will be 0 before 8, and linearly increasing after 8.

> hrs8 <- emulsions$hrs-8
> hrs8[hrs8 < 0] <- 0

Now we just use hrs and hrs8 together in the model. Again, for illustrative purposes, we fit with both lme() and lmer().

> emulsions.lme.hrs <- lme(y~protein*(hrs+hrs8),random=~1|block/subject,data=emulsions)
> emulsions.lmer.hrs <- lmer(y~protein*(hrs+hrs8)+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')

If we try to compare emulsions.lme and emulsions.lme.hrs using anova(), it produces output based on AIC and the REML likelihood. We know these don’t work, because the two models differ in their fixed effects; anova() has the courtesy to warn us.

> anova(emulsions.lme.hrs,emulsions.lme)
Warning in anova.lme(emulsions.lme.hrs, emulsions.lme): fitted objects with different fixed effects.
REML comparisons are not meaningful.
                  Model df       AIC       BIC   logLik   Test  L.Ratio p-value
emulsions.lme.hrs     1 12 -356.6913 -326.6936 190.3456                        
emulsions.lme         2 36 -169.3336  -90.5060 120.6668 1 vs 2 139.3577  <.0001

If we use lmer() model fits we can use KRmodcomp() to do the comparison. We see that the linear spline model fits well, and the additional 24 fixed effects degrees of freedom in the full model are not needed.

> KRmodcomp(emulsions.lmer,emulsions.lmer.hrs)
large : y ~ protein * time + (1 | block) + (1 | subject)
small : y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
         stat     ndf     ddf F.scaling p.value
Ftest  0.7796 24.0000 60.0000         1  0.7459

Look at the summary, and we see something interesting: the protein effects are not significant. (Recall that protein was highly significant in the non-functional model.) In this model, protein effects represent different intercepts (values at time 0) for the different proteins. Evidently, all the emulsions begin with the same time 0 turbidity, so we don’t need separate intercepts.

> summary(emulsions.lmer.hrs)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
   Data: emulsions

REML criterion at convergence: -380.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.23531 -0.52813  0.00651  0.40146  3.15729 

Random effects:
 Groups   Name        Variance  Std.Dev. 
 subject  (Intercept) 8.006e-05 8.948e-03
 block    (Intercept) 3.501e-12 1.871e-06
 Residual             4.135e-04 2.034e-02
Number of obs: 99, groups:  subject, 9; block, 3

Fixed effects:
                Estimate Std. Error t value
(Intercept)    1.0647594  0.0051582 206.422
protein1      -0.0060711  0.0072947  -0.832
protein2       0.0030366  0.0072947   0.416
hrs           -0.0108661  0.0009448 -11.501
hrs8           0.0077663  0.0010912   7.117
protein1:hrs  -0.0208494  0.0013362 -15.604
protein2:hrs   0.0101060  0.0013362   7.564
protein1:hrs8  0.0216577  0.0015431  14.035
protein2:hrs8 -0.0106403  0.0015431  -6.895

Correlation of Fixed Effects:
            (Intr) protn1 protn2 hrs    hrs8   prtn1: prtn2: prt1:8
protein1     0.000                                                 
protein2     0.000 -0.500                                          
hrs         -0.672  0.000  0.000                                   
hrs8         0.632  0.000  0.000 -0.990                            
proten1:hrs  0.000 -0.672  0.336  0.000  0.000                     
proten2:hrs  0.000  0.336 -0.672  0.000  0.000 -0.500              
protn1:hrs8  0.000  0.632 -0.316  0.000  0.000 -0.990  0.495       
protn2:hrs8  0.000 -0.316  0.632  0.000  0.000  0.495 -0.990 -0.500
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Just to push this to its conclusion, let’s fit two more models. The first uses hrs and hrs8 separately for each protein but removes the individual intercepts for each protein.

Now the final model. This includes a common overall intercept, hrs and hrs8 terms for protein 1, and a common coefficient of hrs for proteins 2 and 3. This is the model that encapsulates what the interaction plot looks like. Is it good enough?

> emulsions.lmer.hrs.int <- lmer(y~protein:(hrs+hrs8)+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')
> prot1 <- with(emulsions,(protein==1)*1) # indicates protein 1
> prot23 <- 1-prot1 # indicates proteins 2 and 3
> emulsions.lmer.hrs.min <- lmer(y~prot1:hrs+prot1:hrs8+prot23:hrs+(1|block)+(1|subject),data=emulsions)
boundary (singular) fit: see help('isSingular')

Compare the model with the common intercept to the full model and the model with separate intercepts. No evidence we need a more complicated model.

> KRmodcomp(emulsions.lmer,emulsions.lmer.hrs.int)
large : y ~ protein * time + (1 | block) + (1 | subject)
small : y ~ protein:(hrs + hrs8) + (1 | block) + (1 | subject)
         stat     ndf     ddf F.scaling p.value
Ftest  0.7428 26.0000 54.3461   0.99651  0.7938
> KRmodcomp(emulsions.lmer.hrs,emulsions.lmer.hrs.int)
large : y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
small : y ~ protein:(hrs + hrs8) + (1 | block) + (1 | subject)
         stat     ndf     ddf F.scaling p.value
Ftest  0.3463  2.0000 15.7643         1  0.7125

Now look at the model with common intercept, a single slope for proteins 2 and 3, and a linear spline for protein 1. There is little evidence to suggest that anything more complex than this minimal model is needed.

> KRmodcomp(emulsions.lmer,emulsions.lmer.hrs.min)
large : y ~ protein * time + (1 | block) + (1 | subject)
small : y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 | 
    subject)
         stat     ndf     ddf F.scaling p.value
Ftest  0.9526 29.0000 48.2109   0.98721  0.5466
> KRmodcomp(emulsions.lmer.hrs,emulsions.lmer.hrs.min)
large : y ~ protein * (hrs + hrs8) + (1 | block) + (1 | subject)
small : y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 | 
    subject)
         stat     ndf     ddf F.scaling p.value
Ftest  1.8361  5.0000 21.2762   0.94116  0.1487
> KRmodcomp(emulsions.lmer.hrs.int,emulsions.lmer.hrs.min)
large : y ~ protein:(hrs + hrs8) + (1 | block) + (1 | subject)
small : y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 | 
    subject)
         stat     ndf     ddf F.scaling p.value  
Ftest  2.9197  3.0000 45.4460   0.98442 0.04407 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now look at the slopes. For proteins 2 and 3, the slope is -.003 with an SE of .00016. For protein 1, the initial slope is -.334, and after 8 hours the slope is -.033411+.031265 = -.00215 (the SE for this is about .00037).

> summary(emulsions.lmer.hrs.min)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ prot1:hrs + prot1:hrs8 + prot23:hrs + (1 | block) + (1 |      subject)
   Data: emulsions

REML criterion at convergence: -428.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.2032 -0.3199  0.0004  0.4237  3.1639 

Random effects:
 Groups   Name        Variance  Std.Dev.
 subject  (Intercept) 8.826e-05 0.009394
 block    (Intercept) 0.000e+00 0.000000
 Residual             4.330e-04 0.020808
Number of obs: 99, groups:  subject, 9; block, 3

Fixed effects:
             Estimate Std. Error t value
(Intercept)  1.072695   0.004451  241.01
prot1:hrs   -0.033411   0.001360  -24.57
prot1:hrs8   0.031265   0.001617   19.34
hrs:prot23  -0.003008   0.000163  -18.45

Correlation of Fixed Effects:
           (Intr) prt1:h prt1:8
prot1:hrs  -0.396              
prot1:hrs8  0.362 -0.987       
hrs:prot23 -0.376  0.149 -0.136
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

46.3 Modeling correlation

Given that we did not randomize, we could find that there is correlation among the responses for a given subject. In particular, there could be a correlation that does not match the simple one required for the univariate analysis to work (everything has the same variance with constant correlation). We may be able to model the correlation to come up with something that fits the data better.

One concept for this correlation is that it is largest for points near in time and then decays as time gets farther apart. The simplest model for this is the autoregressive model of order one (AR1).

What this command does is say to keep each subject independent, but to allow for AR1 correlations between responses for a given subject, with time taken as hrs.

> emulsions.lme.ar1 <- update(emulsions.lme,correlation=corAR1(form=~hrs|block/subject))

AIC and BIC prefer the model with the AR1 adjustment.

> anova(emulsions.lme,emulsions.lme.ar1)
                  Model df       AIC       BIC   logLik   Test  L.Ratio p-value
emulsions.lme         1 36 -169.3336 -90.50598 120.6668                        
emulsions.lme.ar1     2 37 -175.9648 -94.94762 124.9824 1 vs 2 8.631294  0.0033

Just to illustrate, let’s look at the first 10 estimated fixed effects with and without the AR1 correlation. Because we have balanced data with the orthogonal main effects and interactions model, the estimates are the same in the two models.

> fixed.effects(emulsions.lme)[1:10]
(Intercept)    protein1    protein2       time1       time2       time3       time4       time5 
 0.98760606 -0.10006061  0.04745455  0.07217172  0.07294949  0.05683838  0.04128283  0.03406061 
      time6       time7 
 0.02172727  0.01339394 
> fixed.effects(emulsions.lme.ar1)[1:10]
(Intercept)    protein1    protein2       time1       time2       time3       time4       time5 
 0.98760606 -0.10006061  0.04745455  0.07217172  0.07294949  0.05683838  0.04128283  0.03406061 
      time6       time7 
 0.02172727  0.01339394 

However, their standard errors differ.

> sqrt(diag(vcov(emulsions.lme)))[1:10]
(Intercept)    protein1    protein2       time1       time2       time3       time4       time5 
0.003616091 0.005113925 0.005113925 0.006676523 0.006676523 0.006676523 0.006676523 0.006676523 
      time6       time7 
0.006676523 0.006676523 
> sqrt(diag(vcov(emulsions.lme.ar1)))[1:10]
(Intercept)    protein1    protein2       time1       time2       time3       time4       time5 
0.005927617 0.007475466 0.007475466 0.006616551 0.006087686 0.005770966 0.005657149 0.005742990 
      time6       time7 
0.006030932 0.006529343 

The estimates will not always be the same. Here we fit the minimal spline regression models with and without the AR1 adjustment. The coefficients are slightly different (and could be more different in other settings).

> emulsions.lme.hrs.min <- lme(y~prot1:hrs+prot1:hrs8+prot23:hrs,random=~1|block/subject,data=emulsions)
> emulsions.lme.hrs.min.ar1 <- lme(y~prot1:hrs+prot1:hrs8+prot23:hrs,random=~1|block/subject,data=emulsions,correlation=corAR1(form=~hrs|block/subject))
> fixed.effects(emulsions.lme.hrs.min)
 (Intercept)    prot1:hrs   prot1:hrs8   hrs:prot23 
 1.072694767 -0.033410454  0.031264678 -0.003008316 
> fixed.effects(emulsions.lme.hrs.min.ar1)
 (Intercept)    prot1:hrs   prot1:hrs8   hrs:prot23 
 1.074217594 -0.033820109  0.031696788 -0.003039819