Chapter 25 Scheffe Correction for Data Snooping

When you have a contrast suggested by the data, you need to use the Scheffe adjustment to get control of any error rates. Scheffe is conservative, controlling the strong familywise error rate for an arbitrarily large number of contrasts, including contrasts suggested by the data.

We illustrate with the concrete strength data.
> library(cfcdae)
> data("ConcreteStrength")
> concrete.separate <- lm(strength~fiberPct,data=ConcreteStrength)
> anova(concrete.separate)
Analysis of Variance Table

Response: strength
          Df Sum Sq Mean Sq F value    Pr(>F)    
fiberPct   4 6.2627 1.56567  12.975 0.0005713 ***
Residuals 10 1.2067 0.12067                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(concrete.separate,fiberPct)
           0         0.25          0.5         0.75            1 
 0.593333333  0.693333333 -0.006666667 -0.173333333 -1.106666667 
The overall null is rejected with a p-value of about .0006. When we look at the treatment effects, the last one (fiber percent equal to 1) is far from the others, so I want to look at a contrast that compares the fifth treatment to the average of the first four.
> linear.contrast(concrete.separate,fiberPct,c(.25,.25,.25,.25,-1))
  estimates        se  t-value      p-value  lower-ci upper-ci
1  1.383333 0.2242271 6.169341 0.0001056076 0.8837243 1.882942
attr(,"class")
[1] "linear.contrast"
We see that the unadjusted p-value is about .0001. However, this contrast was suggested by the data, so the correct way to draw inference is to use the Scheffe correction.
> linear.contrast(concrete.separate,fiberPct,c(.25,.25,.25,.25,-1),
+                 scheffe=TRUE)
  estimates        se  t-value     p-value  lower-ci upper-ci
1  1.383333 0.2242271 6.169341 0.001934182 0.5469874 2.219679
attr(,"class")
[1] "linear.contrast"

The p-value is still pretty small, about .002, but that is 20 times the size of the unadjusted p-value. So it’s still significant after adjustment, but not nearly as significant as the naive p-value would suggest.

We have a completely analogous story when we look at the birch seedlings treated with acid mist.
> data(BirchSeedlings)
> birch.separate <- lm(weights~pH,data=BirchSeedlings)
> anova(birch.separate)
Analysis of Variance Table

Response: weights
           Df Sum Sq Mean Sq F value    Pr(>F)    
pH          4 0.7627 0.19068  16.023 1.292e-11 ***
Residuals 235 2.7965 0.01190                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(birch.separate,pH)
        4.7           4         3.3           3         2.3 
 0.05139583  0.01037500  0.03445833  0.01239583 -0.10862500 
The treatment with pH 2.3 is far from the others, which are more tightly clustered. Thus we might consider comparing the 2.3 treatment to the average of the other four. Because this is a contrast suggested by the data, we need to use a Scheffe adjustment.
> linear.contrast(birch.separate,pH,c(.25,.25,.25,.25,-1),conf=.99) #unadjusted
  estimates         se t-value      p-value   lower-ci  upper-ci
1 0.1357812 0.01760377 7.71319 3.470545e-13 0.09006581 0.1814967
attr(,"class")
[1] "linear.contrast"
> linear.contrast(birch.separate,pH,c(.25,.25,.25,.25,-1),conf=.99,scheffe=TRUE) # Scheffe adjusted
  estimates         se t-value      p-value   lower-ci  upper-ci
1 0.1357812 0.01760377 7.71319 7.541588e-11 0.07086223 0.2007003
attr(,"class")
[1] "linear.contrast"

Again in this example, the adjusted p-value is still quite small, but it is 200 times the unadjusted p-value.