Chapter 25 Scheffe Correction for Data Snooping
When you have a contrast suggested by the data, you need to use the Scheffe adjustment to get control of any error rates. Scheffe is conservative, controlling the strong familywise error rate for an arbitrarily large number of contrasts, including contrasts suggested by the data.
We illustrate with the concrete strength data.> library(cfcdae)
> data("ConcreteStrength")
> concrete.separate <- lm(strength~fiberPct,data=ConcreteStrength)
> anova(concrete.separate)
Analysis of Variance Table
Response: strength
Df Sum Sq Mean Sq F value Pr(>F)
fiberPct 4 6.2627 1.56567 12.975 0.0005713 ***
Residuals 10 1.2067 0.12067
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(concrete.separate,fiberPct)
0 0.25 0.5 0.75 1
0.593333333 0.693333333 -0.006666667 -0.173333333 -1.106666667
> linear.contrast(concrete.separate,fiberPct,c(.25,.25,.25,.25,-1))
estimates se t-value p-value lower-ci upper-ci
1 1.383333 0.2242271 6.169341 0.0001056076 0.8837243 1.882942
attr(,"class")
[1] "linear.contrast"
> linear.contrast(concrete.separate,fiberPct,c(.25,.25,.25,.25,-1),
+ scheffe=TRUE)
estimates se t-value p-value lower-ci upper-ci
1 1.383333 0.2242271 6.169341 0.001934182 0.5469874 2.219679
attr(,"class")
[1] "linear.contrast"
The p-value is still pretty small, about .002, but that is 20 times the size of the unadjusted p-value. So it’s still significant after adjustment, but not nearly as significant as the naive p-value would suggest.
We have a completely analogous story when we look at the birch seedlings treated with acid mist.> data(BirchSeedlings)
> birch.separate <- lm(weights~pH,data=BirchSeedlings)
> anova(birch.separate)
Analysis of Variance Table
Response: weights
Df Sum Sq Mean Sq F value Pr(>F)
pH 4 0.7627 0.19068 16.023 1.292e-11 ***
Residuals 235 2.7965 0.01190
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> model.effects(birch.separate,pH)
4.7 4 3.3 3 2.3
0.05139583 0.01037500 0.03445833 0.01239583 -0.10862500
> linear.contrast(birch.separate,pH,c(.25,.25,.25,.25,-1),conf=.99) #unadjusted
estimates se t-value p-value lower-ci upper-ci
1 0.1357812 0.01760377 7.71319 3.470545e-13 0.09006581 0.1814967
attr(,"class")
[1] "linear.contrast"
> linear.contrast(birch.separate,pH,c(.25,.25,.25,.25,-1),conf=.99,scheffe=TRUE) # Scheffe adjusted
estimates se t-value p-value lower-ci upper-ci
1 0.1357812 0.01760377 7.71319 7.541588e-11 0.07086223 0.2007003
attr(,"class")
[1] "linear.contrast"
Again in this example, the adjusted p-value is still quite small, but it is 200 times the unadjusted p-value.