Chapter 24 Bonferroni-Style Methods

We want methods that adjust the p-values for a set of K tests so that we control various error rates at some rate \(\alpha\). The methods discussed here work directly on the p-values; usually, you don’t need to know anything beyond K and the p-values. Some methods, however, require that the tests that generated the p-values be statistically independent; when the tests are independent, you can usually get a little more power.

The text presents Bonferroni-style methods as comparing p-values to the nominal rate E divided by an adjustment factor. In R, p.adjust() instead multiplies each p-value by the adjustment factor, and you then compare that to the nominal error rate.

The progenitor of all these methods is Bonferroni itself. Multiply each p-value by K. If the adjusted p-value is less than \(\alpha\), reject that null hypothesis. Bonferroni is widely applicable and dead easy. It controls the strong familywise error rate, or you can use it to get simultaneous confidence intervals. But unless you need simultaneous confidence intervals, the Holm method is always more powerful (more likely to find differences) while still controlling the strong familywise error rate.

Let \(p_{(1)}\) denote the smallest p-value, \(p_{(2)}\) the second smallest p-value and so on. All of the methods will compare ordered p-values to rescaled critical values. Alternatively, you can adjust (rescale) the ordered p-values and check to see if they are less than the original rate \(\alpha\). The Bonferroni, Hochberg, Benjamini-Hochberg, and Benjamini-Yekutieli methods say to reject hypothesis (j) if the comparison inequality holds for any (i) with \(i \geq j\). The Holm method says to reject hypothesis (j) if the comparison inequality holds for all \(i \leq j\).

Put the adjusted p-values in order so that the original p-values are in increasing order. For Holm, you can only reject (j) if the jth adjusted p-value and all those that come before it are less than \(\alpha\). So start from the left and keep rejecting as long as the adjusted p-values are less than \(\alpha\); any adjusted p-value bigger than \(\alpha\) prevents rejection from there on into the list. For the other methods, you can reject (j) if it or any adjusted p-value that comes after it is less than \(\alpha\). So start from the right. Once you find an adjusted p-value less than \(\alpha\), reject that hypothesis and all those that come before it in the list.

In this example, we have p-values for testing the equality of ratings of different sensory aspects of three cottage cheeses (Example 5.3). It’s probably reasonable to assume that these tests are all independent of each other, but you might be able to come up with a scenario where using the same raters for all attributes induces some kind of correlation between the results.

Enter the p-values and sort them into increasing order:
> Texture <- c(BD.rate=.001,Firm=.0001,Sticky=.41,Slippery=.07,Heavy=.15,
+              Part.size=.42,Runny=.002,Rubbery=.006)
> Texture <- sort(Texture) # smallest to largest
> Texture
     Firm   BD.rate     Runny   Rubbery  Slippery     Heavy    Sticky Part.size 
   0.0001    0.0010    0.0020    0.0060    0.0700    0.1500    0.4100    0.4200 

Note: if you need to sort the rows of a matrix so that the first column is in increasing order, or do a parallel rearrangement of the elements of several vectors so that one of them is in increasing order, then you should be using the order() function. order() gives us the indices so that when you subscript with those indices the result is in order. The following two do the same thing:

sort(foo)
foo[order(foo)]
Consider the Texture tests using the different approaches. Assuming that the tests are independent, we can use B-H to control the false discovery rate and Hochberg to control the strong familywise error rate.
> Texture # unadjusted
     Firm   BD.rate     Runny   Rubbery  Slippery     Heavy    Sticky Part.size 
   0.0001    0.0010    0.0020    0.0060    0.0700    0.1500    0.4100    0.4200 
> p.adjust(Texture,method="BH") # controls FDR if independent
       Firm     BD.rate       Runny     Rubbery    Slippery       Heavy      Sticky   Part.size 
0.000800000 0.004000000 0.005333333 0.012000000 0.112000000 0.200000000 0.420000000 0.420000000 
> p.adjust(Texture,method="hochberg") # controls SFER if independent
     Firm   BD.rate     Runny   Rubbery  Slippery     Heavy    Sticky Part.size 
   0.0008    0.0070    0.0120    0.0300    0.2800    0.4200    0.4200    0.4200 

Testing at the .01 level, Firm, BD.rate, Runny, and Rubbery were all signficant when we make no adjustments. Firm, BD.rate, and Runny are significant when we control the false discovery rate, and only Firm and BD.rate are significant when we control the strong familywise error rate.

Notice that the three largest Hochberg p-values are all .42. The adjusted p-values begin by multiplying the smallest by K (8 in this case), then by K-1, and so on. If we do that by hand, we get
> Texture*(8:1)
     Firm   BD.rate     Runny   Rubbery  Slippery     Heavy    Sticky Part.size 
   0.0008    0.0070    0.0120    0.0300    0.2800    0.4500    0.8200    0.4200 

The last three products are .45, .82, and .42 (not .42 three times). However, recall that Hochberg is a “right to left”; the sixth and seventh values will reject for any \(\alpha\) at or above .42, because the eighth value is .42 and it carries left. p.adjust() changes the .45 and .82 to .42, which is the value that is carried left.

If the tests were not independent, then we would need to use B-Y and Holm instead of B-H and Hochberg. Note that for this particular set of p-values, B-Y, which is only guaranteed to control the false discovery rate, has larger adjusted p-values than Holm, which controls the strong familywise error rate. For this particular set of p-values, we could reject more with Holm than with B-Y, so we should use Holm. This peculiar reversal will not always happen.
> p.adjust(Texture,method="BY")
       Firm     BD.rate       Runny     Rubbery    Slippery       Heavy      Sticky   Part.size 
0.002174286 0.010871429 0.014495238 0.032614286 0.304400000 0.543571429 1.000000000 1.000000000 
> p.adjust(Texture,method="holm")
     Firm   BD.rate     Runny   Rubbery  Slippery     Heavy    Sticky Part.size 
   0.0008    0.0070    0.0120    0.0300    0.2800    0.4500    0.8200    0.8200 

Holm is a “left to right” approach. It begins with the same basic products as Hochberg, but it modifies the products to ensure that they do not decrease. Because the seventh product is .82, the eighth value will not reject for anything below .82. Thus p.adjust() reports the eighth value as .82.