Chapter 9 Randomization-based Procedures

Standard frequentist statistics make inferences about parameters of distributions (populations). Randomization (permutation) tests do not make distributional assumptions. Instead, randomization tests assume that the data are fixed, but the labels as to which response correspond to which treatment are random. So standard methods assume known labels (treatment assignments) and random data arising as samples from populations, and randomization methods assume fixed/known data, and random labels (treatment assignments). The logic of these two approaches are totally different, but the results are often quite similar.

The beauty of randomization methods is that: if you have done a randomization, then the randomization inference is appropriate without any further assumptions (distributions, variances, etc.). The downsides of randomization methods are that they are exceedingly tedious to do by hand and do not generalize as easily to more complicated settings.

Randomization procedures work by first choosing a summary statistic that reflects the issue of interest. If we are concerned about comparing the means of two groups, our summary statistic would probably be the sample mean of the first group’s data minus the sample mean of the second group’s data. We can then observe this summary for our data.

Randomization procedures work on the basis of assuming that the treatments are null. That is, assignment of treatments to units does absolutely nothing other than change treatment labels; regardless of what treatment is assigned to a unit, the response would be the same. You can compute the summary statistic for each possible way that the random assignment of treatments to units could have turned out. Under most randomizations, each of these possibilities is equally likely, leading to a distribution of potential summary statistics under the null assumption.

Inference is made by comparing the observed statistic to the randomization distribution. To test the null, get the tail area of the observed statistic in the randomization distribution. To get a confidence interval, find the range of shifts of the null distribution that would lead to not rejecting the null.

9.1 Two-sample procedure

In a two-sample problem we have one set of \(n\) responses and a second set of \(m\) responses. We assume that the randomization that led to this structure was one where \(n\) units were chose at random from \(n+m\) units and assigned to group 1, with the remaining \(m\) units going to group 2. All possible assignments of \(n+m\) units into groups of \(n\) and \(m\) are equally likely.

The randomization analog of the two-sample t-test can be done with the function permTS() from the perm package; you will need to install the perm package if you do not already have it (see Section 1.3).

The permTS() function does the two-sample randomization (permutation) t-test. Its summary statistic is the difference in the means for the two groups. By default it does a two-sided alternative. The main advantage of this procedure over the t-test is that the permutation test does not assume or depend on normality.

First get the NotchedBoards data from cfcdae into your current work space. Then create variables for the notched and unnotched boards.
> library(cfcdae)
> data(NotchedBoards) # creates variable NotchedBoards
> unnotched <- NotchedBoards$strength[NotchedBoards$shape == "uniform"]
> notched <- NotchedBoards$strength[NotchedBoards$shap == "notched"]
Now use permTS() much like t.test(). The exact=TRUE means that permTS() should try to determine the complete randomization distribution if it won’t take too long, or use a random sample of 999 from the distribution if it will take too long.
> library(perm)
> permTS(unnotched, notched, exact=TRUE)

    Exact Permutation Test Estimated by Monte Carlo

data:  unnotched and notched
p-value = 0.75
alternative hypothesis: true mean unnotched - mean notched is not equal to 0
sample estimates:
mean unnotched - mean notched 
                     6.615385 

p-value estimated from 999 Monte Carlo replications
99 percent confidence interval on p-value:
 0.6702309 0.8296954 

We see a p-value of .75, which compares with .79 from the t-test.

We may also specify different alternatives, for example,
> permTS(unnotched, notched, alternative="greater",exact=TRUE)

    Exact Permutation Test Estimated by Monte Carlo

data:  unnotched and notched
p-value = 0.375
alternative hypothesis: true mean unnotched - mean notched is greater than 0
sample estimates:
mean unnotched - mean notched 
                     6.615385 

p-value estimated from 999 Monte Carlo replications
99 percent confidence interval on p-value:
 0.3351154 0.4148477 

9.2 Paired procedures

Sometimes we get two measurements on a subject under different circumstances, or perhaps we treat every experimental unit with two different treatments and get the responses for the two treatments for each unit. These data are paired, because we expect that the two responses for the same subject or unit to both be high or both be low. They are correlated with each other because they share some aspect of the subject or unit that causes the responses to be similar.

The randomization for the paired set up is that each pair of responses is equally likely to have been assigned AB or BA. Under the null assumption that treatment assignments are only labels, this implies that the sign of the difference in responses between the two has an equal probability of being positive or negative. The randomization in the test is then to randomly flip the signs of all of the differences.

Will will demonstrate using the data cfcdae::RunStitch. Each worker run stitches collars using two different setups: the conventional setup and an ergonomic setup. The two runs are made in random order for each worker, and the interest is in any difference in average speed between the two setups.

Load the runstitch data from the package and compute the differences.
> data(RunStitch)
> head(RunStitch)
  Standard Ergonomic
1     4.90      3.87
2     4.50      4.54
3     4.86      4.60
4     5.57      5.27
5     4.62      5.59
6     4.65      4.61
> differences <- RunStitch[,"Standard"] - RunStitch[,"Ergonomic"]
> differences
 [1]  1.03 -0.04  0.26  0.30 -0.97  0.04 -0.57  1.75  0.01  0.42  0.45 -0.80
[13]  0.39  0.25  0.18  0.95 -0.18  0.71  0.42  0.43 -0.48 -1.08 -0.57  1.10
[25]  0.27 -0.45  0.62  0.21 -0.21  0.82
The permsign.test() function in cfcdae does what we need. Note that we set the random number seed so that the output is replicable, and we asked for a plot of the randomization distribution.
> set.seed(654321)
> permsign.test(differences,plot=TRUE)
Randomization distribution for runstitch differences

Figure 9.1: Randomization distribution for runstitch differences

Permutation Sign Test for differences 

 Null hypothesis mean value: 0 

 Lower tail p-value 0.9273 
 Upper tail p-value 0.0728 
 Two tail p-value 0.1456 

95% confidence interval: -0.05867, 0.4093

The two-sided p-value for these data using a paired t-test is .147.

There is also a version of permsign.test() that will save you the toil of computing the differences, along with the possibility of using a data frame source.

> permsign.test(Standard, Ergonomic, plot=FALSE, data=RunStitch)
Permutation Sign Test for Standard - Ergonomic 

 Null hypothesis mean value: 0 

 Lower tail p-value 0.9253 
 Upper tail p-value 0.0751 
 Two tail p-value 0.1502 

95% confidence interval: -0.05867, 0.4093

Note that the results are just slightly different due to different random samples used in the two runs.