University of Minnesota, Twin Cities     School of Statistics     Stat 5601     Rweb

Stat 5601 (Geyer) Examples (Permutation Tests and Related Procedures)

The Name of the Game

Permutation tests are also called randomization tests. Different name, same concept.

Contents

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

Theory

Permutation tests, despite being the oldest of all nonparametric procedures that are still used today, do not seem to be covered in Hollander and Wolfe. Thus we need a theory section.

However, both Wilcoxon tests are special cases of the general notion of permutation tests. Thus the subject should seem somewhat familiar.

As the plural permutation tests implies, this name refers to a class of tests. It is not a single procedure like the sign test or either of the Wilcoxon tests. Rather, it is a large class of procedures all derived from the same fundamental principle.

As with any test of statistical hypotheses we need the following elements.

The last of these is a complicated structure involving the other three elements. It is a probability distribution for the data which in turn induces a probability distribution for the test statistic (because it is a function of the data). The null hypothesis embodies the assumptions under which this probability distribution for the data holds. (Under alternatives, the data and test statistic would have a different sampling distribution.)

Permutations

For most hypothesis tests, we start with the assumptions and work forward to derive the sampling distribution of the test statistic under the null hypothesis. For permutation tests we will reverse the procedure, since the sampling distribution involves the permutations which give the procedure its name and are the key theoretical issue in understanding the test.

In mathematics, a permutation is a reordering of the numbers 1, ..., n. For example,

(1, 2, 3, 4, 5, 6, 7)
(1, 3, 2, 4, 5, 6, 7)
(4, 7, 5, 2, 6, 1, 3)
are all permutations of the numbers 1, ..., 7 (note that this includes the standard order in line one). As one learns in at the beginning of a probability course, there are n! (n factorial) permutations of n objects. In this case, 7! = 5040, so you can see why they aren't all written out here.

In the term permutation tests the notion of permutation is somewhat different. It refers to any of a specified class of rearrangements or modifications of the data.

The null hypothesis of the test specifies that the permutations are all equally likely. A concise way to say this is that the distribution of the data under the null hypothesis satisfies exchangeability.

The sampling distribution of the test statistic under the null hypothesis is computed by forming all of the permutations, calculating the test statistic for each and considering these values all equally likely.

Rather than give a general description of all the kinds of permutations that may arise in applications, we will just consider two specific cases.

Permutations for Unpaired Two-Sample Tests

Wilcoxon Rank Sum Tests

For unpaired two-sample tests, in particular, for Wilcoxon rank sum tests, the permutations are really combinations because order within the x's and y's doesn't matter.

Suppose, as in Comment 5 of Section 4.1 in Hollander and Wolfe, there are 3 x's and 2 y's. The combinations can be represented as follows

Rweb:> library(stat5601) 
Rweb:> foo <- perm2(5, 2) 
Rweb:> t(foo) 
      [,1] [,2] 
 [1,]    1    2 
 [2,]    1    3 
 [3,]    1    4 
 [4,]    1    5 
 [5,]    2    3 
 [6,]    2    4 
 [7,]    2    5 
 [8,]    3    4 
 [9,]    3    5 
[10,]    4    5 
Each row of the printed matrix gives one combination. The sampling distribution of the test statistic is formed by taking for each row of the matrix the corresponding elements of the combined data vector, calling those elements y's, and computing the test statistic. Each of the choose(n, k) combinations is equally likely under the null hypothesis.

The function perm2 ( on-line help) used here produces all the combinations.

Comment 5 in Section 4.1 in Hollander and Wolfe.

Continuing with this example, the following code calculates

External Data Entry

Enter a dataset URL :

The function perm2sum ( on-line help) used here produces the sum of each combination of data elements. Since the argument it is fed is the rank vector (note that there are ties, so we are calculating the exact sampling distribution of the test statistic when there are tied ranks), it calculates the sum of each possible pair of ranks (for this particular rank vector with its particular pattern of ties).

Thus the vector wsim is the vector of all possible values of the Wilcoxon rank sum test statistic for this particular pattern of ties.

The fraction of these values at or below the observed value of the test statistic is the P-value for an exact lower-tailed test.

For an upper-tailed test, reverse the inequality in the last statement.

For a two-tailed test, double the smaller of the one-tailed P-values.

Other Tests

There is no reason whatsoever why a permutation test has to use any particular test statistic. Any test statistic will do!

Consider, for example, the usual test statistic for a two-sample z test.

Data from Example 4.1 in Hollander and Wolfe, Different Test Statistic.

External Data Entry

Enter a dataset URL :

The function perm2fun ( on-line help) used here evaluates an arbitrary function on each combination of data elements.

It takes arguments x and y which are the x and y values for the permuted data for each permutation.

This example is really two examples.

Note the following very interesting observations.

Permutations for One-Sample or Paired Two-Sample Tests

Wilcoxon Signed Rank Tests

For one-sample or paired two-sample tests, in particular, for Wilcoxon signed rank tests, the permutations are really subsets. The permutation distribution choses an arbitrary subset to mark + and the complementary subset is marked -. Either subset can be empty.

Suppose, as in Comment 11 of Section 3.1 in Hollander and Wolfe, there are 4 z's. The subsets can be represented as follows

Rweb:> library(stat5601) 
Rweb:> foo <- perm1(4) 
Rweb:> t(foo) 
      [,1] [,2] [,3] [,4] 
 [1,]    0    0    0    0 
 [2,]    1    0    0    0 
 [3,]    0    1    0    0 
 [4,]    1    1    0    0 
 [5,]    0    0    1    0 
 [6,]    1    0    1    0 
 [7,]    0    1    1    0 
 [8,]    1    1    1    0 
 [9,]    0    0    0    1 
[10,]    1    0    0    1 
[11,]    0    1    0    1 
[12,]    1    1    0    1 
[13,]    0    0    1    1 
[14,]    1    0    1    1 
[15,]    0    1    1    1 
[16,]    1    1    1    1
Each row of the printed matrix gives (the indicator vector of) one subset. The sampling distribution of the test statistic is formed by for each row of the matrix summing the ranks of the data elements corresponding to the 1's. Each of the 2^n subsets is equally likely under the null hypothesis.

The function perm1 ( on-line help) used here produces all the subsets.

Comment 11 in Section 3.1 in Hollander and Wolfe.

Continuing with this example, the following code calculates

External Data Entry

Enter a dataset URL :

The function perm1sum ( on-line help) used here produces the sum of each subset of data elements. Since the argument it is fed is the absolute rank vector (note that there are ties, so we are calculating the exact sampling distribution of the test statistic when there are tied ranks), it calculates the sum of each possible subset of ranks (for this particular rank vector with its particular pattern of ties).

Thus the vector tsim is the vector of all possible values of the Wilcoxon signed rank test statistic for this particular pattern of ties.

The fraction of these values at or above the observed value of the test statistic is the P-value for an exact upper-tailed test.

For a lower-tailed test, reverse the inequality in the last statement.

For a two-tailed test, double the smaller of the one-tailed P-values.

Other Tests

There is no reason whatsoever why a permutation test has to use any particular test statistic. Any test statistic will do!

Consider, for example, the usual test statistic for a one-sample z test.

Data from Table 3.9 in Hollander and Wolfe, Different Test Statistic.

External Data Entry

Enter a dataset URL :

The function perm1fun ( on-line help) used here evaluates an arbitrary function on each subset of data elements and its complement.

It takes arguments x and y which are the + and - values for each permutation.

This example is really two examples.

Note the following very interesting observations.

Monte Carlo Tests

The perm functions described above don't work for large sample sizes. The amount of memory used, choose(n, k) or 2^n, gets larger than the computer can hold.

What then?

For the Wilcoxon tests, we can use the normal approximations. But for arbitrary permutation tests there are no normal approximations.

The answer is to use so-called Monte Carlo tests which use a random sample of the permutations rather than all the permutations.

The consequence of this is that the result of the calculation, the P-value of the test is itself a random number. However, we can calculate its precision using statistics and so use enough random pemutations to get whatever precision we want.

Two-Sample Tests

Example 4.1 in Hollander and Wolfe.

External Data Entry

Enter a dataset URL :

Comments

One-Sample Tests

Example 3.1 in Hollander and Wolfe.

External Data Entry

Enter a dataset URL :