University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb

Stat 5601 (Geyer) Examples (Wilcoxon Signed Rank Test and Related Procedures)

General Instructions
The Wilcoxon Signed Rank Test
The Associated Point Estimate (Median of the Walsh Averages)
The Associated Confidence Interval
The R Function wilcox.test
Ties and Zeros

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

The Wilcoxon Signed Rank Test

Example 3.1 in Hollander and Wolfe.

Summary

Lower-tailed signed rank test
Test statistic: T⁺ = 5
Sample size: n = 9
P-value: P = 0.01953

Comments

Line one assigns the value of the parameter (population median) assumed under the null hypothesis. Usually zero.
There is no need to sort the z values in line two. It just makes the data easier to look at.
The vector r defined in line four is the (absolute) ranks.
Line five prints the differences z and the ranks r stuffed into a matrix so they line up, each difference over the corresponding rank.
Line six prints the test statistic (sum of the positive signed ranks).
For an upper-tailed test the seventh line would be replaced by any of the following, which all do the same thing,
```
1 - psignrank(tplus - 1, n)
psignrank(tplus - 1, n, lower.tail=FALSE)
psignrank(n * (n + 1) / 2 - tplus, n)
```
the first line here does exactly the same as line seven in the example, but is less accurate for very small P-values. The second does exactly the same as line seven of the example because of the symmetry of the distribution of the signed rank test statistic.
For handling zeros and tied ranks, see Hollander and Wolfe, the class discussion, and below.

The Associated Point Estimate (Median of the Walsh Averages)

The Hodges-Lehmann estimator associated with the signed rank test is the median of the Walsh averages, which are the n (n + 1) / 2 averages of pairs of differences

(Z_i + Z_j) / 2, i ≤ j

The following somewhat tricky code computes the Walsh averages and their median.

Example 3.3 in Hollander and Wolfe.

Summary

Point Estimate (sample median of Walsh averages): -0.46

Comments

There is no need to sort the z values in line one. It just makes the data easier to look at.
There is no need to sort the Walsh averages in line three. It just makes them easier to look at. Having them sorted is necessary later on when we use them to construct a confidence interval.

The Associated Confidence Interval

Very similar to the confidence interval associated with the sign test, the confidence interval has the form

(W_(k), W_{(m + 1 - k)})

where m = n (n + 1) / 2 is the number of Walsh averages, and, as always, parentheses on subscripts indicates order statistics, in this case, of the Walsh averages W_k. That is, one counts in k from each end in the list of sorted Walsh averages to find the confidence interval.

Example 3.4 in Hollander and Wolfe.

Summary

Achieved confidence level: 0.9609375
Confidence interval for the population median: (-0.786, -0.010)

Comments

Some experimentation may be needed to achieve the confidence level you want. The possible confidence levels are shown by
```
1 - 2 * psignrank(k - 1, n, 1 / 2)
```
for different values of k. The vectorwise operation of R functions can give them all at once
```
k <- seq(1, 100)
conf <- 1 - 2 * psignrank(k - 1, n)
conf[conf > 1 / 2]
```
If one adds these lines to the form above, one sees that the choice is fairly restricted. There are nine possible achieved levels between 0.99 and 0.80 are
0.9883, 0.9805, 0.9727, 0.9609, 0.9453, 0.9258, 0.9023, 0.8711, 0.8359.
Alternatively, you can just assign k to be any integer between zero and n / 2 just before the second to last line in the form (cat . . .). A confidence interval with some achieved confidence level will be produced.
For a one-tailed confidence interval (called upper and lower bounds by Hollander and Wolfe) just use alpha rather than alpha / 2 in the fifth line of the form. Then make either the lower limit minus infinity or the upper limit plus infinity, as desired.

The R Function `wilcox.test`

All of the above can be done in one shot with the R function wilcox.test (on-line help). This function comes with R. It was not written especially for this course.

Comments

Oops! We should have said wilcox.test does almost all of the above. It has a bit of programmer brain damage (PBD) in the way it calculates the point estimate.
It uses for its definition of the median, the average of the two middle values if an even number of values (which is the standard definition) and the average of the two values on either side of the middle value if an odd number (which I have never seen anywhere else).
Of course, this definition is asymptotically equivalent to the standard definition. Quite as good really. So we could regard it as a harmless eccentricity. It is, however, a pain when trying to get the answer in the back of the book or to communicate with anyone familiar with the standard definition.
As we shall see when we come to ties (next section, not yet written), this is not the only PBD in wilcox.test.

Ties and Zeros

What if the continuity assumption is false and there are tied absolute Z values or zero Z values?

First, neither ties nor zeros should make any difference in calculating point estimators or confidence intervals.

The point estimate is the median of the Walsh averages ( section on the associated point estimate above).
The end points of the confidence interval are k in from each end of the sorted Walsh averages ( section on the associated confidence interval above).
Ties and zeros affect only hypothesis tests.

This is another bit of PBD in the implementation of the wilcox.test function. It does change the way it calculates point estimates and confidence intervals when there are ties or zeros. But it shouldn't.

The Zero Fudge

What we called the "zero fudge" in the context of the sign test (because it is fairly bogus there) makes much more sense in the context of the signed rank test. Zero values in the vector Z = Y - X of paired differences should get the smallest possible absolute rank because zero is smaller in absolute value than any other number. We might as well give them rank zero, starting counting at zero rather than at one. Then they make no contribution to the sum of ranks statistic.

Here's another way to see the same thing. Let's compare three procedures: Student t, signed rank, and sign.

The t procedures use the actual numbers, both size and sign. Because of this they are highly non-robust (breakdown point zero).
The sign procedures completely ignore the size of the numbers using only the sign (are the differences positive or negative). Because of this they are highly robust (breakdown point one-half).
The Wilcoxon signed rank procedures use both size and sign. But they don't use the actual magnitudes of the numbers, only the ranks of their magnitudes. So they pay some attention to size, but not as much attention as the parametric Student t procedures. Because of this they have in between robustness (breakdown point 0.293).

This analysis explains why the Wilcoxon should treat differences of zero size just like the Student t, that is, they don't count at all.

There is another issue about the zero fudge. As explained in the section about the zero fudge for the sign test, there is an additional condition

pr(Z_i < μ) = pr(Z_i > μ).

which generally is not true under the usual assumptions for the sign test but which generally is true under the usual assumptions for the Wilcoxon signed rank test. We are already assuming the distribution of the differences is symmetric, and that implies the probability to either side of the population median μ is the same.

Thus we won't argue with the zero fudge for the signed rank test. We will start our hypothesis test calculations (don't do this for point estimates or confidence intervals) with

z <- y - z
z <- z - mu
z < z[z != 0]

Tied Ranks

The preceding section takes care of one kind of violation of the continuity assumption. But there's a second kind of violation that causes another problem.

If there are ties among the magnitudes of the Z values, then

there is no way to unambiguously assign ranks,
the theorem about the equivalence of the T⁺ and W⁺ statistics is no longer true, and
the distribution of T⁺ tabulated in Table A.4 in Hollander and Wolfe and computed by the psignrank and qsignrank functions in R is no longer correct in the presence of ties.

So there are really two issues to be resolved.

What test statistic?
What is the sampling distribution of the test statistic under the null hypothesis.

The standard solution to the first problem is to use so-called "tied ranks" in which each of a set of tied magnitudes is assigned the average of the ranks they otherwise would have gotten if they had been slightly different (and untied). The R rank function automatically does this. So the ranks are done the same as before. And the test statistic is calculated from the ranks the same as before.

r <- rank(abs(z))
tplus <- sum(r[z > 0])

The Null Sampling Distribution of the Test Statistic

Now we have to deal with the fact that the distribution of the test statistic is no longer the one that holds under the continuity assumption (so no ties).

There are now three ways to proceed.

The Wrong Thing

The Wrong Thing (with a capital "W" and a capital "T") is to just ignore the fact that tied ranks change the sampling distribution and just use tables in books or computer functions that are based on the assumption of no ties.

This is not quite as bad as it sounds, because tied ranks were thought up in the first place with the idea of not changing the sampling distribution much. So this does give wrong answers, but not horribly wrong.

Example 3.2 in Hollander and Wolfe.

Summary

Upper-tailed signed rank test
Test statistic: T⁺ = 62.5
Sample size: n = 12
P-value (wrong): P = 0.03857

Comments

Note that when the variable names aren't "x" and "y", you have to use the actual names, here private and government.
The lines
```
mu <- 0 # hypothesised value of median
```
and
```
z <- z - mu
z <- z[z != 0]
```
serve no purpose in this problem where μ = 0 and there are no zero differences. They are only there for the general case.
The floor function in the last line rounds tplus down to the nearest integer (in this case from 62.5 to 62) because only integer arguments make sense to psignrank and rounding down is conservative for upper tails (makes them bigger).
For a lower-tailed test replace the last line with
```
psignrank(ceiling(tplus), n)
```

The Right Thing

As long as you have a computer, why not use it? There are only 2ⁿ points in the sample space of the sampling distribution of the test statistic under the null hypothesis corresponding to all possible choices of signs to the ranks. In this case, 2¹² = 4096, not a particularly big number for a computer (although out of the question for hand calculation). So just do it.

Example 3.2 in Hollander and Wolfe.

Summary

Upper-tailed signed rank test
Test statistic: T⁺ = 62.5
Sample size: n = 12
P-value (exact): P = 0.03345

Comments

Note that the "wrong thing" isn't all that wrong. It doesn't even have one correct significant figure, though.

Wrong P = 0.04
Right P = 0.03
The part of the code in the form above the blank line is just the same as in all the other calculations for hypothesis tests.
As for the code below the blank line, in the words of a famous comment in the UNIX source code you are not expected to understand this. A naive implementation would have a for loop that does 2¹² = 4096 iterations instead of 12. This one uses R's vectorizing arithmetic and indexing plus some fancy modular arithmetic (%% is the remainder operator). It is a lot faster than the naive implementation.
Since this code uses 2ⁿ memory, it will crash if n is large. For large n use following section.
The vector tstat is the 2ⁿ values of the test statistic that occur for the 2ⁿ patterns of plus and minus signs for the ranks.
The last line calculates the fraction of those values that are greater than or equal to the observed value. This is the P-value of the upper-tailed test. For a lower-tail test, just reverse the inequality
```
mean(tstat <= tplus)
```
The P-value of the two-tailed test is twice the P-value of either one-tailed test.

The Large Sample Approximation

We have been ignoring up to now, large sample (also called "asymptotic") approximations to sampling distributions of test statistics. Why use an approximation when the computer can do it exactly?

However, the computer can't do it exactly for really large n. The functions psignrank and qsignrank crash when n is larger than about 50. The code in the preceding section is worse. It will crash when n is larger than about 20.

Thus the need for large sample approximation.

It is a fact, which Hollander and Wolfe derive on pp. 44-45 that the mean and variance of the sampling distribution of T⁺ are

E(T⁺) = n (n + 1) / 4
var(T⁺) = n (n + 1) (2 n + 1) / 24

under the assumption of continuity (hence no ties).

When there are ties, the mean stays the same but the variance is reduced by a quantity that, because it has a summation sign in it, doesn't look good in web pages. Just see equation (3.13) in Hollander and Wolfe.

Here's how R uses this approximation.

Example 3.2 in Hollander and Wolfe.

Summary

Upper-tailed signed rank test
Test statistic: T⁺ = 62.5
Sample size: n = 12
P-value (approximate with correction for ties): P = 0.03258
P-value (approximate with correction for ties and correction for continuity): P = 0.03403

Comments

The function that calculates the c. d. f. of the normal distribution is pnorm.
The last two lines calculate without and with a "correction for continuity" which is 1 / 4 here rather than 1 / 2 because the discreteness in the distribution is in steps of 1 / 2 rather than integer steps. This is because tied ranks may have half-integer values (and do in this case).
Arrghh!!! The tied ranks may also have only integer values (when the number tied in each tied group is odd). In that case the correction for continuity should be 1 / 2.
For a lower-tailed test, we need to reverse the sign of the correction for continuity
```
pnorm(tplus, et, sqrt(vt))
pnorm(tplus + 1 / 4, et, sqrt(vt))
pnorm(tplus + 1 / 2, et, sqrt(vt))
```
whichever you think is better.
The R function wilcox.test always uses 1 / 2 for the correction for continuity even though we can see that 1 / 4 works better in this case.

The `wilcox.test` Thing

Example 3.2 in Hollander and Wolfe.

Summary

Upper-tailed signed rank test
Test statistic: T⁺ = 62.5
Sample size: n = 12
P-value (approximate with correction for ties and correction for continuity): P = 0.03554

Wrong	`P` = 0.04
Right	`P` = 0.03

Stat 5601 (Geyer) Examples (Wilcoxon Signed Rank Test and Related Procedures)

Contents

Example 3.1 in Hollander and Wolfe.

External Data Entry

Summary

Comments

Example 3.3 in Hollander and Wolfe.

External Data Entry

Summary

Comments

Example 3.4 in Hollander and Wolfe.

External Data Entry

Summary

Comments

External Data Entry

Comments

Example 3.2 in Hollander and Wolfe.

External Data Entry

Summary

Comments

Example 3.2 in Hollander and Wolfe.

External Data Entry

Summary

Comments

Example 3.2 in Hollander and Wolfe.

External Data Entry

Summary

Comments

Example 3.2 in Hollander and Wolfe.

External Data Entry

Summary