# Stat 5601 (Geyer) Examples (Sign Test and Related Procedures)

## General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions or specify a dataset. That's already done for you.

## External Data Entry

Enter a dataset URL :

### Summary

• Upper-tailed sign test
• Test statistic: B = 21
• Sample size: n = 25
• P-value: P = 0.00046

• Line one assigns the value of the parameter (population median) assumed under the null hypothesis. Usually zero.
• There is no need to sort the z values in line two. It just makes the data easier to look at.
• Alternatives to line five are
```1 - pbinom(b - 1, n, 1 / 2)
pbinom(n - b, n, 1 / 2)
```
the first line here does exactly the same as line five in the example, but is less accurate for very small P-values. The second does exactly the same as line five of the example because of the symmetry of the binomial distribution with `p = 1 / 2`.
• For a lower-tailed test the fifth line would be replaced by
```pbinom(b, n, 1 / 2)
```
• For handling zeros see Hollander and Wolfe and the zero fudge section.

## External Data Entry

Enter a dataset URL :

### Summary

• Point Estimate (sample median): 17.6

## External Data Entry

Enter a dataset URL :

### Summary

• Achieved confidence level: 0.9567147
• Confidence interval for the population median: (7.1, 24.7)

• Some experimentation may be needed to achieve the confidence level you want. The possible confidence levels are shown by
```1 - 2 * pbinom(k - 1, n, 1 / 2)
```
for different values of `k`. The vectorwise operation of R functions can give them all at once
```k <- seq(1, 100)
k <- k[1 - 2 * pbinom(k - 1, n, 1 / 2) > 0.5]
1 - 2 * pbinom(k - 1, n, 1 / 2)
```
If one adds these lines to the form above, one sees that there's not much choice, only three achieved levels
0.9854, 0.9567, 0.8922
between 0.99 and 0.80.
• Alternatively, you can just assign `k` to be any integer between zero and `n / 2` just before the second to last line in the form (`cat . . .`). A confidence interval with some achieved confidence level will be produced.
• For a one-tailed confidence interval (called upper and lower bounds by Hollander and Wolfe) just use `alpha` rather than `alpha / 2` in the fifth line of the form. Then make either the lower limit minus infinity or the upper limit plus infinity, as desired.

## The R Function `sign.test`

All of the above can be done in one shot with the R function `sign.test` (on-line help).

## External Data Entry

Enter a dataset URL :

## The Zero Fudge

Many authorities recommend (at least lukewarmly) the following procedure for dealing with zero differences (differences equal to the hypothesized value μ if not zero) in the sign test. After defining the vector `z` of differences, do

```z <- z[z != mu]
n <- length(z)
```
which treates zero differences as if they were not part of the data (and the sample size is reduced accordingly).

The best that can be said for this is

• Most nonparametrics books recommend it, or at least describe it first.
• From a theoretical point of view, it is a valid test based on the distribution of the test statistic conditional on the number of zero differences and under the additional assumption
pr(Zi < μ) = pr(Zi > μ).
which generally is not true for discrete distributions (hence this additional assumption is generally false).

Thus the zero fudge can give results of arbitrary bogosity. See the counterexample.

But
• From an applied point of view, the zero fudge is completely bogus. How can anyone claim that differences of zero are completely irrelevant to decisions about whether or not the median difference is zero?
• It is obvious that zero differences are evidence in favor of the null hypothesis (of zero population difference) and should be counted as such.
• Hence the zero fudge is a form of "honest cheating". Widely accepted, but bogus.
• The reason everyone likes it, of course, is because it produces P < 0.05 more often than the conservative procedure, and everyone likes to get "statistically significant results", even if bogus.

The alternative to the zero fudge, what Hollander and Wolfe call the conservative approach is to count the zero differences as evidence in favor of the null hypothesis. This is a bit tricky, since no matter how you do it, the recipe for the test must be altered. In my opinion, the easiest way is to change the definition of the test statistic for the upper-tailed test.

For the lower tailed test, assuming the vector of differences `z`, the sample size `n`, and the hypothesisized value of the median `mu` have already been defined,

```b <- sum(z >= mu)
pbinom(b, n, 1 / 2)
```
calculates the P-value for the lower-tailed test. Note that we need the weak inequality (`>=`) to include the zero differences in the tail area calculated by the following statement.

Reversing the inquality gives the conservative upper-tailed test.

```blow <- sum(z <= mu)
pbinom(blow, n, 1 / 2)
```
Note that this isn't the test statistic described in the book, but obviously does the right thing by symmetry.

The `sign.test` function described in the preceding section has an optional argument that controls whether it does the zero fudge (`zero.fudge=TRUE` or the conservative procedure (`zero.fudge=FALSE`). Perhaps inadvisedly, `zero.fudge=TRUE` is the default.