General Instructions
To do each example, just click the Submit
button.
You do not have to type in any R instructions or specify a dataset.
That's already done for you.
The Sign Test
Example 3.5 in Hollander and Wolfe
Summary
- Upper-tailed sign test
- Test statistic: B = 21
- Sample size: n = 25
- P-value: P = 0.00046
Comments
- Line one assigns the value of the parameter (population median) assumed under the null hypothesis. Usually zero.
- There is no need to sort the z values in line two. It just makes the data easier to look at.
- Alternatives to line five are
1 - pbinom(b - 1, n, 1 / 2) pbinom(n - b, n, 1 / 2)
the first line here does exactly the same as line five in the example, but is less accurate for very small P-values. The second does exactly the same as line five of the example because of the symmetry of the binomial distribution withp = 1 / 2
. -
For a lower-tailed test the fifth line would be replaced by
pbinom(b, n, 1 / 2)
- For a two-tailed test calculate both one-tailed P-values, take the smaller, and double it (because two tails is twice one tail). This only works for a test for which the null distribution of the test statistic is symmetric, which is the case for the sign test.
- For handling ties see Hollander and Wolfe, the zero fudge section, and the fuzzy section.
The Associated Point Estimate (Median)
Example 3.6 in Hollander and Wolfe
Summary
- Point Estimate (sample median): 17.6
The Associated Confidence Interval
Example 3.6 in Hollander and Wolfe
Summary
- Achieved confidence level: 0.9567147
- Confidence interval for the population median: (7.1, 24.7)
Comments
- Some experimentation may be needed to achieve the confidence level
you want. The possible confidence levels are shown by
1 - 2 * pbinom(k - 1, n, 1 / 2)
for different values ofk
. The vectorwise operation of R functions can give them all at oncek <- seq(1, 100) k <- k[1 - 2 * pbinom(k - 1, n, 1 / 2) > 0.5] 1 - 2 * pbinom(k - 1, n, 1 / 2)
If one adds these lines to the form above, one sees that there's not much choice, only three achieved levels0.9854, 0.9567, 0.8922between 0.99 and 0.80. - Alternatively, you can just assign
k
to be any integer between zero andn / 2
just before the second to last line in the form (cat . . .
). A confidence interval with some achieved confidence level will be produced. - For a one-tailed confidence interval (called upper and lower bounds by
Hollander and Wolfe) just use
alpha
rather thanalpha / 2
in the fifth line of the form. Then make either the lower limit minus infinity or the upper limit plus infinity, as desired.
The Zero Fudge
Many authorities recommend (at least lukewarmly) the following procedure
for dealing with zero differences (differences equal to the hypothesized
value μ if not zero) in the sign test. After defining the
vector z
of differences, do
z <- z[z != mu] n <- length(z)
which treates zero differences as if they were not part of the data (and the sample size is reduced accordingly).
The best that can be said for this is
- Most nonparametrics books recommend it, or at least describe it first.
- From a theoretical point of view, it is a valid test of the hypotheses
H0: pr(Zi < μ) = pr(Zi > μ).(or the analogous one-sided alternatives).
H1: pr(Zi < μ) ≠ pr(Zi > μ).But these hypotheses are not what you want to test! What you want to test is the hypotheses described on p. 60 in Hollander and Wolfe: that the medians are the same or different.
Modify the data for our example above for the sign test adding a million zero differences to the data set. The zero fudge says we throw out those zeros and do exactly the same analysis getting P = 0.00046, a highly statistically significant result.
But the whole data set says we get exactly the same response in the treatment and control situations in 1,000,000 cases and a different response in only 25 cases. This is overwhelming evidence in favor of the null hypothesis (3.37) in Hollander and Wolfe. It is highly significant evidence against the tricky null hypothesis of the zero fudge test.
The moral of the story: In interpreting a test of significance it's not enough that P < 0.05. It's even more important what the null hypothesis is. Rejecting a null hypothesis of no scientific interest whatsoever is worthless.
- Hence the zero fudge is a form of "honest cheating". Widely accepted, but bogus. The reason everyone likes it, of course, is because it produces P < 0.05 more often than the conservative procedure, and everyone likes to get "statistically significant results", even if bogus.
The alternative to the zero fudge, what Hollander and Wolfe call the conservative approach is to count the zero differences as evidence in favor of the null hypothesis. This is a bit tricky, since no matter how you do it, the recipe for the test must be altered.
For the upper tailed test, assuming the vector of differences z
,
the sample size n
, and the hypothesisized value of the median
mu
have already been defined,
b <- sum(z > mu) pbinom(b - 1, n, 1 / 2, lower.tail=FALSE)
calculates the P-value for the upper-tailed test. (This is the same code
we used for the upper-tailed test when there were no ties.) Using the
strict inequality (>
) to excludes the zero differences
from the tail area calculated by these statements.
Reversing the inquality gives the conservative lower-tailed test.
blow <- sum(z < mu) pbinom(blow - 1, n, 1 / 2, lower.tail=FALSE)
Note that this isn't the test statistic described in the book, but obviously does the right thing by symmetry.
Fuzzy Procedures
Fuzzy P-Values
A recent paper by your humble instructor and another member of this department resurrected an old idea, randomized tests, and gave it a new spin as fuzzy tests, taking some terminology from fuzzy set theory.
In a simple situation where there are no ties, a fuzzy P-value for for a sign test (or other rank tests we will meet in a few weeks) can be thought of as a P-value smeared out over an interval. If t is the observed value of the test statistic and T is a random variable having the distribution of the test statistic assuming the null hypothesis is true, then
- The conventional P-value is Pr(T ≥ t).
- The fuzzy P-value smeared out over the interval from Pr(T > t) to Pr(T ≥ t).
Note that the fuzzy P-value gives more information. The upper end point of the fuzzy P-value interval is the conventional P-value. Hence fans of conventional P-values cannot object to fuzzy P-values. The fuzzy P-value tells you more than a conventional P-value, but it does not tell you less.
The interpretation of a fuzzy P-value is just like the interpretation of a conventional P-value.
- Low P-values are good (if you are on the side that wants to reject the null hypothesis). The lower the better.
- High P-values are bad (for that side).
- Intermediate P-values are equivocal.
The only difference is that what is now low, high, or intermediate is a range of values. So long as the range isn't too wide, this makes little difference to the interpretation. Anyone who thinks there is an important difference between P = 0.051 and P = 0.049 understands neither science nor statistics. A fuzzy P-value smeared out over the interval (0.049, 0.051) isn't different in any practical sense.
Example 3.5 in Hollander and Wolfe Redone
Summary
- Upper-tailed sign test
- Fuzzy P-value: Uniform on the interval (0.000078, 0.000455)
Comments
- Line one assigns the value of the parameter (population median) assumed under the null hypothesis. Usually zero.
- Line two attaches the library that has the function we want to use.
- Line three does the test. In this data the result is highly statistically significant whether one uses conventional or fuzzy P-values. It only makes a difference when the decision is borderline.
- To do a lower-tailed test, change the value of
alternative
to"less"
. To do a two-tailed test, change the value ofalternative
to"two.sided"
. You may abbreviate.alt = "g"
andalt = "two"
work too. (You only need enough of the argument name or value to specify it unambiguously.)
Theoretically, a fuzzy P-value is a random variable whose randomness comes not from the sampling process that generated the data but is artificial, introduced by the theoretical statistician. We can say here that the fuzzy P-value is a random variable uniformly distributed on the interval (0.000078, 0.000455), which is what the summary says.
If there are ties, then the fuzzy.sign.test
automatically
does the right thing (or at least a right thing). Then the
probability distribution of the fuzzy P-value becomes non-uniform.
Details are given in the paper cited above and also on the
fuzzy P-values and confidence intervals page.
An Example With Ties
Summary
- Upper-tailed sign test
- Fuzzy P-value: Non-Uniform on an interval with upper end 0.0717. See plots for details.
- Interpretation: using the conventional 0.05 as the borderline of statistical significance (which we shouldn't), most of the distribution of the fuzzy P-value is below 0.05 and hence this is almost but not quite as strong evidence as a conventional P-value a little bit below 0.05.
Fuzzy Decisions
If one likes the decision theoretic view of hypothesis testing
(pick an alpha level, say 0.05, and accept
or reject
the null hypothesis at that level, ignoring all other levels),
then the fuzzy analog is to report the probability that the fuzzy
test rejects (which is the same as the probability that the randomized
test rejects), which is the probability that the fuzzy P-value is less
than alpha.
An Example With Ties Redone Decisively
Summary
- Upper-tailed sign test
- Randomized test rejects at level 0.05 with probability 0.9425.
Not much difference between rejecting the null hypothesis (that the true population median is zero) with probability 0.94 and with probability 1.00. In either case, moderately strong, but not extremely strong, evidence against the null hypothesis.
Fuzzy Confidence Intervals
Example 3.5 in Hollander and Wolfe Redone with Confidence
Comments
The fuzzy confidence interval is a function that gives a number between
zero and one for each parameter value.
The values for which it is one
(the core of the interval) are the ones for which it gets
full credit
if the true unknown parameter value is among them.
The values for which it is nonzero
(the support of the interval) are the ones for which it gets
partial credit
if the true unknown parameter value is among them.
The reported confidence level (here 95%) is the expected amount of credit
it gets (full or partial) averaged over samples from the population.
In most cases (and here) fuzzy and conventional confidence intervals are not so different. The core of the fuzzy confidence interval, which is (7.5, 23.8) in the example, counts in 9 from each end, which would be an 89.22% confidence interval by itself, what
n <- length(x) k <- 9 1 - 2 * pbinom(k - 1, n, 1 / 2)
would calculate.
The core of the fuzzy confidence interval,
which is (7.1, 24.7) in the example,
counts in 8 from each end, which would be a 95.67% confidence interval
by itself, what the above code would calculate if we set
k <- 8
instead. The partial credit is carefully arranged
to give exactly 95% coverage regardless of what the true parameter value
may be.
More details are given in the paper cited above and also on the fuzzy P-values and confidence intervals page.