University of Minnesota, Twin Cities School of Statistics Stat 5601 Rweb
wilcox.test
NA
removal in lines two and three is that
Rweb insists on reading variables of the same length, so whichever of
x
or y
is shorter must
be padded with NA
(not applicable) values.
w
is the Wilcoxon form defined in
equation (4.3) in Hollander and Wolfe.
u
is the Mann-Whitney form defined in
equation (4.15) in Hollander and Wolfe.
1 - pwilcox(u - 1, nx, ny) pwilcox(u - 1, nx, ny, lower.tail=FALSE) pwilcox(nx * ny - u, nx, ny)
The Hodges-Lehmann estimator associated with the rank sum test is the median of the pairwise differences, which are the nx ny differences
Yj - Xi, for all i and j
Very similar to the confidence intervals associated with the sign test and signed rank test, the confidence interval has the form
(D(k), D(m + 1 - k))where m = nx ny is the number of pairwise differences, the Di are the pairwise differences, and, as always, parentheses on subscripts indicates order statistics. That is, one counts in k from each end in the list of sorted pairwise differences to find the confidence interval.
1 - 2 * pwilcox(k - 1, nx, ny)for different values of
k
. The vectorwise operation of R
functions can give them all at once
k <- seq(1, 100) conf <- 1 - 2 * pwilcox(k - 1, nx, ny) conf[conf > 1 / 2]If one adds these lines to the form above, one sees that the choice is fairly restricted. There are nine possible achieved levels between 0.99 and 0.80 are
0.9873, 0.9807, 0.9720, 0.9600, 0.9447, 0.9247, 0.9008, 0.8708, 0.8355
k
to be any integer between
zero and n / 2
just before the second to last line in the form
(cat ...
). A confidence interval with some
achieved confidence level will be produced.
alpha
rather than
alpha / 2
in the fifth line of the form. Then make either
the lower limit minus infinity or the upper limit plus infinity, as desired.
wilcox.test
All of the above can be done in one shot with the R function
wilcox.test
(on-line help).
This function comes with R. It was not written especially for this course.
wilcox.test
does almost
all of the above. It has a bit of programmer brain damage
(PBD)
in the way it calculates the point estimate.
It uses for its definition of the median, the average of the two middle values if an even number of values (which is the standard definition) and the average of the two values on either side of the middle value if an odd number (which I have never seen anywhere else).
Of course, this definition is asymptotically equivalent to the standard definition. Quite as good really. So we could regard it as a harmless eccentricity. It is, however, a pain when trying to get the answer in the back of the book or to communicate with anyone familiar with the standard definition.
wilcox.test
.