## General Instructions

To do each example, just click the Submit

button.
You do not have to type in any R instructions or specify a dataset.
That's already done for you.

## Kruskal-Wallis

### Example 6.1 in Hollander and Wolfe.

### Comments

The second analysis done by the `aov`

function is the
usual parametric procedure: one-way ANOVA. It produces
`P` = 0.5866 for comparison with the Kruskal-Wallis `P`-value.

R just knows

that the predictor variable `status`

is
categorical because it is not numeric. If the predictor variable is
numeric, then it has no way to know. The `kruskal.test`

function still assumes categorical. The `aov`

function
assumes numeric.

Suppose, for example, we were doing the example used for the two procedures for ordered alternatives below, which loads data from the URL

which has predictor variable `information`

and response variable
`number`

. Then

kruskal.test(number ~ information)

will do the right thing (a Kruskal-Wallis test in which the three values
of `information`

are treated as denoting treatments

.
But we need

information <- factor(information) out <- aov(number ~ information) summary(out)

to have `aov`

do the right thing.
The `factor`

function
(on-line help)
tells R that the variable is to be treated as categorical (R calls a categorical variable a factor

).

### More Exact Computation

The contributed package `SuppDists`

contains a better
approximation to the distribution of the Kruskal-Wallis test statistic
under the null hypothesis. Here's how that works.

The `P`-value hardly changes, and in this example is so large
that the better calculation makes no difference. Either way the treatment
is clearly not statistically significant. But on different data,
the better calculation might be important.

## Jonckheere-Terpstra

Unfortunately, R doesn't have this procedure. So we'll have to do it
by hand

(in R).

### Example 6.2 in Hollander and Wolfe.

### Summary

- Upper-tailed Jonckheere-Terpstra test
- Test statistic:
`J`= 79 - Sample sizes:
`n`= 6,_{1}`n`= 6, and_{2}`n`= 6_{3} - Monte Carlo approximation to P-value: P = 0.0231
- Monte Carlo standard error of P-value: 0.0015

### Comment

Rather than use large sample approximation

on what are really
small sample sizes, we do a Monte Carlo

calculation of the P-value
(that is, we compute by simulation of null distribution of the test statistic).

The Monte Carlo calculation is the loop

for (i in 1:nsim) { datsim <- sample(dat, length(dat)) jsim[i] <- jkstat(datsim, grp) }

This does `nsim`

simulations of the null distribution of the
test statistic. The first line of the body of the loop generates a new
simulated data set `datsim`

which is a permutation of the
actual data (same numbers, just assigned to different groups). The second
line of the body of the loop calculates the value of the test statistic
for the simulated data and stores it for future use.

After the loop has finished `jsim`

is a vector of
length `nsim`

that consists of independent, identically distributed
random variables having the distribution of the test statistic `J`
under the null hypothesis. And

phat <- mean(jsim >= jstat)

approximates the P-value, which is Pr(`J` ≥ `j`).

The slightly more tricky code

(nsim * phat + 1) / (nsim + 1)

*includes the observed value* in the numerator and denominator.
As explained in class, this assures that if α is a multiple of
`1 / nsim`

, then Pr(`P` ≤ α) is indeed α,
despite the Monte Carlo.

Despite having an exact Monte Carlo test (exact

meaning level α
really means level α), there is some interest in the randomness in
the reported P-value. Hence the next to last line calculates its standard
error.

The last line reports the time the calculation takes: the first number is the elapsed time in seconds.

## Isotonic Regression

This is the normal-theory competitor to Jonckheere-Terpstra.
R has two different functions that do this procedure.
One is the function `isoreg`

that is part of the R base
and the other is the function `pava`

in the CRAN package
`Iso`

.

However, we wrote the original version of this web page before either
existed. So we use our original version that does this by hand

(in R).
(The `pava`

in the `Iso`

package could be dropped in
for the `pava`

function defined in the box below.)

### Example 6.2 in Hollander and Wolfe.

### Summary

- Upper-tailed isotonic regression test
- Test statistic:
`T`= 52.78 - Sample sizes:
`n`= 6,_{1}`n`= 6, and_{2}`n`= 6_{3} - Monte Carlo approximation to P-value: P = 0.036
- Monte Carlo standard error of P-value: 0.0019

### Comment

The result quoted in the summary uses 10 times the sample size entered in the example form above. It was run off-line (R rather than Rweb) with the following output.

The R function `pava`

implements the *pool adjacent violators
algorithm* which does isotonic regression. In the famous words of a UNIX
source code comment you are not expected to understand this.

The main lesson here is that the normal theory test (isotonic regression) does more or less the same as the nonparametric Jonckheere-Terpstra test. This is no surprise since the data are fairly normal looking.