University of Minnesota, Twin Cities School of Statistics Stat 3011 Rweb Textbook (Wild and Seber)
This page expands on the very terse treatment of two-dimensional tables on the main page for chapter 11.
That page had the following example, which used the data for Example 11.2.1 in Wild and Seber, which is in the file melanoma.txt.
That page said unfortunately, because of the way Rweb reads in data,
the syntax here is a bit obscure
in reference to this example.
You may think a bit obscure
is an understatement. If so,
here is more explanation.
In order to see what the example above does, let's look at X
and X[ , -1]
The printout is shown below
Rweb:> names(X) [1] "Type" "HeadnNeck" "Trunk" "Extremities"
This part of the printout, which we usually ignore, shows that the data as read in by Rweb consists of four variables with the names shown.
Then
Rweb:> X Type HeadnNeck Trunk Extremities 1 Hutchinson's 22 2 10 2 Superficial 16 54 115 3 Nodular 19 33 73 4 Indeterminant 11 17 28
shows what the four variables are. Rweb has not understood the way the
textbook authors formatted their data. It has taken the row labels to
be a variable named Type
, which is wrong. The row labels
aren't data.
Finally
Rweb:> X[ , -1] HeadnNeck Trunk Extremities 1 22 2 10 2 16 54 115 3 19 33 73 4 11 17 28shows that
X[ , -1]
knocks off the first column (the row labels,
which weren't data anyway). So this does give us the data we want, as we
can see by comparing this output with the table in the textbook.
If the preceding section seems just too weird. Here's a more straightforward way to do it.
We don't use the data file provided by the textbook authors at all. We just type the data into the web form.
Here we just read the data into a vector tmp
and then
stuff it into a matrix (what mathematicians call a two-dimensional
array of numbers).
The only trick bit is that the result, the matrix fred
has
rows and columns interchanged because of the way R stuffs vectors into
matrices. We typed in the data reading across rows, but R reads down
columns when putting numbers into matrices.
But having the rows and columns switched does not matter to the chi squared test. It does exacty the same thing either way. And we get exactly the same P-value.
This bit is probably overkill. The example is done to death already. But . . .
If it bothers you having the rows and columns switched, the optional
argument byrow = TRUE
to the matrix function fixes that