University of Minnesota, Twin Cities     School of Statistics     Stat 3011     Rweb     Textbook (Wild and Seber)

Stat 3011 (Geyer) In-Class Examples (Matrices)

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions (that's already done for you). You do not have to select a dataset (that's already done for you).

Chi-Square Tests for Two-Dimensional Tables (Section 11.2 in Wild and Seber)

This page expands on the very terse treatment of two-dimensional tables on the main page for chapter 11.

That page had the following example, which used the data for Example 11.2.1 in Wild and Seber, which is in the file melanoma.txt.

That page said unfortunately, because of the way Rweb reads in data, the syntax here is a bit obscure in reference to this example. You may think a bit obscure is an understatement. If so, here is more explanation.

What the example above does

In order to see what the example above does, let's look at X and X[ , -1]

The printout is shown below

Rweb:> names(X) 
[1] "Type"        "HeadnNeck"   "Trunk"       "Extremities" 

This part of the printout, which we usually ignore, shows that the data as read in by Rweb consists of four variables with the names shown.

Then

Rweb:> X 
           Type HeadnNeck Trunk Extremities 
1  Hutchinson's        22     2          10 
2   Superficial        16    54         115 
3       Nodular        19    33          73 
4 Indeterminant        11    17          28 

shows what the four variables are. Rweb has not understood the way the textbook authors formatted their data. It has taken the row labels to be a variable named Type, which is wrong. The row labels aren't data.

Finally

Rweb:> X[ , -1] 
  HeadnNeck Trunk Extremities 
1        22     2          10 
2        16    54         115 
3        19    33          73 
4        11    17          28
shows that X[ , -1] knocks off the first column (the row labels, which weren't data anyway). So this does give us the data we want, as we can see by comparing this output with the table in the textbook.

Another way to do the example

If the preceding section seems just too weird. Here's a more straightforward way to do it.

We don't use the data file provided by the textbook authors at all. We just type the data into the web form.

Here we just read the data into a vector tmp and then stuff it into a matrix (what mathematicians call a two-dimensional array of numbers).

The only trick bit is that the result, the matrix fred has rows and columns interchanged because of the way R stuffs vectors into matrices. We typed in the data reading across rows, but R reads down columns when putting numbers into matrices.

But having the rows and columns switched does not matter to the chi squared test. It does exacty the same thing either way. And we get exactly the same P-value.

Yet another way to do the example

This bit is probably overkill. The example is done to death already. But . . .

If it bothers you having the rows and columns switched, the optional argument byrow = TRUE to the matrix function fixes that