University of Minnesota, Twin Cities School of Statistics Stat 3011 Rweb Textbook (Wild and Seber)

- General Instructions
- Chi-Square Tests for One-Dimensional Tables
- Chi-Square Tests for Two-Dimensional Tables
- Chi-Square Tests for Two-by-Two Tables

This test involves a vector of category counts (assumed to be a random
sample of individuals classified into several categories). The null hypothesis
specifies the probability vector for the categories.
Say the category counts are an R dataset named `fred`

and
the hypothesized category probabilties are an R dataset named `p`

.
Then `chisq.test(fred, p=p)`

does the test.

For an example we will use the data for Example 11.1.1 in Wild and Seber, which is in the file rolldie.txt. Unfortunately, this file is formatted wrong to be read into Rweb. So we will just enter the data directly.

From the printout we get the result of the test:
*P* = 0.1833. Since this is not below 0.10, it is no evidence at all
against the null hypothesis (we conclude that the die is fair or, at least,
these data provide no evidence it isn't).

This test involves a table of category counts (assumed to be a random sample of individuals classified into the categories of the table). The null hypothesis for the test commonly done on such tables is called either a chi-square test of

*homogeneity*of proportions, or*independence*of categorical variables.

Both kinds of test do exactly the same calculation, give exactly the same
test statistic and exactly the same *P*-value. The only difference
is the name of the test and the conclusion. We say it is a test
of *independence* when both categorical variables are random.
We say it is a test of *homogeneity* when one is random and the
other fixed.
Say the table of category counts are an R dataset named `fred`

(which is a matrix), then `chisq.test(fred)`

does the test.

For an example we will use the data for Example 11.2.1 in Wild and Seber, which is in the file melanoma.txt. Unfortunately, because of the way Rweb reads in data, the syntax here is a bit obscure.

From the printout we get the result of the test:
*P* = 0.000. Extremely clear evidence that the null hypothesis is false.

Using `X`

says to use the whole data frame as a matrix.
The notation `X[ , -1]`

says to use the data frame with the
first column removed (the first column was row labels, so that does the
right thing).

For those who are not satisfied with this explanation, there is a fuller explanation but it isn't anything you really need to know.

In one sense, we shouldn't need to say anything special about the 2 by 2
case of the preceding section. But it is
*also* a special case of "difference of two proportions",
the test that goes with the confidence interval done long ago in
Chapter 8. Thus we can do this
test in *two completely different ways*, using either of

`prop.test`

`chisq.test`

For our example, we will use the data for Review Exercise 8(a) of Chapter 8 in Wild and Seber, which is in the 2 by 2 table

sex | prisoners with tubercululosis | Total number of prisoners |
---|---|---|

male | 556 | 984 |

female | 36 | 90 |

A confidence interval for the difference of population proportions for the
two categories (male and female) and a test of the hypothesis that the two
population proportions are equal is done by `prop.test`

just like
we did in Chapter 8.

The first argument to `prop.test`

is the vector of counts
of individuals with the specified characteristic (in this case tuberculosis)
in the two samples.
The second argument to `prop.test`

is the vector of sample
sizes.

The output, copied below, gives the test statistic and *P*-value
for the test

```
Rweb:> prop.test(c(556, 36), c(984, 90))
2-sample test for equality of proportions with continuity correction
data: c(556, 36) out of c(984, 90)
X-squared = 8.4244, df = 1, p-value = 0.003702
alternative hypothesis: two.sided
95 percent confidence interval:
0.05313106 0.27695024
sample estimates:
prop 1 prop 2
0.5650407 0.4000000
```

The
Although this is usually thougth of as a "*z* test" (one with a test
statistic that is approximately normal for large *n*), this R command
treats it as a chi-square test. The `X-squared = 8.4244`

reported
for the test statistic is approximately chi-squared on one degree of freedom
for large sample sizes.

We can also do this as a chi-square test using `chisq.test`

.
To do that we need to combine the "successes" and "failures" into a 2 by 2
table as follows.

The printout of the `chisq.test`

command is

```
Rweb:> chisq.test(data)
Pearson's Chi-square test with Yates' continuity correction
data: data
X-squared = 8.4244, df = 1, p-value = 0.003702
```

Note that the test statistic and *P*-value are exactly the same in
both cases. They are really both the same test.