# Stat 3011 (Geyer) In-Class Examples (Chapter 11)

## General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions (that's already done for you). You do not have to select a dataset (that's already done for you).

## Chi-Square Tests for One-Dimensional Tables (Section 11.1 in Wild and Seber)

This test involves a vector of category counts (assumed to be a random sample of individuals classified into several categories). The null hypothesis specifies the probability vector for the categories. Say the category counts are an R dataset named `fred` and the hypothesized category probabilties are an R dataset named `p`. Then `chisq.test(fred, p=p)` does the test.

For an example we will use the data for Example 11.1.1 in Wild and Seber, which is in the file rolldie.txt. Unfortunately, this file is formatted wrong to be read into Rweb. So we will just enter the data directly.

From the printout we get the result of the test: P = 0.1833. Since this is not below 0.10, it is no evidence at all against the null hypothesis (we conclude that the die is fair or, at least, these data provide no evidence it isn't).

## Chi-Square Tests for Two-Dimensional Tables (Section 11.2 in Wild and Seber)

This test involves a table of category counts (assumed to be a random sample of individuals classified into the categories of the table). The null hypothesis for the test commonly done on such tables is called either a chi-square test of

• homogeneity of proportions, or
• independence of categorical variables.
The two categorial variables in question give the row and column labels.

Both kinds of test do exactly the same calculation, give exactly the same test statistic and exactly the same P-value. The only difference is the name of the test and the conclusion. We say it is a test of independence when both categorical variables are random. We say it is a test of homogeneity when one is random and the other fixed. Say the table of category counts are an R dataset named `fred` (which is a matrix), then `chisq.test(fred)` does the test.

For an example we will use the data for Example 11.2.1 in Wild and Seber, which is in the file melanoma.txt. Unfortunately, because of the way Rweb reads in data, the syntax here is a bit obscure.

From the printout we get the result of the test: P = 0.000. Extremely clear evidence that the null hypothesis is false.

Using `X` says to use the whole data frame as a matrix. The notation `X[ , -1]` says to use the data frame with the first column removed (the first column was row labels, so that does the right thing).

For those who are not satisfied with this explanation, there is a fuller explanation but it isn't anything you really need to know.

## Chi-Square Tests for Two-by-Two Tables

In one sense, we shouldn't need to say anything special about the 2 by 2 case of the preceding section. But it is also a special case of "difference of two proportions", the test that goes with the confidence interval done long ago in Chapter 8. Thus we can do this test in two completely different ways, using either of

• `prop.test`
• `chisq.test`

For our example, we will use the data for Review Exercise 8(a) of Chapter 8 in Wild and Seber, which is in the 2 by 2 table
sex prisoners with tubercululosis Total number of prisoners
male 556 984
female 36 90

A confidence interval for the difference of population proportions for the two categories (male and female) and a test of the hypothesis that the two population proportions are equal is done by `prop.test` just like we did in Chapter 8.

The first argument to `prop.test` is the vector of counts of individuals with the specified characteristic (in this case tuberculosis) in the two samples. The second argument to `prop.test` is the vector of sample sizes.

The output, copied below, gives the test statistic and P-value for the test

```Rweb:> prop.test(c(556, 36), c(984, 90))

2-sample test for equality of proportions with continuity correction

data:  c(556, 36) out of c(984, 90)
X-squared = 8.4244, df = 1, p-value = 0.003702
alternative hypothesis: two.sided
95 percent confidence interval:
0.05313106 0.27695024
sample estimates:
prop 1    prop 2
0.5650407 0.4000000
```
The P-value is highlighted.

Although this is usually thougth of as a "z test" (one with a test statistic that is approximately normal for large n), this R command treats it as a chi-square test. The `X-squared = 8.4244` reported for the test statistic is approximately chi-squared on one degree of freedom for large sample sizes.

We can also do this as a chi-square test using `chisq.test`. To do that we need to combine the "successes" and "failures" into a 2 by 2 table as follows.

The printout of the `chisq.test` command is

```Rweb:> chisq.test(data)

Pearson's Chi-square test with Yates' continuity correction

data:  data
X-squared = 8.4244, df = 1, p-value = 0.003702
```

Note that the test statistic and P-value are exactly the same in both cases. They are really both the same test.