University of Minnesota, Twin Cities School of Statistics Stat 3011 Rweb Textbook (Wild and Seber)

Stat 3011 (Geyer) In-Class Examples (Chapter 11)

General Instructions
Chi-Square Tests for One-Dimensional Tables
Chi-Square Tests for Two-Dimensional Tables
Chi-Square Tests for Two-by-Two Tables

General Instructions

To do each example, just click the "Submit" button. You do not have to type in any R instructions (that's already done for you). You do not have to select a dataset (that's already done for you).

Chi-Square Tests for One-Dimensional Tables (Section 11.1 in Wild and Seber)

This test involves a vector of category counts (assumed to be a random sample of individuals classified into several categories). The null hypothesis specifies the probability vector for the categories. Say the category counts are an R dataset named fred and the hypothesized category probabilties are an R dataset named p. Then chisq.test(fred, p=p) does the test.

For an example we will use the data for Example 11.1.1 in Wild and Seber, which is in the file rolldie.txt. Unfortunately, this file is formatted wrong to be read into Rweb. So we will just enter the data directly.

From the printout we get the result of the test: P = 0.1833. Since this is not below 0.10, it is no evidence at all against the null hypothesis (we conclude that the die is fair or, at least, these data provide no evidence it isn't).

Chi-Square Tests for Two-Dimensional Tables (Section 11.2 in Wild and Seber)

This test involves a table of category counts (assumed to be a random sample of individuals classified into the categories of the table). The null hypothesis for the test commonly done on such tables is called either a chi-square test of

homogeneity of proportions, or
independence of categorical variables.

The two categorial variables in question give the row and column labels.

Both kinds of test do exactly the same calculation, give exactly the same test statistic and exactly the same P-value. The only difference is the name of the test and the conclusion. We say it is a test of independence when both categorical variables are random. We say it is a test of homogeneity when one is random and the other fixed. Say the table of category counts are an R dataset named fred (which is a matrix), then chisq.test(fred) does the test.

For an example we will use the data for Example 11.2.1 in Wild and Seber, which is in the file melanoma.txt. Unfortunately, because of the way Rweb reads in data, the syntax here is a bit obscure.

From the printout we get the result of the test: P = 0.000. Extremely clear evidence that the null hypothesis is false.

Using X says to use the whole data frame as a matrix. The notation X[ , -1] says to use the data frame with the first column removed (the first column was row labels, so that does the right thing).

For those who are not satisfied with this explanation, there is a fuller explanation but it isn't anything you really need to know.

Chi-Square Tests for Two-by-Two Tables

In one sense, we shouldn't need to say anything special about the 2 by 2 case of the preceding section. But it is also a special case of "difference of two proportions", the test that goes with the confidence interval done long ago in Chapter 8. Thus we can do this test in two completely different ways, using either of

prop.test
chisq.test

For our example, we will use the data for Review Exercise 8(a) of Chapter 8 in Wild and Seber, which is in the 2 by 2 table

sex prisoners with tubercululosis Total number of prisoners
male 556 984
female 36 90

sex	prisoners with tubercululosis	Total number of prisoners
male	556	984
female	36	90

A confidence interval for the difference of population proportions for the two categories (male and female) and a test of the hypothesis that the two population proportions are equal is done by prop.test just like we did in Chapter 8.

The first argument to prop.test is the vector of counts of individuals with the specified characteristic (in this case tuberculosis) in the two samples. The second argument to prop.test is the vector of sample sizes.

The output, copied below, gives the test statistic and P-value for the test

Rweb:> prop.test(c(556, 36), c(984, 90)) 
 
         2-sample test for equality of proportions with continuity correction  
 
data:  c(556, 36) out of c(984, 90)  
X-squared = 8.4244, df = 1, p-value = 0.003702
alternative hypothesis: two.sided  
95 percent confidence interval: 
 0.05313106 0.27695024  
sample estimates: 
   prop 1    prop 2  
0.5650407 0.4000000

The P-value is highlighted.

Although this is usually thougth of as a "z test" (one with a test statistic that is approximately normal for large n), this R command treats it as a chi-square test. The X-squared = 8.4244 reported for the test statistic is approximately chi-squared on one degree of freedom for large sample sizes.

We can also do this as a chi-square test using chisq.test. To do that we need to combine the "successes" and "failures" into a 2 by 2 table as follows.

The printout of the chisq.test command is

Rweb:> chisq.test(data) 
 
         Pearson's Chi-square test with Yates' continuity correction  
 
data:  data  
X-squared = 8.4244, df = 1, p-value = 0.003702

Note that the test statistic and P-value are exactly the same in both cases. They are really both the same test.

Stat 3011 (Geyer) In-Class Examples (Chapter 11)

Contents

General Instructions

Chi-Square Tests for One-Dimensional Tables (Section 11.1 in Wild and Seber)

Chi-Square Tests for Two-Dimensional Tables (Section 11.2 in Wild and Seber)

Chi-Square Tests for Two-by-Two Tables