Statistics 5102 (Geyer, Fall 2016) Examples: Categorical Data Analysis

One-Way Contingency Table

The data set

http://www.stat.umn.edu/geyer/5102/data/ex6-5.txt

simulates 6000 rolls of a fair die (singular of dice). We test the hypothesis that all six cells of the contingency table have the same probability (null hypothesis) versus that they are different (alternative hypothesis).

The following R statements do this test two different ways.

The tests are asymptotically equivalent so it is no surprise that the test statistics are very similar (4.904 and 4.895), as are the P-values (0.4277 and 0.4289).

Since the P-values are not small, the null hypothesis is accepted. The die seems fair.

One-Way Contingency Table with Parameters Estimated

The data set

http://www.stat.umn.edu/geyer/5102/data/ex6-6.txt

simulates 6000 rolls of an unfair die of the type known as six-ace flats. The one and six faces are shaved slightly so the other faces have smaller area and smaller probability. We start by testing the hypothesis that all six cells of the contingency table have the same probability.

The following R statements do this test two different ways.

Since the P-values are above 0.1, the null hypothesis is accepted. The die seems fair.

However, it is a bad idea to reject a hypothesis we haven't even fit yet. Suppose instead we do a likelihood ratio test of model comparison, comparing the six-ace flats hypothesis (one and six have the same probability, two, three, four, and five have the same probability) to hypothesis that all six cells have the same probability.

We find that the null hypothesis of equal probabilities is rejected and the six-ace flats hypothesis accepted (P = 0.038).

The same test can also be done assuming Poisson sampling rather than multinomial sampling. The likelihood ratio test statistic is the same and the degrees of freedom for the chi-square approximation are the same.

Two-Way Contingency Table

When there are two categorical predictors, we can also think of the contingency table as a two-dimensional array, one categorical predictor giving the row labels and the other giving the column labels.

Rweb does not like to read data as contingency tables, so we read it as usual.

The data set

http://www.stat.umn.edu/geyer/5102/data/ex6-7.txt

has three variables, the response y and two categorical predictors color and opinion.

The following R code does the likelihood ratio test.

The test rejects the null hypothesis that the two categorical predictors have independent effects (P = 0.0159).

The analogous chi-square test requires us to put the data in a two-way array. The following R code does this test.

The R function xtabs (on-line help) converts data from the data frame format read by Rweb and wanted by the lm and glm functions to the contingency table (matrix) format wanted by the chisq.test function.

Statistics 5102 (Geyer, Fall 2016) Examples: Categorical Data Analysis

One-Way Contingency Table

One-Way Contingency Table with Parameters Estimated

Two-Way Contingency Table

Navigation

Contents