Statistics 3011 (Geyer and Jones, Spring 2006) Examples: Two-Way Tables

Contents

Matrix Stuff

For our example we will use the data from Example 6.1 in the Textbook (Moore).

Rweb does not like to input two way tables. So we will just type the table in directly rather than read it from a file. (We should apologise for the inadequacy of Rweb here. This is not an inadequacy of R. Most of R's input methods are disabled in Rweb in an attempt at security. The Introduction to R book says how.)

Comments

The R function c (on-line help) combines a list of numbers into a single object, which we name fred for no particular reason (the computer doesn't care what we call it).

The R function matrix (on-line help) creates a matrix (a two-way table), in this case with 4 rows (specified by nrow = 4) and with the numbers given (in fred) given in row order (specifed by byrow = TRUE, the default being column order).

The R function dimnames (on-line help) adds row and column labels.

The R function margin.table (on-line help) calculates margins of the table. The second argument 1 or 2 says which margin.

The R function sum (on-line help) calculates the sum (grand total) for the table.

The R function addmargins (on-line help) calculates the margins and tacks them onto the table.

It is a bit annoying to have all of this code in each R form, but that's the way it's going to be on this web page.

Our Numbers Totals Don't Agree with the Book

Note that the book's column totals are different from ours. This is explained on p. 138 (look for the boldface roundoff error.

Joint and Marginal Distributions

The joint distribution of the data is the table of raw numbers (fred in the example above) divided by the grand total (sum(fred) in the example above).

Then every number in the new table is between zero and one and the numbers add to one. This makes the numbers in the table probabilities.

Summary

The output

             25-34  35-54    55+    Sum 
DNF         0.0254 0.0524 0.0812 0.1590 
High School 0.0660 0.1510 0.1145 0.3314 
1-3 College 0.0610 0.1292 0.0635 0.2538 
4+ College  0.0632 0.1322 0.0605 0.2558 
Sum         0.2156 0.4647 0.3196 1.0000

gives the joint distribution of the data and the marginal distributions.

Note that there are two marginals, one for each variable.

Note that the grand total for the joint distribution is exactly one.

Comments

The only computational difference between this and the example above) is that we define sally to be fred divided by the grand total. Then we do to sally what we previously did to fred.

The R function round (on-line help) does just what the name suggests: round (in this case to 4 decimal places).

Conditional Distributions

First the calculation. Then the explanation.

Theory

Conditional probability is probability calculated relative to a restricted subset of the data.

In theory there is a different conditional for every possible subset. In practice, there are two conditional distributions, just like there are two marginals.

Hand Calculation

Consider the joint distribution for either the raw data (fred) or the data converted to probabilities (sally). It doesn't matter which.

            25-34 35-54   55+    Sum 
DNF          4459  9174 14226  27859 
High School 11562 26455 20060  58077
1-3 College 10693 22647 11125  44465 
4+ College  11071 23160 10597  44828 
Sum         37785 81436 56008 175229 

and suppose we are only interested in the highlighted row.

When the whole data are this one (highlighted) row, there is only one variable: age. Everybody in this row has the same education.

When we convert to probabilities, the grand mean for this row is 58077 (thousand people). Dividing this row by that total (the row total for this row) gives

 25-34  35-54    55+    Sum 
0.1991 0.4555 0.3454 1.0000

The highlighted numbers are the conditional distribution of age given that education is high school.

Computer Calculation

The R function prop.table (on-line help) calculates all the conditional distributions at once. This is illustrated in the computer example in this section (just above).

Rweb:> round(prop.table(fred, 1), 4) 
             25-34  35-54    55+ 
DNF         0.1601 0.3293 0.5106 
High School 0.1991 0.4555 0.3454
1-3 College 0.2405 0.5093 0.2502 
4+ College  0.2470 0.5166 0.2364 

gives all the conditional distributions for age given education.

Observe that the highlighted row is the same as computed in the hand calculation section (the conditional distribution of age given education = high school).

The other conditional (for education given age) is given by

Rweb:> round(prop.table(fred, 2), 4) 
            25-34  35-54    55+ 
DNF         0.118 0.1127 0.2540 
High School 0.306 0.3249 0.3582 
1-3 College 0.283 0.2781 0.1986 
4+ College  0.293 0.2844 0.1892