For our example we will use the data from Example 6.1 in the Textbook (Moore).
Rweb does not like to input two way tables. So we will just type the table in directly rather than read it from a file. (We should apologise for the inadequacy of Rweb here. This is not an inadequacy of R. Most of R's input methods are disabled in Rweb in an attempt at security. The Introduction to R book says how.)
The R function c
(on-line
help)
combines a list of numbers into a single object, which we
name fred
for no particular reason (the computer doesn't
care what we call it).
The R function matrix
(on-line
help)
creates a matrix (a two-way table), in this case with 4 rows (specified
by nrow = 4
) and with the numbers given (in fred
)
given in row order (specifed by byrow = TRUE
, the default
being column order).
The R function dimnames
(on-line
help)
adds row and column labels.
The R function margin.table
(on-line
help) calculates margins of the table. The second argument 1 or 2 says
which margin.
The R function sum
(on-line
help)
calculates the sum (grand total) for the table.
The R function addmargins
(on-line
help)
calculates the margins and tacks them onto the table.
It is a bit annoying to have all of this code in each R form, but that's the way it's going to be on this web page.
Note that the book's column totals are different from ours. This is explained on p. 138 (look for the boldface roundoff error.
The joint distribution of the data is the table of raw
numbers (fred
in the example above)
divided by the grand total (sum(fred)
in
the example above).
Then every number in the new table is between zero and one and the numbers add to one. This makes the numbers in the table probabilities.
The output
25-34 35-54 55+ Sum DNF 0.0254 0.0524 0.0812 0.1590 High School 0.0660 0.1510 0.1145 0.3314 1-3 College 0.0610 0.1292 0.0635 0.2538 4+ College 0.0632 0.1322 0.0605 0.2558 Sum 0.2156 0.4647 0.3196 1.0000
gives the joint distribution of the data and the marginal distributions.
Note that there are two marginals, one for each variable.
Note that the grand total
for the joint distribution is exactly one.
The only computational difference between this
and the example above)
is that we define sally
to be fred
divided by the grand total. Then we do to sally
what we previously did to fred
.
The R function round
(on-line
help)
does just what the name suggests: round (in this case to 4 decimal places).
First the calculation. Then the explanation.
Conditional probability is probability calculated relative to a restricted subset of the data.
In theory there is a different conditional for every possible subset. In practice, there are two conditional distributions, just like there are two marginals.
Consider the joint distribution for either the raw data (fred
)
or the data converted to probabilities (sally
). It doesn't
matter which.
25-34 35-54 55+ Sum
DNF 4459 9174 14226 27859
High School 11562 26455 20060 58077
1-3 College 10693 22647 11125 44465
4+ College 11071 23160 10597 44828
Sum 37785 81436 56008 175229
and suppose we are only interested in the highlighted row.
When the whole data are this one (highlighted) row, there is only one variable: age. Everybody in this row has the same education.
When we convert to probabilities, the grand mean for this row is 58077 (thousand people). Dividing this row by that total (the row total for this row) gives
25-34 35-54 55+ Sum
0.1991 0.4555 0.3454 1.0000
The highlighted numbers are the conditional distribution of age
given that education is high school
.
The R function prop.table
(on-line
help)
calculates all the conditional distributions at once.
This is illustrated in the computer example in this section (just above).
Rweb:> round(prop.table(fred, 1), 4)
25-34 35-54 55+
DNF 0.1601 0.3293 0.5106
High School 0.1991 0.4555 0.3454
1-3 College 0.2405 0.5093 0.2502
4+ College 0.2470 0.5166 0.2364
gives all the conditional distributions for age given education.
Observe that the highlighted row is the same as computed in
the hand calculation
section
(the conditional distribution of age
given education = high school
).
The other conditional (for education given age) is given by
Rweb:> round(prop.table(fred, 2), 4) 25-34 35-54 55+ DNF 0.118 0.1127 0.2540 High School 0.306 0.3249 0.3582 1-3 College 0.283 0.2781 0.1986 4+ College 0.293 0.2844 0.1892