University of Minnesota, Twin Cities School of Statistics Stat 3011 Rweb Textbook (Wild and Seber)
This test involves a vector of category counts (assumed to be a random
sample of individuals classified into several categories). The null hypothesis
specifies the probability vector for the categories.
Say the category counts are an R dataset named
the hypothesized category probabilties are an R dataset named
chisq.test(fred, p=p) does the test.
For an example we will use the data for Example 11.1.1 in Wild and Seber, which is in the file rolldie.txt. Unfortunately, this file is formatted wrong to be read into Rweb. So we will just enter the data directly.
From the printout we get the result of the test: P = 0.1833. Since this is not below 0.10, it is no evidence at all against the null hypothesis (we conclude that the die is fair or, at least, these data provide no evidence it isn't).
This test involves a table of category counts (assumed to be a random sample of individuals classified into the categories of the table). The null hypothesis for the test commonly done on such tables is called either a chi-square test of
Both kinds of test do exactly the same calculation, give exactly the same
test statistic and exactly the same P-value. The only difference
is the name of the test and the conclusion. We say it is a test
of independence when both categorical variables are random.
We say it is a test of homogeneity when one is random and the
Say the table of category counts are an R dataset named
(which is a matrix), then
chisq.test(fred) does the test.
For an example we will use the data for Example 11.2.1 in Wild and Seber, which is in the file melanoma.txt. Unfortunately, because of the way Rweb reads in data, the syntax here is a bit obscure.
From the printout we get the result of the test: P = 0.000. Extremely clear evidence that the null hypothesis is false.
X says to use the whole data frame as a matrix.
X[ , -1] says to use the data frame with the
first column removed (the first column was row labels, so that does the
For those who are not satisfied with this explanation, there is a fuller explanation but it isn't anything you really need to know.
In one sense, we shouldn't need to say anything special about the 2 by 2 case of the preceding section. But it is also a special case of "difference of two proportions", the test that goes with the confidence interval done long ago in Chapter 8. Thus we can do this test in two completely different ways, using either of
For our example, we will use the data for Review Exercise 8(a) of Chapter 8 in Wild and Seber, which is in the 2 by 2 table
|sex||prisoners with tubercululosis||Total number of prisoners|
A confidence interval for the difference of population proportions for the
two categories (male and female) and a test of the hypothesis that the two
population proportions are equal is done by
prop.test just like
we did in Chapter 8.
The first argument to
prop.test is the vector of counts
of individuals with the specified characteristic (in this case tuberculosis)
in the two samples.
The second argument to
prop.test is the vector of sample
The output, copied below, gives the test statistic and P-value for the test
Rweb:> prop.test(c(556, 36), c(984, 90)) 2-sample test for equality of proportions with continuity correction data: c(556, 36) out of c(984, 90) X-squared = 8.4244, df = 1, p-value = 0.003702 alternative hypothesis: two.sided 95 percent confidence interval: 0.05313106 0.27695024 sample estimates: prop 1 prop 2 0.5650407 0.4000000The P-value is highlighted.
Although this is usually thougth of as a "z test" (one with a test
statistic that is approximately normal for large n), this R command
treats it as a chi-square test. The
X-squared = 8.4244 reported
for the test statistic is approximately chi-squared on one degree of freedom
for large sample sizes.
We can also do this as a chi-square test using
To do that we need to combine the "successes" and "failures" into a 2 by 2
table as follows.
The printout of the
chisq.test command is
Rweb:> chisq.test(data) Pearson's Chi-square test with Yates' continuity correction data: data X-squared = 8.4244, df = 1, p-value = 0.003702
Note that the test statistic and P-value are exactly the same in both cases. They are really both the same test.