Some two-sample problems are really one-sample problems in disguise.
In this section we consider paired observations (Xi, Yi) on n individuals.
Depending on the question of interest, such data might be suitable for calculating a correlation or doing regression. Here we are only interested in comparing the population means of the X and Y values.
This is covered in the textbook (pp. 209–210 and 421–424).
The trick is to convert the two-sample data into one-sample data, usually by subtraction
Having produced one-sample data Z1, … Zn we consider this a sample from a population (of Z values) and produce either a test or a confidence interval about the location parameter (mean or median).
Since we already know how to do such confidence intervals and tests of significance, there is nothing new we need to cover. In fact, our example for tests that started out by computing differences
diff15 <- run1 - run5
was already an example.
So there is really nothing to see here. The only reason we say this much about the issue is that textbooks traditionally make a big deal of it, hence knowing something about the issue is part of being knowledgeable about statistics.
The situation is very different when we have two independent samples, one of X values and one of Y values. This gets a whole chapter in the textbook (Chapter 17) and is not doable using the one-sample procedures we already know.
We have two samples from two populations. The populations may be the same or different (that is generally the issue for a test of significance). We make the following assumptions.
The first assumption is about independence — and identical distribution — within each sample. The second assumption is about independence — not identical distribution — between the two samples.
Number the two samples and two populations 1 and 2. For each index i = 1 or 2 we have
There are further assumptions about the population distributions that may or may not be required (see below).
In this section we describe the sampling distribution of
1 − 2.In Chapter 10 of the textbook and on our sampling distributions web page we learned about the sampling distribution of the sample mean.
Each of
1 and 2 is a random variable of the sort described there (a sample mean). Any function of random variables is a random variable. Hence 1 − 2. is a random variable. Hence it has a sampling distribution.The mean of the sampling distribution of
1 − 2 is the difference of population means μ1 − μ2.The standard deviation of the sampling distribution of
1 − 2 isIf both populations are exactly normally distributed, then the sampling distribution of
1 − 2 is also exactly normal.Regardless of the population distributions, if n1 and n2 are both sufficiently large, then the sampling distribution of
1 − 2 is approximately normal (by the central limit theorem).In order to do a test, we need to standardize
1 − 2Although this standardized quantity does not literally appear in the formula for the confidence interval, its distribution produces the critical value.
If both populations are exactly normal, then the sampling distribution of z is exactly standard normal.
However, this result is of little use in practice. We cannot use it unless we know the population standard deviations (which we almost never do).
As in the one-sample case, the natural thing to do is to plug in sample standard deviations for population standard deviations.
That is, we need to know the distribution of
Unfortunately, this random variable does not have a
distribution.
More precisely, it has a sampling distribution, but that distribution depends
on the unknown population standard deviations even when the population
distributions are normal.
Fortunately, there is a good approximation to the sampling distribution of t. It is called Welch's approximation. If both population distributions are exactly normal then this sampling distribution is well approximated by a Student t distribution with noninteger degrees of freedom given by the formula on p. 452 in the textbook
(If we use the computer for confidence intervals and tests of significance, then we never need to use this formula. The computer uses it automatically.)
Regardless of the population distributions, if both sample sizes n1 and n1 are sufficiently large, the sampling distribution of the statistic t given above is approximately normal (by the central limit theorem).
If n1 and n2 are both large, then the population distributions don't matter and we use z critical values.
If n1 and n2 are not both large, then the population distributions do matter and we use t critical values with degrees of freedom given by Welch's approximation.
The confidence interval is for μ1 − μ2 is
where t* is the appropriate t critical value with degrees of freedom from Welch's approximation (a standard normal z critical value may be used if n1 and n2 are both large).
The calculations for this are very obnoxious if not done by computer.
We do Examples 17.2 and 17.3 in the textbook (same data for both but discussion split into two separate boxes). They have data
Example 17.3 does a 90% confidence interval.
The expression Strength ~ Weeks
that is the first
argument to the t.test
function
(on-line
help) is an R formula
. It says that the one variable
Strength
contains the data for both samples, the samples
being distinguished by the value of the variable Weeks
.
The t.test
interval does not agree with the textbook
because the book has not yet introduced Welch's approximation.
As it reports, t.test
is using 4.651 degrees of freedom.
The null hypothesis specifies a value for the difference of population means (usually zero).
The alternative hypothesis specifies a range of values for the same parameter.
tail type | P-value | H0 | Ha |
---|---|---|---|
lower tail | P(T ≤ t) | μ1 − μ2 = δ0 | μ1 − μ2 < δ0 |
upper tail | P(T ≥ t) | μ1 − μ2 = δ0 | μ1 − μ2 > δ0 |
two tail | P(|T| ≥ |t|) | μ1 − μ2 = δ0 | μ1 − μ2 ≠ δ0 |
The test statistic is
Again, the calculations for this are very obnoxious if not done by computer.
We again do Examples 17.2 and 17.3 in the textbook,
which have data given in the URL linked above.
We want to do an upper-tailed test because
the language in Example 17.2
Is this good evidence that polyester decays
more in 16 weeks than in 2 weeks?
The P-value given in the textbook does not agree with the
P-value given by t.test
because, again, the
book is not yet using Welch's approximation.
However, the P-value we get leads to the same interpretation.
There is no statistically significant difference in polyester decay measured by population mean breaking strength at 2 weeks and at 16 weeks (P = 0.186, upper-tailed Welch's approximate t test).