Statistics 3011 (Geyer and Jones, Spring 2006) Examples:
Tests and Confidence Intervals for Two-Sample Location Problems

Matched Pairs

Some two-sample problems are really one-sample problems in disguise.

In this section we consider paired observations (Xi, Yi) on n individuals.

Depending on the question of interest, such data might be suitable for calculating a correlation or doing regression. Here we are only interested in comparing the population means of the X and Y values.

This is covered in the textbook (pp. 209–210 and 421–424).

The trick is to convert the two-sample data into one-sample data, usually by subtraction

Zi = XiYi

Having produced one-sample data Z1, … Zn we consider this a sample from a population (of Z values) and produce either a test or a confidence interval about the location parameter (mean or median).

Since we already know how to do such confidence intervals and tests of significance, there is nothing new we need to cover. In fact, our example for tests that started out by computing differences

diff15 <- run1 - run5

was already an example.

So there is really nothing to see here. The only reason we say this much about the issue is that textbooks traditionally make a big deal of it, hence knowing something about the issue is part of being knowledgeable about statistics.

Two Independent Samples

The situation is very different when we have two independent samples, one of X values and one of Y values. This gets a whole chapter in the textbook (Chapter 17) and is not doable using the one-sample procedures we already know.

Assumptions

We have two samples from two populations. The populations may be the same or different (that is generally the issue for a test of significance). We make the following assumptions.

The first assumption is about independence — and identical distribution — within each sample. The second assumption is about independence — not identical distribution — between the two samples.

Number the two samples and two populations 1 and 2. For each index i = 1 or 2 we have

There are further assumptions about the population distributions that may or may not be required (see below).

The Sampling Distribution of the Difference of Sample Means

In this section we describe the sampling distribution of x1x2.

In Chapter 10 of the textbook and on our sampling distributions web page we learned about the sampling distribution of the sample mean.

Each of x1 and x2 is a random variable of the sort described there (a sample mean). Any function of random variables is a random variable. Hence x1x2. is a random variable. Hence it has a sampling distribution.

Mean

The mean of the sampling distribution of x1x2 is the difference of population means μ1 − μ2.

Standard Deviation

The standard deviation of the sampling distribution of x1x2 is

σ12n1 + σ22n2

Shape

If both populations are exactly normally distributed, then the sampling distribution of x1x2 is also exactly normal.

Regardless of the population distributions, if n1 and n2 are both sufficiently large, then the sampling distribution of x1x2 is approximately normal (by the central limit theorem).

Welch's Approximation

In order to do a test, we need to standardize x1x2

z = [(x1x2) − (μ1 − μ2)] ⁄ √σ12n1 + σ22n2

Although this standardized quantity does not literally appear in the formula for the confidence interval, its distribution produces the critical value.

If both populations are exactly normal, then the sampling distribution of z is exactly standard normal.

However, this result is of little use in practice. We cannot use it unless we know the population standard deviations (which we almost never do).

As in the one-sample case, the natural thing to do is to plug in sample standard deviations for population standard deviations.

That is, we need to know the distribution of

t = [(x1x2) − (μ1 − μ2)] ⁄ √s12n1 + s22n2

Unfortunately, this random variable does not have a distribution. More precisely, it has a sampling distribution, but that distribution depends on the unknown population standard deviations even when the population distributions are normal.

Fortunately, there is a good approximation to the sampling distribution of t. It is called Welch's approximation. If both population distributions are exactly normal then this sampling distribution is well approximated by a Student t distribution with noninteger degrees of freedom given by the formula on p. 452 in the textbook

df = (s12n1 + s22n2)2 ⁄ [(s12n1)2 ⁄ (n1 − 1) + (s22n2)2 ⁄ (n2 − 1)]

(If we use the computer for confidence intervals and tests of significance, then we never need to use this formula. The computer uses it automatically.)

Regardless of the population distributions, if both sample sizes n1 and n1 are sufficiently large, the sampling distribution of the statistic t given above is approximately normal (by the central limit theorem).

Summary

If n1 and n2 are both large, then the population distributions don't matter and we use z critical values.

If n1 and n2 are not both large, then the population distributions do matter and we use t critical values with degrees of freedom given by Welch's approximation.

Confidence Intervals

The confidence interval is for μ1 − μ2 is

(x1x2) ± t* × √s12n1 + s22n2

where t* is the appropriate t critical value with degrees of freedom from Welch's approximation (a standard normal z critical value may be used if n1 and n2 are both large).

The calculations for this are very obnoxious if not done by computer.

We do Examples 17.2 and 17.3 in the textbook (same data for both but discussion split into two separate boxes). They have data

http://www.stat.umn.edu/geyer/3011/mdata/chap17/eg17-02.dat

Example 17.3 does a 90% confidence interval.

External Data Entry

Enter a dataset URL :

The expression Strength ~ Weeks that is the first argument to the t.test function (on-line help) is an R formula. It says that the one variable Strength contains the data for both samples, the samples being distinguished by the value of the variable Weeks.

The t.test interval does not agree with the textbook because the book has not yet introduced Welch's approximation. As it reports, t.test is using 4.651 degrees of freedom.

Tests of Significance

The null hypothesis specifies a value for the difference of population means (usually zero).

The alternative hypothesis specifies a range of values for the same parameter.

Kinds of Tests and Hypotheses
tail type P-value H0 Ha
lower tail P(Tt) μ1 − μ2 = δ0 μ1 − μ2 < δ0
upper tail P(Tt) μ1 − μ2 = δ0 μ1 − μ2 > δ0
two tail P(|T| ≥ |t|) μ1 − μ2 = δ0 μ1 − μ2 ≠ δ0

The test statistic is

t = [(x1x2) − δ0] ⁄ √s12n1 + s22n2

Again, the calculations for this are very obnoxious if not done by computer.

We again do Examples 17.2 and 17.3 in the textbook, which have data given in the URL linked above. We want to do an upper-tailed test because the language in Example 17.2 Is this good evidence that polyester decays more in 16 weeks than in 2 weeks?

External Data Entry

Enter a dataset URL :

The P-value given in the textbook does not agree with the P-value given by t.test because, again, the book is not yet using Welch's approximation.

However, the P-value we get leads to the same interpretation.

There is no statistically significant difference in polyester decay measured by population mean breaking strength at 2 weeks and at 16 weeks (P = 0.186, upper-tailed Welch's approximate t test).