# Statistics 3011, Fall 2001, Prof. Geyer, Homework Assignments

You can look at last year's homework assignments to get an idea what future assignments will be like. This year's will be more or less the same. We may cover a little less material than last year. And of course last year's dates don't match this year.

Go to assignment:     1     2     3     4     5     6     7     8     9     10     11

Note: The problems assigned will all be "Exercises" or "Review Exercises" not "Quiz" questions.

No. Due Date Ch. or Sec. Exercises Comments
1 Fri Sep 14 1.1 2, 4
1.2 1
1.3 2
1 (Review) 2, 3, 6 12, 16
2 Fri Sep 21 2.3.1 1
2.3.2 4
2.3.3 2 histogram only, use computer and use `right=FALSE` for comparison with Figure 2.3.8 in the textbook and the histogram example from class.
2.3.4 2
2.4.1 4 use computer to find median
2.4.2 1(b), 4 see data entry example from class.
2.4.3 2
2 (Review) 4, 6
3 Fri Sep 28 2 (Review) 9abde
3.1.2 1, 3
3 (Review) 3 You don't have to do anything special to avoid plotting the 24th point. Its y value is `NA` (no value) so it will be ignored.
4.3 1, 3
4.4.2 3
4.4.3 2
4.4.4 2
4.5 1, 4
4 (Review) 2, 3
4 Fri Oct  5 4.7.1 2
4.7.3 1
4 (Review) 5, 18
5.2 1, 5b-i
5.4.1 1
5.4.2 2
5.4.3 a-i no number on question, do all parts
5 (Review) 11
A 1, 2 "additional problems" see below.
5 Fri Oct 12 6.2.2 2, 3 You should also do 1 in the sense of visiting the examples page but needn't hand anything in.
6.2.3 2, 3 You should also do 1 in the sense of visiting the examples page but needn't hand anything in.
6.2.4 1, 2
6.4.3 1abcf, 2
6 (Review) 1, 2, 4, 10
6 Fri Oct 26 7.2.1 1 Note: The answers in the back of the book are correct. The previous comment that they were wrong was itself wrong.
7.2.2 1, 2
7.2.3 1, 3 Recall that the R command `fred <- c(0.513, 0.524, 0.529)` creates a data vector of those three numbers, and similarly for longer data vectors.
7.3.1 1, 3
7.5 1, 3
7 (Review) 4, 10, 12, 20
7 Fri Nov  2 A 3, 4, 5, 6 "additional problems" see below.
7 (Review) 15
8.2 2 Recall that the R command `fred <- c(0.513, 0.524, 0.529)` creates a data vector of those three numbers, and similarly for longer data vectors.
8.3 1, 2 Do each of these problems twice, once using the methods described in Wild and Seber getting the answer in the back of the book, then again using the R function `prop.test` described in the Lecture Examples for Chapter 8.
8 (Review) 1
8 Fri Nov  9 7.5 2 This problem and the next are paired. The two samples can be obtained in Rweb by the statements
```   x <- density[1:6]
y <- density[7:29]
```
when you are on the Rweb for 3011 page with the dataset
```   Table 7.2.1 (p. 291) cavend.txt
```
selected in the "Datasets from Wild and Seber" chooser.
8.4 1 For (a) use the R function `t.test`, which does the Right Thing, not the answer in the back of the book.
8.5 1, 2
8.6 1, 2
8 (Review) 4, 12
A 7, 8, 9 "additional problems" see below (problem 9 was added Monday).
9 Mon Nov 26 9.2 1, 3, 4
9.3 2, 3, 7
9 (Review) 2, 12, 18
10 Wed Dec 5 10.1.2 1
10.3 1, 2, 3, 4, 5
10 (Review) 4, 6abdef, 12abcefg omit part (c) of 6 and part (d) of 12
11 Fri Dec 14 11.1 1
11.2.1 to 11.2.3 2
11 (Review) 2, 5, 6
A 10 "additional problems" see below. Also see Chi-Square Tests for 2 by 2 Tables for help with additional problem 10

## Additional Problems

1. Suppose the probability of a widget being defective is 0.02. Suppose widgets come in boxes of 12. Assume widget defects are statistically independent.

1. What is the probability that a box of widgets contains no defects?
2. What is the probability that a box of widgets contains at least one defective widget?

2. For the probability model for the random variable X defined by the following table

 x 0 1 pr(x) 1 - p p

1. Find E(X).
2. Find sd(X).

3. Suppose the random variable T has Student(10) distribution (Student's t-distribution with 10 degrees of freedom).

1. Find P(T < 1.234). Use R or Rweb. (Answer: 0.8772914).
2. Find P(T > 1.234). Use R or Rweb.
3. Find P(|T| > 1.234). Use R or Rweb.
Note: the vertical bars are absolute value signs. The question is the same as: Find P(T < -1.234 or 1.234 < T).

4. Suppose the random variable T has Student(7) distribution

1. Find the t such that P(T > t) = 0.05. Use R or Rweb or Appendix A6 in Wild and Seber. (Answer: 1.894579).
2. Find the t such that P(T < t) = 0.05. Use R or Rweb or Appendix A6 in Wild and Seber or part (a).
3. Find the t such that P(|T| > t) = 0.05. Use R or Rweb or Appendix A6 in Wild and Seber.
Note: the vertical bars are absolute value signs. The question is the same as: Find the t such that P(T < -t or t < T) = 0.05.

5. Widgets produced at Acme Widget Works are specified to have 7.00 mm frammis diameter. A random sample of 5 widgets are taken from the production line and their diameters accurately measured. The sample mean was 6.9123 mm and the sample standard deviation 0.0884 mm. Assume that the distribution of frammis diameters is normal, and give an interval that has 95% coverage probability for the true mean frammis diameter of widgets being produced based on Student's t-distribution. Answer: (6.80, 7.02).

6. Jones and Smith are running for Mayor of the town of Outer Boondock. Two polls taken one month apart by the local paper, both with sample sizes of 500, had the results shown below

candidate first poll second poll
Jones 37.2% 42.6%
Smith 45.4% 42.8%
Undecided 17.4% 14.6%

It appears from the polls that Jones is gaining. But appearances may be deceiving.

1. Calculate a 2 standard error interval for the difference of the true proportions of the population that would have indicated a preference for Jones on the dates of the polls if the whole population had been asked. Assume that the polls did take a random sample of the population.
2. Interpret your interval. Does it indicate that Jones is really gaining? Or is no real change a possibility?

7. Redo part (a) of Additional Problem 6 using the R function `prop.test` rather than hand calculation.

8. Using the data for the first poll in Additional Problem 6 calculate an approximate confidence interval for the difference of proportions of voters favoring Jones and favoring Smith. (Hint: Which of Wild and Seber's three cases is this?)

9. In two polls taken a month apart, each poll sampling 600 likely voters, the preferences expressed for the candidates were

 First Poll Second Poll Shrub 45% 50% Pierce 35% 36% Bottom 12% 8% Undecided 8% 6%
Both polls gave their margin of error as 4%.

In the second poll the following results were reported for suburban college educated women (67 were in the sample, about 1 / 9 of the sample).

 Second Poll Shrub 62% Pierce 24% Bottom 9% Undecided 5%
The large difference between these results and the results for the whole sample caused much woofing among the pundits.

Answer each of the questions below two ways.

• Do an exact two-standard-error interval using the appropriate formula from Table 8.5.6 in Wild and Seber for the standard error or appropriate web form on our web page on the same material.
• Also do a quick and dirty or mental adjustment calculation described in Section 8.5.3 in Wild and Seber or on our web page on the same material.
1. What is the margin of error of the 50% reported for Shrub in the second poll?
2. What is the margin of error of the 5% increase in the support for Shrub from the first to the second poll?
3. What is the margin of error of the 14% difference (50% - 36%) in the support for Shrub and for Pierce in the second poll?
4. What is the margin of error of the 62% reported for Shrub in this subgroup?
5. What is the margin of error of the 38% difference (62% - 24%) in the support for Shrub and for Pierce in this subgroup?

10. For the two polls in additional question 9 the table below gives the actual counts (how many actual people correspond to each cell of the table), which you need for this problem.

 First Poll Second Poll Shrub 270 302 Pierce 210 215 Bottom 72 48 Undecided 48 35

The sample size for both polls was 600.

1. Perform a test of whether there is any difference in the true population proportions of people favoring Shrub at the times of the two polls. Obtain a P-value and interpret the P-value, saying what it implies about support for Shrub. Clearly state whether you did a one-tailed or two-tailed test and why.
2. Perform a test of whether there is any difference is the true population proportions of any of the categories at the times of the two polls. For this you will need the matrix read into Rweb. The following box does this for you. You only need to supply the correct analysis.

Obtain a P-value and interpret the P-value, saying what it implies about support for the various candidates.