Statistics 5102 (Geyer, Spring 2003) Homework Assignments

Go to assignment:     1     2     3     4     5     6     7     8     9     10     11     12

No. Due Date Sec. Exercises Comments
1 Wed Jan 29 6.2 6, 7, 10
6.3 5, 7, 12, 14
2 Wed Feb 05 6.4 2, 6, 7, 8
6.5 2, 6, 9, 10
3 Wed Feb 12 6.6 2, 6
A 1 additional problem number one (see below).
7.1 2, 4, 6, 8
7.2 6, 10, 11
4 Wed Feb 17 7.3 4, 6, 8
7.4 1, 2, 6 Alternatively, use Problem 6 on the 5101 final exam.
7.5 2, 4, 6, 11
5 Wed Mar 05 7.6 4, 8, 10, 11
7.7 6, 11
A 2, 3, 4 additional problems (see below).
6 Wed Mar 12 7.8 2, 4, 6, 14
A 5, 6, 7, 8, 9, 10 additional problems (see below).
7 Wed Mar 26 8.1 1, 4, 15
8.5 2, 12, 14
8.6 2, 3, 7
A 11 additional problems (see below).
8 Wed Apr 02 8.7 2, 4, 7 For 7 also find the P-value of the test. See the page about F tests.
9.1 4, 7, 8
9 Wed Apr 16 9.2 2, 6
9.3 5
9.4 2
9.6 4, 9 data are in http://www.stat.umn.edu/geyer/old03/5102/examp/ds9-7.4.txt and http://www.stat.umn.edu/geyer/old03/5102/examp/ds9-7.9.txt. The answer in the back of the book uses the large sample approximation. R doesn't. So R doesn't give the same answer unless you say ks.test(x, y, exact = FALSE).
A 12 additional problems (see below).
10 Wed Apr 23 A 13, 14 additional problems (see below).
10.1 4, 6, 7 the data for 7 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-1-3.txt
10.2 12, 16 for 16 give the 95% prediction interval rather than M.S.E.
11 Wed Apr 30 10.3 10, 11 the data for 10 and 11 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-9.txt
A 15, 16, 17, 18, 19 additional problems (see below).
12 Fri May  9 10.6 10 the data for 10 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-18.txt
10.7 14, 15 the data for 14 and 15 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-24.txt
10.8 11, 12, 13 the data for 11, 12, and 13 are in the file http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-29.txt
A 20, 21 additional problems (see below).

Additional Problems

1. Like the example of maximum likelihood done by computer except instead of the gamma scale model, we will use the Cauchy location model. The likelihood is given by (6.6.7) on p. 366 of DeGroot and Schervish. For data, use the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/cauchy.txt

and for a starting point use the sample median rather than the sample mean, that is, median(x) instead of mean(x). The reason for this will become clear later. The sample is a very bad estimate of location for the Cauchy distribution.

2. Solve the quadratic equation to prove that the interval (2.18) in the handout does indeed have endpoints (2.19) in the handout.

3. Calculate the three kinds of intervals given by equations (2.20), (2.19), and (2.22) in the handout for binomial data with n = 50 and x = 4. Use 95% for the confidence coefficient.

4. Calculate the second and fourth central moments μ2 and μ4 in the notation of the handout for the so-called double exponential distribution with density

f(x) = (1 / 2) e− |x|,           − ∞ < x < ∞

(note this distribution is symmetric about zero, so the mean is zero and all odd central moments are zero).

Compare the correct asymptotic variance of the sample variance μ4 − μ22 with the incorrect asymptotic variance of the sample variance 2 μ22 that we would get if we incorrectly assumed the data were normal. (Section 2.10 of the handout).

5. Starting with the asymptotic distribution for Sn2 given on p. 16 of the more on confidence intervals handout use the delta method to give the asymptotic distribution of Sn.

6. Using the method of Section 1.2 of the more on confidence intervals handout, find an exact 95% confidence interval for the mean (not the rate) parameter of an exponential distribution from which it is assumed we have independent and identically distributed data with sample size 15 and sample mean 103.49.

7. Using the method of Section 2.9.2 of the more on confidence intervals handout, find an asymptotic (approximate, large sample) 95% confidence interval for the mean parameter of a Poisson distribution from which is assumed we have independent and identically distributed data with sample size 50 and sample mean 2.9.

Hint: In order to use plug-in you need a consistent estimator of the standard deviation of the Poisson distribution. What is the standard deviation and what is its relation to the mean? The sample mean consistently estimates the mean parameter. What does that suggest for a consistent estimator of standard deviation?

8. Suppose we have an independent and identically distributed sample from a Geometric(p) distribution with sample size 30 and sample mean 7.8. Find the maximum likelihood estimate of p and a 95% confidence interval for p based on the MLE and either observed or expected Fisher information.

9. Like the example of multiparameter maximum likelihood done by computer except instead of the gamma scale-rate model, we will use the Cauchy location-scale model. The likelihood is given by

f(x | θ, σ) = g([x - θ] / σ) / σ

where

g(z) = 1 / [π (1 + z2)]

The R function

dcauchy(x, location = theta, scale = sigma)

calculates f(x | θ, σ), returning a vector of values if x is a vector.

For data, use the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/cauchy.txt

Method of moments estimators make no sense for the Cauchy distribution because the Cauchy distribution doesn't have any moments. We have to use estimators based on quantiles instead.

For a starting point for theta use the sample median (as we did in additional problem 1). This makes sense because θ is the theoretical median. And for a starting point for the scale parameter sigma use half the sample interquartile range, that is, 0.5 * IQR(x). This makes sense because the theoretical interquartile range is 2 σ.

Report the values you obtain for

  1. the MLEs for θ and σ.
  2. the observed Fisher information matrix.
  3. 95% confidence intervals for θ and σ.

10. Suppose the variables X1, X2, ..., Xn, Y1, Y2, ..., Yn are independent, and suppose the Xi are identically Exponential(θ) distributed and the Yi are identically Exponential(1 / θ) distributed.

  1. Find the maximum likelihood estimate when the sample size is n = 25 and the sample means are 3.12 for the mean of the Xi and 0.432 for the mean of the Yi. Give the MLE both as a formula (a function of mean(x) and mean(y)) and numerically.
  2. Calculate both observed and expected Fisher information.
  3. Show that even after the MLE is plugged in for the parameter, observed and expected Fisher information are different, both as formulas (functions of mean(x) and mean(y)) and numerically.
  4. Calculate 95% asymptotic (approximate, large sample) confidence intervals for the parameter θ, one using observed Fisher information, one using expected Fisher information.

11. Basically this is Problem 8.6.10 in DeGroot and Schervish. Use the data in their Table 8.1, which can be read into R with the statements

calcium <- c( 7, -4, 18, 17, -3, -5, 1, 10,  11, -2)
placebo <- c(-1, 12, -1, -3,  3, -5, 5,  2, -11, -1, -3)
  1. Perform a test of the hypotheses stated in Problem 8.6.10 using Welch's approximate test, giving the P-value.
  2. Perform a test of the same hypotheses using the exact t-test based on the assumption of equal variances, giving the P-value.
  3. Interpret these P-values.
  4. Calculate a 95% two-sided confidence interval for the difference of the means of the two groups.

The web page on doing t-tests in R may help.

12. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/rob.txt

calculate the following point estimators

  1. the sample mean
  2. the sample median
  3. the sample 10% trimmed mean
  4. the sample 20% trimmed mean
  5. the median of the Walsh averages (Hodges-Lehmann estimator associated with the Wilcoxon signed rank test)

13. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/a13.txt

calculate confidence intervals for the center of symmetry (we assume the population distribution is symmetric about some point θ which is the unknown parameter of interest) associated with

  1. the sign test
  2. the Wilcoxon signed rank test
  3. the Student t test

having confidence level above 95% and as close to 95% as you can get (this is what the wilcox.test function does by default).

14. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/a13.txt

calculate P-values for an upper tailed test about the center of symmetry (we assume the population distribution is symmetric about some point θ which is the unknown parameter of interest) with null and alternative hypotheses

H0: &theta = 0
H1: &theta > 0

for each of the following types of test

  1. the sign test
  2. the Wilcoxon signed rank test
  3. the Student t test

(note: the t.test and wilcox.test functions do two-tailed tests by default so you must use the optional argument alternative = "greater" to do an upper-tailed test).

15. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-9.txt

which contains two variables x and y, assume the data follow the simple linear regression model

y = β0 + β1 x + error
  1. Calculate the P-value for a test with null and alternative hypotheses
    H0: β1 = 0
    H1: β1 ≠ 0
  2. Interpret the P-value. Does the test say the value of the true population regression coefficient β1 is statistically significantly different from zero at the 0.05 level?

16. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-9.txt

which contains two variables x and y, assume the pairs (Xi, Yi) are independent and identically bivariate normal distributed with correlation

ρ = cor(Xi, Yi)
  1. Calculate the P-value for a test with null and alternative hypotheses
    H0: ρ = 0
    H1: ρ ≠ 0
  2. Interpret the P-value. Does the test say the value of the true correlation coefficient ρ is statistically significantly different from zero at the 0.05 level?

17. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-9.txt

which contains two variables x and y, assume the data follow the simple linear regression model

y = β0 + β1 x + error
  1. Calculate the P-value for a test with null and alternative hypotheses
    H0: β1 = 0.6
    H1: β1 ≠ 0.6
  2. Interpret the P-value. Does the test say the value of the true population regression coefficient β1 is statistically significantly different from 0.6 at the 0.05 level?

Note: This is exactly the same as Additional Problem 15 (word for word) except that the hypothesized value of the regression coefficient is 0.6 rather than zero.

18. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/ds10-9.txt

which contains two variables x and y, assume the data follow the simple linear regression model

y = β0 + β1 x + β2 x2 + error
  1. Calculate the P-value for a test with null and alternative hypotheses
    H0: β2 = 0
    H1: β2 ≠ 0
  2. Interpret the P-value. Does the test say the value of the true population regression coefficient β2 is statistically significantly different from zero at the 0.05 level?

Note: This is exactly the same as Additional Problem 15 except that it is about the quadratic regression model rather than the simple linear model and the test is about β2 rather than about β1.

19. For the data in the URL

http://www.stat.umn.edu/geyer/old03/5102/examp/sally.txt

which contains two variables x and y, it is clear from the scatter plot produced by plot(x, y) that a simple linear regression will not fit the data (no statistics needed, the points are obviously nowhere near a straight line).

From the scatter plot curves up at both ends, it is clear that a polynomial of even degree is needed for the regression function (assuming we restrict our consideration to polynomials), because a polynomial of odd degree would go up at one end and down at the other.

  1. Fit the following three regression models:

    Report the regression coefficients for each model.

  2. Perform a test in which the quadratic model is the little model and the quartic model is the big model. Report the F statistic and the P-value for the F test for model comparison. Interpret the P-value. Which model does this test tell you to use?
  3. Perform a test in which the forth degree model is the little model and the sixth degree model is the big model. Report the F statistic and the P-value for the F test for model comparison. Interpret the P-value. Which model does this test tell you to use?
  4. Make a scatter plot of the data points, with the estimated regression function plotted for all three models on the same plot (use lty = 2, lty = 3, and so forth to distinguish the lines). Hand in the plot. Comment on the differences between the curves and the relation to the results of the F tests.

20. Modify the example calculating the MSE of an estimator by simulation making two changes. Use the t distribution with 2.5 degrees of freedom for the distribution of the data (instead of the standard Cauchy distribution in the example) and use the 20% trimmed mean for the point estimator, which is calculated by the mean function in R using the trim optional argument (on-line help). Provide both a point estimate and a confidence interval for the actual true MSE.

21. Modify the percentile bootstrap confidence interval example making two changes. Make the parameter to be estimated the interquartile range of the population and the point estimator of this parameter the interquartile range of the data, which is calculated by the IQR function in R (on-line help).

Note on Math on the Web

Some web browsers don't display the math formulas above correctly. In this case you have two options.

  1. Get a non-sucky web browser that actually implements (rather than disdains) internet standards.
  2. Read the additional problems in PDF (Adobe Palatable Dog Food) format.