Up: Stat 5102

Stat 5102 (Geyer) Final Exam

Problem 1

First define
$\begin{align*}\mu & = \frac{\alpha}{\lambda} \\ \sigma^2 & = \frac{\alpha}{\lambda^2} \end{align*}$
so
$\begin{align*}E(X_i) & = \mu \\ \var(X_i) & = \sigma^2 \end{align*}$
and the CLT says

$\begin{displaymath}\sqrt{n} \left(X{\mkern -13.5 mu}\overline{\phantom{\text{X}}... ...hcal{D}}{\longrightarrow} \NormalDis\left(0, \sigma^2\right). \end{displaymath}$

Now we apply the delta method to the transformation

$\begin{displaymath}g(x) = \log(x) \end{displaymath}$

because the random variable we want the the asymptotic of is $g(X{\mkern -13.5 mu}\overline{\phantom{\text{X}}}_n)$ . This has derivative

$\begin{displaymath}g'(x) = \frac{1}{x} \end{displaymath}$

Hence the delta method says

$\begin{displaymath}\sqrt{n} \left[ g(X{\mkern -13.5 mu}\overline{\phantom{\text{... ...longrightarrow} \NormalDis\left(0, g'(\mu)^2 \sigma^2 \right) \end{displaymath}$

Now

$\begin{displaymath}g'(\mu)^2 \sigma^2 = \left(\frac{\lambda}{\alpha}\right)^2 \frac{\alpha}{\lambda^2} = \frac{1}{\alpha} \end{displaymath}$

and

$\begin{displaymath}g(\mu) = \log(\alpha) - \log(\lambda) \end{displaymath}$

Hence

$\begin{displaymath}\sqrt{n} \left[ \log(X{\mkern -13.5 mu}\overline{\phantom{\te... ...ckrel{\mathcal{D}}{\longrightarrow} \NormalDis(0, 1 / \alpha) \end{displaymath}$

$\begin{displaymath}\log(X{\mkern -13.5 mu}\overline{\phantom{\text{X}}}_n) \appr... ...left(\frac{\alpha}{\lambda}\right), \frac{1}{n \alpha} \right) \end{displaymath}$

Problem 2

Let's look at the simplest moment first

$\begin{displaymath}E(X_i) = \frac{s}{s + t} = \theta \end{displaymath}$

Thus

$\begin{displaymath}\hat{\theta}_n = X{\mkern -13.5 mu}\overline{\phantom{\text{X}}}_n \end{displaymath}$

is a perfectly good method of moments estimator of $\theta$ . The CLT gives its asymptotic distribution

$\begin{displaymath}\hat{\theta}_n \approx \NormalDis(\theta, \sigma^2 / n) \end{displaymath}$

where

$\begin{displaymath}\sigma^2 = \var(X_i) = \frac{s t}{(s + t + 1) (s + t)^2} = \frac{\theta (1 - \theta)}{2} \end{displaymath}$

Problem 3

An asymptotic test can be based on the asymptotically pivotal quantity

$\begin{displaymath}Z = \frac{X{\mkern -13.5 mu}\overline{\phantom{\text{X}}}_n - \mu}{S_n / \sqrt{n}} \end{displaymath}$

which is approximately standard normal for large n, where S_nis the sample standard deviation (the ``plug-in'' theorem allows any consistent estimator of the population standard deviation $\sigma$ , but the convenient one here is S_n). To do the test in this particular problem we need the mean of the beta distribution (from p. 176 in Lindgren)

$\begin{displaymath}\mu = \frac{s}{s + t} \\ \end{displaymath}$

So under H₀ (s = t)

$\begin{displaymath}\mu = \frac{1}{2} \\ \end{displaymath}$

That's the value of $\mu$ that we plug in to compute Z

$\begin{displaymath}Z = \frac{X{\mkern -13.5 mu}\overline{\phantom{\text{X}}}_n -... ...n / \sqrt{n}} = \frac{0.57 - 0.50}{\sqrt{0.036 / 100}} = 3.7 \end{displaymath}$

The first row of Table I in the appendix of Lindgren gives P = 0.0001 for the one-tailed P-value. Hence the two-tailed P-value is P = 0.0002. (R says P = 0.000225, so the table is right to one significant figure.)

Since P < 0.05, reject H₀.

Problem 4

First we have to find the posterior. The relevant formulas are given in Example 5.2.4 in the notes, equations (5.17a) and (5.17b). The precision of the data distribution is 1 / 25 = 0.04 and the prior precision is 1 / 10 = 0.10. Hence from (5.17a) the posterior precision is

$\begin{displaymath}16 \cdot 0.04 + 0.10 = 0.74 \end{displaymath}$

and from (5.17b) the posterior mean is

$\begin{displaymath}\frac{16 \cdot 0.04 \cdot 31.2 + 0.10 \cdot 20}{16 \cdot 0.04 + 0.10} = 29.6865 \end{displaymath}$

The posterior standard deviation is

$\begin{displaymath}\sqrt{\frac{1}{0.74}} = 1.162476 \end{displaymath}$

The HPD region is

$\begin{displaymath}29.6865 \pm 1.96 \cdot 1.162476 \end{displaymath}$

$\begin{displaymath}29.6865 \pm 2.2784 \end{displaymath}$

(27.4081, 31.9649)

Problem 5

The model called ``Model 1'' in the ANOVA table has regression function

$\begin{displaymath}h(x) = \gamma + \beta x = \alpha + \beta (x - 11) \end{displaymath}$

where $\alpha = \gamma + 11 \beta$ . This is obtained by setting $\beta_1 = \beta_2 = \beta$ in the larger model (called ``Model 2'' in the ANOVA table). Hence ``Model 1'' is a submodel of ``Model 2.''

The table gives a P-value P = 0.004277 for the test of model comparison. Since the P-value is very small, this is strong evidence against the small model. Thus we conclude that the piecewise linear model fits and the simple linear model (``Model 1'') doesn't.

Comment

This problem was a learning experience for the teacher. We needed some homework problems like this. It was harder than I thought it would be. Most people had no clear idea of how to show the models were nested. Only two people got full credit.

In order two show the models are nested, you need to show one of two things.

Every distribution in the little model is also in the big model. Here the models differ only in their regression functions, so you need to show that every regression function in the little model is also in the big model.
For regression models, we also have the condition that the range of the design matrix for the little model is a subspace of the range of the design matrix for the big model.

These two conditions come to much the same thing (as we shall see below). We can write the regression functions for the little model

$\begin{displaymath}h(x) = \alpha + \beta x \end{displaymath}$

but this is a bad idea given the form of the regression function for the big model

$\begin{displaymath} h(x) = \begin{cases} \alpha + \beta_1 (x - 11), & x \le 11 \\ \alpha + \beta_2 (x - 11), & x \ge 11 \end{cases}\end{displaymath}$

(1)

Because the two $\alpha$ 's are not the same. Better to chose another letter (or at least an embellished alpha) for the intercept in the little model

$\begin{displaymath} h(x) = \gamma + \beta x \end{displaymath}$

(2)

Now the question to be answered is what values of $\alpha$ , $\beta_1$ and $\beta_2$ in (1) give (2)? It is not enough to just assert that there are some such values. You have to find them.

A little thought suggests $\beta_1 = \beta_2 = \beta$ , which collapses the two cases in (1) to one

$\begin{displaymath} h(x) = \alpha + \beta (x - 11) = (\alpha - 11 \beta) + \beta x, \end{displaymath}$

(3)

but the result still doesn't exactly match (2). You still have to remark about chosing $\alpha - 11 \beta = \gamma$ , or what is the same, $\alpha = \gamma + 11 \beta$ .

The design matrices for the two models are

$\begin{displaymath}\mathbf{X}_{\text{little}} = \begin{pmatrix} 1 & x_1 \\ ... ...1 & 0 & x_{n - 1} - 11 \\ 1 & 0 & x_n - 11 \\ \end{pmatrix}\end{displaymath}$

How does one show that the range $\mathbf{X}_{\text{little}}$ is a subspace of the range of $\mathbf{X}_{\text{big}}$ ? One needs to show that for any two-vector $\boldsymbol{\beta}_{\text{little}}$ , there exists a three-vector $\boldsymbol{\beta}_{\text{big}}$ such that

$\begin{displaymath}\mathbf{X}_{\text{little}} \boldsymbol{\beta}_{\text{little}} = \mathbf{X}_{\text{big}} \boldsymbol{\beta}_{\text{big}} \end{displaymath}$

But this is exactly the same question as asked and answered above, because $\mathbf{X} \boldsymbol{\beta}$ is the regression function described in matrix language. Exactly the same argument shows that

$\begin{displaymath}\boldsymbol{\beta}_{\text{little}} = (\alpha, \beta) \end{displaymath}$

and

$\begin{displaymath}\boldsymbol{\beta}_{\text{big}} = (\alpha - 11 \beta, \beta, \beta) \end{displaymath}$

does the job.

Problem 6

The density of the data is

$\begin{displaymath}f(x \mid \theta) = \frac{\Gamma(\theta + 1)}{\Gamma(\theta)... ...)} x^{\theta - 1} = \theta x^{\theta - 1}, \qquad 0 < x < 1 \end{displaymath}$

The prior density is

$\begin{displaymath}g(\theta \mid \alpha, \lambda) = \frac{\lambda^\alpha}{\Gam... ... \theta^{\alpha - 1} e^{- \lambda \theta}, \qquad \theta > 0 \end{displaymath}$

(the normalizing constant doesn't matter).

The likelihood for a sample of size n is

$\begin{displaymath}L_n(\theta) = \prod_{i = 1}^n \theta x_i^{\theta - 1} = \... ...eta^n \exp\left( (\theta - 1) \sum_{i = 1}^n \log(x_i) \right) \end{displaymath}$

or, if we introduce the variables $y_i = \log(x_i)$ ,

$\begin{displaymath}L_n(\theta) = \theta^n e^{(\theta - 1) n \bar{y}_n} = \theta^n e^{\theta n \bar{y}_n} e^{- n \bar{y}_n} \end{displaymath}$

The last term can be dropped, since it does not contain the parameter, giving

$\begin{displaymath}L_n(\theta) = \theta^n e^{\theta n \bar{y}_n} \end{displaymath}$

The unnormalized posterior (likelihood times prior) is

$\begin{displaymath}\theta^n e^{\theta n \bar{y}_n} \theta^{\alpha - 1} e^{- \la... ... \theta^{n + \alpha - 1} e^{- (\lambda - n \bar{y}_n) \theta} \end{displaymath}$

This is clearly proportional to a $\GammaDis(\alpha + n, \lambda - n \bar{y}_n)$ . So that is the posterior density.

Sanity Check: Does this make sense? Are both parameters of the posterior positive? Clearly $\alpha + n$ is positive, because we need $\alpha > 0$ for the prior to make sense. How about $\lambda - n \bar{y}_n$ ? At first sight this doesn't look positive. We need $\lambda > 0$ for the prior to make sense, but how do we know that the other bit doesn't make it negative? Have to think a bit. 0 < x_i < 1, so $y_i = \log(x_i) < 0$ (logs of numbers less than one are negative), so $- \bar{y}_n$ is actually positive despite its appearance, and everything is o. k.

Problem 7

The regression coefficient in question is -0.011464 and R gives its standard error as 0.007784 and the degrees of freedom for error as 17. We only need to look up the t critical value from Table IIIb in Lindgren, which for 90% confidence is 1.74 (note not in the column headed 90, but in the next one over that has 1.645 as the appropriate z critical value at the bottom).

Thus the interval is

$\begin{displaymath}-0.011464 \pm 1.74 \cdot 0.007784 \end{displaymath}$

$\begin{displaymath}-0.011464 \pm 0.01354416 \end{displaymath}$

(-0.02500816, 0.0020801)

Problem 8

The sample median $\widetilde{X}_n$ is asymptotically normal center m the population median and variance 1 / 4 n f(m)² (Corollary 2.28 in the notes). Here the population median is zero by symmetry and $f(0) = 2 / \pi$ . Hence

$\begin{displaymath}\widetilde{X}_n \approx \NormalDis\left(0, \frac{\pi^2}{16 n} \right) \end{displaymath}$

Up: Stat 5102

Charles Geyer
2000-05-13