Statistics 3701 (Geyer, Fall 2022) Homework 3

Rules

See the Section about Rules for Quizzes and Homeworks on the General Info page.

Your work handed into Canvas should be an Rmarkdown file with text and code chunks that can be run to produce what you did. We do not take your word for what the output is. We may run it ourselves. But we also want the output.

You may ask questions if the wording of the questions are confusing. But the instructor will not be giving hints.

You must be in the classroom, Molecular and Cellular Biology 2-120, to take the quiz.

Quizzes must uploaded by the end of class (1:10). It should actually allow a few minutes after that, but those not uploaded by 1:10 will be marked late. Here is the link for uploading this quiz https://canvas.umn.edu/courses/330843/assignments/2823441.

Homeworks must uploaded before midnight the day they are due. Here is the link for uploading this homework. https://canvas.umn.edu/courses/330843/assignments/2823447.

Quiz 3

Problem 1

Write an R function that, like the example in Section 8 of the course notes about computer arithmetic except that we want it to be for the negative binomial distribution rather than the binomial distribution.

The negative binomial distribution has parameter θ and data x and log likelihood

l(θ) = x θ + r log(1 − exp(θ))

where, as always, exp denotes the exponential function and log the natural logarithmic function (as in R) and r is a known positive constant (not an unknown parameter). The data x is nonnegative-integer-valued (0, 1, 2, …). The parameter θ is negative-real-valued (− ∞ < θ < 0).

Your function should have signature

    logl(theta, x, r)

and return a list having components

value, the value of the function,
gradient, the first derivative of the function, and
hessian, the second derivative of the function.

Like the example in the course notes, the point is to be careful about computer arithmetic, avoiding overflow and catastrophic cancellation as much as possible.

Your function cannot use R function dnbinom.

For this problem, you do not have to check for invalid argument values.

The formulas for the derivatives are

l′(θ) = x − r exp(θ) ⁄ (1 − exp(θ))

and

l″(θ) = − r exp(θ) ⁄ (1 − exp(θ))²

Show your function working for parameter values


thetas <- (- 10^seq(3, - 3))

and for data values both


r <- 2.7
x <- 0

and


r <- 10.7
x <- 5

Problem 2

Test your function for the preceding problem like the example in Section 8 of the course notes about computer arithmetic

To test the function value compare with the values of the function

logl.too <- function(theta) dnbinom(x, r, 1 - exp(theta), log = TRUE)

and to test the derivatives use numerical derivatives (you can also apply the other method using derivatives calculated by the R function D if you like, but that does not count; we are only going to grade the comparison to numerical derivatives).

Note that numerical differentiation does not give perfectly accurate derivatives. So there may be small discrepancies between the two methods of calculation.

Also note that log likelihood is only defined up to an additive constant that does not depend on the parameter. Hence your function will give different answers for the values than R function logl.too, but the answers must differ by a constant that does not depend on the value of θ.

Since the derivative of a constant is zero, this does not affect the derivatives.

You may find R function dnbinom giving bad answers. In this case, just check that the its answers and the answers for your function agree when θ is far from minus infinity or zero.

Problem 3

Rewrite your function for problem 1 to add error checks for all three arguments theta, x, and r.

Check that all are numeric and finite and have length one.
Check that x is an integer (not necessarily type integer but rather a real number that is an integer).
Check
- theta < 0
- x ≥ 0
- r > 0

For this problem we don't even care if the function works (that was problems 1 and 2). We just care that all possible user errors are caught. You do not have to test the errors for this problem.

Homework 3

Homework problems start with problem number 4 because, if you don't like your solutions to the problems on the quiz, you are allowed to redo them (better) for homework. If you don't submit anything for problems 1–3, then we assume you liked the answers you already submitted.

Problem 4

Write a function like the function in Problem 1 except for the zero-tuncated Poisson distribution rather than the negative binomial distribution used in Problem 1.

This distribution is very complicated, so it has Notes on the Zero-Truncated Poisson Distribution. We will copy the essential math here.

m = exp(θ)
μ = m ⁄ (1 − exp(− m))
l(θ) = x θ − m − log(1 − exp(− m))
l'(θ) = x − μ
l''(θ) = − μ (1 − μ + m) = − μ (1 − μ exp(− m))

These are the same formulas as in practice problem 3.1 except for the last. We now have two formulas (the one in the practice problem and a new one).

Unless you were extremely clever, your solutions to practice problem 3.1 did not work for large negative θ or large positive θ. My solution did not work for these cases either.

As discussed in the supplementary notes linked above, there are three cases to look out for

large negative θ, and in this case x = 1 causes special problems,
large positive θ, and in this case when x ⋅ θ overflows causes a special problem, and
moderate sized θ, positive or negative.

For the function itself, for large negative θ use

l(θ) = (x - 1) θ − m − log(1 − m ⁄ 2 + m² ⁄ 3! − m³ ⁄ 4! + … + (−1)^{k − 1} m^{k − 1} ⁄ k! + …)

For all other θ except for very large positive θ the formula

l(θ) = x θ − m − log(1 − exp(− m))

can be used.

For θ so large that x ⋅ θ overflows (is Inf) the correct value for l(θ) is -Inf but neither of the formulas above give that, so this has to be done as a special case.

The mean μ is not part of what your function is supposed to return, but is useful in intermediate calculations. For all but large negative θ use the formula

μ = m ⁄ (1 − exp(− m))

but for very large negative θ use

μ = 1 ⁄ a

where

a = 1 − m ⁄ 2 + m² ⁄ 3! − m³ ⁄ 4! + … + (−1)^{k − 1} m^{k − 1} ⁄ k! + …

(the reason for this is that the other formula could give NaN or Inf depending on how it is calculated, but this formula works).

For the first derivative, the hard case is when θ is large negative and x = 1. In that case use the formula

l'(θ) = m b ⁄ a

where

b = − 1 ⁄ 2 + m ⁄ 3! − m² ⁄ 4! + … + (−1)^{k − 1} m^{k − 2} ⁄ k! + …

and a is the same infinite series as defined above.

In all other cases the formula

l'(θ) = x − μ

gives correct results (when μ is calculated correctly, as described above).

For the second derivative, no formula works for all θ but

l''(θ) = − μ (1 − μ + m)

works except for large positive θ (where it gives catastrophic cancellation because m and μ are nearly equal and very large), and

l''(θ) = = − μ (1 − μ exp(− m))

works except for large negative θ (where it gives catastrophic cancellation because μ exp(− m) is nearly equal to 1) and except for when μ overflows (is Inf) in which case the formula gives NaN but the correct answer is -Inf.

With those formulas and the usual care about using log1p and expm1, the log likelihood and two derivatives can be coded correctly for all θ, including infinite values.

The correct values for θ = ∞ are

l(θ) = − ∞
l'(θ) = − ∞
l''(θ) = − ∞

The correct values for θ = − ∞ and x = 1 are

l(θ) = 0
l'(θ) = 0
l''(θ) = 0

The correct values for θ = − ∞ and x > 1 are

l(θ) = − ∞
l'(θ) = x - 1
l''(θ) = 0

So the problem is to implement this function getting good answers for all θ, including even infinite values (computer infinite).

Apply your function to the following values of θ


thetas <- c(1, 3, 10, 30, 100, 300, 1000, .Machine$double.xmax, Inf)
thetas <- c(- rev(thetas), 0, thetas)

and for x = 1 and for x = 5.

It is your decision how many terms of each infinite series to use and for what values of θ to use the formulas with infinite series.

These are rapidly converging series and converge for all values of m, but the smaller m is, particularly when m is much less than 1, the faster the series converges. When m is much less than one, the index k does not have to be very large before the k-th term is less than the machine epsilon.

Problem 5

This is a modification of Problem 4. In order to make the solution not exactly the same as the Problem 4 solution, this time we are requiring that you use the following formula for the log likelihood (rather than all of the careful calculation in Problem 4)

l(θ) = x θ − exp(θ) − log(1 − exp(− exp(θ))))

and use the derivatives of the above formula (calculated either by yourself or by R function D) for the first and second derivatives of the log likelihood.

Compare this function with the function you wrote for the Problem 4,

Compare for the values of θ and x required in the preceding problem.

Comment on any discrepancies between your function and evaluating the formula in the obvious way.

Problem 6

This problem is about hypothesis tests for generalized linear models. This problem does not require you to write a function.

The R command


foo <- read.table(url("https://www.stat.umn.edu/geyer/3701/data/2022/q3p6.txt"), header = TRUE)

creates a data frame foo containing two variables x, which is quantitative, and y, which is zero-or-one-valued.

Fit all polynomial logistic regression models of degree one (linear) through degree 7. You can either use R function I as illustrated in Section 3.3.3 of the first notes on statistical models or you can use R function poly.

First do all hypothesis tests comparing models that you fit differing by one parameter (one degree of polynomial).

Notice that either the odd degree polynomials all fit better than even degree or vice versa. That's just the way polynomials work.

For the rest of this problem we ignore the odd degree polynomials or the even degree polynomials (whichever is no good). Do hypothesis tests of adjacent models you keep. These now differ by two parameters (degrees 1, 3, 5. 7 if you keep odd, degrees 2, 4, 6 if you keep even).

Interpret the P-values in the latter.