Statistics 3701 (Geyer, Spring 2017) Practice Problems 3

Rules

No rules. This is practice.

Grades

No grades. This is practice.

Disclaimer

These practice problems are supplied without any guarantee that they will help you do the quiz problems. However, they were written after the quiz problems were written and with the intention that they would help.

These practice problems are also supplied without any guarantee that they are exactly or even nearly like the quiz problems. However, they are like at least some quiz problems in at least some respects.

Problem 1

Write an R function that, like the example in Section 8 of the course notes about computer arithmetic except that we want it to be for the zero-truncated Poisson distribution rather than the binomial distribution.

The zero-truncated Poisson distribution has parameter θ and data x and log likelihood and derivatives are given by

m = exp(θ)

(this is the mean of the corresponding untruncated Poisson distribution)

μ = m ⁄ (1 − exp(− m))

(this is the mean of the corresponding zero-truncated Poisson distribution)

l(θ) = x θ − m − log(1 − exp(− m))

l'(θ) = x − μ

l''(θ) = − μ (1 − μ exp(− m))

where, as always, exp denotes the exponential function and log the natural logarithm function (as in R). The data x is positive-integer-valued (1, 2, 3, …). The parameter θ is real-valued (− ∞ < θ < ∞).

Your function should have signature

    logl(theta, x)

and return a list having components

value, the value of the function,
gradient, the first derivative of the function, and
hessian, the second derivative of the function.

Like the example in the course notes, the point is to be careful about computer arithmetic, avoiding overflow and catastrophic cancellation as much as possible.

For this problem, you do not have to check for invalid argument values.

Show your function working for parameter values


thetas <- seq(-10, 10)

and for data values both


x <- 1

and


x <- 5

Problem 2

Test your function for the preceding problem like the example in Section 8 of the course notes about computer arithmetic

To test with the function value compare with the values of the function

logl.too <- function(theta) dpois(x, exp(theta), log = TRUE) -
    ppois(0, exp(theta), lower.tail = FALSE, log = TRUE) +
    lfactorial(x)

and to test the derivatives use numerical derivatives (you can also apply the other method using derivatives calculated by the R function D if you like, but that does not count).

Note that numerical differentiation does not give perfectly accurate derivatives. So there may be small discrepancies between the two methods of calculation.

Problem 3

(This problem is about finding errors in data, but that is too hard for a quiz, so we are just going to test some skills that are useful for that.)

We will use the data


foo <- read.csv("http://www.stat.umn.edu/geyer/s17/3701/data/p3p3.csv", stringsAsFactors = FALSE)

(This reads in a data frame having variables x, y, and z.

y is zero-or-one-valued and we will call one success and zero failure.

z is categorical.

In this problem we consider the answer better if it is done without using a loop.

What is the success rate when x is less than or equal to 30? Greater than 30 and less than or equal to 50? Greater than 50?
Which level of z had the highest success rate?
Get the subset of the data for which x is greater than 40.
Which level of z had the highest success rate when x was greater than 40?
Order the levels of z by success rate when x was greater than 40, from highest to lowest.