Statistics 8054 (Geyer, Spring 2024) Newton's Method Example

The log likelihood for the binomial distribution is

l(θ) = x θ − n log(1 + e^θ)

where θ is the natural parameter.

The first derivative is

l'(θ) = x − n p(θ)

and the second derivative is

l''(θ) = − n p(θ) q(θ)

where

p(θ) = 1 ⁄ (1 + e^{− θ})
q(θ) = 1 ⁄ (1 + e^θ)

Finally, the Newton update is

θ_{n + 1} = θ_n − l'(θ) ⁄ l''(θ)

Let's do it.

R statements n <- 20 x <- 11 theta <- 0 logl <- function(theta) { stopifnot(is.numeric(theta)) stopifnot(length(theta) == 1) if (theta <= 0) return(x * theta + log1p(exp(theta))) return((x - n) * theta + log1p(exp(- theta))) } score <- function(theta) { stopifnot(is.numeric(theta)) stopifnot(length(theta) == 1) ptheta <- 1 / (1 + exp(- theta)) return(x - n * ptheta) } info <- function(theta) { stopifnot(is.numeric(theta)) stopifnot(length(theta) == 1) ptheta <- 1 / (1 + exp(- theta)) qtheta <- 1 / (1 + exp(theta)) return(n * ptheta * qtheta) } fred <- NULL repeat { delta <- score(theta) / info(theta) fred <- rbind(fred, c(theta, logl(theta), score(theta), info(theta), delta)) if (!is.finite(delta) || theta + delta == theta) break theta <- theta + delta } colnames(fred) <- c("theta", "logl(theta)", "score(theta)", "info(theta)", "delta") rownames(fred) <- rep("", nrow(fred)) options(width = 1000) print(fred)

This problem is about as well behaved as an optimization gets.

The log likelihood is strictly convex.
Hence the maximum likelihood is unique if it exists.
It exists when 0 < x < n.
It is found by any method that goes uphill on the log likelihood and makes progress if there is progress to be made.
Newton is not such a method.

Statistics 8054 (Geyer, Spring 2024) Newton's Method Example

Navigation