The log likelihood for the binomial distribution is
l(θ) =
x θ
− n log(1 + eθ)
where θ is the natural parameter.
The first derivative is
l'(θ) =
x − n p(θ)
and the second derivative is
l''(θ) =
− n p(θ) q(θ)
where
p(θ) = 1 ⁄ (1 + e− θ)
q(θ) = 1 ⁄ (1 + eθ)
q(θ) = 1 ⁄ (1 + eθ)
Finally, the Newton update is
θn + 1
=
θn
−
l'(θ) ⁄ l''(θ)
Let's do it.
This problem is about as well behaved as an optimization gets.
- The log likelihood is strictly convex.
- Hence the maximum likelihood is unique if it exists.
- It exists when 0 < x < n.
- It is found by any method that goes uphill on the log likelihood and makes progress if there is progress to be made.
- Newton is not such a method.