Fitting Generalized Linear Models (Bernoulli Response)
Natural Parameter is Linear in Data
When the GLM is specified by a formula like that for simple linear regression, the linear predictor for the ith individual is
Here is how this model is fit in R.
Quadratic Regression
In socalled quadratic logistic regression the linear predictor for the ith individual is
Here is one way to fit this model in R to the same data as in the preceding section.
The dashed curve is the fit from the model in the preceding section.
Multiple Regression
In the GLM analog of multiple regression the linear predictor for the ith individual is
Here's how R fits this model.
Many more kinds of models can be fit. Any formula that makes sense
for linear models fit by the R function lm
also makes sense
for generalized linear models fit by the R function glm
(online
help).
Hypothesis Tests
Tests about Regression Coefficients
Done by R
A hypothesis test about whether a regression coefficient is zero
is automatically done for each regression coefficient by
the R function summary
. These tests are not corrected
for multiple testing and hence must be used with extreme caution.
We'll use the example for multiple regression done above as an example. The relevant part of the output for that example was
Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 11.4778 2.8968 3.962 7.42e05 *** x1 0.9995 0.4193 2.384 0.0171 * x2 0.3852 0.4239 0.909 0.3636 x3 1.0456 0.4161 2.513 0.0120 * x4 0.1184 0.3495 0.339 0.7348 x5 0.1637 0.4322 0.379 0.7048  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The columns labeled z value
and Pr(>t)
of the Coefficients:
table give, respectively,

z value
, the test statistic for a test of the hypothesis that the specified regression coefficient (the one for that line of the table) is actually zero. Asymptotically standard normal. 
Pr(<z)
the Pvalue for the twotailed test about that regression coefficient.
If you want onetailed rather than a twotailed tests, you divide the given Pvalue by 2.
Caution: It violates the do only one test
dogma
to take all of these Pvalues seriously. Each Pvalue
makes sense if that is the only test being done, not otherwise.
Model Comparison Tests
In this section we consider testing a big model
versus a little model
. The null hypothesis is some generalized linear
model, and the alternative hypothesis is some other generalized linear model,
and the little model is a submodel of the big model
(the little model is obtained by setting some of the regression coefficients
of the big model to zero).
As an example, let us consider the
multiple regression example done above.
We use the model from that fit as the big model
.
We use the model containing only the predictors
x1
and x3
(which seemed statistically significant).
as the little model
. This is suggested if one ignores the caution
in that section and decides that only the regression coefficients with
stars in the printout are statistically significant.
To compare the fits of the two models we first do the fits,
which we save in the R objects out.little
and out.big
. Then we compare the models using
the anova
function
(online
help).
Unlike when it works on an object of class "lm"
,
when it works on an object of class "glm"
, the optional argument
test = "Chisq"
must be specified to get Pvalues.
The interpretation of the test is that the fit of the big model is not statistically significantly better than the fit of the little model (P = 0.792).
Omnibus Tests
Also of interest is a test of whether there are any
regression coefficients that are significantly nonzero
except for the intercept
coefficient. Unlike the
case for linear models the Pvalue
for this test is not automatically printed by the summary function.
Since this is just a test with the little model specified by the
formula y ~ 1
, it can be done as in the preceding section.
Confidence Intervals
Confidence Intervals for Regression Coefficients
We again use the example for linear regression done above.
R doesn't do confidence intervals for regression coefficients for you. It does most of the work but leaves a bit left for you. Let's again look at the regression printout for that example.
Estimate Std. Error z value Pr(>z) (Intercept) 6.7025 2.4554 2.730 0.00634 ** x 0.3617 0.1295 2.792 0.00524 **  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The printout gives us two numbers (highlighted in yellow) needed to construct the confidence interval.
 The first number is the estimate, also called the sample regression coefficient.
 The second number is the standard error of this estimate (the estimated standard deviation of the sampling distribution of the estimate).
Thus one makes up a confidence interval in the usual way
where the critical value here comes from the standard normal distribution.
If we want a 95% confidence interval, the critical value is given by
Rweb:> qnorm(0.975) [1] 1.959964
and then the confidence interval for the parameter β_{3} for these data is given by
Rweb:> 0.3617 + c(1,1) * qnorm(0.975) * 0.1295 [1] 0.1078847 0.6155153
Of course, this whole calculation can be done by hand with a calculator and a table of the normal distribution starting with the regression output. No further use of R is necessary after getting the regression output.
Confidence Intervals for the Linear Predictor
The R function predict
(online
help) makes confidence intervals for the linear predictor and for the
means, either for old data or for new data. We illustrate only new data here.
We again use the data for the linear regression example.
Confidence Intervals for Means
We again use the data for the linear regression example.
Confidence intervals for meanvalue parameters (μ) are obtained
by one of two methods. The obvious method uses the optional argument
type = "response"
to obtain the estimate and standard error
of the mean, in which case a confidence interval is obtained in the usual
way. The notsoobvious method transforms the interval for the natural
parameter to the mean value parameter scale using the inverse link function
(which you must know to use this method).
The two methods are asymptotically equivalent, so they both work equally well for large sample sizes. Here, where the sample size is small, the alternative method seems to work better.