Fitting Generalized Linear Models (Bernoulli Response)

Natural Parameter is Linear in Data

When the GLM is specified by a formula like that for simple linear regression, the linear predictor for the i-th individual is

θi = β1 + xi β2
The i-th row of the model matrix is thus
( 1   xi )
The model matrix is n × 2, the first column is all ones, and the second column is the covariate values for all individuals.

Here is how this model is fit in R.

Quadratic Regression

In so-called quadratic logistic regression the linear predictor for the i-th individual is

θi = β1 + xi β2 + xi2 β3
The i-th row of the model matrix is thus
( 1   xi   xi2 )
The model matrix is n × 3, the first column is all ones, the second column is the covariate values for all individuals, and the third column is the squares of the covariate values for all individuals.

Here is one way to fit this model in R to the same data as in the preceding section.

The dashed curve is the fit from the model in the preceding section.

Multiple Regression

In the GLM analog of multiple regression the linear predictor for the i-th individual is

θi = β1 + x1 i β2 + x2 i β3 + x3 i β4 + x4 i β5 + x5 i β6
if there are six covariates.

Here's how R fits this model.

Many more kinds of models can be fit. Any formula that makes sense for linear models fit by the R function lm also makes sense for generalized linear models fit by the R function glm (on-line help).

Hypothesis Tests

Tests about Regression Coefficients

Done by R

A hypothesis test about whether a regression coefficient is zero is automatically done for each regression coefficient by the R function summary. These tests are not corrected for multiple testing and hence must be used with extreme caution.

We'll use the example for multiple regression done above as an example. The relevant part of the output for that example was

Coefficients: 
            Estimate Std. Error z value Pr(>|z|)     
(Intercept) -11.4778     2.8968  -3.962 7.42e-05 *** 
x1            0.9995     0.4193   2.384   0.0171 *   
x2            0.3852     0.4239   0.909   0.3636     
x3            1.0456     0.4161   2.513   0.0120 *   
x4            0.1184     0.3495   0.339   0.7348     
x5           -0.1637     0.4322  -0.379   0.7048     
--- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  

The columns labeled z value and Pr(>|t|) of the Coefficients: table give, respectively,

If you want one-tailed rather than a two-tailed tests, you divide the given P-value by 2.

Caution: It violates the do only one test dogma to take all of these P-values seriously. Each P-value makes sense if that is the only test being done, not otherwise.

Model Comparison Tests

In this section we consider testing a big model versus a little model. The null hypothesis is some generalized linear model, and the alternative hypothesis is some other generalized linear model, and the little model is a submodel of the big model (the little model is obtained by setting some of the regression coefficients of the big model to zero).

As an example, let us consider the multiple regression example done above. We use the model from that fit as the big model. We use the model containing only the predictors x1 and x3 (which seemed statistically significant). as the little model. This is suggested if one ignores the caution in that section and decides that only the regression coefficients with stars in the printout are statistically significant.

To compare the fits of the two models we first do the fits, which we save in the R objects out.little and out.big. Then we compare the models using the anova function (on-line help). Unlike when it works on an object of class "lm", when it works on an object of class "glm", the optional argument test = "Chisq" must be specified to get P-values.

The interpretation of the test is that the fit of the big model is not statistically significantly better than the fit of the little model (P = 0.792).

Omnibus Tests

Also of interest is a test of whether there are any regression coefficients that are significantly nonzero except for the intercept coefficient. Unlike the case for linear models the P-value for this test is not automatically printed by the summary function.

Since this is just a test with the little model specified by the formula y ~ 1, it can be done as in the preceding section.

Confidence Intervals

Confidence Intervals for Regression Coefficients

We again use the example for linear regression done above.

R doesn't do confidence intervals for regression coefficients for you. It does most of the work but leaves a bit left for you. Let's again look at the regression printout for that example.

            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -6.7025     2.4554  -2.730  0.00634 ** 
x             0.3617     0.1295   2.792  0.00524 ** 
--- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  

The printout gives us two numbers (highlighted in yellow) needed to construct the confidence interval.

Thus one makes up a confidence interval in the usual way

point estimate ± critical value × standard error

where the critical value here comes from the standard normal distribution.

If we want a 95% confidence interval, the critical value is given by

Rweb:> qnorm(0.975) 
[1] 1.959964

and then the confidence interval for the parameter β3 for these data is given by

Rweb:> 0.3617 + c(-1,1) * qnorm(0.975) * 0.1295
[1] 0.1078847 0.6155153

Of course, this whole calculation can be done by hand with a calculator and a table of the normal distribution starting with the regression output. No further use of R is necessary after getting the regression output.

Confidence Intervals for the Linear Predictor

The R function predict (on-line help) makes confidence intervals for the linear predictor and for the means, either for old data or for new data. We illustrate only new data here.

We again use the data for the linear regression example.

Confidence Intervals for Means

We again use the data for the linear regression example.

Confidence intervals for mean-value parameters (μ) are obtained by one of two methods. The obvious method uses the optional argument type = "response" to obtain the estimate and standard error of the mean, in which case a confidence interval is obtained in the usual way. The not-so-obvious method transforms the interval for the natural parameter to the mean value parameter scale using the inverse link function (which you must know to use this method).

The two methods are asymptotically equivalent, so they both work equally well for large sample sizes. Here, where the sample size is small, the alternative method seems to work better.