Next: glmpred() Up: MacAnova Help File Previous: glm_keys   Contents

glmfit()

Usage:
glmfit([Model] [,dist:distName,link:linkName, n:denom, incr:T,\
  print:F or silent:T, maxiter:m, epsilon:eps, coefs:F, offsets:OffVec,\
  scale:sigma]), distName and linkName CHARACTER scalars, denom > 0
  REAL scalar or vector, integer m > 0, REAL eps > 0, REAL vector OffVec



Keywords: glm, anova, regression, categorical data
glmfit(Model,dist:DistName ,link:LinkName,...) does a generalized linear
model analysis with assumed response distribution DistName and link
function LinkName, somewhat in the manner of program GLIM.  The response
variable y must be a vector (isvector(y) is True).

See topic 'models' for information on and examples of quoted string or
CHARACTER scalar Model.

Current legal values for DistName are "binomial", "poisson", and
"normal" (or "gaussian").  If DistName is "binomial" or "poisson", you
must have y[i] >= 0.

Current legal values for LinkName are "logit", "probit", "log", and
"identity".

If dist:DistName is omitted, the default DistName is "normal".

If link:LinkName is omitted the default LinkName depends on DistName --
"logit" for "binomial", "log" for "poisson", and "identity" for
"normal".

Because of these defaults, glmfit(Model), with no distribution or link
specified, is equivalent to anova(Model, unbalanced:T).

If DistName is "binomial" you must specify the number of trials using
keyword 'n' as for logistic() or probit().  The value Denom for 'n' must
either be a REAL scalar >= max(y) or a REAL vector of the same length as
y with Denom[i] >= y[i].

Except when DistName is "normal" and LinkName is "identity", an
iterative algorithm is used to model link(E[y]) or link(E[y/Denom]) as a
linear function of X-variables associated with the right hand side of
Model.  Normally a two line Analysis of Deviance table is printed.  Line
1 is the difference 2*L(1) - 2*L(0), where L(0) is the log likelihood
for a model with all coefficients 0 and L(1) is the maximized log
likelihood for the model fit.  Line 2 is 2*L(2) - 2*L(1) where L(2) is
the maximized log likelihood under a model fitting one parameter for
every y[i].  Under certain conditions, the latter can be used to test
the goodness of fit of the model using a chi-squared test.  When
DistName is "normal" and LinkName is "identity", an Analysis of Variance
table is printed including all terms.

glmfit() sets the side effect variables RESIDUALS, WTDRESIDUALS, SS, DF,
HII, DEPVNAME, TERMNAMES, and STRMODEL.  See topic 'glm'.  With DistName
is "normal" and LinkName is "identity", SS contains the ANOVA sums of
squares; otherwise SS contains deviances.  After an iterative fit
without keyword phrase 'inc:T' (see below), TERMNAMES has value
vector("","", ...,"Overall model","ERROR1"), DF has value vector(0,0,
...,ModelDF,ErrorDF) and SS has value vector(0,0,...,ModelDeviance,
ErrorDeviance).

glmfit(Model,dist:DistName,link:LinkName,inc:T,...) computes the full
fitted model and all partial models -- only a constant term, the
constant and the first term, and so on.  It prints an Analysis of
Deviance table, with one line for each term, representing a difference
2*L(i) - 2*L(i-1) where L(i) is the maximumized log likely for a model
including terms 1 through i, plus the deviance of the complete model
labeled as "ERROR1".  Each line except the last can be used in a
chi-squared test to test the significance of the term on the assumption
that the true model includes no later terms.  The value of 'inc' is
ignored when DistName is "normal" and LinkName is "identity".

The use of glmfit() provides an alternative method to specify a logistic
or probit analysis of binomial responses, or a log linear analysis of
Poisson responses.

  Function                DistName      LinkName
  logistic()             "binomial"    "logit"
  probit()               "binomial"    "probit"
  poisson()              "poisson"     "log"
  anova()                "normal"      "identity"

In the future additional distributions such as "gamma" will be
implemented, as well as additional links such as "sqrt", "recip", or
"power".  If you specify an unimplemented combination of LinkName and
DistName, an informative error message is printed.

When fitting a model with a binomial dependent variable, a warning
message similar to the following
   WARNING: problimit = 1e-08 was hit by glmfit() at least once
usually indicates either the presence of an extreme outlier or a best
fitting model in which many of the probabilities are almost exactly 0 or
1.  The latter case may not represent any problem, since the fitted
probabilities at these points will be 1e-8 or 1 - e-8.  You can try
reducing the threshold using keyword 'problimit' (see below), but you
will probably just get the message again.

                     Other Keyword Phrases
Keyword phrase  Default  Meaning
  maxiter:m       50     Positive integer m is the maximum number of
                         iterations that will be allowed in fitting

  epsilon:eps    1e-6    Small positive REAL specifying relative error
                         in objective function (2*log likelihood)
                         required to end iteration

  problimit:small 1e-8   With dist:"binomial", iteration is restricted
                         so that no fitted probabilities are < small
                         or > 1 - small.  Value of small must be between
                         1e-15 and 0.0001.

  offsets:OffVec  none   Causes model to be fit to link to be 1*Offvec +
                         Model, where OffVec is a REAL vector the same
                         length as response y.  OffVec must be in the
                         same units as the link function, say, logits,
                         logs, or probits.  See topic 'glm_keys' for
                         more information and poisson(), logistic() and
                         probit() for examples.

  scale:sigma      1     sigma must be a positive REAL scalar or ?
                         (MISSING).  Its value will replace a default
                         multiplier used by secoefs() and contrast() to
                         compute standard errors.  If the value is
                         MISSING, sigma will be computed as sqrt(SS[m]/
                         DF[m]), where m = length(SS).  The default is 1
                         unless dist is "normal" when it is sqrt(SS[m]/
                         DF[m]).  In secoefs(), scale multiplies the
                         square roots of the diagonal values of the
                         inverse of X'WX, where X is the matrix of
                         X-variables, and W is a diagonal matrix of
                         weights computed using the converged fit.

See topic 'glm_keys' for details on keyword phrases print:F, silent:T,
coefs:F.

See also topics logistic(), poisson(), probit(), 'glm'.


Gary Oehlert 2003-01-15