Next: glmpred()
Up: MacAnova Help File
Previous: glm_keys
Contents
Usage:
glmfit([Model] [,dist:distName,link:linkName, n:denom, incr:T,\
print:F or silent:T, maxiter:m, epsilon:eps, coefs:F, offsets:OffVec,\
scale:sigma]), distName and linkName CHARACTER scalars, denom > 0
REAL scalar or vector, integer m > 0, REAL eps > 0, REAL vector OffVec
|
Keywords:
glm, anova, regression, categorical data
glmfit(Model,dist:DistName ,link:LinkName,...) does a generalized linear
model analysis with assumed response distribution DistName and link
function LinkName, somewhat in the manner of program GLIM. The response
variable y must be a vector (isvector(y) is True).
See topic 'models' for information on and examples of quoted string or
CHARACTER scalar Model.
Current legal values for DistName are "binomial", "poisson", and
"normal" (or "gaussian"). If DistName is "binomial" or "poisson", you
must have y[i] >= 0.
Current legal values for LinkName are "logit", "probit", "log", and
"identity".
If dist:DistName is omitted, the default DistName is "normal".
If link:LinkName is omitted the default LinkName depends on DistName --
"logit" for "binomial", "log" for "poisson", and "identity" for
"normal".
Because of these defaults, glmfit(Model), with no distribution or link
specified, is equivalent to anova(Model, unbalanced:T).
If DistName is "binomial" you must specify the number of trials using
keyword 'n' as for logistic() or probit(). The value Denom for 'n' must
either be a REAL scalar >= max(y) or a REAL vector of the same length as
y with Denom[i] >= y[i].
Except when DistName is "normal" and LinkName is "identity", an
iterative algorithm is used to model link(E[y]) or link(E[y/Denom]) as a
linear function of X-variables associated with the right hand side of
Model. Normally a two line Analysis of Deviance table is printed. Line
1 is the difference 2*L(1) - 2*L(0), where L(0) is the log likelihood
for a model with all coefficients 0 and L(1) is the maximized log
likelihood for the model fit. Line 2 is 2*L(2) - 2*L(1) where L(2) is
the maximized log likelihood under a model fitting one parameter for
every y[i]. Under certain conditions, the latter can be used to test
the goodness of fit of the model using a chi-squared test. When
DistName is "normal" and LinkName is "identity", an Analysis of Variance
table is printed including all terms.
glmfit() sets the side effect variables RESIDUALS, WTDRESIDUALS, SS, DF,
HII, DEPVNAME, TERMNAMES, and STRMODEL. See topic 'glm'. With DistName
is "normal" and LinkName is "identity", SS contains the ANOVA sums of
squares; otherwise SS contains deviances. After an iterative fit
without keyword phrase 'inc:T' (see below), TERMNAMES has value
vector("","", ...,"Overall model","ERROR1"), DF has value vector(0,0,
...,ModelDF,ErrorDF) and SS has value vector(0,0,...,ModelDeviance,
ErrorDeviance).
glmfit(Model,dist:DistName,link:LinkName,inc:T,...) computes the full
fitted model and all partial models -- only a constant term, the
constant and the first term, and so on. It prints an Analysis of
Deviance table, with one line for each term, representing a difference
2*L(i) - 2*L(i-1) where L(i) is the maximumized log likely for a model
including terms 1 through i, plus the deviance of the complete model
labeled as "ERROR1". Each line except the last can be used in a
chi-squared test to test the significance of the term on the assumption
that the true model includes no later terms. The value of 'inc' is
ignored when DistName is "normal" and LinkName is "identity".
The use of glmfit() provides an alternative method to specify a logistic
or probit analysis of binomial responses, or a log linear analysis of
Poisson responses.
Function DistName LinkName
logistic() "binomial" "logit"
probit() "binomial" "probit"
poisson() "poisson" "log"
anova() "normal" "identity"
In the future additional distributions such as "gamma" will be
implemented, as well as additional links such as "sqrt", "recip", or
"power". If you specify an unimplemented combination of LinkName and
DistName, an informative error message is printed.
When fitting a model with a binomial dependent variable, a warning
message similar to the following
WARNING: problimit = 1e-08 was hit by glmfit() at least once
usually indicates either the presence of an extreme outlier or a best
fitting model in which many of the probabilities are almost exactly 0 or
1. The latter case may not represent any problem, since the fitted
probabilities at these points will be 1e-8 or 1 - e-8. You can try
reducing the threshold using keyword 'problimit' (see below), but you
will probably just get the message again.
Other Keyword Phrases
Keyword phrase Default Meaning
maxiter:m 50 Positive integer m is the maximum number of
iterations that will be allowed in fitting
epsilon:eps 1e-6 Small positive REAL specifying relative error
in objective function (2*log likelihood)
required to end iteration
problimit:small 1e-8 With dist:"binomial", iteration is restricted
so that no fitted probabilities are < small
or > 1 - small. Value of small must be between
1e-15 and 0.0001.
offsets:OffVec none Causes model to be fit to link to be 1*Offvec +
Model, where OffVec is a REAL vector the same
length as response y. OffVec must be in the
same units as the link function, say, logits,
logs, or probits. See topic 'glm_keys' for
more information and poisson(), logistic() and
probit() for examples.
scale:sigma 1 sigma must be a positive REAL scalar or ?
(MISSING). Its value will replace a default
multiplier used by secoefs() and contrast() to
compute standard errors. If the value is
MISSING, sigma will be computed as sqrt(SS[m]/
DF[m]), where m = length(SS). The default is 1
unless dist is "normal" when it is sqrt(SS[m]/
DF[m]). In secoefs(), scale multiplies the
square roots of the diagonal values of the
inverse of X'WX, where X is the matrix of
X-variables, and W is a diagonal matrix of
weights computed using the converged fit.
See topic 'glm_keys' for details on keyword phrases print:F, silent:T,
coefs:F.
See also topics logistic(), poisson(), probit(), 'glm'.
Gary Oehlert
2003-01-15