Next: modelvars() Up: MacAnova Help File Previous: modelinfo()   Contents

# models

Usage:
 ```Regression: regress("y=x1+x2+...+xk") One-way ANOVA: anova("y=A"), factor A Randomized block ANOVA: anova("y=Repl+A"), factors Repl and A Nested ANOVA: anova("y=A/B") or anova("y=A+A.B") Two-way factorial: anova("y=A*B") or anova("y=A+B+A.B"), factors A and B Completely randomized Split plot ANOVA: anova("y=A+E(Repl.A)+B+A.B"), factors A, B, and Repl Analysis of covariance: anova("y=x+A"), factor A, variate x Transform variables on the fly: regress("{log10(y)}={sqrt(x)}") Polynomial regression: regress("y=P3(x)") Periodic regression: regress("y=C2(2*PI*hour/24)") ```

Keywords: glm, anova, regression
```All of the GLM (generalized linear or linear model) commands such as
regress(), anova(), or poisson() require you to specify a model in a
quoted string or CHARACTER variable.

A model can be specified as
"Response = Term"   or   "Response = Term1 + Term2 + ..."
where 'Response' is the name of the dependent or response variable
and each term is of the form Name or Name1.Name2. ... .Namek, where
Name, Name1, Name2 ... are the names of variables.  A period or dot (.)
between the variable names is interpreted as a product operator
indicating that all combinations of the variable values are included in
the model.

Model Variables
Variables in model terms, including those computed on the fly (see
below), may be either factors (vectors of positive integers created
using factor()) or variates.  No more than one variate may appear in a
single term.  Up to 95 variables may appear in a model, including no
more than 31 factors.

Factors and variates must be vectors or matrices with one column.  They
must all have the same number of rows as the response variable.

Any factors must have been created using function factor() or been
selected from such a variable using subscripts.  For balanced designs
with factor levels in a reasonable order a factor may often be computed
by factor(rep(run(r),s)), factor(rep(run(s), rep(r,s))), or something
similar.  See factor(), rep().

The constant term may be specified as 1, but is always included by
default, that is "y = Model" is equivalent to "y = 1 + Model".  You can
omit a constant term by "y = Model - 1" or move it to the end by
"y = Model - 1 + 1".

Computing Variables "on the fly"
You can transform or otherwise compute model variables "on the fly."  In
place of the name of a variable, including the response variable, you
can use {Expr}, where Expr is a MacAnova expression such as x^2 or
log10(y).  If the same expression, say {sqrt(x)}, appears more than once
in a model, it is evaluated only once and only one model variable is
introduced.  In comparing expressions, leading and trailing spaces are
ignored, so that { sqrt(x) } is considered the same as {sqrt(x)};
however, other differences in the presence or placement of spaces will
cause expressions to be considered different variables.  For example,
{sqrt( x)} is not recognized to be the same as {sqrt(x)}.  The only
limitation on Expr is that it may not directly or indirectly execute
another GLM command.

Since subscripted factors remain factors (see 'subscripts'), when groups
is a factor, anova("{y[-3]} = {groups[-3]}") computes a one factor
analysis of variance omitting case 3.

Examples
a and b are factors and y, x1, x2, and x3 are REAL vectors
Model                       Description
"y = a + b + a.b"           Two factor model with both main effects
and interaction
"y = a + a.b"               Two factor model with b nested in a
"y = x1 + x2 + x3"          Three variable multiple regression
"{sqrt(y)} = x1 + {x1^2}"   2nd order polynomial regression of square
root of y on x.

Shortcuts for Polynomial and Periodic Regression
You can use special short cuts of the form Pn(expr) and Cn(expr) to
specify a polynomial term or a periodic term, respectively, where n is
an integer between 1 and 95 and expr is a MacAnova expression.  For
example, P4(x-10) expands to ({x-10}+{(x-10)^2}+{(x-10)^3}+{(x-10)^4})
and C2(2*PI*x/24) expands to ({cos(2*PI*x/24)}+{sin(2*PI*x/24)}+
{cos(2*(2*PI*x/24))}+{sin(2*(2*PI*x/24))}).

Pn(expr) and Cn(expr) can be used wherever a variable name can be used
on the right side of '=', except not in a {...} expression.  Thus the
last example in the preceding list could have been written "{sqrt(y)} =
P2(x1)".  They can be "dotted" with a factor.  For example, P2(x).a
expands to {x}.a + {(x)^2}.a.

If you are doing a regression on a subset of cases uses subscripts, the
subscripts must be applied to x, not Cn(x) or Pn(x).  For example,
Cmd> regress("{y[-run(3)]} = P3(x[-run(3)])")
fits a cubic polynomial omitting the first 3 rows of x and y.

See below for other shortcuts you can use to specify models.

Combining Variables
Parts of terms can be replaced by 'submodels', enclosed in
parentheses, for example,
Cmd> anova("y = a + b + c + d + (a + b).(c + d)")
is equivalent to
Cmd> anova("y = a + b + c + d + a.c + b.c + a.d + b.d")

The product of a factor or variate with itself (a.a) is equivalent to
the variate or factor itself.  For example,
Cmd> anova("y = (a + b).(a + c)")
is equivalent to
Cmd> anova("y = a + b.a + a.c + b.c")

The order of factors and variates in a term is immaterial.  That is a.b
is equivalent to b.a.

The order of terms in a model is very important since fitting a model is
done sequentially, term by term.  For example, although "y = a + a.b" is
a model with b nested in a, "y = a.b + a" is computationally equivalent
to "y = a.b", since after fitting all combinations of a and b, there is
nothing left for 'a' to fit.

If a term in a model is duplicated, only the first occurrence is
retained.  For example, (a + b).(a + b) expands to a.a + b.a + b.a + b.b
which is equivalent to a + a.b + a.b + b which is trimmed to a + a.b + b
(which is computationally equivalent to a + a.b).

Order of terms in expanded models
If M1, M2, ..., Mk, N1, N2, ..., Nl are terms or submodels,
(M1 + M2 + ... + Mk).(N1 + N2 + ... + Nl) is equivalent to
M1.N1 + M2.N1 + ... + Mk.N1 + M1.N2 + M2.N2 + ... + Mk.N2 + ... + Mk.Nl

If M1, M2, ..., Mk are terms or submodels, M1.M2.M3. ... .Mk is expanded
as (...((M1.M2).M3) ... ).Mk.

Short cut formulas for combining terms or submodels
In the following, M1, M2, ... are terms or submodels.

M1*M2 is an abbreviation for M1 + M2 + M1.M2

M1*M2* ... *Mk is an abbreviation for (...((M1*M2)*M3) ... )*Mk.  In
particular, M1*M2*M3 is an abbreviation for (M1*M2)*M3, that is for
M1 + M2 + M1.M2 + M3 + M1.M3 + M2.M3 + M1.M2.M3

M1^N is an abbreviation for M1.(1+M1). ... .(1+M1), where there are N
factors.  N must be a digit between 1 and 31.  This contains the same
terms as M1*M1*...*M1 (N factors) but in a different order.  For
example, (a+b+c+d)^4 has main effects followed by 2-way interactions
followed by 3-way interactions followed by a.b.c.d.  Note that M1^N is
usually not equivalent to and does not contain the same terms as
M1. ... .M1 (N dot factors).

M1/M2 is an abbreviation for M1 + MM1.M2 where MM1 has the form
a.b. ... .z, where a, b, ..., z are all the factors and/or variates in
M1.  For example, (a+b+c)/(d+e) is equivalent to a+b+c+a.b.c.d+
a.b.c.e.  Note: Earlier versions of help and other documentation had a
different definition which was correct only in some common simple
cases.

M1 - M2 is an abbreviation for a model containing all the terms in M1,
omitting any term in M2.  In particular Model - 1 specifies a model
with no constant term or intercept and Model - 1 + 1 specifies a model
with a contant term that is fit after all other terms in Model.

M1 -* M2 is an abbreviation for a model containing all the terms in M1,
but omitting any terms containing all the variables in any term of M2.

Examples of use of shortcuts.  Note the order of the exanded terms.
"y = a*b" is equivalent to "y = a + b + a.b"
"y = a/b" is equivalent to "y = a + a.b"
"y = a*b*c" is equivalent to "y = a + b + a.b + c + a.c + b.c +
a.b.c"
"y = (a+b+c)^2" is equivalent to "y = a + b + c + a.b + a.c + b.c"
"y = (a+b+c)^3" is equivalent to "y = a + b + c + a.b + a.c + b.c +
a.b.c"
"y = a*b*c - a.b.c" is equivalent to "y = a + b + a.b + c + a.c +
b.c"
"y = a*b*c -* (a.b + a.c)" is equivalent to "y = a + b + c + b.c"

Note although "y=a*b*c" and "y=(a+b+c)^3" contain the same terms when
expanded, they are in a different order.

Error Terms
In the output from commands such as anova() or poisson() that produce an
analysis of variance or deviance table, there is always one line,
usually labeled "ERROR1", following all the terms explicitly or
implicitly specified in Model.  It consists of the sum of squares or
deviance associated with all the degrees of freedom not included in the
model.  If the model fitted uses up all the degrees of freedom this line
will still be present, but will have 0 degrees of freedom.

You can also label other terms as ERROR.  If a term is of the form
E(Term) (for example, E(a.b.c)), it will be labeled "ERRORn" in the
ANOVA table, where n is 1, 2, ..., .  The final error line will still be
printed but will be labeled "ERRORm", where m-1 is the number of error
terms you specified.  E(1) is not legal, nor is it legal to specify a
term as an error term more than once (E(a.b) + E(a.b)).  Moreover, once
a term is designated as an error term, it cannot be deleted by '-' or
'-*'.  Term in E(term) must be a single factor or a pure product of
factors.  For example, E(a.b+a.b.c) is illegal.

A '#' in Model marks the end of the model, allowing models to be self-
documenting as in anova("y = a + b #additive model").

Any GLM command sets the CHARACTER variable STRMODEL to the specified
model as a "side effect" of the analysis.  If no model is specified on a
subsequent GLM commands (for example anova()), it is taken from this
variable.  Alternatively, if you set STRMODEL directly, for example
Cmd> STRMODEL <- "y = x1 + x2 + x3"
then the value of STRMODEL will be used by the next GLM command if it
has no model as argument.  Note, however, when you assign a value to
STRMODEL, MacAnova discards the internal information saved by the most
recent GLM command that is used by functions such as secoefs() and
contrast().

Examples of GLM Models
Cmd> anova("y = a + b + a.b") # or anova("y = a*b")
will produce a two-way analysis of variance with interaction for the
response in y, provided vector y is defined and a and b are factors with
the same length as y.

Cmd> anova("y = a + a.b") # or anova("y = a/b")
where a and b are factors will produce a nested analysis of variance
with b nested within a.

Cmd> anova("y = blk + a + E(a.blk) + b + a.b")
would be appropriate for the analysis of a two factor split plot
experiment with the whole plot treatments in a randomized block design.
Do not attempt to use the name 'rep' for a blocking factor, since 'rep'
is the name of a built-in operation.
```

Gary Oehlert 2003-01-15