Regression: regress("y=x1+x2+...+xk") One-way ANOVA: anova("y=A"), factor A Randomized block ANOVA: anova("y=Repl+A"), factors Repl and A Nested ANOVA: anova("y=A/B") or anova("y=A+A.B") Two-way factorial: anova("y=A*B") or anova("y=A+B+A.B"), factors A and B Completely randomized Split plot ANOVA: anova("y=A+E(Repl.A)+B+A.B"), factors A, B, and Repl Analysis of covariance: anova("y=x+A"), factor A, variate x Transform variables on the fly: regress("{log10(y)}={sqrt(x)}") Polynomial regression: regress("y=P3(x)") Periodic regression: regress("y=C2(2*PI*hour/24)") |

All of the GLM (generalized linear or linear model) commands such as regress(), anova(), or poisson() require you to specify a model in a quoted string or CHARACTER variable. A model can be specified as "Response = Term" or "Response = Term1 + Term2 + ..." where 'Response' is the name of the dependent or response variable and each term is of the form Name or Name1.Name2. ... .Namek, where Name, Name1, Name2 ... are the names of variables. A period or dot (.) between the variable names is interpreted as a product operator indicating that all combinations of the variable values are included in the model. Model Variables Variables in model terms, including those computed on the fly (see below), may be either factors (vectors of positive integers created using factor()) or variates. No more than one variate may appear in a single term. Up to 95 variables may appear in a model, including no more than 31 factors. Factors and variates must be vectors or matrices with one column. They must all have the same number of rows as the response variable. Any factors must have been created using function factor() or been selected from such a variable using subscripts. For balanced designs with factor levels in a reasonable order a factor may often be computed by factor(rep(run(r),s)), factor(rep(run(s), rep(r,s))), or something similar. See factor(), rep(). The constant term may be specified as 1, but is always included by default, that is "y = Model" is equivalent to "y = 1 + Model". You can omit a constant term by "y = Model - 1" or move it to the end by "y = Model - 1 + 1". Computing Variables "on the fly" You can transform or otherwise compute model variables "on the fly." In place of the name of a variable, including the response variable, you can use {Expr}, where Expr is a MacAnova expression such as x^2 or log10(y). If the same expression, say {sqrt(x)}, appears more than once in a model, it is evaluated only once and only one model variable is introduced. In comparing expressions, leading and trailing spaces are ignored, so that { sqrt(x) } is considered the same as {sqrt(x)}; however, other differences in the presence or placement of spaces will cause expressions to be considered different variables. For example, {sqrt( x)} is not recognized to be the same as {sqrt(x)}. The only limitation on Expr is that it may not directly or indirectly execute another GLM command. Since subscripted factors remain factors (see 'subscripts'), when groups is a factor, anova("{y[-3]} = {groups[-3]}") computes a one factor analysis of variance omitting case 3. Examples a and b are factors and y, x1, x2, and x3 are REAL vectors Model Description "y = a + b + a.b" Two factor model with both main effects and interaction "y = a + a.b" Two factor model with b nested in a "y = x1 + x2 + x3" Three variable multiple regression "{sqrt(y)} = x1 + {x1^2}" 2nd order polynomial regression of square root of y on x. Shortcuts for Polynomial and Periodic Regression You can use special short cuts of the form Pn(expr) and Cn(expr) to specify a polynomial term or a periodic term, respectively, where n is an integer between 1 and 95 and expr is a MacAnova expression. For example, P4(x-10) expands to ({x-10}+{(x-10)^2}+{(x-10)^3}+{(x-10)^4}) and C2(2*PI*x/24) expands to ({cos(2*PI*x/24)}+{sin(2*PI*x/24)}+ {cos(2*(2*PI*x/24))}+{sin(2*(2*PI*x/24))}). Pn(expr) and Cn(expr) can be used wherever a variable name can be used on the right side of '=', except not in a {...} expression. Thus the last example in the preceding list could have been written "{sqrt(y)} = P2(x1)". They can be "dotted" with a factor. For example, P2(x).a expands to {x}.a + {(x)^2}.a. If you are doing a regression on a subset of cases uses subscripts, the subscripts must be applied to x, not Cn(x) or Pn(x). For example, Cmd> regress("{y[-run(3)]} = P3(x[-run(3)])") fits a cubic polynomial omitting the first 3 rows of x and y. See below for other shortcuts you can use to specify models. Combining Variables Parts of terms can be replaced by 'submodels', enclosed in parentheses, for example, Cmd> anova("y = a + b + c + d + (a + b).(c + d)") is equivalent to Cmd> anova("y = a + b + c + d + a.c + b.c + a.d + b.d") The product of a factor or variate with itself (a.a) is equivalent to the variate or factor itself. For example, Cmd> anova("y = (a + b).(a + c)") is equivalent to Cmd> anova("y = a + b.a + a.c + b.c") The order of factors and variates in a term is immaterial. That is a.b is equivalent to b.a. The order of terms in a model is very important since fitting a model is done sequentially, term by term. For example, although "y = a + a.b" is a model with b nested in a, "y = a.b + a" is computationally equivalent to "y = a.b", since after fitting all combinations of a and b, there is nothing left for 'a' to fit. If a term in a model is duplicated, only the first occurrence is retained. For example, (a + b).(a + b) expands to a.a + b.a + b.a + b.b which is equivalent to a + a.b + a.b + b which is trimmed to a + a.b + b (which is computationally equivalent to a + a.b). Order of terms in expanded models If M1, M2, ..., Mk, N1, N2, ..., Nl are terms or submodels, (M1 + M2 + ... + Mk).(N1 + N2 + ... + Nl) is equivalent to M1.N1 + M2.N1 + ... + Mk.N1 + M1.N2 + M2.N2 + ... + Mk.N2 + ... + Mk.Nl If M1, M2, ..., Mk are terms or submodels, M1.M2.M3. ... .Mk is expanded as (...((M1.M2).M3) ... ).Mk. Short cut formulas for combining terms or submodels In the following, M1, M2, ... are terms or submodels. M1*M2 is an abbreviation for M1 + M2 + M1.M2 M1*M2* ... *Mk is an abbreviation for (...((M1*M2)*M3) ... )*Mk. In particular, M1*M2*M3 is an abbreviation for (M1*M2)*M3, that is for M1 + M2 + M1.M2 + M3 + M1.M3 + M2.M3 + M1.M2.M3 M1^N is an abbreviation for M1.(1+M1). ... .(1+M1), where there are N factors. N must be a digit between 1 and 31. This contains the same terms as M1*M1*...*M1 (N factors) but in a different order. For example, (a+b+c+d)^4 has main effects followed by 2-way interactions followed by 3-way interactions followed by a.b.c.d. Note that M1^N is usually not equivalent to and does not contain the same terms as M1. ... .M1 (N dot factors). M1/M2 is an abbreviation for M1 + MM1.M2 where MM1 has the form a.b. ... .z, where a, b, ..., z are all the factors and/or variates in M1. For example, (a+b+c)/(d+e) is equivalent to a+b+c+a.b.c.d+ a.b.c.e. Note: Earlier versions of help and other documentation had a different definition which was correct only in some common simple cases. M1 - M2 is an abbreviation for a model containing all the terms in M1, omitting any term in M2. In particular Model - 1 specifies a model with no constant term or intercept and Model - 1 + 1 specifies a model with a contant term that is fit after all other terms in Model. M1 -* M2 is an abbreviation for a model containing all the terms in M1, but omitting any terms containing all the variables in any term of M2. Examples of use of shortcuts. Note the order of the exanded terms. "y = a*b" is equivalent to "y = a + b + a.b" "y = a/b" is equivalent to "y = a + a.b" "y = a*b*c" is equivalent to "y = a + b + a.b + c + a.c + b.c + a.b.c" "y = (a+b+c)^2" is equivalent to "y = a + b + c + a.b + a.c + b.c" "y = (a+b+c)^3" is equivalent to "y = a + b + c + a.b + a.c + b.c + a.b.c" "y = a*b*c - a.b.c" is equivalent to "y = a + b + a.b + c + a.c + b.c" "y = a*b*c -* (a.b + a.c)" is equivalent to "y = a + b + c + b.c" Note although "y=a*b*c" and "y=(a+b+c)^3" contain the same terms when expanded, they are in a different order. Error Terms In the output from commands such as anova() or poisson() that produce an analysis of variance or deviance table, there is always one line, usually labeled "ERROR1", following all the terms explicitly or implicitly specified in Model. It consists of the sum of squares or deviance associated with all the degrees of freedom not included in the model. If the model fitted uses up all the degrees of freedom this line will still be present, but will have 0 degrees of freedom. You can also label other terms as ERROR. If a term is of the form E(Term) (for example, E(a.b.c)), it will be labeled "ERRORn" in the ANOVA table, where n is 1, 2, ..., . The final error line will still be printed but will be labeled "ERRORm", where m-1 is the number of error terms you specified. E(1) is not legal, nor is it legal to specify a term as an error term more than once (E(a.b) + E(a.b)). Moreover, once a term is designated as an error term, it cannot be deleted by '-' or '-*'. Term in E(term) must be a single factor or a pure product of factors. For example, E(a.b+a.b.c) is illegal. A '#' in Model marks the end of the model, allowing models to be self- documenting as in anova("y = a + b #additive model"). Any GLM command sets the CHARACTER variable STRMODEL to the specified model as a "side effect" of the analysis. If no model is specified on a subsequent GLM commands (for example anova()), it is taken from this variable. Alternatively, if you set STRMODEL directly, for example Cmd> STRMODEL <- "y = x1 + x2 + x3" then the value of STRMODEL will be used by the next GLM command if it has no model as argument. Note, however, when you assign a value to STRMODEL, MacAnova discards the internal information saved by the most recent GLM command that is used by functions such as secoefs() and contrast(). Examples of GLM Models Cmd> anova("y = a + b + a.b") # or anova("y = a*b") will produce a two-way analysis of variance with interaction for the response in y, provided vector y is defined and a and b are factors with the same length as y. Cmd> anova("y = a + a.b") # or anova("y = a/b") where a and b are factors will produce a nested analysis of variance with b nested within a. Cmd> anova("y = blk + a + E(a.blk) + b + a.b") would be appropriate for the analysis of a two factor split plot experiment with the whole plot treatments in a randomized block design. Do not attempt to use the name 'rep' for a blocking factor, since 'rep' is the name of a built-in operation.

Gary Oehlert 2003-01-15