factor(n1 [, n2, ...]) where n1, n2, ... are REAL scalars or vectors, all of whose elements are positive integers. |

factor(A), where A is a vector of positive integers or MISSING, creates a vector with contents identical to A except that the new vector is marked as a "factor" with number of factor levels = max(A). The non-MISSING elements of A must be positive integers <= 32767. Since the number of factor levels is the largest integer in A, both factor(vector(1,2,4)) and factor(vector(1,2,3,4)) produce factors marked as having four levels, although only three of the levels are present in factor(vector(1,2,4)). Argument A can also be a matrix or array when isvector(A) is True, that is, when all dimensions beyond the first must be 1. In that case the result has the same dimensions as A. factor(a1, a2, ... ak) is equivalent to factor(vector(a1, a2, ... ak)) where a1, ..., ak are all scalars or vectors. When A is a LOGICAL vector, factor(A) is equivalent to factor(A+1), that is, False and True are translated to levels 1 and 2, respectively. The result is always marked as having 2 factor levels, even if every element of A is False. The purpose of marking a variable as a factor is to ensure that, when it is a variable in a model for a non-regression GLM (generalized linear or linear model) command such as anova() or poisson(), its values are interpreted as specifying levels of a categorical (non-quantitative) variable, that is, classes or categories. A vector in a model that has not been marked as a factor using factor() is called a "variate" and its values are taken to specify quantities, even if they are all positive integers. In regress(), and screen() factors are treated the same as variates -- that is the levels are viewed as quantitative. In a model which includes both factors and variates, the variates are often referred to as "covariates". A common mistake in using GLM commands is to forget to use factor() to turn vectors of factor levels into factors. This error results in their being treated as variates with single degrees of freedom. When A is a factor with k levels and J is an appropriate subscript for A (for example, J might be A != 3, vector(1,run(3,length(A))) or -2), A[J] is also marked as a factor with k levels, even if max(A[J]) < k. When A is a factor with k levels, A[j] <- newvalue is legal only if newvalue is an integer between 1 and k. The number of levels associated with A will not change even if max(A) < k after the replacement. In both these last two situations, subscripting a factor or assigning to a subscripted factor, it is possible to create a factor whose actual maximum level is less than k. However, the actual maximum factor level will be used in any analysis. See also topics makefactor(), 'models'.

Gary Oehlert 2003-01-15