Next: secoefs()
Up: MacAnova Help File
Previous: scalars
Contents
Usage:
screen([Model] [, method:"cp" or "rsq" or "adjrsq", mbest:m, forced:fn,\
s2:mse, penalty:pen, keep:items]), m positive integer, fn vector of
positive integers, mse and pen positive REAL scalars, items CHARACTER
scalar or vector with elements "p", "cp", "rsq", "adjrsq", "model",
"intmodel" or "all".
|
Keywords:
glm, regression
screen(Model) screens all the regression models based on one or more of
the X variables given in the CHARACTER argument Model. The best of
these models are printed, together with the values of Mallow's Cp =
RSS/MSE + 2*p - n, multiple R^2 = coefficient of determination, and
adjusted R^2 = 1 - (n-1)*(1-R^2)/(n-p), where p is the number of
coefficients in the model, including the constant term.
In the definition of Cp, RSS is the residual sum of squares for a model
and MSE is either the residual mean square from the model using all the
variables or the value of optional keyword 's2'. When optional keyword
'penalty' is used, its value replaces 2 as multiplier of p in the
definition of Cp. See below for details on 's2' and 'penalty'.
By default, screen() finds the models with the 5 smallest values of
Mallow's Cp statistic. This default can be changed by keywords 'method'
and 'mbest'; see below.
Model must not specify a model with no constant term. For example,
screen("y=x1+x2+x3-1") is illegal. See topic 'models' for information
on the form of Model.
screen(Model, keep:Items) does the same, except that nothing is printed.
Instead information specified by CHARACTER scalar or vector Items is
returned as the value of screen(). See below for permissible values.
screen(model, keep:Items, print:T) both prints the results and returns
those results specified by Items.
Permissible values of elements of Items:
"p" Number of coefficients fit including constant (intercept)
"cp" Mallow's Cp statistic
"rsq" Multiple R^2
"adjrsq" Adjusted R^2
"model" Models selected in the CHARACTER form expected by regress()
"intmodel" Integer Matrix with each column containing the indices of
the variables in one of the selected models
"all" All of the above
The values produced by the first 5 of these are vectors with one element
for every model selected; "intmodel" produces a matrix with one column
for every model selected and nv rows, where nv is the number of
independent variables in Model.
When more than one item is specified, they are returned as components of
a structure with names as in this list.
vector("y=x1","y=x1+x3+x4","y=x2+x3+x4","y=x1+x2","y=x4") would be
an example of a CHARACTER vector that might be produced by keep:"model".
For this case, if there are 4 variables in all in the obvious order, the
"intmodel" value would be the matrix
[ 1 1 2 1 4]
[ 0 3 3 2 0]
[ 0 4 4 0 0]
[ 0 0 0 0 0]
Other Keywords
Keyword Type of value Default Meaning
method CHARACTER variable "cp" Criterion ("cp", "rsq", or
"adjrsq") for subset selection
mbest Positive integer 5 Number of subsets to be found
forced REAL Vector of positive none List of independent variables
integers or independent to be forced into all subsets
variable names
s2 Positive REAL number MSE Replacement for full model
MSE in computing Cp
penalty Positive REAL number 2 Multiplier of p in computing
Cp
print T or F print:T forces printing when
'keep' is used.
Keyword 'silent', 'fstats', and 'pvals' are illegal for screen(), and
'print:F' is illegal unless keyword 'keep' is used.
The value of keyword 'method' determines the criterion to be used to
rank regressions.
Value Criterion Used What is better
"cp" Mallow's Cp Smaller
"rsq" Multiple R^2 Larger
"adjrsq" Adjusted R^2 Larger
The number of subset regressions computed is determined by the value of
'mbest'.
With method:"adjrsq" or method:"cp", exactly mbest regressions are
printed or returned.
With method:"rsq", for each possible number m, m = 1, 2, ..., nv, of
variables, screen() selects the mbest models with largest value of R^2,
where nv is the number of variables in the model. Thus in this case, up
to (nv-1)*mbest + 1 models would be selected.
The value of 'forced' should be a list of names of any variables that
should be forced into the model. For instance, with forced:vector("x1",
"x2"), all models examined would include x1 and x2. By default, no
variables are forced. The value of 'forced' can also be a vector of
integers, say forced:vector(1,2).
The value of 'penalty' is used to compute Cp = RSS/MSE + penalty*p - n.
Larger values increase the "penalty" of including additional variables
and tends to produce models with fewer variables. The default value is
2.
The value of 's2' is the MSE to be used in computing Cp. If not
specified, the mean square error from the complete model is used.
Examples, all assuming Model is "y=x1+x2+x3+x4+x5+x6+x7"
Screen with defaults mbest = 5,method = "cp",and penalty = 2
Cmd> screen(Model)
Screen for 10 best models using adjusted R^2 with variable x3 forced
into the model:
Cmd> screen(Model,mbest:10,forced:"x3",method:"adjrsq") # or forced:3
Screen using Cp over models with x6 forced in and penalty factor of 3:
Cmd> screen(Model,forced:"x6",penalty:3)
Screen using defaults, saving p, cp, and the models, and printing
results:
Cmd> result <- screen(Model,keep:vector("p","cp","model"),print:T)
Cmd> regress(result$model[1]) # compute regression with best model
screen() uses a branch and bound algorithm due to Furnival and Wilson.
See their paper, Regression by Leaps and Bounds, Technometrics 16 (1974)
499-511.
See also regress() and anova().
Gary Oehlert
2003-01-15