Next: secoefs() Up: MacAnova Help File Previous: scalars   Contents

screen()

Usage:
screen([Model] [, method:"cp" or "rsq" or "adjrsq", mbest:m, forced:fn,\
  s2:mse, penalty:pen, keep:items]), m positive integer, fn vector of
  positive integers, mse and pen positive REAL scalars, items CHARACTER
  scalar or vector with elements "p", "cp", "rsq", "adjrsq", "model",
  "intmodel" or "all".



Keywords: glm, regression
screen(Model) screens all the regression models based on one or more of
the X variables given in the CHARACTER argument Model.  The best of
these models are printed, together with the values of Mallow's Cp =
RSS/MSE + 2*p - n, multiple R^2 = coefficient of determination, and
adjusted R^2 = 1 - (n-1)*(1-R^2)/(n-p), where p is the number of
coefficients in the model, including the constant term.

In the definition of Cp, RSS is the residual sum of squares for a model
and MSE is either the residual mean square from the model using all the
variables or the value of optional keyword 's2'.  When optional keyword
'penalty' is used, its value replaces 2 as multiplier of p in the
definition of Cp.  See below for details on 's2' and 'penalty'.

By default, screen() finds the models with the 5 smallest values of
Mallow's Cp statistic.  This default can be changed by keywords 'method'
and 'mbest'; see below.

Model must not specify a model with no constant term.  For example,
screen("y=x1+x2+x3-1") is illegal.  See topic 'models' for information
on the form of Model.

screen(Model, keep:Items) does the same, except that nothing is printed.
Instead information specified by CHARACTER scalar or vector Items is
returned as the value of screen().   See below for permissible values.

screen(model, keep:Items, print:T) both prints the results and returns
those results specified by Items.

                Permissible values of elements of Items:
  "p"        Number of coefficients fit including constant (intercept)
  "cp"       Mallow's Cp statistic
  "rsq"      Multiple R^2
  "adjrsq"   Adjusted R^2
  "model"    Models selected in the CHARACTER form expected by regress()
  "intmodel" Integer Matrix with each column containing the indices of
             the variables in one of the selected models
  "all"      All of the above

The values produced by the first 5 of these are vectors with one element
for every model selected; "intmodel" produces a matrix with one column
for every model selected and nv rows, where nv is the number of
independent variables in Model.

When more than one item is specified, they are returned as components of
a structure with names as in this list.

vector("y=x1","y=x1+x3+x4","y=x2+x3+x4","y=x1+x2","y=x4") would be
an example of a CHARACTER vector that might be produced by keep:"model".
For this case, if there are 4 variables in all in the obvious order, the
"intmodel" value would be the matrix
                          [ 1   1   2   1   4]
                          [ 0   3   3   2   0]
                          [ 0   4   4   0   0]
                          [ 0   0   0   0   0]

                             Other Keywords
 Keyword  Type of value           Default            Meaning
  method  CHARACTER variable        "cp"  Criterion ("cp", "rsq", or
                                          "adjrsq") for subset selection

  mbest   Positive integer            5   Number of subsets to be found

  forced  REAL Vector of positive   none  List of independent variables
          integers or independent         to be forced into all subsets
          variable names

  s2      Positive REAL number       MSE  Replacement for full model
                                          MSE in computing Cp

  penalty Positive REAL number        2   Multiplier of p in computing
                                          Cp

  print   T or F                          print:T forces printing when
                                          'keep' is used.

Keyword 'silent', 'fstats', and 'pvals' are illegal for screen(), and
'print:F' is illegal unless keyword 'keep' is used.

The value of keyword 'method' determines the criterion to be used to
rank regressions.
    Value       Criterion Used   What is better
     "cp"        Mallow's Cp      Smaller
     "rsq"       Multiple R^2     Larger
     "adjrsq"    Adjusted R^2     Larger

The number of subset regressions computed is determined by the value of
'mbest'.

With method:"adjrsq" or method:"cp", exactly mbest regressions are
printed or returned.

With method:"rsq", for each possible number m, m = 1, 2, ..., nv, of
variables, screen() selects the mbest models with largest value of R^2,
where nv is the number of variables in the model.  Thus in this case, up
to (nv-1)*mbest + 1 models would be selected.

The value of 'forced' should be a list of names of any variables that
should be forced into the model.  For instance, with forced:vector("x1",
"x2"), all models examined would include x1 and x2.  By default, no
variables are forced.  The value of 'forced' can also be a vector of
integers, say forced:vector(1,2).

The value of 'penalty' is used to compute Cp = RSS/MSE + penalty*p - n.
Larger values increase the "penalty" of including additional variables
and tends to produce models with fewer variables.  The default value is
2.

The value of 's2' is the MSE to be used in computing Cp.  If not
specified, the mean square error from the complete model is used.

        Examples, all assuming Model is "y=x1+x2+x3+x4+x5+x6+x7"

Screen with defaults mbest = 5,method = "cp",and penalty = 2
  Cmd> screen(Model)

Screen for 10 best models using adjusted R^2 with variable x3 forced
into the model:
  Cmd> screen(Model,mbest:10,forced:"x3",method:"adjrsq") # or forced:3

Screen using Cp over models with x6 forced in and penalty factor of 3:
  Cmd> screen(Model,forced:"x6",penalty:3)

Screen using defaults, saving p, cp, and the models, and printing
results:
  Cmd> result <- screen(Model,keep:vector("p","cp","model"),print:T)

  Cmd> regress(result$model[1]) # compute regression with best model

screen() uses a branch and bound algorithm due to Furnival and Wilson.
See their paper, Regression by Leaps and Bounds, Technometrics 16 (1974)
499-511.

See also regress() and anova().


Gary Oehlert 2003-01-15