screen([Model] [, method:"cp" or "rsq" or "adjrsq", mbest:m, forced:fn,\ s2:mse, penalty:pen, keep:items]), m positive integer, fn vector of positive integers, mse and pen positive REAL scalars, items CHARACTER scalar or vector with elements "p", "cp", "rsq", "adjrsq", "model", "intmodel" or "all". |

screen(Model) screens all the regression models based on one or more of the X variables given in the CHARACTER argument Model. The best of these models are printed, together with the values of Mallow's Cp = RSS/MSE + 2*p - n, multiple R^2 = coefficient of determination, and adjusted R^2 = 1 - (n-1)*(1-R^2)/(n-p), where p is the number of coefficients in the model, including the constant term. In the definition of Cp, RSS is the residual sum of squares for a model and MSE is either the residual mean square from the model using all the variables or the value of optional keyword 's2'. When optional keyword 'penalty' is used, its value replaces 2 as multiplier of p in the definition of Cp. See below for details on 's2' and 'penalty'. By default, screen() finds the models with the 5 smallest values of Mallow's Cp statistic. This default can be changed by keywords 'method' and 'mbest'; see below. Model must not specify a model with no constant term. For example, screen("y=x1+x2+x3-1") is illegal. See topic 'models' for information on the form of Model. screen(Model, keep:Items) does the same, except that nothing is printed. Instead information specified by CHARACTER scalar or vector Items is returned as the value of screen(). See below for permissible values. screen(model, keep:Items, print:T) both prints the results and returns those results specified by Items. Permissible values of elements of Items: "p" Number of coefficients fit including constant (intercept) "cp" Mallow's Cp statistic "rsq" Multiple R^2 "adjrsq" Adjusted R^2 "model" Models selected in the CHARACTER form expected by regress() "intmodel" Integer Matrix with each column containing the indices of the variables in one of the selected models "all" All of the above The values produced by the first 5 of these are vectors with one element for every model selected; "intmodel" produces a matrix with one column for every model selected and nv rows, where nv is the number of independent variables in Model. When more than one item is specified, they are returned as components of a structure with names as in this list. vector("y=x1","y=x1+x3+x4","y=x2+x3+x4","y=x1+x2","y=x4") would be an example of a CHARACTER vector that might be produced by keep:"model". For this case, if there are 4 variables in all in the obvious order, the "intmodel" value would be the matrix [ 1 1 2 1 4] [ 0 3 3 2 0] [ 0 4 4 0 0] [ 0 0 0 0 0] Other Keywords Keyword Type of value Default Meaning method CHARACTER variable "cp" Criterion ("cp", "rsq", or "adjrsq") for subset selection mbest Positive integer 5 Number of subsets to be found forced REAL Vector of positive none List of independent variables integers or independent to be forced into all subsets variable names s2 Positive REAL number MSE Replacement for full model MSE in computing Cp penalty Positive REAL number 2 Multiplier of p in computing Cp print T or F print:T forces printing when 'keep' is used. Keyword 'silent', 'fstats', and 'pvals' are illegal for screen(), and 'print:F' is illegal unless keyword 'keep' is used. The value of keyword 'method' determines the criterion to be used to rank regressions. Value Criterion Used What is better "cp" Mallow's Cp Smaller "rsq" Multiple R^2 Larger "adjrsq" Adjusted R^2 Larger The number of subset regressions computed is determined by the value of 'mbest'. With method:"adjrsq" or method:"cp", exactly mbest regressions are printed or returned. With method:"rsq", for each possible number m, m = 1, 2, ..., nv, of variables, screen() selects the mbest models with largest value of R^2, where nv is the number of variables in the model. Thus in this case, up to (nv-1)*mbest + 1 models would be selected. The value of 'forced' should be a list of names of any variables that should be forced into the model. For instance, with forced:vector("x1", "x2"), all models examined would include x1 and x2. By default, no variables are forced. The value of 'forced' can also be a vector of integers, say forced:vector(1,2). The value of 'penalty' is used to compute Cp = RSS/MSE + penalty*p - n. Larger values increase the "penalty" of including additional variables and tends to produce models with fewer variables. The default value is 2. The value of 's2' is the MSE to be used in computing Cp. If not specified, the mean square error from the complete model is used. Examples, all assuming Model is "y=x1+x2+x3+x4+x5+x6+x7" Screen with defaults mbest = 5,method = "cp",and penalty = 2 Cmd> screen(Model) Screen for 10 best models using adjusted R^2 with variable x3 forced into the model: Cmd> screen(Model,mbest:10,forced:"x3",method:"adjrsq") # or forced:3 Screen using Cp over models with x6 forced in and penalty factor of 3: Cmd> screen(Model,forced:"x6",penalty:3) Screen using defaults, saving p, cp, and the models, and printing results: Cmd> result <- screen(Model,keep:vector("p","cp","model"),print:T) Cmd> regress(result$model[1]) # compute regression with best model screen() uses a branch and bound algorithm due to Furnival and Wilson. See their paper, Regression by Leaps and Bounds, Technometrics 16 (1974) 499-511. See also regress() and anova().

Gary Oehlert 2003-01-15