info MACRO ) This file contains MacAnova macros used in multivariate analysis ) Copyright (C) 2004 by Christopher Bingham and Gary W. Oehlert ) ) Many use features available only in MacAnova4.10 or later ) Some use features available only in MacAnova 4.13 or later )) File version 041216 )) All macros are the "new" format (no line count, macro body terminated )) by %macroname%) ) It contains the following macros. ) Descriptive statistics ) covar distcomp groupcovar standardize ) Mean testing related macros ) hotellval hotell2val ) MANOVA hypothesis testing related macros ) cumpillai cumtrace cumwilks seqF ) Factor analysis related macros ) facanal glscrit * mlcrit * ulscrit * ) glsfactor glsresids + stepuls ** stepgls ** ) stepml ** ulsfactor ulsresids + ) Obsolete factor analysis related macro ) goodfit ++ ) Discriminant analysis related macros ) discrim jackknife discrimquad probsquad ) Subset discriminant analysis related macros ) dastepsetup dastepstatus dasteplook daentervar ) daremovevar davarid dastepselect dascreen ) Obsolete subset discriminant analysis related macros ) compf + forstep + backstep + ) Plotting macros ) chiqqplot ) Other miscellaneous macros ) rmvnorm mulvarhelp ) * Used by facanal() ) ** Obsolete except for use in getting starting values in facanal() ) + Used by glsfactor() or ulsfactor() ) ++ Obsolete )) 001210 added subtopics to help )) 030114 moved standardize() here from MacAnova.mac and wrote help )) added macros facanal() and supporting macros glscrit(), )) mlcrit() and ulscrit() and wrote help for them )) renamed mvngen() to rmvnorm() (left in a stub) )) added keys for most topics )) 030814 added subtopic titles )) 030922 minor correction to help )) 031224 exact match of names on dasteplook() is no longer required )) new macro davarid(), used by daentervar() and daremovevar() )) 031226 dastepsetup() uses pushmodel() before invoking manova() )) New macros dastepselect(), black box driver for dastepsetup(), )) daentervar() and daremovevar() and dascreen(), a macro )) for screening all subsets )) 040106 added distcomp() keywords 'center' and 'quadform' )) 041012 added keyword 'silent' and replaced 'verbose' by 'quiet' in )) dastepselect(); changed default for 'mbest', and shape )) and labelling of output in dascreen() )) 041018 Reorganized cumtrace() and changed default operation; added )) 4-th term and method to choose how many terms to use when )) not specified )) 041026 Fixed bug in cumtrace() introduced on 041018; removed extra ')' )) in distcomp() )) 041111 Added keyword phrase 'nsteps:n' to stepuls(), stepgls(), )) stepml(); use eigen() instead of releigen() in stepgls() )) and stepml() )) 041115 Added keyword phrase 'presteps:n' to facanal() specifying )) starting value obtained by n steps of stepuls(), stepgls() )) or stepml(), default is presteps:1; illegal with start:psi0 )) 041201 Added usage lines to dastepselect() help. )) 041202 Cosmetic change to dasteplook() )) 041216 Changed dastepselect() to use F-statistics to choose candidates )) for entering or removing a variable. Previously it used P-values )) which caused indeterminacy when there were several values of )) P = 0. %info% ===> standardize <=== standardize MACRO DOLLARS ) usage: ) ynew <- standardize(y [,locations [,scales]]) ) y REAL vector of length n or REAL n by p matrix ) locations REAL scalar or vector or row vector with ) length(locations)= ncols(y) (default describe(y,means:T)) ) scalar is equivalent to rep(locations, p) ) MISSING elements replaced by mean ) scales REAL scalar vector or row vector with length(scales) ) = ncols(y) (default describe(y,stddev:T)) ) scalar is equivalent to rep(scales, p) ) MISSING elements replaced by standard deviation ) ynew REAL vector or matrix the same size and shape as y ) )) Version 991219 removed $$ )) 030114 allow scalar locations and/or scales )) allow MISSING elements in locations and/or scales # usage: ynew <- $S(y [,locations [,scales]]) if($v > 3 || $v == 0){ error("usage: $S(y [,locs [,scales])",macroname:F) } @y <- argvalue($1,"$1","real matrix") @p <- ncols(@y) if($v < 3){ @stats <- describe(@y,mean:T, stddev:T) } @loc <- if($v >= 2){ matrix(argvalue($2 ,"argument 2","real matrix")) }else{ matrix(@stats$mean) } @scale <- if($v == 3){ matrix(argvalue($3,"argument 3","real matrix")) }else{ matrix(@stats$stddev) } if (min(dim(@scale)) > 1){ error("argument 2 not REAL vector or row vector") } if (min(dim(@loc)) > 1){ error("argument 2 not REAL vector or row vector") } if (isscalar(@loc)){ @loc <- rep(@loc,@p) } elseif(length(@loc) != @p){ error("length($3) != ncols($1)") } if (isscalar(@scale)){ @scale <- rep(@scale,@p) } elseif(length(@scale) != @p){ error("length($3) != ncols($1)") } if (anymissing(@loc)){ if (!isdefined(@stats)){ @stats <- describe(@y, mean:T, stddev:T) } @J <- ismissing(@loc) @loc[@J] <- @stats$mean[@J] delete(@J) } if (anymissing(@scale)){ if (!isdefined(@stats)){ @stats <- describe(@y, mean:T, stddev:T) } @J <- ismissing(@scale) @scale[@J] <- @stats$stddev[@J] delete(@J) } if(isdefined(@stats)){ delete(@stats) } (delete(@y,return:T) - vector(delete(@loc,return:T))')/\ vector(delete(@scale,return:T))' %standardize% ===> covar <=== covar MACRO DOLLARS ) Macro to compute sample covariance matrix and mean vector ) Usage: ) stats <- covar(x) ) x REAL N by p matrix with no MISSING values ) stats structure(n:N, mean:ybar, covariance:S)\ ) N sample size = nrows(x) ) ybar 1 by p REAL row vector of sample variable means ) S p by p REAL sample variance matrix ) )) Version of 000531, uses tabs(y,covar:T,n:T,mean:T) # usage: d <- $S(x), where x is a REAL matrix with no MISSING values @R <- tabs(argvalue($1,"argument to $S","real matrix nonmissing"),\ mean:T,count:T,covar:T) @R <- structure(n:@R$count[1], mean:@R$mean', covariance:@R$covar) delete(@R, return:T) %covar% ===> groupcovar <=== groupcovar MACRO DOLLARS ) macro to compute group means and pooled sample covariance matrix ) Usage: ) result <- groupcovar(groups, y) ) groups factor or vector of positive integers ) y REAL matrixwith nrows(y) = length(groups) ) ) result structure(n:N, mean:ybar, covariance:S) ) N integer vector of group sizes, length(N) = g = ) max(groups) ) ybar g by ncols(y) matrix of group mean vectors, with ) MISSING values for any empty groups ) S pooled sample variance-covariance matrix )) 000601 new version using tabs(y,means:T,count:T,covar:T) )) 011024 fixed problem with factor groups with max(groups) < nfactors @groups <- vector(argvalue($1, "group numbers", "positive integer vector")) @n <- tabs(,@groups) if (max(@n) < 2){ error("no groups have 2 or more cases") } @R <- tabs(argvalue($2,"argument to $S","real matrix nonmissing"),\ delete(@groups, return:T), mean:T, covar:T) @ng <- length(@n) @cov <- 0 for(@i,1,@ng){ if (@n[@i] > 1){ @cov <-+ (@n[@i]-1)*matrix(@R$covar[@i,,]) } } @cov <-/ sum(@n[@n > 0] - 1) delete(@ng,@i) @R <- structure(n:delete(@n,return:T),means:@R$mean, \ covariance:delete(@cov,return:T)) delete(@R,return:T) %groupcovar% ===> distcomp <=== distcomp MACRO DOLLARS ) Macro to compute vector of generalized distances ) Usage: ) d <- distcomp(y [,center:cent] [,quadform:Q]) ) y REAL N by p data matrix with no MISSING values ) cent REAL vector or row vector of length p with no ) MISSING values; default: cent = ybar = sum(y)/N ) Q REAL positive definite symmetric p by p matrix, ) default: Q = S = ((y-ybar)' %*% (y-ybar))/(N-1) ) d REAL vector with length N ) ) d[i] is (y[i,] - cent)' %*% solve(Q) %*% (y[i,] - cent), ) When needed, tabs() or describe() are used to compute ybar and S ) )) Version of 000826, uses tabs() instead of macro covar() )) 040105 added keywords 'center' and 'quadform' # usage: d <- $S(y) where y is a REAL matrix with no MISSING values @y <- matrix(argvalue($1, "data matrix", "real matrix nonmissing")) @cent <- keyvalue($K, "cent*","nonmissing real matrix") @Q <- keyvalue($K,"quad*","nonmissing real square") @p <- ncols(@y) if (!isnull(@cent)){ if (!isvector(@cent)) { if (nrows(@cent) != 1) { error("center not column or row vector") } @cent <- @cent' } if (length(@cent) != @p){ error("length(center) != ncols(y)") } } if (isnull(@Q)){ @Q <- if (isnull(@cent)){ @stats <- tabs(@y,covar:T,mean:T) @cent <- @stats$mean delete(@stats,return:T)$covar } else { tabs(@y,covar:T) } } else { if (nrows(@Q) != @p){ error("quadform matrix has wrong dimensions") } @testval <- max(abs(vector(@Q - @Q'))) if (@testval > 1e-10*max(abs(vector(@Q)))) { error("quadratic form matrix not symmetric") } if (delete(@testval, return:T) != 0){ @Q <- (@Q + @Q')/2 } @vals <- eigenvals(@Q) if (anytrue(@vals[1] <= 0, @vals[@p]/@vals[1] <= 1e-12)) { error("quadratic form matrix not positive definite") } delete(@vals) if (isnull(@cent)) { @cent <- describe(y, mean:T) } } @y <-- delete(@cent,return:T)' @dsq <- vector((@y * (@y %/% delete(@Q,return:T))) %*% rep (1,@p)) if (haslabels(@y)){ setlabels(@dsq, getlabels(@y,1)) } delete(@p, @y) delete(@dsq, return:T) %distcomp% ===> chiqqplot <=== chiqqplot MACRO DOLLARS ) Macro to make a chisquare or sqrt(chisquare) QQ plot of generalized ) distance of residuals from 0 ) usage: ) chiqqplot(dsq, df [,sqrt:T] [, symbol:sym]) ) dsq REAL vector or matrix of squared distances >= 0 ) df REAL scalar > 0 ) or ) chiqqplot(y [,sqrt:T] [, symbol:sym]) ) y REAL data matrix with no MISSING values from which ) a vector dsq of squared generalized distances from ) the mean vector and df = ncols(y). ) ) Ordered values of dsq are plotted against equally spaced quantiles ) of chi-squared on df degrees of freedom. When dsq is a matrix, each ) column is plotted ) ) With 'sqrt:T', sqrt(dsq) is plotted against sqrt(chi-squared quantiles) ) ) With symbol:0, when ncols(d) = 1, the symbols are the case numbers, and ) when ncols(d) > 1, the symbols are the column numbers. Otherwise, ) symbol:sym works as for plot(). ) ) The usual graphics keywords such as 'xlab', 'ylab' and 'title' may also ) be used ))version 000601 #$S(d, p [, sqrt:T] [,symbols:T] [,graphics keyword phrases]) if ($v > 2){ error("usage: $S(y [keyword phrases]) or $S(d, df [keyword phrases]") } if ($v == 1){ # need to compute vector of distances if (!ismacro(covar)){ getmacros(covar,quiet:T,printname:F) } if (!ismacro(distcomp)){ getmacros(distcomp,quiet:T,printname:F) } @d <- matrix(argvalue($1,"data","nonmissing real matrix")) @df <- ncols(@d) @d <- distcomp(@d) }else{ # arg 1 is matrix of distance; arg2 is d.f. @d <- matrix(argvalue($1,"distances","nonmissing real matrix")) @df <- argvalue($2,"degrees of freedom","positive number") } @sqrt <- keyvalue($K,"sqrt","TF",default:F) @n <- nrows(@d) @m <- ncols(@d) @J <- grade(@d) @ch <- vector("\1","\2","\3","\4","\5","\6","\7","\10")[1 + run(0,@m-1)%%8] @symbols <- keyvalue($K,"symbol*","matrix",default:@ch) if (!anytrue(ischar(@symbols),\ alltrue(isreal(@symbols), sum(vector(@symbols!=floor(@symbols))) == 0))){ error("Value of 'symbols' not CHARACTER or integer") } if (alltrue(isscalar(@symbols,real:T),@symbols == 0)){ @symbols <- if (@m == 1){ @J }else{ run(@m)' } } @d <- sort(@d) @vals <- invchi((run(@n) - .5)/@n,@df) if (@sqrt){ @d <- sqrt(@d) @vals <- sqrt(@vals) @what <- "Chi" @ylab <- "Distances" }else{ @what <- "Chi-squared" @ylab <- "Squared distances" } @xlab <- paste(@what,"(",@df,") probability points",sep:"") @title <- paste(@what, "Q-Q plot of $1") plot(delete(@vals,return:T), delete(@d,return:T),\ symbols:delete(@symbols,return:T), $K,\ xlab:delete(@xlab,return:T),ylab:delete(@ylab,return:T),\ title:delete(@title,return:T),xmin:0, ymin:0) %chiqqplot% ===> rmvnorm <=== rmvnorm macro dollars ) Macro to compute multivariate normal sample ) Usage: ) y = rmvnorm(n , p [,mu]) ) n REAL integer > 0 ) p REAL integer > 0 ) mu nonMISSING REAL row or column vector of length p, ) default is mu = rep(0,p) ) y random REAL n by p matrix with rows i.i.d ) MVN(mu, I_p) ) y = rmvnorm(n, sigma [,mu]) ) sigma positive p by p definite symmetrix matrix with p > 1 ) y random REAL n by p matrix with rows i.i.d ) MVN(mu, sigma) ) ) Use mu + sigma*rnorm(n) to generate a univariate sample ) )) Version 991120, C. Bingham, kb@stat.umn.edu )) 030406 fixed bug when mu not an argument #$S(n,sigma [,mu]) @n <- argvalue($1,"sample size","positive count") @sigma <- argvalue($2,"covariance matrix","nonmissing real square") if (isscalar(@sigma)){ @p <- argvalue(@sigma,"dimension","positive count") @sigma <- NULL }else{ if(max(abs(vector(@sigma-@sigma')/vector(@sigma))) > 1e-12){ error("Covariance matrix not symmetric") } @sigma <- cholesky(@sigma, nonposok:T) if (isnull(@sigma)){ error("covariance not positive definite") } @p <- nrows(@sigma) } @mu <- if ($v > 2){ matrix(argvalue($3,"mean vector","nonmissing real matrix")) }else{ rep(0,@p)' } if (min(dim(@mu)) > 1){ error("mu not row or column vector") } if(length(@mu) != @p){ error("mean vector length != nrows(sigma)") } @y <- matrix(rnorm(@n*@p),@n) if (!isnull(@sigma)){ @y <- @y %*% @sigma } delete(@n,@sigma,@p) delete(@y,return:T) + vector(delete(@mu, return:T))' %rmvnorm% ===> mvngen <=== mvngen MACRO ) alias for rmvnorm() to maintain backward compatibility if (!ismacro(rmvnorm)){ getmacros(rmvnorm, silent:T) } rmvnorm($0) %mvngen% ===> cumpillai <=== cumpillai macro ) alias for cumtrace(tr,fh,fe,p,pillai:T ...) to compute ) probabilities for Pillai's trace test )) 031230 written by C. Bingham, kb@umn.edu #$S(tr, fh, fe, p [,upper:T] [,nterms:m] [,all:T]) if (!ismacro(cumtrace)){ getmacros(cumtrace, silent:T) } cumtrace($0,pillai:T) %cumpillai% ===> cumtrace <=== cumtrace macro dollars ) Macro to compute cumulative distribution of MANOVA trace tests ) T_0^2/fe = trace(solve(E,H)) = sum(vals) (Hotelling) ) V/(fe+fh) = trace(solve(E+H,H)) = sum(vals/(1+vals)) (Pillai) ) where vals is the vector of eigenvalues of H relative to E. ) ) Usage: ) cumP <- cumtrace(tr, fh, fe, p [,upper:T] [,nterms:m or useF:T]) ) cumP <- cumtrace(tr, fh, fe, p, pillai:T [,upper:T] [,nterms:m or useF:T]) ) allF <- cumtrace(tr, fh, fe, p [,pillai:T] [,upper:T], useF:T, all:T) ) allP <- cumtrace(tr, fh, fe, p [,pillai:T] [,upper:T], all:T) ) tr Positive REAL scalar, vector, matrix or array with ) no MISSING values; see below. ) fh, fe, p Positive integer scalars, vectors, matrices or arrays ) pillai:T Argument 1 is trace(solve(H+E,H)) = V/(fe+fh) where V = ) Pillai statistic; otherwise it's trace(solve(E,H)) = ) T_0^2/fe, T_0^2 = Hotelling statistic ) m 1, 2, 3 or 4, number of terms of series used; ignored ) with all:T; illegal with useF:T ) useF:T Use approximate F-distribution; illegal with nterms:m ) cumP REAL scalar or vector, matrix or array of probabilities ) with dimensions matching those any non-scalar tr ) fh, fe, or p. ) allF structure(P:cumP, f:fstat, df1:dfnum, df2:dfdenom) ) allP structure(P:cumPStr, terms:termStr) ) cumPStr structure(P_1:cumP1, P_2:cumP2, P_3:cumP3, P_4:cumP4) ) termStr structure(Term_1:t1, Term_2:t2, Term_3:t3, Term_4:t4) ) ) All non scalar tr, fh, fe or p must have identical dimensions ) ) Without pillai:T (or with pillai:F), ) tr = T_0^2/fe = trace(solve(E) %*% H)) = sum(vals) ) where T_0^2 is Hotelling's generalized T^2 ) ) With pillai:T, ) tr = V/(fe+fh) = trace(solve(E+H) %*% H) = sum(vals/(1 + vals)) ) where V = Pillai's trace statistic ) ) H and E are p by p MANOVA hypothesis and error matrices with degrees ) of freedom fh and fe and vals are the eigenvalues of H relative to E ) (eigenvalues of solve(E,H)) ) ) With upper:T, upper tail probability is computed (usually what is ) wanted) ) ) With neither useF:T or nterms:m, 1 to 4 terms of Fujikoshi's series ) is used, the choice depending on the size of the terms, except ) when min(fh,p) = 1, when the exact value based on F is computed. ) ) The first term of Fujikoshi's series is the usual chi-squared ) approximation, cumchi(mj*tr, fh*p [,upper:T]), where ) mj = m2 = fe - p - 1, without pillai:T ) mj = m3 = fh + fe, with pillai:T ) The j-th term of the series is O(1/mj^(j-1)). ) ) With nterms:m, probabilities are computed using m terms of Fujikoshi's ) series, even when min(p,fh) = 1 ) ) With all:T, but without useF:T, the component P of structure ansP ) contains the probabilities computed with 1, 2, 3 and 4 terms of ) Fujikoshi's series; component terms contains the actual terms ) themselves. ) ) With useF:T, probabilities are computed from approximations based ) on the F-distribution which are used in SAS and SPSS which are ) exact when min(p,fh) = 1. With all:T, the returned structure ) contains the F-statistics and their degrees of freedom in addition ) to the probabilities. Except when min(p,fh) = 1, the probabilities ) appear to be less accurate than those from Fujikoshi's series. ) ) Examples ) Cmd> list(Y,groups) # response matrix and factor ) groups REAL 50 FACTOR with 4 levels ) Y REAL 50 4 ) ) Cmd> manova("Y = groups",silent:T) ) ) Cmd> h <- matrix(SS[2,,]);fh <- DF[2] ) ) Cmd> e <- matrix(SS[3,,]);fe <- DF[3] ) ) Cmd> p <- ncols(Y); vector(fh,fe,p) ) (1) 3 46 4 ) ) Cmd> trhotelling <- trace(solve(e,h)); trhotelling ) (1) 0.77 ) ) Cmd> trpillai <- trace(solve(h+e,h)); trpillai ) (1) 0.48115 ) ) Cmd> cumtrace(trhotelling,fh,fe,p,upper:T) # default ) (1) 0.0047263 ) ) Cmd> cumtrace(trpillai,fh,fe,p,pillai:T,upper:T) ) (1) 0.013864 ) ) Cmd> cumtrace(trhotelling,fh,fe,p,upper:T,all:T) ) component: P ) component: P_1 ) (1) 0.0016117 [usual chi-squared approximaition] ) component: P_2 ) (1) 0.003814 ) component: P_3 ) (1) 0.0047398 ) component: P_4 ) (1) 0.0047263 [value returned by default usage above] ) component: terms ) component: Term_1 ) (1) 0.0016117 ) component: Term_2 ) (1) 0.0022023 ) component: Term_3 ) (1) 0.00092578 ) component: Term_4 ) (1) -1.3487e-05 ) ) Cmd> cumtrace(trpillai,fh,fe,p,upper:T,pillai:T,nterms:1) ) (1) 0.023213 [usual chi-squared approximation] ) ) Cmd> cumtrace(trhotelling,fh,fe,p,upper:T,useF:T) # F-approximation ) (1) 0.0031321 [F-approximation probability] ) ) Cmd> cumtrace(trpillai,fh,fe,p,upper:T,useF:T,all:T) ) component: P ) (1) 0.081057 [F-approximation probability] ) component: f ) (1) 1.6707 ) component: df1 ) (1) 12 ) component: df2 ) (1) 125 ) ) From a simulation using 100,000 samples, the P-value for ) trhotelling is estimated as 0.00451 and for trpillai ) is estimated to be 0.01431. These are quite close to ) the values computed by the default usage of cumtrace() and ) less close to the value computed with useF:T ) ) See Fujikoshi, Yasunori (1973) Asymptotic formulas for the ) distributions of three statistics for multivariate linear ) hypothesis, Ann. Inst. Statist. Math. 25 423-437 )) 031015 written by C. Bingham, kb@umn.edu )) 031017 corrected bug )) 031229 added all:T, other minor changes )) 031230 added pillai:T to handle Pillai's trace )) 041018 added order m^-3 term )) always uses n terms with nterms:n )) returns terms and partial sums with all:T and without useF:T )) Otherwise, without useF:T, for items with s > 1, )) finds appropriate truncation of Fujikoshi's series )) 041026 Fixed bug that forced using just 1 term #$S(tr, fh, fe, p [,pillai:T] [,upper:T] [,nterms:m] [,all:T]) @trace <- argvalue($1, "$1", "positive") @fh <- argvalue($2, "fh", "positive integer") @fe <- argvalue($3, "fe", "positive integer") @p <- argvalue($4, "p", "positive integer") @upper <- keyvalue($K,"up*","TF",default:F) @nterms <- keyvalue($K,"nterm*","positive count") @useF <- keyvalue($K,"useF","TF",default:F) @pillai <- keyvalue($K,"pillai*","TF",default:F) @all <- keyvalue($K,"all","TF",default:F) if (@useF) { if (!isnull(@nterms)){ error("'useF:T' illegal with 'nterms:m'") } } else { if (isnull(@nterms)){ @nterms <- -4 } elseif (@all) { print("WARNING: 'nterms' ignored with 'all:T'",macroname:T) } elseif (@nterms > 4) { error("nterms > 4 not implemented") } } @names <- vector("trace","fh","fe","p") @args <- structure(@trace,@fh,@fe,@p) for (@i,1,3) { for(@j,@i+1,4) { @ai <- @args[@i] @aj <- @args[@j] if (!isscalar(@ai) && !isscalar(@aj) &&\ anytrue(ndims(@ai) != ndims(@aj),\ sum(dim(@ai) != dim(@aj)) != 0)) { error(paste(@names[@i],"and",@names[@j],"dimension mismatch")) } } } @dim <- 1 for (@i,1,4) { if (!isscalar(@args[@i])) { @dim <- dim(@args[@i]) break } } delete(@i,@j,@ai,@aj,@names,@args) @N <- prod(@dim) if (prod(@dim) > 1) { if (isscalar(@trace)) { @trace <- array(rep(@trace,@N),@dim) } if (isscalar(@fh)) { @fh <- array(rep(@fh,@N),@dim) } if (isscalar(@fh)) { @fe <- array(rep(@fe,@N),@dim) } if (isscalar(@p)) { @p <- array(rep(@p,@N),@dim) } } delete(@N,@dim) @df1 <- @p*@fh @s <- (@fh + @p - abs(@fh - @p))/2 # min(p,fh) @JF <- if (@useF){ rep(T,length(@trace)) } elseif (@all || @nterms < 0) { rep(F,length(@trace)) } else { vector(@s == 1) } @JG <- !@JF if (max(@JF) > 0) { # use F approximation like SAS and SPSS #@m <- (abs(@fh - @p) - 1)/2 @n <- (@fe - @p - 1)[@JF]/2 if (!@pillai) { # 2*(s*n+1)*trace(solve(e,h)) / (s^2*(2*m+s+1)) # ~= F(s*(2*m+s+1), 2*(s*n+1)) # and s*(2*m+s+1) = fh*p @df2 <- 2*(@s[@JF]*@n + 1) @f <- ((@df2/@df1[@JF])*(@trace/@s)[@JF]) } else { @df2 <- @s[@JF]*(2*@n + @s[@JF] + 1) @f <- (@df2/@df1[@JF])*(@trace/(@s - @trace))[@JF] } @probsF <- cumF(@f,@df1[@JF],@df2,upper:@upper) if (@all) { @result <- structure(P:@probsF, f:@f, df1:@df1, df2:@df2) } delete(@n,@s,@f,@df2) } if (max(@JG) > 0) { # use Fujikoshi series @mj <- if (!@pillai) { (@fe - @p - 1)[@JG] # m2 } else { (@fh + @fe)[@JG] # m3 } @x <- @mj * @trace[@JG] @f <- @df1[@JG] @P0 <- @probsG <- cumchi(@x, @f,upper:@upper) # term 1 if (@all || abs(@nterms) > 1) { if (@all) { @P <- structure(P_1:@P0, P_2:NULL, P_3:NULL, P_4:NULL) @terms <- structure(Term_1:@P0, Term_2:NULL, Term_3:NULL, \ Term_4:NULL) } @gam <- (@p + @fh + 1)[@JG] # Compute term 2 @P2 <- cumchi(@x, @f + 2, upper:@upper) @P4 <- cumchi(@x, @f + 4, upper:@upper) @term <- (@f*@gam)*(@P4 - 2*@P2 + @P0)/(4*@mj) if (@pillai) { @term <- -@term } if (@all) { @P[2] <- @P[1] + @term @terms[2] <- @term } elseif (@nterms > 1){ @probsG <-+ @term } elseif (@nterms < 0) { @term2 <- @term } if (@all || abs(@nterms) > 2) { # Compute term 3 #Fujikoshi has following in the form (p+fh+1) + p*fh + 2 @g <- ((@p+1)*(@fh+1))[@JG] + 2 @h0 <- (3*@f - 8)*@gam^2 + 4*@g @h1 <- 12*@f*@gam^2 @h2 <- 6*(3*@f + 8)*@gam^2 @h3 <- 4*((3*@f + 16)*@gam^2 + 4*@g) @h4 <- (3*@f + 24)*@gam^2 + 12*@g # h's should sum to 0 @P6 <- cumchi(@x, @f + 6,upper:@upper) @P8 <- cumchi(@x, @f + 8,upper:@upper) @term <- @f*(@h0*@P0 - @h1*@P2 + @h2*@P4 - \ @h3*@P6 + @h4*@P8)/(96*@mj^2) if (@all) { @P[3] <- @P[2] + @term @terms[3] <- @term } elseif (@nterms > 2) { @probsG <-+ @term } elseif (@nterms < 0) { @term3 <- @term } if (@all || abs(@nterms) > 3) { # Compute term 4 # order fe^-3 terms @g0 <- @gam*((@f-4)^2*@gam^2 + \ 4*(@f-4)*@gam + 4*(@f-4)*(@f+2)) @g1 <- 2*@f*@gam*@h0 @g2 <- @f*@gam*((15*@f+40)*@gam^2 + 4*@gam + 4*(@f+2)) @g3 <- 4*@gam*(5*(@f+4)^2*@gam^2 + \ 4*(@f+4)*@gam + \ 4*(@f+4)*(@f+2)) @g4 <- 5*rational(@f,vector(144,40,3))*@gam^3 + \ (44*@f+432)*@gam^2 + \ 4*rational(@f,vector(288,130,11))*@gam + 96*(@f + 2) @g5 <- 2*(rational(@f,vector(288,56,3))*@gam^3 + \ 4*(5*@f + 72)*@gam^2 + \ 4*rational(@f,vector(216,82,5))*@gam +\ 96*(@f+2)) @g6 <- rational(@f,vector(160,24,1))*@gam^3 + \ 4*(3*@f + 56)*@gam^2 + \ 4*rational(@f,vector(184,62,3))*@gam +\ 96*(@f+2) # @g0 - @g1 + @g2 - @g3 + @g4 - @g5 + @g6 = 0 @P10 <- cumchi(@x, @f + 10,upper:@upper) @P12 <- cumchi(@x, @f + 12,upper:@upper) @term <- @f*(@g0*@P0 - @g1*@P2 + @g2*@P4 - @g3*@P6 +\ @g4*@P8 - @g5*@P10 + @g6*@P12)/(384*@mj^3) if (@pillai) { @term <- -@term } if (@all) { @P[4] <- @P[3] + @term @terms[4] <- @term @result <- structure(P:delete(@P,return:T),\ terms:delete(@terms, return:T)) } elseif (@nterms > 3) { @probsG <-+ @term } elseif (@nterms < 0) { # Work backwards, adding terms and earlier terms # if a term is smaller than preceding term @L <- vector(abs(@term) <= abs(@term3)) if (max(@L) > 0){ # include term 4 @probsG[@L] <- @probsG[@L] + @term[@L] } @L <- @L || vector(abs(@term3) <= abs(@term2)) if (max(@L) > 0){ # include term 3 @probsG[@L] <- @probsG[@L] + @term3[@L] } @L <- @L || vector(abs(@term2) <= (abs(1-2*@probsG) + 1)/2) if (max(@L) > 0){ # include term 2 @probsG[@L] <- @probsG[@L] + @term2[@L] } @L <- vector(@probsG < 0 || @probsG > 1) if (max(@L) > 0) {#if use term 1 if out of range @probsG[@L] <- @P0[@L] } delete(@term2,@term3,@L) } delete(@g0,@g1,@g2,@g3,@g4,@g5,@g6,@P10,@P12) } delete(@h0,@h1,@h2,@h3,@h4,@g) delete(@P6,@P8) } delete(@gam, @term, @P2, @P4) } delete(@P0,@x,@mj) } if (!@all) { @result <- @trace # ensures dimensions are OK if (max(@JG) > 0) { @result[@JG] <- delete(@probsG, return:T) } if (max(@JF) > 0) { @result[@JF] <- delete(@probsF, return:T) } } delete(@trace,@fh,@fe,@upper,@nterms, @useF, @pillai, @df1) delete(@result,return:T) %cumtrace% ===> cumwilks <=== cumwilks macro dollars ) Macro to compute cumulative distribution of MANOVA LR ) statistic lambda* = det(E)/det(H+E) = 1/prod(1+vals) ) It uses either an F-approximation (exact for min(p,fh) <=2) or ) a series in powers of 1/m1^2, m1 = fe - (p - fh + 1)/2. ) Usage: ) cumP <- cumwilks(lambdastar, fh, fe, p [,upper:T] [,useF:F] [,nterms:m]) ) allF <- cumwilks(lambdastar, fh, fe, p, all:T, [,upper:T]) ) allP <- cumwilks(lambdastar, fh, fe, p, all:T, useF:F [,upper:T]) ) lambdastar Positive REAL scalar, vector, matrix or array with ) no MISSING values whose elements should be ) det(E)/det(H+E) = 1/prod(1+vals) = (LR)^(2/N), where ) vals are the relative eigenvalues of H relative to E ) and LR is the unadjusted likelihood ratio statistic ) -N*log(lambdastar) is the usual -2*log(lambda) form ) fh, fe, p Positive integer scalars, vectors, matrices or arrays, ) designating hypothesis and error degrees of freedom and ) the dimension ) useF:F Probabilities are computed from a series involving ) chi-square rather than from F ) nterms:m m = 1, 2, 3, 4 = number of terms of series involving ) chi-square; implies useF:F; not legal with useF:T ) all:T Output is structure with additional information ) cumP REAL scalar or vector, matrix or array of lower (default) ) pr upper tail probabilities with dimensions matching ) those of any non-scalar lambdastar, fh, fe, or p. ) allF structure(P:cumP, f:fstats, df1:numdf, df2:denomdf) ) allP structure(P:cumPStr, terms:termStr) ) cumPStr structure(P_1:cumP1, P_2:cumP2, P_3:cumP3, P_4:cumP4) ) termStr structure(Term_1:t1, Term_2:t2, Term_3:t3, Term_4:t4) ) ) All non scalar lambdastar, fh, fe or p must have identical dimensions ) ) With upper:T, upper tail probability is computed (usually not ) what's wanted since small lamdastar is evidence against H_0) ) ) With useF:T or without either useF:F or nterms:m ) P is computed from Rao's formula approximating the distribution ) of (1 - lambdastar^(1/t))/lambdastar^(1/t) by that of a multiple of F, ) where t is computed from fh, fe and p. This is exact when ) s = min(p,fh) <= 2. ) ) With all:T, ans = structure(P:P,f:fstat,df1:df1,df2:df2) ) ) With nterms:m ) P is computed from m terms of a series computing probabilities ) for x = -m1*log(lambdastar), m1 = fe - (p - fh + 1)/2. ) ) With m = 1, x is treated as chi-squared(p*fh), the standard ) approximation. With m = 2, 3 or 4, the chi-squared probability is ) adjusted by m-1 additional terms of magnitudes O(1/m1^2), O(1/m1^4), ) and O(1/m1^6). ) ) The series is always used with nterms:m, even when s = min(p,fh) <= 2 ) ) With useF:F but without nterms:m ) Probablilities are computed from F when s = min(p,fh) <= 2; otherwise ) they are computed from the series, using from 1 to 4 terms, the number ) chosen depending on the relative sizes of the terms ) ) With all:T, allP = structure(P:cumPStr, terms:termStr) where cumPstr = ) structure(P_1:cumP1, P_2:cumP2, P_3:cumP3, P_4:cumP4) and termStr = ) structure(Term_1:t1, Term_2:t2, Term_3:t3, Term_4:t4), where cumP1,..., ) cumP4 are the 1, ...,4 term approximations and t1,...,t4 are the ) terms themselves. ) ) Example, ) Cmd> manova("Y=groups",silent:T) ) ) Cmd> fh <- DF[2]; fe <- DF[3], p <- ncols(Y) ) ) Cmd> vector(fh,fe,p) ) (1) 3 46 4 ) ) Cmd> h <- matrix(SS[2,,]); e <- matrix(SS[3,,]) ) ) Cmd> lambdastar <- det(e)/det(e + h);lambdastar ) (1) 0.54836 ) ) Cmd> cumwilks(lambdastar,fh,fe,p) # Rao's F-approximation ) (1) 0.0077151 ) ) Cmd> cumwilks(1/prod(1+releigenvals(h,e)),fh,fe,p) # same ) (1) 0.0077151 ) ) Cmd> cumwilks(lambdastar,fh,fe,p,all:T) ) component: P ) (1) 0.0077151 ) component: f ) (1) 2.4232 ) component: df1 ) (1) 12 ) component: df2 ) (1) 114.06 ) ) Cmd> cumwilks(lambdastar,fh,fe,p,useF:F) # Rao/Box series ) (1) 0.0077152 ) ) Cmd> cumwilks(lambdastar,fh,fe,p,nterms:1) # use 1st term (chisq) ) (1) 0.0076322 ) ) Cmd> cumwilks(lambdastar,fh,fe,p,nterms:2) # use two terms ) (1) 0.0077147 ) ) Cmd> cumwilks(lambdastar,fh,fe,p,useF:F,all:T) ) component: P ) component: P_1 ) (1) 0.0076322 ) component: P_2 ) (1) 0.0077147 ) component: P_3 ) (1) 0.0077152 ) component: P_4 ) (1) 0.0077152 ) component: terms ) component: Term_1 ) (1) 0.0076322 ) component: Term_2 ) (1) 8.2563e-05 ) component: Term_3 ) (1) 4.444e-07 ) component: Term_4 ) (1) 2.1917e-09 ) ) See the following references ) Fujikoshi, Yasunori (1973) Asymptotic formulas for the ) distributions of three statistics for multivariate linear ) hypothesis, Ann. Inst. Statist. Math. 25 423-437 ) Box, G.E.P (1949) A general distribution theory for a class ) of likelihood criteria, Biometrika 36, 317-346 ) Rao, C.R. (1948) Tests of significance in multivariate analysis ) Biometrika 35, 467-480 )) 031015 written by C. Bingham, kb@umn.edu )) 031229 minor modifications and change of defaults; it no longer )) recognizes 'series:T' but does recognize 'useF:F' )) With nterms:m and all:T, result has all three terms )) 031230 Except with all:T, always uses exact distribution based on )) F when min(p,fh) = 1 )) 041019 Corrected formulas involved in O(1/m2^4) term from Box )) 041020 Added O(1/m2^3) term from Box and modified variable names )) to be closer to Box; @a2, @a4, @a6 are his alpha_2, alpha_4, )) alpha_6; @a2 was previously @b1 and @a4 + @a2^2/2 was @b2 )) With all:T and useF:F, result is structure with probabilities )) with O(1/m2^2), O(1/m2^4), O(1/m2^6), and O(1/m2^8) )) accuracy and the O(1), O(1/m2^2), O(1/m2^4), O(1/m2^6) terms )) With nterms:m, always uses series even for min(p.fh) <= 2 #$S(lambdastar, fh, fe, p [,upper:T] [,chisq:T [,nterms:m]]) @lamstar <- argvalue($1, "lambdaStar", "positive") @fh <- argvalue($2, "fh", "positive integer") @fe <- argvalue($3, "fe", "positive integer") @p <- argvalue($4, "p", "positive integer") @upper <- keyvalue($K,"upper","TF",default:F) @useF <- keyvalue($K,"useF","TF") @nterms <- keyvalue($K,"nterm*","positive count") @all <- keyvalue($K,"all","TF",default:F) if (!isnull(@nterms)){ if (@all) { print("WARNING: 'nterms' ignored with 'all:T'",macroname:T) } if (isnull(@useF)){ @useF <- F } elseif (@useF) { error("useF:T illegal with nterms:m") } if (@nterms > 4) { error("nterms > 4 not implemented") } } elseif (isnull(@useF)){ @useF <- T } elseif (!@useF) { @nterms <- (2*@all - 1)*4 } @names <- vector("lambda*","fh","fe","p") @args <- structure(@lamstar,@fh,@fe,@p) for (@i,1,3) { # Check dimensions of 1st 4 arguments for(@j,@i+1,4) { @ai <- @args[@i] @aj <- @args[@j] if (!isscalar(@ai) && !isscalar(@aj) &&\ anytrue(ndims(@ai) != ndims(@aj),\ sum(dim(@ai) != dim(@aj)) != 0)) { error(paste(@names[@i],"and",@names[@j],"dimension mismatch")) } } } @dim <- 1 for (@i,1,4) { if (!isscalar(@args[@i])) { @dim <- dim(@args[@i]) break } } delete(@i,@j,@ai,@aj,@names,@args) @N <- prod(@dim) if (prod(@dim) > 1) { # replicated any scalar args @lamstar <- if (isscalar(@lamstar)) { rep(@lamstar,@N) } else { vector(@lamstar) } @fh <- if (isscalar(@fh)) { rep(@fh,@N) } else { vector(@fh) } @fe <- if (isscalar(@fh)) { rep(@fe,@N) } else { vector(@fe) } @p <- if (isscalar(@p)) { rep(@p,@N) } else { vector(@p) } } @Jbad <- @fe < @p @N1 <- @N - sum(@Jbad) if (@N1 != @N) { if (@N1 == 0) { error("all values have fe < p") } print(paste("WARNING:",@N - @N1,"values have fe < p"),macroname:T) @lamstar <- @lamstar[!@Jbad] @fh <- @fh[!@Jbad] @fe <- @fe[!@Jbad] @p <- @p[!@Jbad] } @df1 <- @p*@fh @m1 <- @fe - (@p - @fh + 1)/2 @JF <- if (@useF) { rep(T,@N1) } elseif (@all || @nterms > 0) { rep(F,@N1) } else { # useF:F without nterms:m vector(@p <= 2 || @fh <= 2) # cases where F is exact } @JG <- !@JF # @JF is selector for values where F will be used # @JG is selector for values where chi-square or series will be used if (max(@JF) > 0) { # use Rao's formula @df1F <- @df1[@JF] @t <- rep(1,@N1) @J <- (@p != 1 && @fh != 1)[@JF] if (max(@J) > 0) { @t[@J] <- sqrt((@df1F^2 - 4)[@J]/(@p^2 + @fh^2 - 5)[@JF][@J]) } @df2F <- (@m1[@JF]*@t - @df1F/2 + 1) @fstats <- (@df2F/@df1F)*(@lamstar[@JF]^(-1/@t) - 1) @probsF <- cumF(@fstats,@df1F,@df2F,upper:!@upper) delete(@t) if (@all) { if (@N == @N1) { @fstats <- array(@fstats, @dim) @probsF <- array(@probsF, @dim) @df1F <- array(@df1F, @dim) @df2F <- array(@df2F, @dim) } else { @tmp <- @fstats @fstats <- array(rep(?,@N),@dim) @fstats[!@Jbad] <- @tmp @tmp <- @probsF @probsF <- @fstats @probsF[!@Jbad] <- @tmp @tmp <- @df1F @df1F <- @fstats @df1F[!@Jbad] <- @tmp @tmp <- @df2F @df2F <- @fstats @df2F[!@Jbad] <- delete(@tmp,return:T) } @result <- structure(P:@probsF,f:@fstats, df1:@df1F,df2:@df2F) } delete(@fstats,@df1F,@df2F,@J) } if (max(@JG) > 0) { # use chi-squared approximation or Fujikoshi series @f <- @df1[@JG] @m1 <- @m1[@JG] @fh <- @fh[@JG] @p <- @p[@JG] @x <- -@m1 * log(@lamstar[@JG]) @probsG <- cumchi(@x, @f,upper:!@upper) if (@all){ @tmp <- array(rep(?,@N),@dim) @tmp[!@Jbad] <- @probsG @P <- structure(P_1:@tmp, P_2:NULL, P_3:NULL, P_4:NULL) @terms <- structure(Term_1:@tmp, Term_2:NULL, Term_3:NULL, \ Term_4:NULL) } if (abs(@nterms) > 1 || @all) { #@Pk is cumchi(@x, @f + k, upper:!@upper),k=0,2,4,... @P0 <- @probsG @P4 <- cumchi(@x, @f + 4,upper:!@upper) @a2 <- @f*(@p^2 + @fh^2 - 5)/48 @term <- @a2*(@P4 - @P0)/@m1^2 if (@nterms < 0) { @term2 <- @term } else { @probsG <-+ @term } if (@all) { @tmp[!@Jbad] <- @probsG @P[2] <- @tmp @tmp[!@Jbad] <- @term @terms[2] <- @tmp } if (@all || abs(@nterms) > 2) { @P8 <- cumchi(@x, @f + 8,upper:!@upper) # Rao 1948 incorrectly had 150 in place of 159 @a4 <- @f*(3*(@p^4+@fh^4) + 10*@f^2 - 50*(@p^2+@fh^2) + 159)/1920 @term <- ((@a4 + @a2^2/2)*(@P8-@P0) - @a2^2*(@P4-@P0))/@m1^4 if (@nterms < 0) { @term3 <- @term } else { @probsG <-+ @term } if (@all) { @tmp[!@Jbad] <- @probsG @P[3] <- @tmp @tmp[!@Jbad] <- @term @terms[3] <- @tmp } if (@all || abs(@nterms) > 3) { @P12 <- cumchi(@x, @f + 12,upper:!@upper) @a6 <- @f*(3*(@p^6 + @fh^6) - \ 105*(@p^4 + @fh^4) + \ 1113*(@p^2 + @fh^2) + \ (21*@p^2 - 350 + 21*@fh^2)*@f^2 - \ 2995)/16128 @term <- (@a6*(@P12 - @P0) + \ 2*@a2*@a4*(@P12 - @P8 - @P4 + @P0) + \ @a2^3*(@P12 - 3*@P8 + 3*@P4 - @P0)/6)/@m1^6 if (@nterms < 0) { # use from 1 to 4 terms # Trim off terms that are larger than their predecessor @term4 <- @term @L <- abs(@term4) <= abs(@term3) if (max(@L) > 0) { @probsG[@L] <- @probsG[@L] + @term4[@L] } @L <- @L || abs(@term3) <= abs(@term2) if (max(@L) > 0) { @probsG[@L] <- @probsG[@L] + @term3[@L] } @L <- @L || abs(@term2) <= (abs(1-2*@P0) + 1)/2 if (max(@L) > 0) { @probsG[@L] <- @probsG[@L] + @term2[@L] } @L <- @probsG < 0 || @probsG > 1 if (max(@L) > 0) {#if use term 1 if out of range @probsG[@L] <- @P0[@L] } delete(@term2, @term3, @term4) } else { @probsG <-+ @term } if (@all) { @tmp[!@Jbad] <- @probsG @P[4] <- @tmp @tmp[!@Jbad] <- @term @terms[4] <- delete(@tmp, return:T) } delete(@P12, @a6) } delete(@P8, @a4) } delete(@P0, @P4, @a2, @term) } if (@all) { @result <- structure(P:delete(@P, return:T), \ terms:delete(@terms, return:T)) } delete(@x, @f) } delete(@fh,@fe,@p,@upper,@df1,@m1,@useF,@nterms) if (!delete(@all,return:T)) { @tmp <- rep(0,@N1) if (max(@JF) > 0) { @tmp[@JF] <- delete(@probsF, return:T) } if (max(@JG) > 0) { @tmp[@JG] <- delete(@probsG, return:T) } @result <- array(rep(?,@N),@dim) @result[!@Jbad] <- delete(@tmp, return:T) } delete(@lamstar,@JF,@JG,@N,@N1,@dim) delete(@result,return:T) %cumwilks% ===> seqF <=== seqF MACRO DOLLARS ) Macro to compute sequential F-statistics for MANOVA hypothesis ) Usage: ) result <- seqF(hypTerm [,errTerm ][,model:Model] [,order:Jorder]) ) hypTerm scalar CHARACTER term name or integer term number ) errTerm scalar CHARACTER term name or integer term number; ) default = number of last term in model ) Model CHARACTER scalar specifying a manova() model; ) default is STRMODEL ) Jorder Length p vector of positive integers containing a ) permutation of 1, 2, ..., p = dimension specifying ) variable order for computing sequential F-statistics; ) default is Jorder = run(p) ) result structure(f:fstats,fh:fh,fe:fe), f, fh and fe ) vectors of F-statistics, hypothesis DF and error ) DF,each of length ncols(y), where y is the matrix ) of response variables ) )) Written by C. Bingham (kb@umn.edu) )) 011022: modified so errTerm could be omitted )) 031021: added keyword 'order' )) 031022: use labels on response matrix )) 031228: minor changes )) 041011: more minor changes # $S(hypterm [, errterm] [,model:Model] [,order:Jorder]) if ($v == 0 || $v > 2) { error("Usage is $S(hypterm [, errterm] [,model:Model])") } @model <- keyvalue($K,"model","string") if (isnull(@model)){ if (!isscalar(STRMODEL,char:T)){ error("No model for $S and STRMODEL doesn't exist") } if (!isvector(DF, real:T) || !isreal(SS) || !isvector(TERMNAMES, char:T)){ @model <- STRMODEL } } @Jorder <- keyvalue($K,"order","positive integer vector") @hypterm <- argvalue($1,"hypothesis term") if (!isscalar(@hypterm,real:T) && !isscalar(@hypterm,char:T)){ error("argument 1 to $S must be positive integer or term name") } if ($v > 1){ @errterm <- argvalue($2,"error term") if (!isscalar(@errterm,real:T) && !isscalar(@errterm,char:T)){ error("argument 2 must be positive integer or error term name") } } else { @errterm <- 0 } @pushed <- !isnull(@model) && isfunction(pushmodel) if (@pushed) { @pushed <- pushmodel(canpush:T) } if (!isnull(@model)){ if (@pushed){ pushmodel() } manova(@model,silent:T) } @nterms <- length(TERMNAMES) if (ischar(@hypterm)) { @hypterm <- match(@hypterm,TERMNAMES,0) if (@hypterm == 0){ error(paste("$1 is not the name of a term in the model")) } } elseif (!isreal(@hypterm, positive:T, integer:T) || @hypterm > @nterms){ error(paste("$1 not positive integer between 1 and",@nterms)) } if (ischar(@errterm)){ @errterm <- match(@errterm,TERMNAMES,0) if (@errterm == 0){ error(paste("$2 is not the name of a term in the model")) } } elseif (@errterm < 0 || !isreal(@errterm,integer:T) || @errterm > @nterms){ error(paste("$2 not an integer between 1 and",@nterms)) } elseif (@errterm == 0){ @errterm <- @nterms } else { argvalue(@errterm, "positive count") } @e <- matrix(SS[@errterm,,]) # e @hpluse <- matrix(SS[@hypterm,,]) + @e # h + e @p <- nrows(@e) @fh <- rep(DF[@hypterm], @p) @fe <- DF[@errterm] - run(@p) + 1 if (@pushed) { popmodel() } delete(@model) @labs <- if (@p > 1 && haslabels(@e)) { getlabels(@e,1) } else { NULL } @fstats <- rep(0, @p) if (!isnull(@Jorder)) { if(anytrue(length(@Jorder) != @p,sum(sort(@Jorder) != run(@p)) != 0)) { error("value of 'order' not permutation of run(dimension)") } @hpluse <- @hpluse[@Jorder,@Jorder] @e <- @e[@Jorder,@Jorder] if (isnull(@labs)) { @labs <- getlabels(vector(rep(0,@p),labels:"y")) } @labs <- @labs[@Jorder] } if (!isnull(@labs)) { setlabels(@fstats, @labs) setlabels(@fh, @labs) setlabels(@fe, @labs) } delete(@hypterm,@errterm,@nterms,@labs) for(@i,1,@p){ if (@i > 1){ @hpluse <- swp(@hpluse, @i-1) @e <- swp(@e, @i-1) } @fstats[@i] <- (@fe[@i]/@fh[@i])*(@hpluse[@i,@i] - @e[@i,@i])/@e[@i,@i] } delete(@hpluse,@e,@p,@i) structure(f:delete(@fstats,return:T),fh:delete(@fh,return:T),\ fe:delete(@fe,return:T)) %seqF% ===> discrim <=== discrim MACRO DOLLARS ) Macro to compute coefficients and offsets for multi-group linear ) classification, assuming Multivariate normal populations with equal ) variances matrices. ) Usage: ) result <- discrim(groups, y) ) groups vector of positive integers defining membership ) in g non-empty groups ) y N by p REAL matrix with no MISSING values, with ) N = nrows(groups) ) result structure(coefs:L, addcon:C), L p by g matrix, ) C 1 by g vector where linear discriminant function for ) group j is y' %*% L[,j] + C[j] ) )) 000825 uses tabs() instead of macro covar # usage: $S(groups, y), N by 1 vector groups, N by p matrix y if($v != 2){ error("usage is $S(groups, y)",macroname:F) } @groups <- argvalue($1,"group numbers","positive integer vector") @y <- argvalue($2, "data", "nonmissing matrix") @g <- max(@groups) # number of groups @p <- ncols(@y) # number of variables @n <- nrows(@y) # number of cases if (length(@groups) != @n){ error("length of group index differs from number of cases") } if (@n <= @p){ error("number of cases not larger than dimension") } @varlabs <- if (haslabels(@y)){ getlabels(@y,2) }else{ "@" } @facname <- "$1" if (!isname(@facname) || match("@*",@facname,0,exact:F) != 0){ @facname <- "Group " } @ybar <- matrix(rep(0,@p*@g),@p,labels:structure(@varlabs,@facname)) @spooled <- matrix(rep(0,@p*@p),@p,labels:structure(@varlabs,@varlabs)) @stuff <- tabs(@y,@groups,covar:T,count:T,mean:T) @N <- vector(@stuff$count[,1]) @ybar <- matrix(@stuff$mean',labels:structure(@varlabs,@facname)) @spooled <-\ matrix(sum((@N-1)*matrix(@stuff$covar,length(@N))),@p,\ labels:structure(@varlabs,@varlabs))/sum(@N-1) delete(@stuff,@N,@y,@p) @coefs <- solve(delete(@spooled, return:T), @ybar) # p x g @addcon <- -0.5*sum(@coefs * delete(@ybar, return:T)) # 1 x g structure(coefs:delete(@coefs,return:T), addcon:delete(@addcon,return:T)) %discrim% ===> jackknife <=== jackknife MACRO dollars ) Macro to do jackknife validation of linear discrimination ) Usage ) probs <- jackknife(groups,y [,prior:P]) ) groups vector of positive integers defining membership ) in g = max(groups) groups, each with at least two ) cases ) y n by p REAL data matrix with n = length(groups) ) P REAL positive vector of g "prior probabilities", ) default = rep(1/g,g) ) ) probs n by g+1 matrix ) ) Column j of probs, 1 <= j <= g contains the estimated posterior ) probabilities of group j for each case ) Column g+1 of probs contains the leave-one-out classification of ) each case ) )) 991120 use down dating formula for computing coefficients )) Version of 991120 # usage: $S(groups, y [,prior:P]), factor groups, matrix y, positive vector P if ($v != 2 || $k > 1){ error("Usage: $S(groups, y [,prior:P]), factor groups, matrix y, positive vector P",\ macroname:F) } @groups <- argvalue($1, "group numbers","positive integer vector") @y <- argvalue($2, "data","nonmissing matrix") @g <- max(@groups) @P <- keyvalue($K,"prior","positive vector", default:rep(1,@g)/@g) if (length(@P) != @g){ error("number of prior probabilities != number of groups") } @n <- tabs(,@groups) if(min(@n) < 2){ error("min(group sizes) < 2") } @N <- nrows(@y) @p <- ncols(@y) @fe1 <- @N - 1 - @g #compute mean vectors and pooled error matrix @ybar <- matrix(rep(0,@p*@g),@p)#,labels:structure(@varlabs,@facname)) @E <- matrix(rep(0,@p*@p),@p)#,labels:structure(@varlabs,@varlabs)) for(@i,run(@g)){ @yi <- @y[@groups == @i,] @ybar1 <- sum(@yi)/@n[@i] @ybar[,@i] <- @ybar1 @yi <-- @ybar1 @E <-+ @yi %c% @yi } @n1 <- @n - 1 @nn1 <- (@n*@n1) # sqrt(n(n-1)) @probs <- matrix(rep(0,@N*@g),@g) #g by N @coefs <- solve(delete(@E, return:T),hconcat(@ybar,dmat(@p,1))) @Einv <- @coefs[,-run(@g)] # p by p @coefs <- @coefs[,run(@g)] # p by g for(@i,run(@N)){ @ybar1 <- @ybar @j <- @groups[@i] @yi <- @y[@i,] @d <- (@yi' - @ybar[,@j])/@n1[@j] @ybar1[,@j] <- @ybar[,@j] - @d @tmp <- @Einv %*% @d @c <- 1 - @nn1[@j] * @tmp %c% @d @coefs1 <- @coefs + @nn1[@j] * @tmp * (@d %c% @coefs)/@c @coefs1[,@j] <- @coefs1[,@j] - @tmp/@c @probs[,@i] <- @fe1*vector(@yi %*% @coefs1 - .5*sum(@ybar1 * @coefs1)) } @probs <- @P * exp(@probs - @probs[1,]) @probs <- vconcat(@probs/sum(@probs),grade(@probs)[@g,])' delete(@groups, @y, @n, @g, @N, @n1, @nn1, @fe1, @tmp, @i, @j,\ @Einv,@P,@ybar,@coefs,@ybar1,@coefs1) delete(@probs, return:T) %jackknife% discrimquad macro dollars ) Macro to compute coefficients for quadratic discrimination, ) assuming multivariate normal. ) ) Usage: ) quadfn <- discrimquad(groups, y) ) groups length N vector of positive integers defining ) samples from g populations of p-dimensional vectors. ) It is an error if the smallest sample size <= p ) y N by p REAL data matrix with no MISSING elements ) quadfn structure(Q:q, L:l, addcon:c, grandmean:ybar) ) q structure of g p by p REAL matrices, ) l structure of g length p REAL vectors, ) c length g REAL vector ) ybar length p REAL vector of grand means ) ) If x is a length p vector to be classified, the quadratic score for ) group j is ) qs[j] = (x-ybar)' %*% q[,j] %*% (x-ybar) + (x-ybar)' %*% l[,j] + c[j] ) If P is a length g vector of prior probabilities and qs is the vector ) of scores, the vector of posterior probabilities can be computed as ) P*exp(qs-K)/sum(P*exp(qs-K)), where K is a scalar such as max(qs) ) chosen to the exponentials are computable. ) )) Macro written 991119 by C. Bingham, kb@stat.umn.edu )) Version of 000826 uses tabs() instead of covar #$S(groups,y) @groups <- argvalue($1,"group index","positive integer vector") @y <- argvalue($2, "data", "nonmissing matrix") @p <- ncols(@y) @g <- max(@groups) @N <- nrows(@y) if (length(@groups) != @N){ error("length of group index differs from number of cases") } @stats <- tabs(@y, @groups,count:T,mean:T,covar:T) @n <- vector(@stats$count[,1]) # group sizes @means <- @stats$mean # row i is mean of group i @covars <- @stats$covar # @covars[i,,] is s for group i delete(@stats) if (min(@n) <= @p){ error("All groups must have at least p+1 cases") } @varlabs <- if (haslabels(@y)){ getlabels(@y,2) }else{ "@" } delete(@y) @facname <- "$1" if (!isname(@facname) || match("@*",@facname,0,exact:F) != 0){ @facname <- "Group" } @facname <- paste(@facname," ",sep:"") @grandmean <- vector(sum(@n*@means)/@N,labels:@varlabs) #make structures and row vector to fill @l <- @q <- strconcat(split(rep(0,@g)'),labels:@facname) @addcon <- matrix(rep(0,@g)',labels:structure("@",@facname)) for(@i,run(@g)){ @s <- matrix(@covars[@i,,],labels:structure(@varlabs,@varlabs)) @offset <- vector(@means[@i,] - @grandmean');#ll(@s,@ll,@q,@addcon,@offset) @q[@i] <- -.5*solve(@s) @l[@i] <- vector(solve(@s,@offset),labels:@varlabs) @addcon[@i] <- -.5*@l[@i] %c% @offset - .5*log(det(@s)) } delete(@groups,@p,@g,@i,@N,@varlabs,@facname,@means,@n) structure(Q:delete(@q,return:T),L:delete(@l,return:T),\ addcon:delete(@addcon,return:T),grandmean:delete(@grandmean,return:T)) %discrimquad% probsquad macro dollars ) Macro to compute posterior probabilities based on a quadratic ) discriminant functions with coefficients computed by discrimquad() ) ) Usage: ) probs <- probsquad(x,Str [, prior:P]) ) ) x REAL m by p matrix or length p vector with no missing ) values ) Str structure(Q:q,L:l,addcon:c,grandmean:ybar) as returned by ) macro discrimquad(). ) q structure of g square matrices ) l structure of g vectors of length p ) c 1 by g row vector ) ybar length p vector ) P length g positive REAL vector proportional to ) prior probabilities; default is rep(1,g) ) probs REAL m by g matrix of posterior probabilities for ) rows of x or length g vector of posterior ) probabilities for vector x ) If x is a vector and p = length(grandmean) > 1, x must have length p ) Otherwise x must be a matrix with p columns and probabilities are ) for each row of x. ) )) 991119, written by, C. Bingham, kb@stat.umn.edu )) 011121 modified method of computing probabilities to avoid overflow )) problem in computing exp(...) #$S(x,structure(Q,L,addcon,grandmean) [,prior:p]) @x <- argvalue($1,"data","real matrix nonmissing") @stuff <- argvalue($2,"quadratic discriminant info","structure") if (!isstruc(@stuff[1]) || !isstruc(@stuff[2]) ||\ !isvector(@stuff[3]',real:T) ||\ !isvector(@stuff[4],real:T)){ error("Argument 2 to $S has wrong form") } @p <- nrows(@stuff[1][1]) # dimension @g <- ncomps(@stuff[1]) # number of groups @P <- keyvalue($K,"prior","positive vector",default:rep(1,@g)) if (!isvector(@x) && ncols(@x) != @p ||\ isvector(@x) && @p > 1 && length(@x) != @p){ error("Wrong number of variables in data") } if(length(@P) != @g){ error("Wrong length prior probability vector") } @n <- if(!isvector(@x) || @p == 1){ nrows(@x) }else{ @x <- vector(@x)' 1 } @x <- @x' - @stuff$grandmean #transpose x and center it # The following is arcane, but it takes advantage of MacAnova's ability # to do matrix arithmetic with structures of matrices # @stuff$Q and @stuff$L are both structures with g components # @stuff$addcon is a 1 by g matrix (row vector) @probs <- (matrix(vector(sum(@x * (@stuff$Q %*% @x + @stuff$L))),@n) +\ @stuff$addcon)' if (haslabels(@stuff$Q)){ @probs <- matrix(@probs,labels:structure(getlabels(@stuff$Q),"@")) } delete(@stuff,@x,@p,@g,@n) @probs <- @P*exp(@probs - max(@probs)) @probs <- (@probs/sum(@probs))' delete(@P) delete(@probs,return:T) %probsquad% ===> hotellval <=== hotellval MACRO DOLLARS )Macro to compute a one-sample Hotelling's T^2 statistic )Usage: ) t_sq <- hotellval(x) ) x n by p REAL matrix with no MISSING elements ) t_sq REAL scalar Hotellings T^2 for testing H_0:mu = 0 ) ) result <- hotellval(x, pval:T) ) result structure(hotelling:t_sq, pvalue:P), where P = ) P value computed from F-distribution )) 030114. written by C. Bingham (kb@stat.umn.edu) )) 030926 P-value calculation corrected # $S(x [, pval:T]) @x <- matrix(argvalue($1,"x","real matrix nonmissing")) @pval <- keyvalue($K,"pval*","TF",default:F) @n <- nrows(@x) @p <- ncols(@x) if (@n <= @p){ error("nrows(x) <= ncols(x)") } @xbar <- sum(@x)/@n @x <-- @xbar @result <- @n * (@n - 1) * @xbar %*% solve(@x %c% @x,@xbar') if (@pval){ @fe <- @n - 1 @fe1 <- @fe - @p + 1 @P <- 1 - cumF(@fe1*@result/(@fe*@p),@p,@fe1) @result <- strconcat(hotelling:@result, pvalue:delete(@P,return:T)) delete(@fe, @fe1) } delete(@x,@n,@p,@xbar,@pval) delete(@result,return:T) %hotellval% ===> hotell2val <=== hotell2val MACRO DOLLARS )Macro to compute a one-sample Hotelling's T^2 statistic )Usage: ) t_sq <- hotell2val(x1, x2) ) x1 n1 by p REAL matrix with no MISSING elements ) x2 n2 by p REAL matrix with no MISSING elements ) t_sq REAL scalar Hotellings T^2 for testing ) H_0: mu1 = mu2 ) result <- hotellval(x1, x2, pval:T) ) result structure(hotelling:t_sq, pvalue:P), where P = ) P value computed from F-distribution )) 030114 Written by C. Bingham (kb@stat.umn.edu) )) 030926 P-value calculation corrected # $S(x1, x2 [, pval:T]) @x1 <- argvalue($1,"x1","real matrix nonmissing") @x2 <- argvalue($2,"x2","real matrix nonmissing") @pval <- keyvalue($K,"pval*","TF",default:F) @n1 <- nrows(@x1) @n2 <- nrows(@x2) @p <- ncols(@x1) if (ncols(@x2) != @p){ error("ncols(x1) != ncols(x2)") } @fe <- @n1 + @n2 - 2 if (@fe <= @p - 1){ error("n1 + n2 < ncols(x1) + 1") } @xbar1 <- describe(@x1,mean:T) @xbar2 <- describe(@x2,mean:T) @d <- @xbar1 - @xbar2 @x1 <-- @xbar1' @x2 <-- @xbar2' @vhat <- (1/@n1 + 1/@n2)*(@x1 %c% @x1 + @x2 %c% @x2)/@fe delete(@x1,@x2,@xbar1,@xbar2,@n1,@n2) @result <- @d %c% solve(@vhat, @d) if (@pval){ @P <- 1 - cumF((@fe - @p + 1)*@result/(@fe*@p),@p,@fe-@p+1) @result <- strconcat(hotelling:@result,pvalue:delete(@P,return:T)) } delete(@pval,@p,@vhat,@d) delete(@result,return:T) %hotell2val% ===> dastepsetup <=== dastepsetup MACRO DOLLARS ) Setup stepwise discriminant ) Usage: ) dastepsetup(Model [,allin:T or in:ins] [,silent:T]) ) Model glm model, usually of the form "y = groups" ) ins LOGICAL vector with length = ncols(y) or ) vector of unique positive integers <= ncols(y) ) ) dastepsetup([allin:T or in:ins] [,silent:T]), with no model, ) is equivalent to ) dastepsetup(STRMODEL [,allin:T or in:ins] [,silent:T]) ) ) dastepsetup() initializes an "invisible" structure _DASTEPSTATE and, ) except with silent:T, prints F-to-enter with P-values ) ) With allin:T, stepsetup() starts with all dependent variables "in". ) ) When ins is an integer vector, stepsetup() starts with the indicated ) variables "in". in:run(ncols(y)) has same effect as allin:T ) ) When ins is a LOGICAL vector, in:ins has the same effect as ) in:run(ncols(y))[ins], that is T indicates a variable is "in" ) in:rep(T,ncols(y)) has the same effect as allin:T ) )) 000530 Written by C. Bingham, kb@umn.edu )) 031226 pushmodel() and popmodel() invoked to preserve current )) model information and variables )) 041206 Value of 'in' can be vector of unique integers between 1 and p #$S(model [,silent:T] [,in:logVector or allin:T]) @model <- if ($v > 0){ $01 }else{ NULL } @model <- if (!isnull(@model)){ argvalue(@model,"model","string") }elseif(isscalar(STRMODEL,char:T)){ STRMODEL }else{ error("No model provided and STRMODEL does not exist") } if(match("*=*",@model, 0, exact:F) == 0){ error(paste("Illegal model \"",@model,"\"",sep:"")) } @allin <- keyvalue($K,"allin","TF",default:F) @instart <- keyvalue($K,"in","vector nonmissing") if (@allin && !isnull(@instart)){ error("keyword 'in' illegal with 'allin:T'") } @nocon <- match("*-1",@model,0, exact:F) != 0 ||\ match("*- 1",@model,0, exact:F) != 0 @pushed <- isfunction(pushmodel) if (@pushed) { @pushed <- pushmodel(canpush:T) if (@pushed){ pushmodel() } } manova(@model,silent:T) #compute side effect variables @nvars <- dim(SS)[2] if (!isnull(@instart)){ if (islogic(@instart)){ if(length(@instart) != @nvars){ error("Length of LOGICAL 'instart' != number of variables") } } else { if (!isvector(@instart,positive:T,integer:T) || \ length(unique(@instart)) != length(@instart) || \ max(@instart) > @nvars) { error("Value of 'in' not vector of positive integers <= ncols(y)") } @instart <- match(run(@nvars),@instart,0) != 0 } @allin <- (sum(@instart) == @nvars) }else{ @instart <- rep(@allin,@nvars) } @nterms <- length(DF) - 1 @fe <- DF[@nterms+1] @fh <- sum(DF[run(2,@nterms)]) @e <- matrix(SS[@nterms+1,,]) @hpluse <- matrix(sum(SS[-1,,])) if (@nocon){ @fh <-+ DF[1] @hpluse <-+ matrix(SS[1,,]) } if(haslabels(SS)){ @varnames <- getlabels(SS,2) if(!isname(@varnames[1])){ delete(@varnames) } } if (delete(@pushed,return:T)){ popmodel() } if (!isvector(@varnames)){ @varnames <- getlabels(vector(rep(0,@nvars),labels:"X")) } @Fs <- matrix(rep(0, 2*@nvars), 2, labels:structure(vector("F", "P"),\ @varnames)) if (!@allin){ if (sum(@instart) > 0){ @hpluse <- swp(@hpluse, run(@nvars)[@instart]) @e <- swp(@e, run(@nvars)[@instart]) @fe <-- sum(@instart) } }else{ @hpluse <- solve(@hpluse) @e <- solve(@e) @fe <-- @nvars } @history <- if(sum(@instart) > 0){ run(@nvars)[@instart] }else{ NULL } setlabels(@hpluse, structure(@varnames,@varnames)) setlabels(@e, structure(@varnames,@varnames)) @fe <- vector(@fe,@fe+1,labels:vector("enter", "remove")) _DASTEPSTATE <- structure(model:delete(@model,return:T),\ hpluse:delete(@hpluse,return:T),\ e:delete(@e,return:T),\ in:vector(delete(@instart,return:T),labels:@varnames),\ F:delete(@Fs,return:T), fh:delete(@fh,return:T), fe:delete(@fe,return:T),\ history:delete(@history,return:T) ) delete(@nocon,@nvars,@varnames,@allin) if(!ismacro(dastepstatus)){ getmacros(dastepstatus,quiet:T,printname:F) } dastepstatus($K) %dastepsetup% ===> dasteplook <=== dasteplook MACRO DOLLARS ) Macro to acess elements of _DASTEPSTATE ) Usage: ) answers <- dasteplook(name_1, name_2, ...) ) name_k one of 'model', 'hpluse', 'e', 'in', 'F', ) 'fh', 'fe', 'history' and 'varnames' ) answers scalar, vector or matrix (with one name) ) or k-component structure (with k names) ) answers <- dastelook(all) ) answers structure identical to _DASTEPSTATE; does ) not contain variable names ) ) Argument 'varnames' requests a vector of variable names, extracted as ) the labels of _DASTEPSTATE$in. ) ) Except for 'varnames', the names recognized are names of _DASTEPSTATE ) components ) )) Version 000826 )) 031224 exact match of names is no longer required; use enough to specify )) the name uniquely. Also, the variable names are returned when )) 'varnames' is an argument # $(name1, name2, ...) names model, hpluse, e, in, F, fh, fe, history, varnames if ($k > 0){ error("no keywords allowed as arguments") } @names <- $A for(@i,run($v)){ # strip off double quotes, if any if (match("\"*\"",@names[@i], 0,exact:F) != 0){ @names[@i] <- <<@names[@i]>> } } @all <- match("all",@names,0) != 0 if(delete(@all,return:T)){ if ($v > 1){ error("No other arguments are allowed with 'all'") } delete(@names,@i) @result <- _DASTEPSTATE } else { @compnames <- compnames(_DASTEPSTATE) @ncomps <- length(@names) @result <- split(rep(0,@ncomps)') for(@i,1,@ncomps){ if (match("var*",@names[@i],0,exact:F) > 0) { @names[@i] <- "varnames" @result[@i] <- getlabels(_DASTEPSTATE$in) } else { if (@names[@i] == "h" || @names[@i] == "f") { error(paste("\"",@names[@i],"\" doesn't fully specify name",sep:"")) } @k <- match(paste(@names[@i],"*",sep:""),@compnames,0,exact:F) if(@k == 0){ error(paste("\"",@names[@i],"\" not a legal name",sep:"")) } @names[@i] <- @compnames[@k] @result[@i] <- _DASTEPSTATE[@k] delete(@k) } } @result <- if (delete(@ncomps,return:T) == 1){ @result[1] } else { strconcat(@result,compnames:@names) } delete(@compnames,@names,@i) } delete(@result,return:T) %dasteplook% ===> dastepstatus <=== dastepstatus MACRO DOLLARS ) Macro to update the F-statistics and P-values in _DASTEPSTATE ) Usage: ) dastepstatus() ) Updates the F-statistics and P-values in _DASTEPSTATE and prints ) a summary of the updated information ) dastepstatus(silent:T) ) Updates _DASTEPSTATE but prints nothing ) ) In both usages, dastepstatus() returns as value an updated copy of ) _DASTEPSTATE as an invisible variable )) Version 000530 # $S([silent:T]) @silent <- keyvalue($K,"silent","TF",default:F) @hpluse <- _DASTEPSTATE$hpluse @e <- _DASTEPSTATE$e @nvars <- nrows(@e) @Fs <- _DASTEPSTATE$F @in <- _DASTEPSTATE$in @out <- !@in @u <- diag(@hpluse) @v <- diag(@e) @fh <- _DASTEPSTATE$fh @fe <- _DASTEPSTATE$fe if (sum(@out) > 0){ @Fs[1,@out] <- (@fe[1]/@fh)*(@u[@out]/@v[@out] - 1) @Fs[2,@out] <- 1 - cumF(@Fs[1,@out],@fh,@fe[1]) } if (sum(@in) > 0){ @Fs[1,@in] <- ((@fe[2])/@fh)*(@v[@in]/@u[@in] - 1) @Fs[2,@in] <- 1 - cumF(@Fs[1,@in],@fh,@fe[2]) } @ii <- match("F",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- delete(@Fs,return:T) delete(@ii) if (!delete(@silent,return:T)){ @nin <- sum(@in) print(paste("Model: \"",_DASTEPSTATE$model,"\"",sep:"")) if(@nin > 0) { print(paste("F(",@fh, ",",@fe[2],") to delete",sep:"")) print(_DASTEPSTATE$F[, @in], header:F) }else{ print("No variables are \"in\"") } print("") if(@nin < @nvars) { print(paste("F(",@fh,",",@fe[1],") to enter",sep:"")) print(_DASTEPSTATE$F[, !@in], header:F) }else{ print("All variables are \"in\"") } print("") delete(@nin) } delete(@fe,@fh,@in,@nvars) _DASTEPSTATE # invisible so it won't print but can be assigned %dastepstatus% ===> daentervar <=== daentervar MACRO DOLLARS ) Macro to take one or more steps forward, entering one or more ) response variables into the current stepwise discriminant ) analysis ) Usage: ) daentervar(var1 [,var2 ...] [,silent:T]) ) var1, var2 ... quoted or unquoted names or numbers of variables ) to be entered, or CHARACTER or integer scalars ) specifying the variables ) ) Without silent:T, a summary of the state of the stepwise process ) is printed )) Version 000530 )) 031224 Modified to use davarid() to decode arguments )) 041011 Load davarid() silently if not present0 # $S(var1 [,var2 ...] [,silent:T]), varj a variable name or number if ($v == 0 || $k > 1){ error("usage: $S(var1 [,var2 ...] [,silent:T])", macroname:F) } if (!isstruc(_DASTEPSTATE)){ error("You must run dastepsetup(Model) before using $S()",macroname:F) } @varnames <- getlabels(_DASTEPSTATE$in) @nvars <- length(@varnames) @vars <- $A if(!ismacro(dastepstatus)){ getmacros(dastepstatus,quiet:T,printname:F) } if(!ismacro(davarid)){ getmacros(davarid,quiet:T,printname:F) } for (@j, 1, length(@vars)){ @var <- @vars[@j] if(match("*:*",@var,0,exact:F) != 0){ next } # not a keyword phrase @newvar <- davarid(@var) if (@newvar[1] == 0) { error(@newvar[2]) } if (_DASTEPSTATE$in[@newvar[1]]){ error(paste("Variable \"",@newvar[2],"\" already 'in'",sep:"")) } @newvar <- @newvar[1] delete(@var) @ii <- match("hpluse",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$hpluse, @newvar) @ii <- match("e",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$e, @newvar) @in <- _DASTEPSTATE$in @in[@newvar] <- T @ii <- match("in",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- @in @ii <- match("fe",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- _DASTEPSTATE$fe - 1 @ii <- match("history",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- vector(_DASTEPSTATE$history,@newvar) delete(@ii,@newvar,@in) dastepstatus($K) } #for (@j, 1, length(@vars)) delete(@vars, @j, @varnames) _DASTEPSTATE %daentervar% davarid MACRO DOLLARS ) Utility macro used by daentervar() and daremovevar() to identify ) a variable (column) in a data matrix. It extracts information from ) variable _DASTEPSTATE which must be defined ) Usage: ) Id <- davarid(id) ) id CHARACTER scalar or quoted string identifying ) a column of the response matrix in stepwise ) variable selection. ) Id structure(varnumber:k, varname:Name) where k is ) an integer and Name is a CHARACTER scalar, the ) k-th label of IN = _DASTEPSTATE$in ) Forms for id: ) "name" "name" matches the label j of IN ) Id$varnumber = j, Id$varname = "name" ) "3" Id$varname = 3, Id$varname = label 3 of IN ) "J" J an integer scalarvariable ; Id$varnumber = J, ) Id$varname = label J of IN ) "x3" Id$varname = 3, Id$varname = label 3 of IN; if ) "x3" matches a label of IN, the label number takes ) precedence. In place of "x", any name not starting ) with '@' can be used. )) 031227 written by C. Bingham, kb@umn.edu #$S(id), id a CHARACTER scalar @var <- argvalue($1,"arg 1", "string") if (match("\"*\"", @var, 0, exact:F) != 0){ @var <- <<@var>> #strip off quotes if present } if (!isstruc(_DASTEPSTATE)){ error("You must run dastepsetup(Model) before using $S()",macroname:F) } @nvars <- nrows(_DASTEPSTATE$in) @varnames <- getlabels(_DASTEPSTATE$in) @outvar <- match(@var, @varnames, 0) if (@outvar == 0) { # not a direct match if (isnumber(@var)) { @outvar <- <<@var>> if (!isscalar(@outvar,integer:T)) { @outvar <- -1000 } elseif (@outvar <= 0) { @outvar <-- 1 } } else { if (isname(@var)) { if (match("@*",@var,0,exact:F) == 0 && \ max(match(vector("*0","*1","*2","*3","*4","*5","*6","*7","*8","*9"),\ @var,0,exact:F)) > 0) { # arg may have form "v3", "x1" or something similar @outvar <- vecread(string:@var,silent:T,byfields:F) if (!isscalar(@outvar,integer:T)){ @outvar <- 0 } } elseif (isscalar(<<@var>>,integer:T)) { @outvar <- <<@var>> if (@outvar <= 0) { @outvar <-- 1 } } } if (alltrue(@outvar == 0, isscalar(<<@var>>, character:T))) { @var <- <<@var>> @outvar <- match(@var, @varnames, 0) if (alltrue(@outvar == 0,\ max(match(vector("*0","*1","*2","*3","*4","*5","*6","*7",\ "*8","*9"),\ @var,0,exact:F)) > 0)) { @outvar <- vecread(string:@var,silent:T,byfields:F) if (!isscalar(@outvar,integer:T)){ @outvar <- 0 } } } } } @var <- if (@outvar == 0) { paste($1,"does not specify a variable in the model") } elseif (@outvar == -1000) { @outvar <- 0 paste($1,"does not specify integer") } elseif (@outvar < 0) { @outvar <- 0 "Variable number <= 0" } elseif (@outvar > @nvars) { @outvar <- 0 "Variable number > number of variables" } else { @varnames[@outvar] } delete(@nvars,@varnames) structure(varnumber:delete(@outvar,return:T),varname:delete(@var,return:T)) %davarid% ===> daremovevar <=== daremovevar MACRO DOLLARS ) Macro to take one or more steps backward, removing a variable or ) several variables from the current stepwise disciminant analysis ) Usage: ) daremovevar(var1 [,var2 ...] [,silent:T]) ) var1, var2 ... names or numbers of variables to be removed ) or scalar variables containing names or numbers ) silent:T Suppress output ) ) Adapted from a macro by Gary Oehlert )) Version 000530 )) 031219 modification so that a varj can be a integer or character )) scalar variable # $S(var1 [,var2 ...] [,silent:T]), varj a variable name or number if ($v == 0 || $k > 1){ error("usage is $S(var1 [,var2 ...] [,silent:T])") } if (!isstruc(_DASTEPSTATE)){ error("You must run dastepsetup(Model) before using $S()",macroname:F) } @nvars <- nrows(_DASTEPSTATE$e) @varnames <- getlabels(_DASTEPSTATE$in) @vars <- $A if(!ismacro(dastepstatus)){ getmacros(dastepstatus,quiet:T,printname:F) } if(!ismacro(davarid)){ getmacros(davarid,quiet:T,printname:F) } for (@j, 1, length(@vars)){ @var <- @vars[@j] if(match("*:*",@var,0,exact:F) != 0){ next } # not a keyword phrase @outvar <- davarid(@var) if (@outvar[1] == 0) { error(@outvar[2]) } delete(@var) if (!_DASTEPSTATE$in[@outvar[1]]){ error(paste("Variable \"", @outvar[2],"\" not currently 'in'",sep:"")) } @outvar <- @outvar[1] @ii <- match("hpluse",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$hpluse, @outvar) @ii <- match("e",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- swp(_DASTEPSTATE$e, @outvar) @in <- _DASTEPSTATE$in @in[@outvar] <- F @ii <- match("in",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- @in @ii <- match("fe",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- _DASTEPSTATE$fe + 1 @ii <- match("history",compnames(_DASTEPSTATE)) _DASTEPSTATE[@ii] <- vector(_DASTEPSTATE$history,-@outvar) dastepstatus($K) } delete(@ii,@outvar, @varnames, @vars, @j, @nvars) _DASTEPSTATE %daremovevar% ===> dastepselect <=== dastepselect macro dollars ) Macro to perform automatic forward or backward stepwise variable select ) for discriminant analysis ) Usage: ) ins <- dastepselect(Model, alpha [, allin:T] [,updown:T] [,bonferroni:F]\ ) [,verbose:T or silent:T]) ) Model quoted string or CHARACTER scalar specifying a ) manova() model, usually of the form "y = a", where y ) is a REAL matrix and a is a factor ) alpha positive REAL scalar 1 < alpha < 50 or 0 < alpha < .5, ) significance level (in percent when alpha > 1) ) allin:T Start with all variables in and step backwards; ) otherwise, start with all variables out and step ) forwards ) updown:T Before entering a variable, check to see whether one ) can be removed. ) bonferroni:F Do not Bonferronize P-values ) quiet:T Suppress printing of action taken at each step and ) summary and make value returned "visible" ) silent:T Same as 'quiet:T' except warning messages are also ) suppressed; incompatible with 'quiet:F' ) ins Integer vector of variable numbers selected or ) NULL if no variable selected; "invisible" unless ) quiet:T ) ) A decision to enter or remove a variable is based on a P-value for an ) F-to-enter (forward step) or F-to-remove (backward). ) ) Unless bonferroni:F is an argument, P-values are Bonferronized by ) by multiplication by an integer K. ) ) On a forward step, K = number of "outs". ) On a backward step, K = (number of "outs") + 1. ) ) This choice ensures a variable that has just entered will not immediately ) be a candidate for removal, and a variable that has just been removed ) will not immediately be a candidate to enter. ) )) 031226 Written by C. Bingham, kb@umn.edu )) 041011 Warning message when no variables selected unless suppressed )) by new keyword phrase silent:T; 'quiet' replaces 'verbose' )) 041206 Added keyword 'in' to specify a starting position )) 041215 Corrected bug. It now uses F-statistic to choose the candidate )) for entering or removal rather than P-values. The problem was )) having multiple P's = 0, only one of which went with the largest F #$S(Model, alpha [, allin:T or in:logvec] [,updown:T] [,verbose:T or silent:T] [,bonferroni:F]) @model <- $01 @model <- if (!isnull(@model)){ argvalue(@model,"model","string") }elseif(isscalar(STRMODEL,char:T)){ STRMODEL }else{ error("No model provided and STRMODEL does not exist") } if(match("*=*",@model, 0, exact:F) == 0){ error(paste("Illegal model \"",@model,"\"",sep:"")) } @alpha <- argvalue($2,"alpha","positive number") if (@alpha >= 50) { error("alpha >= 50%") } if (@alpha > 1) { # treat alpha between 1 and 50 as percent @alpha <-/ 100 } elseif (@alpha >= .5) { error("alpha >= .5") } @allin <- keyvalue($K, "allin", "TF",default:F) @updown <- keyvalue($K, "updown","TF",default:F) @silent <- keyvalue($K, "silent","TF",default:F) @quiet <- keyvalue($K, "quiet","TF",default:@silent) @instart <- keyvalue($K,"in","logical vector nonmissing") if (@allin && !isnull(@instart)){ error("keyword 'in' illegal with 'allin:T'") } if (!@quiet && @silent){ error("'silent:T' can't be used with 'quiet:F'") } @bonfer <- keyvalue($K,"bonfer*","TF",default:T) if (!ismacro(dastepsetup)) { getmacros(dastepsetup,silent:T) } if (!ismacro(dasteplook)) { getmacros(dasteplook, silent:T) } if (@allin || @updown) { if (!ismacro(daremovevar)) { getmacros(daremovevar, silent:T) } } if (!@allin || @updown) { if (!ismacro(daentervar)) { getmacros(daentervar, silent:T) } } if (!isnull(@instart)){ dastepsetup(@model,in:@instart,silent:T) } else { dastepsetup(@model,allin:@allin,silent:T) } @in <- dasteplook(in) @names <- getlabels(@in) @what <- if (@bonfer) { "Bonferronized P-value-to-" } else { "P-value-to-" } @direction <- if (@allin) { -1 } else { 1 } while (@direction != 0) { @out <- !@in if (@direction < 0) { # back stepping if (sum(@in) == 0) { # all out, so we are done @direction <- 0 } else { @f_info <- dasteplook(F) @i <- grade(@f_info[1,@in]',up:T)[1] # see if we should delete a variable @pval <- @f_info[2,@in][@i] if (@bonfer) { @pval <-* sum(@out) + 1 } @direction <- if (@pval <= @alpha) { if (!@updown || @direction < -1) { # can't delete, so we are done 0 } else { # Try forward next time 2 # indicates failure to delete } } else { # delete variable and continue backwards @var <- @names[@in][@i] if (!@quiet) { print(paste("Removing ",@var,". ", @what, "remove = ",\ @pval,sep:"")) } daremovevar(@var, silent:T) -1 # indicates success at deleting } } } else { # forward stepping if (sum(@out) == 0) { # all are in so stop @direction <- 0 } else { @f_info <- dasteplook(F) @i <- grade(@f_info[1,@out]',down:T)[1] @pval <- @f_info[2,@out][@i] if (@bonfer) { @pval <-* sum(@out) } @direction <- if (@pval > @alpha) { if (!@updown || @direction > 1) { # can't enter so we are done 0 } else { # try forward next time -2 # indicates failure to enter } } else { @var <- @names[@out][@i] if (!@quiet) { print(paste("Entering ",@var,". ",@what,"enter = ",@pval,sep:"")) } daentervar(@var, silent:T) if (!@updown) { 1 } else { # Next time try to delete -1 } } } } @in <- dasteplook(in) } delete(@f_info,@i,@pval,@var,silent:T) @out <- !@in if (!@quiet) { if ((@allin || @updown) && sum(@in) > 0) { @pval <- max(vector(dasteplook(F)[2,@in])) if (@bonfer) { @pval <-* sum(@out) + 1 } print(paste("Largest ",@what,"remove = ",@pval," < ",@alpha,sep:"")) delete(@pval) } if ((!@allin || @updown) && sum(@out) > 0) { @pval <- min(vector(dasteplook(F)[2,@out])) if (@bonfer) { @pval <-* sum(@out) } print(paste("Smallest ",@what,"enter = ",@pval," > ",@alpha,sep:"")) delete(@pval) } } @names <- if (sum(@in) > 0) { @names[@in] } else { NULL } if (!@quiet) { if (!isnull(@names)) { print(paste("Variables selected:",paste(@names, sep:", "))) } else { print("No variables selected") } } @result <- if (!isnull(@names)) { vector(run(length(@in))[@in],labels:@names) } else { if (!@silent){ print("WARNING: no variables selected; result is NULL") } NULL } delete(@in,@out,@names,@bonfer,@alpha,@what,@silent) delete(@result, return:T, invisible:!delete(@quiet,return:T)) %dastepselect% ===> dascreen <=== dascreen macro dollars ) Macro to examine all possible subsets for best linear ) discrimination ) Usage: ) result <- dascreen(model [,mbest:m] [, method:Method] [,penalty:pen]\ ) [, forced:varlist] [,maxvar:maxv] [,silent:T]) ) model CHARACTER scalar of the form "x=a", where a is a factor ) x is a real matrix, and nrows(x) = nrows(a) ) m positive integer <= 2^p-1, default = 5 ) pen REAL scalar >= 0, used as multiplier of number of ) parameters in computing AIC; default is pen = 2 ) varlist vector of distinct integers > 0, a list of variables ) to be included in every subset considered ) maxv positive integer, default:15, specifying the maximum ) number of variables allowed, not counting those ) specified by 'forced:varlist'. If it is exceeded a ) warning is printed, suggesting a value for maxv. ) Method "aic" or "penaic", default is "aic" ) silent:T Suppress any warning messages ) result structure(subsets:S, criterion:Crit) ) S length p1 vector or a p1 by m matrix, with leading ) elements of each column the variable numbers retained ) and p1 = size of largest set ) Crit m vector of values of the criterion for each subset ) ) This actually works with any model with only one error term ) If CONSTANT is the first term it is not considered part of the model ) in computing the criterion ) ) dascreen() works by brute force, examining all 2^p1 - 1 non empty ) subsets of variables on the LHS of the model, where p1 = p - number of ) variables forced in. The amount of time goes up by about a factor of ) 2 for every additional response variable ) ) The default criterion is AIC = ) -N Log(det(H + E)/det(E) + pen*nparams, nparams = q*fh + q*(q+1)/2, ) where q = size of subset and pen = 2 or value of keyword 'penalty' ) ) See Yasunori Fujikoshi, Selection of variables for discriminant ) analysis in a high-dimensional case, Sankhya Series A, 64, (2002) ) 256-267 ) ))031127 written by C. Bingham, kb@umn.edu ))031227 added keywords maxvar, penalty, forced and silent ))041011 When mbest > 1, subset matrix is mbest by p1 ))041012 Changed default for mbest to 5, made sets correspond to )) columns of S rather than rows; added labels to S and Crit ))041220 Corrected computation of number of parameters following )) Fujikoshi 2002 )) Criterion for empty set is now computed when no variables )) are forced in # $S(model [,mbest:m] [,method:Method] [,penalty:pen] \ # [,forced:varlist] [,silent:T]) @model <- argvalue($1, "model", "string") @keys <- if ($k > 0){ structure($K,notakey:NULL) } else { structure(notakey:NULL) } @mbest <- keyvalue(@keys,"mbest","positive count",default:5) @penalty <- keyvalue(@keys,"pen*","nonneg number",default:2) @maxvar <- keyvalue(@keys,"max*","positive count",default:15) @method <- keyvalue(@keys,"meth*","string",default:"aic") @forced <- keyvalue(@keys,"forc*","postive integer vector") @silent <- keyvalue(@keys,"silent","TF",default:F) delete(@keys) @Methods <- vector("aic","penaic") @jmeth <- match(paste(@method,"*",sep:""),@Methods,0,exact:F) if (@jmeth == 0){ error(paste("Unrecognized screening method\"",@method,"\"",sep:"")) } @method <- delete(@Methods,return:T)[@jmeth] if (match("*=*",@model,0,exact:F) == 0) { error(paste("\"",@model,"\" is not a legal model",sep:"")) } @nforced <- if (isnull(@forced)){ 0 } else { length(@forced) } if (alltrue(@nforced > 0, length(unique(@forced)) != @nforced)){ error("Duplicate values in value of 'forced'") } if (alltrue(isfunction(pushmodel),pushmodel(canpush:T))) { pushmodel() } manova(@model,silent:T) @p <- ncols(RESIDUALS) @N <- nrows(RESIDUALS) if (alltrue(@nforced > 0, max(@forced) > @p)){ error(paste("Maximum forced value > dimension =",@p)) } if (@maxvar + @nforced < @p) { @result <- NULL if (!@silent){ print(paste(sep:"","WARNING: # of non-forced variables > ", @maxvar,\ "; use 'maxvar:",@p - @nforced," if you've got lots of time")) } } else { @nterms <- length(DF) @jterms <- run(@nterms-1) if (TERMNAMES[1] == "CONSTANT") { if (@nterms == 2){ error(paste("No non-constant terms in \"",@model,"\"",sep:"")) } @jterms <- @jterms[-1] } @fh <- sum(DF[@jterms]) @fe <- DF[@nterms] if (@fh == 0) { error(paste("No non-constant terms in \"",@model,"\"",sep:"")) } if (@fe == 0) { error("No degrees of freedom for error") } @e <- matrix(SS[@nterms,,]) @hpluse <- matrix(sum(SS[@jterms,,])) + @e delete(@jterms,@nterms) if (@nforced == @p) { if (!@silent){ print("WARNING: all variables forced in; no selection done") } #if (@jmeth <= 2){ @crit <- -@N*log(det(@hpluse)/det(@e)) +\ @penalty*@p*(@p+1 + 2*@fh)/2 #} @result <- structure(subsets:run(@p),criterion:@crit) } else { @p1 <- @p - @nforced @nsets <- 2^@p1 if (@mbest > @nsets){ if (!@silent){ print(paste("WARNING: value for 'mbest' too big; reduced to",\ @nsets)) } @mbest <- @nsets } @eligible <- if (@nforced == 0){ run(@p) } else { run(@p)[-@forced] } @crit <- rep(0, @nsets) @nparamsig <- @p*(@p+1)/2 if (@nforced > 0){ # the criterion for forced-in set @loglik <-\ -@N*log(det(@hpluse[@forced,@forced])/det(@e[@forced,@forced])) @nparam <- @nparamsig + @nforced*@fh + @p } else { # the criterion for the empty set @loglik <- 0 @nparam <- @nparamsig + @p } @crit[@nsets] <- @loglik + @penalty*@nparam @powersof2 <- 2^run(0,@p1-1) for(@i,1,@nsets-1){ @set <- vector(@eligible[(@i %& @powersof2) != 0],@forced) @q <- length(@set) # size of subset #if (@jmeth <= 2){ @loglik <- \ -@N*log(det(@hpluse[@set,@set])/det(@e[@set,@set])) @nparam <- @nparamsig + @q*@fh + @p @crit[@i] <- @loglik + @penalty*@nparam #} } delete(@nparam,@nparamsig,@loglik) @jbest <- grade(@crit)[run(@mbest)] @crit <- @crit[@jbest] @maxp <- if (@nforced > 0 && @mbest == 1 && @jbest[1] == @nsets) { @nforced } else { max(nbits(@jbest)) + @nforced } @subsets <- matrix(rep(0,@mbest*@maxp),@maxp) for(@i,1,@mbest) { @set <- @jbest[@i] if (@set == @nsets) { if (@nforced > 0){ @subsets[run(@nforced), @i] <- sort(@forced) } } else { @subsets[run(nbits(@set) + @nforced), @i] <- \ sort(vector(@eligible[(@set %& @powersof2) != 0], @forced)) } } if (@mbest == 1) { @subsets <- vector(@subsets) } else { setlabels(@subsets,structure("@","Set ")) setlabels(@crit,"Set ") } delete(@i,@mbest,@set,@jbest,@p,@p1,@maxp,@powersof2, @nsets) @result <-structure(subsets:delete(@subsets,return:T),\ criterion:delete(@crit,return:T)) } delete(@fh,@fe,@e,@hpluse) } if (alltrue(isfunction(popmodel),popmodel(canpop:T))) { popmodel() } delete(@result,return:T) %dascreen% ===> mlcrit <=== mlcrit MACRO DOLLARS ) Macro to compute ML factor analysis criterion function and its gradient ) The parameters x are log(psi_sub_i), i = 1,...,p ) This is to be used with optimization macros dfp(), bfs() and broyden() ) Usage ) result <- mlcrit(logpsi, k, structure(s, nfact [, del])) ) logpsi REAL vector of log uniquenesses ) k integer scalar to control what is calculated ) k = 0, return criterion (scalar) (MISSING when s not P.D.) ) k = 1, return gradient (vector) (rep(MISSING,nrows(s)) ) when s not P.D.) ) k = -2, return structure(loadings, eigenvals) ) s REAL symmetric covariance or correlation matrix ) nfact > 0 integer number of factors ) del >= 0 Small REAL scalar [1e-5]. Gradient is computed as ) gradient[i] = (critfn(logpsi + del*(run(p)==i)) - critfn(logpsi))/del ) del = 0 results in analytic gradient being used ) When s not positive definite, MISSING (k = 0) or ) rep(MISSING,nrows(s)) (k = 1) is returned ) No argument checking is done )) 011120 First draft )) 011124 del = 0 signals use of analytic gradient )) Version 011124, C. Bingham, kb@stat.umn.edu @x <- $1 @n <- length(@x) _PSIML <- exp(@x) @param <- $3 @s <- @param[1] @J <- run(@param[2]) @result <- if (($2) < 0) { if (($2) == -2) { # return structure(loadings, eigenvals) @stuff <- releigen(@s, dmat(_PSIML)) @gamma <- @stuff$values[@J]-1 @crit <- NULL @loadings <- if(min(@gamma) < 0){ NULL } else { _PSIML * @stuff$vectors[,@J] * sqrt(@gamma') } structure(loadings:delete(@loadings,return:T),\ eigenvals:delete(@stuff,return:T)$values) } else { NULL } } else { @stuff <- releigen(@s, dmat(_PSIML)) @gamma <- @stuff$values[-@J] if (min(@gamma) <= 0) { delete(@stuff) if (($2) == 0) { ? } else { rep(?, @n) } } else { @crit <- sum(@gamma - 1 - log(@gamma)) if (($2) == 0){ @crit } else { @del <- if (ncomps(@param) > 2) { @param[3] } else { 1e-5 } if (@del > 0) { @gradient <- @x for (@i,1,@n){ @tmp <- @x @tmp[@i] <- @tmp[@i] + @del @gamma <- releigenvals(@s, dmat(exp(@tmp)))[-@J] @gradient[@i] <- if (min(@gamma) <= 0) { ? } else { (sum(@gamma - 1 - log(@gamma)) - @crit)/@del } } delete(@tmp,@i) } else { @omega <- sqrt(_PSIML)*@stuff$vectors[,-@J] @gradient <- vector(sum((1 - @gamma)*(@omega'^2))) } delete(@del,@stuff,@crit) delete(@gradient,return:T) } } } delete(@x,@s,@gamma,@crit,@J,@param,@n,silent:T) delete(@result,return:T) %mlcrit% ===> glscrit <=== glscrit MACRO DOLLARS ) Macro to compute GLS factor analysis criterion function and its gradient ) The parameters x are log(psi_sub_i), i = 1,...,p ) This is to be used with optimization macros dfp(), bfs() and broyden() ) Usage ) result <- glscrit(logpsi, k, structure(s, nfact [, del])) ) logpsi REAL vector of log uniquenesses ) k integer scalar to control what is calculated ) k = 0, return criterion (scalar) (MISSING when s not P.D.) ) k = 1, return gradient (vector) (rep(MISSING,nrows(s)) ) when s not P.D.) ) k = -2, return structure(loadings, eigenvals) ) s REAL symmetric covariance or correlation matrix ) nfact > 0 integer number of factors ) del >= 0 Small REAL scalar [1e-5]. Gradient is computed as ) gradient[i] = (critfn(logpsi + del*(run(p)==i)) - critfn(logpsi))/del ) del = 0 results in analytic gradient being used ) When s not positive definite, MISSING (k = 0) or ) rep(MISSING,nrows(s)) (k = 1) is returned ) No argument checking is done )) Version 011120, C. Bingham, kb@stat.umn.edu )) 011124 del = 0 signals use of analytic gradient @x <- $1 @n <- length(@x) _PSIGLS <- exp(@x) @param <- $3 @s <- @param[1] @J <- run(@param[2]) @result <- if (($2) < 0) { if (($2) == -2) { # return structure(loadings, eigenvalues) @stuff <- releigen(@s, dmat(_PSIGLS)) @gamma <- @stuff$values[@J]-1 @crit <- NULL @loadings <- if(min(@gamma) < 0){ NULL } else { _PSIGLS * @stuff$vectors[,@J] * sqrt(@gamma') } structure(loadings:delete(@loadings,return:T),\ eigenvals:delete(@stuff,return:T)$values) } else { NULL } } else { @stuff <- releigen(@s, dmat(_PSIGLS)) @gamma <- @stuff$values[-@J] if (min(@gamma) <= 0) { delete(@stuff) if (($2) == 0) { ? } else { rep(?, @n) } } else { @crit <- .5*sum((1 - 1/@gamma)^2) @result <- if (($2) == 0){ @crit } else { @del <- if (ncomps(@param) > 2) { @param[3] } else { 1e-5 } if (@del > 0) { @gradient <- @x for (@i,1,@n){ @tmp <- @x @tmp[@i] <- @tmp[@i] + @del @gamma <- releigenvals(@s, dmat(exp(@tmp)))[-@J] @gradient[@i] <- if (min(@gamma) <= 0) { ? } else { @gradient[@i] <- (.5*sum((1 - 1/@gamma)^2) - @crit)/@del } } delete(@tmp,@i) } else { @omega <- sqrt(_PSIGLS)*@stuff$vectors[,-@J] @gradient <- vector(sum((1 - @gamma)*(@omega'^2)/@gamma^2)) } delete(@del,@stuff) delete(@gradient,return:T) } } } delete(@x,@s,@gamma,@crit,@J,@n,silent:T) delete(@result,return:T) %glscrit% ===> ulscrit <=== ulscrit MACRO DOLLARS ) Macro to compute ULS factor analysis criterion function and its gradient ) The parameters x are log(psi_sub_i), i = 1,...,p ) This is to be used with optimization macros dfp(), bfs() and broyden() ) Usage ) result <- ulscrit(logpsi, k, structure(s, nfact [, del])) ) logpsi REAL vector of log uniquenesses ) k integer scalar to control what is calculated ) k = 0, return criterion (scalar) (MISSING when s not P.D.) ) k = 1, return gradient (vector) (rep(MISSING,nrows(s)) ) when s not P.D.) ) k = -2, return structure(loadings, eigenvals) ) s REAL symmetric covariance or correlation matrix ) nfact > 0 integer number of factors ) del >= 0 Small REAL scalar [1e-5]. Gradient is computed as ) gradient[i] = (critfn(logpsi + del*(run(p)==i)) - critfn(logpsi))/del ) Currently del = 0 gets the default. At some point del = 0 ) will signal the use of analytic derivatives ) When s not positive definite, MISSING (k = 0) or ) rep(MISSING,nrows(s)) (k = 1) is returned ) No argument checking is done )) Version 011120, C. Bingham, kb@stat.umn.edu @x <- $1 @n <- length(@x) _PSIULS <- exp(@x) @param <- $3 @s <- @param[1] @J <- run(@param[2]) @result <- if (($2) < 0) { if (($2) == -2) { # return structure(loadings, eigenvals) @crit <- NULL @stuff <- eigen(@s - dmat(_PSIULS)) @delta <- @stuff$values[@J] @loadings <- if(min(@delta) < 0){ NULL } else { @stuff$vectors[,@J] * sqrt(@delta') } structure(loadings:delete(@loadings,return:T),\ eigenvals:delete(@stuff,return:T)$values) } else { NULL } } else { @stuff <- eigen(@s - dmat(_PSIULS)) @delta <- @stuff$values[-@J] @crit <- .5*sum(@delta^2) if (($2) == 0){ @crit } else { @del <- if (ncomps(@param) > 2) { @param[3] } else { 1e-5 } if (@del > 0) { delete(@stuff) @gradient <- @x for (@i,1,@n){ @tmp <- @x @tmp[@i] <- @tmp[@i] + @del @delta <- eigenvals(@s - dmat(exp(@tmp)))[-@J] @gradient[@i] <- (.5*sum(@delta^2) - @crit)/@del } delete(@tmp,@i,@del) } else { @omega <- delete(@stuff,return:T)$vectors[,-@J] @gradient <-\ -_PSIULS*vector(sum(@delta*(delete(@omega,return:T)'^2))) } delete(@gradient,return:T) } } delete(@x,@s,@delta,@crit,@J,@n,silent:T) delete(@result,return:T) %ulscrit% ===> facanal <=== facanal MACRO DOLLARS ) Macro to use optimization macro dfp(), bfs() or broyden() ) to find ULS, GLS or ML estimates of uniquenesses and loadings. ) It uses macro dfp(), bfs() or broyden() to minimize the criterion as ) a function of log(psi), psi = uniquenesses. ) ) It requires macros ulscrit(), glscrit() or mlcrit() to compute ) the objective function, gradient vector and loading matrix. ) ) Usage: ) facanal(s, m [, method:Meth] [,rotate:Rotmeth] \ ) [,start:psi0 or presteps:nsteps]\ ) [,maxit:nmax] [,minit:nmin] [,crit:vector(nsig1,nsig2,dg)]\ ) [,minimizer:M] [,gold:ngold] [,quiet:T] [,silent:T] \ ) [,kaiser:F] [,recordwhen:d1] [,printwhen:d2]) ) s REAL p x p positive definite covariance or correlation ) matrix ) m 0 < integer < (2*p + 1 - sqrt(8*@p+1))/2 ) Meth one of "uls", "gls", "ml" or "mle" ["mle"] ) Rotmeth one of "varimax", "quartimax", "equimax" ["none"] ) psi0 Positive length p vector of starting values for psi ) nsteps Integer > 0, number of stepuls(), stepgls() or stepml() ) steps before using minimizer [1]; 'presteps' illegal with ) 'start:psi0' ) nmax integer > 0, maximum number of iterations permitted [30] ) nmin integer >= 0, minimum number of iterations [0] ) nsig1 target number of significant digits in log uniquenesses [5] ) nsig2 target number of significant digits in criterion [8] ) dg target upper limit for ||gradient|| [-1] ) M one of "dfp", "bfs", or "broyden" ["bfs"] ) ngold integer >= 0, # of steps in golden mean linear search [1] ) quiet:T suppresses printing of results ) silent:T suppresses printing of results and warning messages ) kaiser:F don't use Kaiser normalization ) d1 integer >= 0. When d1 > 0, partial results are printed at ) iterations d1, 2*d1, ... [0]; not affected by silent:T ) d2 integer >= 0. When d2 > 0, partial results are recorded at ) iterations d2, 2*d2, ... [0] ) ) nsig1, nsig2 and dg specify convergence critera; any <= 0 are ignored ) ngold is ignored if M is "broyden" ) The result is an "invisible" structure ) structure(psihat:psi,loadings:l,criterion:crit,eigenvals:vals,\ ) gradient:g, method:Meth, rotation:Rotmeth,iter:n, status:k) ) ) psi REAL vector of estimated uniquenesses ) l REAL p by m matrix of loadings ) crit Minimized criterion ) vals eigenvalues s relative to psi ) g REAL gradient vector at optimum ) n > 0 integer = number of iterations ) k >= 0 integer specifying which of the convergence criterion ) signalled convergence; k == 0 means not converged ) ) When d2 > 0, side effect structure DFPRECORD, BFSRECORD or ) BROYDNRECORD with components funvals, xvals and gradients is created. ) On each call, ulscrit(), glscrit() or mlcrit() saves a copy of psi ) in invisible variable _PSIULS, _PSIGLS or _PSIML. ) )) Version of 011117, written by C. Bingham, kb@stat.umn.edu )) 011125 added keyword 'golden' with default 1 )) added keyword 'rotation' with default "none" )) changed default for delta to 0 so as to use analytic gradient )) modified printed output )) result now includes method and rotation )) 030115 minor bug fixed so silent:T is recognized )) 031217 Fixed bug so it would recognize when macro minimizer was loaded )) 031222 Fixed minor bug so it gets labels right when dim(s) is 1,p,p )) 041111 Changed default starting value to result of one principal )) factor step from 1/diag(solve(s)); added default labels to )) PSI and LOADINGS; cosmetic changes to output )) 041115 Changed default starting value to the result of one step away ) from 1/diag(solve(s)) using stepuls(), stepgls() or stepml(); )) added keyword phrase 'presteps:nsteps' to allow more steps # $S(s or r, m [, start:psi0 or presteps:n] [, maxit:m, silent:T, quiet:T]) @s <- matrix(argvalue($1,"variance or correlation","square real nonmissing")) @m <- argvalue($2,"number of factors","positive count") @keys <- if ($k > 0) { structure($K,NULL) } else { structure(NoTaKey:NULL) } @method <- keyvalue(@keys,"method","string",default:"ml") @maxit <- keyvalue(@keys, "maxit*","positive count",default:30) @minit <- keyvalue(@keys, "minit*","count",default:0) @delta <- keyvalue(@keys,"delta","nonnegative number",default:0) @silent <- keyvalue(@keys, "silent", "TF", default:F) @quiet <- keyvalue(@keys, "quiet", "TF", default:@silent) @record <- keyvalue(@keys, "recordwhen", "count", default:0) @print <- keyvalue(@keys, "printwhen", "count", default:0) @minimizer <- keyvalue(@keys,"minimizer","string",default:"bfs") @crit <- keyvalue(@keys,"crit","real nonmissing vector",default:vector(5,8,-1)) @gold <- keyvalue(@keys,"gold*","count",default:1) @rotmethod <- keyvalue(@keys,"rot*","string",default:"none") @kaiser <- keyvalue(@keys,"kais*","TF",default:T) @presteps <- keyvalue(@keys,"prestep*","positive count") @x0 <- keyvalue($K,"start","positive vector") delete(@keys) @imeth <- min(match(@method,vector("uls","gls","ml","mle"),0),3) if (@imeth == 0) { error(paste("unrecognized method \"",@method,"\"",sep:"")) } @kmethod <- match(vector("var*","quart*","equi*","none"),\ @rotmethod,0,exact:F) if (sum(@kmethod) != 1) { error(paste(@rotmethod,\ "is not one of \"varimax\",\"quartimax\" or \"equimax\"")) } @kmethod <- run(4)[@kmethod != 0] @rotmethod <- \ vector("Varimax","Quartimax","Equimax","none")[delete(@kmethod,return:T)] if (@minimizer == "broyden") { @gold <- 0 } @p <- nrows(@s) if (isnull(@x0)) { if (isnull(@presteps)){ @presteps <- 1 } @stepper <- vector("stepuls","stepgls","stepml")[@imeth] if (!ismacro(<<@stepper>>)) { if (@imeth == 1){ getmacros(stepuls,silent:T) } elseif (@imeth == 2) { getmacros(stepgls,silent:T) } else { getmacros(stepml,silent:T) } } } else { if (!isnull(@presteps)) { error("You can't use both 'presteps:nsteps' and 'start:psi0'") } if (nrows(@x0) != @p) { error("Wrong number of starting uniquenesses") } } @kind <- if(sum(diag(@s) == 1) == @p){ @kind1 <- "R" "Correlation" }else{ @kind1 <- "S" "Covariance" } if (min(eigenvals(@s)) <= 0){ error(paste(@kind,"matrix is not positive definite")) } if (F && !ismacro(<<@minimizer>>)){ if (@minimizer == "dfp") { getmacros(dfp,quiet:T,printname:F) } elseif (@minimizer == "bfs") { getmacros(bfs,quiet:T,printname:F) } elseif (@minimizer == "broyden") { getmacros(broyden,quiet:T,printname:F) } else { error(paste("unrecognized 'minimizer' \"",@minimizer,"\"",sep:"")) } } elseif (!ismacro(minimizer)){ getmacros(minimizer,quiet:T,printname:F) } @critfn <- vector("ulscrit","glscrit","mlcrit")[@imeth] if (!ismacro(<<@critfn>>)) { if (@imeth == 1){ getmacros(ulscrit,quiet:T,printname:F) } elseif (@imeth == 2){ getmacros(glscrit,quiet:T,printname:F) } else { getmacros(mlcrit,quiet:T,printname:F) } } if (isnull(@x0)){ # no value; compute starting value # start with 1 or more steps of stepuls(), stepgls() or stepml() @x0 <- @psi0 <- 1/diag(solve(@s)) for (@i,1,@presteps) { @x0 <- <<@stepper>>(@s, @x0, @m) } @x0 <- @x0$psi if (min(@x0) <= 0) { @x0 <- @psi0 } delete(@psi0) } @x0 <- log(@x0) #@x0 <- log(keyvalue($K,"start","positive vector",\ # default:1/diag(solve(@s)))) # starting value @param <- structure(@s, @m, @delta) if (F) { @result <- <<@minimizer>>(@x0,<<@critfn>>,@param,\ maxit:@maxit, minit:@minit, crit:@crit,gold:@gold,\ recordwhen:@record, printwhen:@print) } else { @result <- minimizer(@x0,<<@critfn>>,@param,method:@minimizer,\ maxit:@maxit, minit:@minit, crit:@crit,gold:@gold,\ recordwhen:@record, printwhen:@print,silent:@silent) } # The result is structure(x:xmin, f:minVal, gradient:gradient,\ # h:invhessian, iterations:niter) @iter <- @result$iterations @status <- @result$status if (@status < 0) { print(paste("WARNING: criterion or gradient not defined on iteration", \ @iter)) } @x <- @result$x PSI <- vector(exp(@x)) CRITERION <- @result$f GRADIENT <- vector(@result$gradient) @result <- if (@imeth == 1) { ulscrit(@x, -2, @param) } elseif (@imeth == 2){ glscrit(@x, -2, @param) } else { mlcrit(@x, -2, @param) } LOADINGS <- if (@rotmethod == "none") { @result$loadings } else { @H <- if (!@kaiser) { 1 } else { vector(sqrt(sum(@result$loadings'^2))) } @H*rotation(@result$loadings/@H, method:@rotmethod, reorder:T) } EIGENVALUES <- delete(@result,return:T)$eigenvals delete(@x,@param,@kind) @labels <- if (haslabels(@s)) { getlabels(@s,1) } else { "V" } @labels <- structure(@labels,vector("F1","F")[1 + (@m > 1)]) setlabels(PSI, @labels[1]) setlabels(LOADINGS, delete(@labels,return:T)) if (!@silent) { if (2*@m >= 2*@p + 1 - sqrt(8*@p+1)){ print(paste("WARNING: With m = ",@m," and p = ",@p,\ ", m > (2*p + 1 - sqrt(8*p+1))/2",sep:"")) } if (isnull(LOADINGS)){ print(paste("WARNING: eigenvalue",@m,"of Psi^(-1) %*%",@kind1,"< 1")) } if (@status <= 0){ print(paste("WARNING: no convergence in",@iter,"iterations")) } if (!@quiet) { if(@status > 0){ print(paste("Convergence in",@iter,"iterations by criterion",@status)) } @loadname <- if (@rotmethod != "none") { paste(@rotmethod,"rotated estimated loadings") } else { "Unrotated estimated loadings" } print(PSI,name:"Estimated uniquenesses") print(LOADINGS, name:delete(@loadname,return:T)) print(CRITERION, name:paste("Minimized",@method,"criterion")) } } @silent <- @silent || @quiet delete(@quiet,@p,@m,@delta,@kind1) @result <- structure(psihat:PSI,loadings:LOADINGS,criterion:CRITERION,\ eigenvals:EIGENVALUES, gradient:GRADIENT,\ method:delete(@method,return:T),rotation:delete(@rotmethod,return:T),\ iter:delete(@iter,return:T),status:delete(@status,return:T)) delete(@result,return:T,invisible:!delete(@silent,return:T)) %facanal% ===> ulsfactor <=== ulsfactor MACRO DOLLARS ) Macro to use nonlinear least squares macro levmar() to find ) ULS estimates of uniquenesses and loadings. It uses macro ulsresids() ) to compute residuals used by levmar(). ) ) Usage: ) result <- ulsfactor(s, m [, quiet:T,silent:T,start:psi0,uselogs:T],\ ) maxit:nmax, minit:nmin, print:T,crit:crvec]) ) s positive definite covariance or correlation matrix ) m positive integer < (2*p + 1 - sqrt(8*@p+1))/2 ) nmax nonnegative integer, maximum number of iterations permitted ) default = 30 ) nmin nonnegative integer, minimum of iterations required, ) default=0 [0] ) psi0 positive REAL vector of length p, starting values ) crvec vector(numsig, nsigsq, delta), 3 criteria for convergence; ) criterion < means it is not used ) print:T something is printed on each interation ) uselogs:T parametrize in terms of log(psi) ) result "invisible" structure(psihat:psi,loadings:l,criterion:crit,\ ) eigenvals:vals,status:iconv,iter:n) ) psi vector of estimated uniquenesses ) l p by m matrix of loadings with orthogonal columns ) crit sum(vector((s-sigmahat)^2)), sigmahat = l%*%l' + dmat(psi) ) vals vector of eigenvalues of s - psi ) iconv convergence information ) = 0 or 4 not converged ) = 1, converged with relative change in coeffs < relcon = 10^-numsig ) = 2, converged with relative change in rss <= relssq = 10^-nsigsq ) = 3, converged with ||gradient|| < delta ) = 4, interation stopped; could not reduce rss. ) n number of iterations required ) ) There is no guarantee that min(psihat) > = 0. In that case you can try ) using uselogs:T to parametrize in terms of log(psi). This ensures ) min(psi) > 0 if it converges. ) However, if the minimum is actually outside the permissible ) range, you will probably get a 'argument to solve() singular' ) error message. ) ) Macro ulsresids saves a copy of psi on every call in invisible variable ) _PSIULS. Type 'print(_PSIULS) to see this value. )) Version of 991114, written by C. Bingham, kb@stat.umn.edu )) It uses macro levmar to minimize the sum of squared residuals. # $S(s or r, m [,start:psi0,maxit:m,print:T,quiet:T]) @s <- argvalue($1,"variance or correlation","square real nonmissing") @quiet <- keyvalue($K, "quiet","TF",default:F) @silent <- keyvalue($K, "silent","TF",default:F) @maxit <- keyvalue($K, "maxit*","positive integer",default:30) __USELOGSULS <- keyvalue($K, "uselogs","TF",default:F) @m <- argvalue($2,"number of factors","positive integer") @p <- nrows(@s) @kind <- if(sum(diag(@s) == 1) == @p){ "correlation" }else{ "covariance" } if (min(eigenvals(@s)) <= 0){ error(paste(@kind,"matrix is not positive definite")) } if (2*@m >= 2*@p + 1 - sqrt(8*@p+1)){ error(paste("With m = ",@m," and p = ",@p,\ ", m > (2*p + 1 - sqrt(8*p+1))/2",sep:"")) } if (!ismacro(levmar)){ getmacros(levmar,quiet:T,printname:F) } if (!ismacro(ulsresids)){ getmacros(ulsresids,quiet:T,printname:F) } @psi0 <- keyvalue($K,"start","positive vector",\ default:1/diag(solve(@s))) # starting value if (length(@psi0) != @p){ error("Wrong number of starting uniqueness") } if (__USELOGSULS){ @psi0 <- log(@psi0) } @result <- levmar(@psi0,0,vector(@s),@m,resid:ulsresids,$K) @psihat <- if (__USELOGSULS){ exp(@result$coefs) }else{ @result$coefs } @iconv <- @result$iconv @iter <- @result$iter if (@iconv == 0){ print(paste("WARNING: failed to converge in",@maxit,"iterations")) } @eigs <- eigen(@s - dmat(@psihat)) @vals <- @eigs$values @J <- run(@m) @lhat <- if (@vals[@m] < 0){ NULL }else{ @eigs$vectors[,@J] * sqrt(@vals[@J]') } delete(@eigs,@J) if (!@silent){ if (isnull(@lhat)){ print(paste("WARNING: eigenvalue",@m,"of (R or S) - Psi is negative")) } if (min(@psihat) < 0){ print("WARNING: psi not acceptable since min(psihat) < 0") } if (@iconv == 4){ print("WARNING: could not reduce RSS on iteration",@iter) }elseif (@iconv <= 0){ print(paste("WARNING: no convergence in",@iter,"iterations")) } } if (!@quiet && !@silent){ if(@iconv > 0 && @iconv < 4){ print(paste("Convergence in",@iter,"iterations by criterion",@iconv)) } print(psihat:@psihat,loadings:@lhat,criterion:@result$rss,eigenvals:@vals) } delete(@quiet,@silent,@p,@m,@psi0,__USELOGSULS) @result <- structure(psihat:delete(@psihat,return:T),\ loadings:delete(@lhat,return:T),criterion:delete(@result$rss,return:T),\ eigenvals:delete(@vals,return:T),status:delete(@iconv,return:T),\ iter:delete(@iter,return:T)) delete(@result,return:T,invisible:T) %ulsfactor% ===> ulsresids <=== ulsresids MACRO DOLLARS OUTLINE ) Macro to be used with levmar to do ULS factor extraction ) ) ulsresids() usage: ) ulsresids(psi,0,vector(r),m) ) psi vector of uniquenesses ) r correlation or covariance matrix ) m positive integer number of factors ) ) levmar usage: ) levmar(if(__USELOGSULS){log(psi)}else{psi},0,\ ) vector(r),m,resid:ulsresids) )) 0000828 # $S(psi,0,vector(r),m) _PSIULS <- if(__USELOGSULS){ exp($1) }else{ $1 } @psi <- dmat(_PSIULS) @r <- $3 @p <- sqrt(length(@r)) @J <- run($4) #run(m) @eigs <- eigen(matrix(@r,@p) - @psi) @l <- @eigs$vectors[,@J] @res <- delete(@r,return:T) - vector((@l * @eigs$values[@J]') %C% @l + @psi) delete(@psi,@p,@J,@eigs,@l) delete(@res, return:T) %ulsresids% ===> glsfactor <=== glsfactor MACRO DOLLARS ) Macro to use nonlinear least squares macro levmar to find ) GLS estimates of uniquenesses and loadings. It uses macro glsresids ) to compute residuals used by levmar. ) ) Usage: ) result <- glsfactor(s, m [, quiet:T,silent:T,start:psi0,uselogs:T],\ ) maxit:nmax, minit:nmin, print:T,crit:crvec]) ) s positive definite covariance or correlation matrix ) m positive integer < (2*p + 1 - sqrt(8*@p+1))/2 ) nmax nonnegative integer, maximum number of iterations permitted ) default = 30 ) nmin nonnegative integer, minimum of iterations required, ) default=0 [0] ) psi0 positive REAL vector of length p, starting values ) crvec vector(numsig, nsigsq, delta), 3 criteria for convergence; ) criterion < means it is not used ) print:T something is printed on each interation ) uselogs:T parametrize in terms of log(psi) ) result "invisible" structure(psihat:psi,loadings:l,criterion:crit,\ ) eigenvals:vals,status:iconv,iter:n) ) psi vector of estimated uniquenesses ) l p by m matrix of loadings with orthogonal columns satisfying ) l' %*% dmat(1/psi) %*% l is diagonal ) crit sum(vector((s-sigmahat)^2)), sigmahat = l%*%l' + dmat(psi) ) vals vector of eigenvalues of s - psi ) iconv convergence information ) = 0 or 4 not converged ) = 1, converged with relative change in coeffs < relcon = 10^-numsig ) = 2, converged with relative change in rss <= relssq = 10^-nsigsq ) = 3, converged with ||gradient|| < delta ) = 4, interation stopped; could not reduce rss. ) ) There is no guarantee that min(psihat) > = 0. In that case you can try ) using uselogs:T to parametrize in terms of log(psi). This ensures ) min(psi) > 0 if it converges. ) However, if the minimum is actually outside the permissible ) range, you will probably get a 'argument to solve() singular' ) error message. ) ) l is the p by m matrix of loadings satisfying ) crit is sum of squared differences of s from sigmathat ) vals are eigenvalues of s - psi ) ) Macro glsresids saves a copy of psi on every call in invisible variable ) _PSIGLS. Type 'print(_PSIGLS) to see this value. )) Version of 991115, written by C. Bingham, kb@stat.umn.edu )) It uses macro levmar to minimize the sum of squared residuals. # $S(s or r, m [,start:psi0,maxit:m,print:T,quiet:T]) @s <- argvalue($1,"variance or correlation","square real nonmissing") @quiet <- keyvalue($K, "quiet","TF",default:F) @silent <- keyvalue($K, "silent","TF",default:F) @maxit <- keyvalue($K, "maxit*","positive integer",default:30) __USELOGSGLS <- keyvalue($K, "uselog*","TF",default:F) @m <- argvalue($2,"number of factors","positive integer") @p <- nrows(@s) @kind <- if(sum(diag(@s) == 1) == @p){ @what <- "r" "correlation" }else{ @what <- "s" "covariance" } @kind <- if(sum(diag(@s) == 1) == @p){ "correlation" }else{ "covariance" } if (min(eigenvals(@s)) <= 0){ error(paste(@kind,"matrix is not positive definite")) } if (2*@m >= 2*@p + 1 - sqrt(8*@p+1)){ error(paste("With m = ",@m," and p = ",@p,\ ", m > (2*p + 1 - sqrt(8*p+1))/2",sep:"")) } if (!ismacro(levmar)){ getmacros(levmar,quiet:T,printname:F) } if (!ismacro(glsresids)){ getmacros(glsresids,quiet:T,printname:F) } @psi0 <- keyvalue($K,"start","positive vector",\ default:1/diag(solve(@s))) # starting value if (length(@psi0) != @p){ error("Wrong number of starting uniqueness") } if (__USELOGSGLS){ @psi0 <- log(@psi0) } @result <- levmar(@psi0,vector(@s),vector(@s),@m,resid:glsresids,$K) @psihat <- if (__USELOGSGLS){ exp(@result$coefs) }else{ @result$coefs } @iconv <- @result$iconv @iter <- @result$iter if (@iconv == 0){ print(paste("WARNING: failed to converge in",@maxit,"iterations")) } @eigs <- releigen(solve(@s), dmat(1/@psihat)) @J <- run(@m) @vals <- 1/reverse(@eigs$values) - 1 @lhat <- if (@vals[@m] < 0){ NULL }else{ @eigs$vectors[,@p + 1 - @J] * sqrt(@vals[@J]') } delete(@eigs,@J) if (!@silent){ if (isnull(@lhat)){ print(paste("WARNING: eigenvalue",@m,"is negative")) } if (min(@psihat) < 0){ print("WARNING: psi not acceptable since min(psihat) < 0") } elseif (min(eigenvals(@s - dmat(@psihat))) < 0) { print(paste("WARNING:",@what,\ "- dmat(psihat) not non-negative definite"),macroname:T) } if (@iconv == 4){ print("WARNING: could not reduce RSS on iteration",@iter) }elseif (@iconv <= 0){ print(paste("WARNING: no convergence in",@iter,"iterations")) } } if (!@quiet && !@silent){ if(@iconv > 0 && @iconv < 4){ print(paste("Convergence in",@iter,"iterations by criterion",@iconv)) } print(psihat:@psihat,loadings:@lhat,criterion:@result$rss,vals:@vals) } delete(@quiet,@silent,@p,@m,@psi0,__USELOGSGLS) @result <- structure(psihat:delete(@psihat,return:T),\ loadings:delete(@lhat,return:T),criterion:delete(@result$rss,return:T),\ vals:delete(@vals,return:T),status:delete(@iconv,return:T),\ iter:delete(@iter,return:T)) delete(@result,return:T,invisible:T) %glsfactor% ===> glsresids <=== glsresids MACRO DOLLARS ) Macro to be used with levmar to do GLS factor extraction ) This is very much a work in progress ) Usage: ) stuff <- levmar(if(__USELOGSGLS){log(psi)}else{psi},\ ) vector(r),vector(r),m,resid:glsresids), m = number of factors ) glsresids itself returns vector(Ip - solve(r,sigmahat)) ) It saves its most recent argument as _PSIGLS ) ) You need to set __USELOGSGLS to T or F before calling levmar ) and use log(psi) as an argument to levmar when __USELOGSGLS is True ) This is done automatically by macro glsfactor ) ) The result from levmar is the structure returned by levmar. ) stuff$coefs is psihat (or log(psihat)) ) stuff$rss is the sum(vector(Ip - solve(r,sigmahat))^2) ) stuff$iconv is convergence status -- 0 means not converged ) stuff$iter is the number of iterations ) Version 991115 # $S(psi,vector(r),vector(r),m) # residualstr((Ip - solve(s,sigmahat))^2) _PSIGLS <- if(__USELOGSGLS){ exp($1) }else{ $1 } @r <- $2 #x @p <- sqrt(length(@r)) @r <- matrix(@r,@p) @m <- $4 @eigs <- releigen(solve(@r), dmat(1/_PSIGLS)) @J <- run(@p, @p - @m + 1) @tmpl <- @eigs$vectors[,@J] @vals <- 1/@eigs$values[@J] - 1 @sighat <- (@tmpl * @vals') %C% @tmpl + dmat(_PSIGLS) @resids <- sqrt(abs(dmat(@p,1) - solve(@r, @sighat))) @resids <- @resids * @resids' delete(@m,@J,@eigs,@tmpl,@vals,@p,@r,@sighat) vector(delete(@resids, return:T)) %glsresids% ===> goodfit <=== goodfit MACRO DOLLARS ) Usage: ) goodfit(S, lhat, psihat) ) goodfit(R, lhat, psihat) ) S p by p sample variance matrix ) R p by p sample correlation matrix ) lhat p by m matrix of factor loadings ) psihat length p vector of factor uniquenesses ) ) Prints the maximum absolute deviation, the sum of squared ) deviations and the sum of absolution deviations from S or R of the ) fitted covariance or correlation matrix S_hat or R_hat computed ) as lhat %*% lhat' + dmat(psihat) ) ))Version 991114 # usage: $S(s,lhat,psihat), s a covar or correl matrix @lhat <- argvalue($2,"lhat","real matrix nonmissing") @dev <- vector(argvalue($1,"s","real square") -\ (@lhat %C% @lhat) -\ dmat(argvalue($3[[1]],"psi","nonmissing real vector"))) @tmp <- vector(max(abs(@dev)),sum(@dev^2), sum(abs(@dev)),\ labels:vector("max(|dev|)","sum(dev^2)","sum(|dev|)")) delete(@dev) delete(@tmp,return:T) %goodfit% ===> stepuls <=== stepuls MACRO DOLLARS ) Macro to do 1 step of crude iteration for ULS factor analysis ) Usage: ) result <- stepuls(s, psi, m [,nsteps:n] [,print:T]) ) result <- stepuls(s, structure(psi,...), m [,nsteps:n][,print:T]) ) s p by p variance or correlation matrix ) psi length p vector of current values for uniquenesses ) m positive integer = number of factors ) n positive integer = number of steps to take, default 1 ) print:T new values of uniqueness and loadings printed on ) each step ) result structure(psi:newPsi, loadings:oldL, crit:criterion) ) newPsi length p vector of updated uniquenesses ) oldL p by m matrices of loadings computed from psi (not ) from newPsi) ) criterion ULS criterion computed from newPsi ) ) stepuls() also creates side effect variabls PSI, LOADINGS, and ) CRITERION matching the components of result or values at last ) successful step when it can't complete a step ))041111 added keyword 'nsteps' ))Version 041111 # usage: $S(r,psi,m [,print:T]), s=covar or corr, psi=current value # m = order = number of factors if ($v != 3 || $k > 2){ error("usage: $S(s, psi, m [,nsteps:n] [,print:T])",macroname:F) } @s <- matrix(argvalue($1,"s","real square")) @psi <- argvalue($2[[1]],"psi","positive vector") @m <- argvalue($3,"number of factors","positive integer scalar") @Jm <- run(@m) @print <- keyvalue($K,"print","TF",default:F) @nsteps <- keyvalue($K,"nstep*","positive count",default:1) @sii <- diag(@s) @labels <- if (haslabels(@s)) { getlabels(@s,1) } else { "V" } @labels <- structure(@labels, vector("F1","F")[1 + (@m > 1)]) for (@istep, 1, @nsteps) { @eigs <- eigen(@s - dmat(@psi)) @vals <- @eigs$values[@Jm] if(min(@vals)<= 0){ error(paste("s - psi not positive definite on step",@istep)) } @l <- sqrt(@vals)' * @eigs$vectors[,@Jm] @psi <- @sii - vector(sum(@l'^2)) if(min(@psi) <= 0){ error(paste("min(psi) <= 0 on step", @istep)) } PSI <- @psi setlabels(PSI,@labels[1]) LOADINGS <- @l setlabels(LOADINGS,@labels) CRITERION <- sum(@eigs$values[-@Jm]^2) if(@print){ # print goodness of fit & psi print(psi:PSI ,criterion:CRITERION) } } delete(@sii,@s,@nsteps,@eigs,@vals,@psi,@l,@Jm,@m,@istep,@print, @labels) structure(psi:PSI, loadings:LOADINGS, crit:CRITERION) %stepuls% ===> stepml <=== stepml MACRO DOLLARS ) Macro to do 1 or more steps of crude iteration for MLE factor analysis ) Usage: ) result <- stepml(s, psi, m [,nsteps:n] [,print:T]) ) result <- stepml(s, structure(psi,...), m [,nsteps:n][,print:T]) ) s p by p variance or correlation matrix ) psi length p vector of current values for uniquenesses ) m positive integer = number of factors ) n positive integer = number of steps (default 1) ) print:T new values of uniqueness and loadings printed ) result structure(psi:newPsi, loadings:oldL, crit:criterion) ) newPsi length p vector of updated uniquenesses ) oldL p by m matrices of loadings computed from psi (not ) from newPsi) ) criterion ML criterion computed from newPsi ) ) stepml() also creates side effect variabls PSI, LOADINGS, and ) CRITERION matching the components of result. ))041111 added keyword nsteps; use eigen() instead of releigen() ))Version 041111 # usage: $S(s,psi,m [,print:T]), s=covar or corr, psi=current value # m=order, print=T or F if ($v != 3 || $k > 2){ error("usage: $S(s, psi, m [,nsteps:n] [,print:T])",macroname:F) } @s <- matrix(argvalue($1,"argument 1","real square")) @m <- argvalue($3,"number of factors","positive integer scalar") @Jm <- run(@m) @psi <- argvalue($2[[1]],"psi","positive vector") @print <- keyvalue($K,"print","TF",default:F) @nsteps <- keyvalue($K,"nstep*","positive count",default:1) @sii <- diag(@s) @labels <- if (haslabels(@s)) { getlabels(@s,1) } else { "V" } @labels <- structure(@labels, vector("F1","F")[1 + (@m > 1)]) for(@istep, 1, @nsteps) { @eigs <- eigen(@s/sqrt(@psi*@psi')) @vals <- @eigs$values[@Jm] - 1 if(min(@vals) <= 0){ error(paste("s - psi not positive definite on step",@istep)) } @l <- sqrt(@psi) * @eigs$vectors[,@Jm] * sqrt(@vals)' @psi <- @sii - vector(sum(@l'^2)) if(min(@psi) <= 0){ error(paste("non-positive psi on step", @istep)) } CRITERION <- sum((@eigs$values - log(@eigs$values)-1)[-@Jm]) PSI <- @psi setlabels(PSI,@labels[1]) LOADINGS <- @l setlabels(LOADINGS,@labels) if(@print){ # print goodness of fit & psi print(psi:PSI ,criterion:CRITERION) } } delete(@s,@m,@Jm,@psi,@print,@nsteps,@sii,@istep,@eigs,@vals,@l, @labels) structure(psi:PSI, loadings:LOADINGS, crit:CRITERION) %stepml% ===> stepgls <=== stepgls MACRO DOLLARS ) Macro to do 1 step of crude iteration for GLS factor analysis ) Usage: ) result <- stepgls(s, psi, m [,nsteps:n] [,print:T]) ) result <- stepgls(s, structure(psi,...), m [,nsteps:n][,print:T]) ) s p by p variance or correlation matrix ) psi length p vector of current values for uniquenesses ) m positive integer = number of factors ) n positive integer = number of steps to take, default 1 ) print:T new values of uniqueness and loadings printed ) result structure(psi:newPsi, loadings:oldL, crit:criterion) ) newPsi length p vector of updated uniquenesses ) oldL p by m matrices of loadings computed from psi (not ) from newPsi) ) criterion GLS criterion computed from newPsi ) ) stepgls() also creates side effect variabls PSI, LOADINGS, and ) CRITERION matching the components of result. ))041111 added keyword 'nsteps' and use eigen() instead of releigen() ))Version 041111 # usage: $S(r, psi, m [,print:T]), s=covar or corr, psi=current value # m = order = number of factors if ($v != 3 || $k > 2){ error("usage: $S(s, psi, m [,nsteps:n] [,print:T])",macroname:F) } @s <- matrix(argvalue($1,"argument 1","real square")) @psi <- argvalue($2[[1]],"psi","positive vector") @m <- argvalue($3,"number of factors","positive integer scalar") @Jm <- run(@m) @print <- keyvalue($K,"print","TF",default:F) @nsteps <- keyvalue($K,"nstep*","positive count",default:1) @labels <- if (haslabels(@s)) { getlabels(@s,1) } else { "V" } @labels <- structure(@labels, vector("F1","F")[1 + (@m > 1)]) for (@istep, 1, @nsteps) { @eigs <- eigen(@s/sqrt(@psi*@psi')) @vals <- @eigs$values[@Jm] - 1 if(min(@vals) <= 0){ error(paste("s - psi not positive definite on step", @istep)) } @l <- sqrt(@psi) * @eigs$vectors[,@Jm] * sqrt(@vals)' @sinv <- solve(@s) @psi <- vector(solve(@sinv^2,\ diag(@sinv) - vector(sum((@l %c% @sinv)^2)))) if(min(@psi) <= 0){ error(paste("non-positive psi; cannot continue on step",@istep)) } CRITERION <- sum((1-1/@eigs$values[-@Jm])^2) PSI <- @psi setlabels(PSI, @labels[1]) LOADINGS <- @l setlabels(LOADINGS, @labels) if(@print){ # print goodness of fit & psi print(psi:PSI ,criterion:CRITERION) } } delete(@eigs, @sinv, @s, @psi, @l, @nsteps, @print, @istep, @Jm, @m, @labels) structure(psi:PSI, loadings:LOADINGS, crit:CRITERION) %stepgls% ===> compf <=== compf MACRO DOLLARS ) Usage (only when integer vector INS is defined): ) f <- compf(h,e,fh,fe) ) h p by p MANOVA hypothesis matrix ) e p by p MANOVA error matrix ) fh integer scalar hypothesis degrees of freedom ) fe integer scalar error degrees of freedom ) f REAL vector of 'F-to-enter' statistics for each ) "out" variable in a forward stepwise discriminant ) analysis, when the variables listed in integer vector ) INS are "in". ) When INS[1] = 0, no variables are "in" ) ) compf() is obsolete. Use dastepsetup(), dastepstatus(), daentervar() ) daremovevar() and dasteplook(), or dastepselect() to do stepwise ) discriminant analysis )) Version 000529, use argvalue() # usage: f <- $S(h,e,fh,fe) ; INS must be defined @h <- matrix(argvalue($1,"hypothesis matrix", "real square nonmissing")) @e <- matrix(argvalue($2,"error matrix", "real square nonmissing")) @fh <- argvalue($3, "hypothesis DF", "positive count") @fe <- argvalue($4, "error DF", "positive count") @p <- nrows(@h) if (nrows(@e) != @p){ error("hypothesis and error matrices are different sizes") } if (alltrue(!isvector(INS,real:T), !isnull(INS))){ print("WARNING: INS not defined; initialized to 0") INS <- 0 } elseif (!isnull(INS)) { if(sum(INS != ceiling(INS)) != 0 || min(INS) < 0 || max(INS) > @p ||\ !isscalar(INS) && min(INS) == 0){ error("INS not 0 or vector of positive integers <= nrows(h)") } if(length(unique(INS)) != length(INS)){ error("duplicate elements in INS") } if(@p < length(INS)) { error("length(INS) > p = nrows(h); not possible") } } if (alltrue(!isnull(INS), @p == length(INS))){ print("WARNING: all variables IN", macroname:T) @out <- structure(f:NULL,df:vector(@fh,@fe - @p),ins:INS,OUTS:NULL) } else { if(anytrue(isnull(INS), INS[1]==0)){ # initialize @h <- diag(@h) @e <- diag(@e) OUTS <- run(@p) } else { OUTS <- run(@p)[-INS] @h <- diag(swp(@h + @e,INS))[OUTS] @e <- diag(swp(@e,INS))[OUTS] @h <-- @e @fe <-- length(INS) } @out <- structure(f:(@h/@fh)/(@e/@fe),df:vector(@fh,@fe),ins:INS,outs:OUTS) } delete(@p,@fh,@fe,@h,@e) delete(@out,return:T) %compf% ===> forstep <=== forstep MACRO DOLLARS ) forstep(i,h,e,fh,fe) ) Macro to add i-th variable to list of "in" variables in multigroup ) stepwise discriminant analysis. ) Integer vector INS must be defined and is a list of numbers of ) variables already "in". INS = 0 means no variables are "in" ) ) Returns structure(f, df, ins, outs) where f is the vector of ) F-to-enter values, df = vector(fh,fe) contains their degrees ) of freedom, ins and outs are vectors of variable numbers that ) are "in" and "out" respectively. INS = 0 means no variables are "in" ) ) forstep() is obsolete. Use dastepsetup(), dastepstatus(), ) daentervar(), daremovevar() and dasteplook(), or dastepselect() ) to do stepwise discriminant analysis )) Version 000826 use argvalue() # usage: $S(i,h,e,fh,fe) ; INS must be defined if ($v != 5){ error("usage is $S(ivar,h,e,fh,fe)") } @h <- matrix(argvalue($2,"hypothesis matrix","nonmissingreal square")) @e <- matrix(argvalue($3,"error matrix","nonmissingreal square")) if(nrows(@h) != nrows(@e)){ error("hypothesis and error matrices are different sizes") } @fh <- argvalue($4,"hypothesis DF","positive count") @fe <- argvalue($5,"error DF","positive count") @i <- argvalue($1,"entering variable number","positive count") if (!isvector(INS, real:T) && !isnull(INS)){ print("WARNING: initializing INS to 0") INS <- 0 } if (alltrue(!isnull(INS), match(@i, INS, 0) != 0)){ error("variable already IN") } INS <- if(alltrue(!isnull(INS), INS[1] != 0)){ vector(INS,@i) }else{ @i } OUTS <- if(isscalar(OUTS) == 1){ 0 }else{ OUTS[OUTS != @i] } if (!ismacro(compf)){ getmacros(compf,quiet:T,printname:F) } @out <- compf(@h, @e, @fh, @fe) delete(@h, @e, @fh, @fe, @i) delete(@out, return:T) %forstep% ===> backstep <=== backstep MACRO DOLLARS ) Macro for doing a back step in stepwise discriminant analysis. ) Usage: ) backstep(h,e,fh,fe) ) INS must be defined and contain indices of in variables ) h and e are hypothesis and error matrices with fh and fe degrees of ) freedom ) backstep deletes the variable with smallest F-to-delete, and returns ) structure(f:F_to_delete,df:vector(fh,fe),ins:INS,outs:OUTS) ) INS is updated and OUTS is created )) Version 000826 Use argvalue() # usage: $S(h,e,fh,fe) if ($v != 4){ error("usage is $S(h,e,fe,fh)") } @h <- matrix(argvalue($1,"hypothesis matrix","nonmissing square real")) @e <- matrix(argvalue($2,"error matrix","nonmissing square real")) @fh <- argvalue($3,"hypothesis DF","positive count") @fe <- argvalue($4,"error DF","positive count") @p <- nrows(@h) if (!isvector(INS,real:T)){ print(paste("WARNING: INS not defined; set to run(",@p,")",sep:"")) INS <- run(@p) } if(anytrue(isnull(INS), INS[1] == 0)){ error("no variables left IN") } # get submatrices of h & e for variables in INS @h <- @h[INS,INS] @e <- @e[INS,INS] @fe <-- length(INS) - 1 @u <- 1/diag(solve(@h + @e)) @v <- 1/diag(solve(@e)) @f <- ((@u - @v)/@fh)/(@v/@fe) delete(@u,@v,@h,@e) @deleted <- grade(@f)[1] INS <- if(isscalar(INS)){ 0 }else{ INS[-@deleted] } delete(@deleted) OUTS <- if(INS[1] == 0){ run(@p) }else{ run(@p)[-INS] } delete(@p) structure(f:delete(@f,return:T),df:vector(delete(@fh,return:T),\ delete(@fe,return:T)),ins:INS,outs:OUTS) %backstep% ===> mulvarhelp <=== mulvarhelp MACRO ) Macro to get help on macros in mulvar.mac # usage $S(topic1 [, topic2 ...] [help keywords]) if(!ismacro(_gethelp)){ getmacros(_gethelp,quiet:T,printname:F) } ___HELPFILE_ <- "mulvar.mac" ___INDEXTOP_ <- "mulvar_index" ___MACRO_ <- "$S" _gethelp($0) %mulvarhelp% _E_N_D_O_F_M_A_C_R_O_S_ Marker for end of macros and data Help file for mulvar.mac for MacAnova (C) 2004 by Gary W. Oehlert and Christopher Bingham Updated 041024 CB !!!! Starting marker for message of the day !!!! Ending marker for message of the day ???? Starting marker for list of up to 32 comma/newline separated keys Classification Covariance Descriptive statistics Diagnostics Distances Discrimination Factor Analysis Hotelling Tsq Hypothesis tests Iteration MANOVA Plotting Random numbers Stepwise Subset Selection Transformations ???? Ending marker for keys ====backstep()#classification,discrimination,stepwise,subset selection %%%% backstep(H,E,fh,fe), H and E symmetric REAL matrices of the same size with no MISSING values %%%% NOTE: This macro is OBSOLETE and is retained only for backward compatibility because it was in file MacAnova.mac in earlier versions of MacAnova. For doing stepwise variable selection in discriminant analysis you should use newer macros dastepsetup(), daentervar(), daremovevar(), dastepstatus() and dasteplook(). @@@@when_used#When backstep() is used You can use macro backstep() to perform a variable elimination step in backwards stepwise variable selection in linear discriminant analysis. Macro backstep() is intended to be used after you have used manova() to compute hypothesis and error matrices H and E, with fh and fe degrees of freedom respectively. @@@@status_information#Status information Status information about the variables currently "in" and "out" is maintained in integer vectors INS and OUTS containing numbers of variables currently included and currently excluded. When no variables are "in", INS = NULL (INS = 0 means the same thing); when all variables are "in", OUTS = NULL. INS must be initialized, usually to run(p), before backstep() can be used. @@@@usage#Usage backstep(H,E,fh,fe) computes F-to-delete for all variables currently included in vector INS. It then updates INS by removing the index of the variable with the smallest F-to-delete, setting INS to 0 if no variable is left "in". OUTS is computed as run(p) when there are no variables "in" and as run(p)[-INS] otherwise. @@@@value_returned#Value returned The value returned is structure(f:F_to_delete, df:vector(fh,fe-k+1), ins:INS,outs:OUTS), where F_to_delete is the vector of F-to-delete statistics and k = length(INS) before deletion. INS and OUTS are copies of the updated INS and OUTS vectors. The F-to-delete statistics have nominal degrees of freedom fh an fe - k + 1. The variable that is deleted is the one with the smallest F-to-delete statistic. If this is large enough, you may want to reenter the variable using forstep(). See forstep(). @@@@example#Example You can initialize things by Cmd> manova("y = groups",silent:T) # response matrix y, factor groups Cmd> H <- matrix(SS[2,,]); E <- matrix(SS[3,,])