Next: distcomp() Up: Multivariate Macros Help File Previous: discrim()   Contents

discrimquad()

Usage:
discrimquad(groups, y), factor or vector of positive integers groups,
  REAL matrix y with nrows(y) = length(groups)



Keywords: classification, discrimination
discrimquad(groups, y), where groups is a factor or an integer vector,
and y is a REAL data matrix, computes the coefficients of quadratic
discriminant functions that can be used to classify an observation into
one of the populations specified by argument groups.

It is an error if the smallest group has p or fewer members or if y has
any MISSING elements.

When there are g = max(groups) populations, and p = ncols(y) variables,
the value returned is structure(Q:q, L:l, addcon:c, grandmean:ybar),
where the components are as follows:
   q     structure(Q1,Q2,...Qg), each Qj a  REAL p by p matrix
   l     structure(L1,L2,...Lg), each L2 a REAL vector of length p
   c     vector(c1,c2,...cg), cj REAL scalars
   ybar  vector(ybar1,...ybarp), the vector of column means

When x is a vector of length p to be classified, the quadratic score
for group j is
    qs[j] = (x-ybar)' %*% q[j] %*% (x-ybar) + (x-ybar)' %*% l[j] + c[j]

The functions are optimal in the case when the distribution in each
population is multivariate normal with no assumption that the variance-
covariance matrices are the same for all populations.

When P = vector(P1,P2,...,Pg) is a vector of prior probabilities a
randomly selected case comes from the various populations, then the
posterior probabilities the elements of the vector
   P*exp(qs)/sum(P*exp(qs)) = P*exp(qs - qs[1])/sum(P*exp(qs - qs[1])

The latter form is usually preferable since it is possible for
exp(qs[1]) to be so large as to be uncomputable.  These probabilities
can be computed using macro probsquad().

NOTE: It is well known that posterior probabilities computed for a case
that is in "training set", the data set from which a classification
method was estimated, are biased in an "optimistic" direction: The
estimated posterior probability for its actual population is biased
upward.  For this reason posterior probabilities should be estimated
only for cases that are not in the training set.

See also discrim() and probsquad().


Gary Oehlert 2003-01-15