Next: distcomp()
Up: Multivariate Macros Help File
Previous: discrim()
Contents
Usage:
discrimquad(groups, y), factor or vector of positive integers groups,
REAL matrix y with nrows(y) = length(groups)
|
Keywords:
classification, discrimination
Usage
discrimquad(groups, y), where groups is a factor or an integer vector,
and y is a REAL data matrix, computes the coefficients of quadratic
discriminant functions that can be used to classify an observation into
one of the populations specified by argument groups.
It is an error if the smallest group has p or fewer members or if y has
any MISSING elements.
Value returned
When there are g = max(groups) populations, and p = ncols(y) variables,
the value returned is structure(Q:q, L:l, addcon:c, grandmean:ybar),
where the components are as follows:
q structure(Q1,Q2,...Qg), each Qj a REAL p by p matrix
l structure(L1,L2,...Lg), each L2 a REAL vector of length p
c vector(c1,c2,...cg), cj REAL scalars
ybar vector(ybar1,...ybarp), the vector of column means
Quadratic score
When x is a vector of length p to be classified, the quadratic score
for group j is
qs[j] = (x-ybar)' %*% q[j] %*% (x-ybar) + (x-ybar)' %*% l[j] + c[j]
The functions are optimal in the case when the distribution in each
population is multivariate normal with no assumption that the variance-
covariance matrices are the same for all populations.
Prior and posterior probabilities
When P = vector(P1,P2,...,Pg) is a vector of prior probabilities a
randomly selected case comes from the various populations, then the
posterior probabilities the elements of the vector
P*exp(qs)/sum(P*exp(qs)) = P*exp(qs - qs[1])/sum(P*exp(qs - qs[1])
The latter form is usually preferable since it is possible for
exp(qs[1]) to be so large as to be uncomputable. These probabilities
can be computed using macro probsquad().
Bias in estimating posterior probabilities
It is well known that posterior probabilities computed for a case that
is in "training set", the data set from which a classification method
was estimated, are biased in an "optimistic" direction: The estimated
posterior probability for its actual population is biased upward. For
this reason posterior probabilities should be estimated only for cases
that are not in the training set.
Cross references
See also discrim() and probsquad().
Gary Oehlert
2006-01-30