Next: labels Up: MacAnova Help File Previous: keywords   Contents

kmeans()

Usage:
kmeans(y [,means or classes] [,kmax:k1,kmin:k2,start:method,standard:F,
  weights:wts, quiet:T]), y a REAL matrix, means a REAL matrix with
  ncols(y) columns, classes a REAL vector with nrows(y) rows, k1 and k2
  positive integers, k1 >= k2, method one of "random", "optimal",
  "means", or "classes", wts a REAL vector with nrows(wts) = nrows(y)



Keywords: multivariate analysis
kmeans(y, kmax:k1 [, kmin:k2]) performs k-means clusterings of the rows
of REAL matrix y, starting with k1 clusters, and successively merging
clusters until there are k2 clusters.  By default the data are
standardized and the initial clusters are selected randomly.  At each
stage, cases are reallocated among clusters in an attempt to minimize
the sum of the within-cluster sums of squares.  If kmin:k2 is omitted,
k2 is taken to be k1.

It is an error when k2 > k1.

kmeans() returns a structure with components 'classes' and 'criterion'.
Component classes is a nrows(y) by k2-k1+1 matrix (vector if k2 = k1)
containing the cluster membership at each stage.  Component criterion is
a k2-k1+1 REAL vector containing the minimized criterion at each stage.

By default, a brief history of the merging process is printed, including
the values of the criterion being minimized.

kmeans(y, kmax:k1 [, kmin:k2], start:"random") is identical to kmeans(y,
kmax:k1 [, kmin:k2]).

kmeans(y, kmax:k1 [, kmin:k2], start:"optimal") attempts to select the
initial clusters so as to minimize the within-cluster sums of squares
for column 1 of y.

kmeans(y, Means [, kmin:k2], start:"means"), where Means is a k1 by
ncols(y) matrix, selects as initial cluster j those rows of y that are
closer to row j of Means than to any other row of Means using (Euclidean
distance).  If kmax:k1 is an argument with k1 != nrows(Means), a warning
message is given and nrows(Means) is used.  If there are duplicates
among the rows of Means, a warning message is printed.

kmeans(y, Classes [, kmin:k2], start:"classes"), where Classes is a
vector of nrows(y) positive integers <= 255, uses Classes to specify
initial clusters.  If kmax:k1 is an argument with k1 != max(Classes), a
warning message is given and max(Classes) is used.  If there are empty
classes (not all integers between 1 and max(Classes) are present), the
empty classes are "squeezed out", and max(Classes) reduced accordingly.

                          Additional keywords
      standard:F                 Do not standardize before clustering
      weights:wts                Use weighted means and sums of squares
                                 with wts a REAL vector of length
                                 nrows(y) with w[i] > 0.
      quiet:T                    Suppress printing of clustering
                                 history.

See also cluster().


Gary Oehlert 2003-01-15