Next: cmplx() Up: MacAnova Help File Previous: clipwritedat() Contents

cluster()

Usage:

cluster(x [, nclust:n, standard:F, method:name, keep:charVec, print:T,\
  tree:T or F, classes:T or F, reorder:T]), x a REAL matrix, name a
  character scalar (one of "single", "complete", "average", "ward",
  "mcquitty", "centroid", or "median"), charVec a CHARACTER vector
  with elements "all", "classes", "criterion", or "distances"
cluster(dissim:d [, ...]), d a square REAL matrix
cluster(similar:s [, ...]), s a square REAL matrix

Keywords: multivariate analysis

cluster(x) performs a hierarchical cluster analysis of cases (rows) of
the data matrix x.  The default method is average linkage and the
default maximum number of clusters described in the output is 9.  It
produces a table of cluster membership with one line per case and a
dendrogram, with the join points labeled with the value of the criterion
used.  There must be at least 2 rows in x.

Distances between cases in x are computed as squared Euclidean distance
after standardization by dividing by standard deviations.
Standardization can be suppressed by including 'standard:F' as an
argument.  NOTE: This is a change in behavior of cluster() from version
3.1 to version 3.3.

cluster(dissim:d) uses the upper triangle of the square matrix d as
dissimilarity or distance measure.  Matrix d must have at least 2 rows
and is treated algorithmically as if it were unsquared Euclidean
distance.

cluster(similar:s) uses the upper triangle of the square matrix s as a
similarity matrix.  Matrix sqrt(2*(max(vector(s))-s)) is used as a
distance matrix.  Matrix s must have at least 2 rows.

                          Other Keywords
Keyword phrase  Default  Meaning
  method:Name "average"  The clustering method used.  Legal values are
                         "ward", "single", "complete", "average",
                         "mcquitty", "median" and "centroid".
                         Name must be a quoted string or CHARACTER
                         variable.

  nclust:m     9         The number (>= 2) of clusters to be
                         described in the output.  When m > 25, the
                         class membership table requires more than 80
                         columns for printing, and if m > 22 the
                         dendrogram requires more than 80.  m > 50 is
                         illegal when either the class membership table
                         or the dendrogram is to be printed.

  standard:F     T       suppresses the standardization of the data
                         matrix to unit standard deviations before
                         computing distances.  Not legal with 'dissim'
                         or 'similar'.

  distance:Dname "euclid" Specifies the distance measure used to label
                         the dendrogram.  Legal values are "euclid" and
                         "euclidsq". It has no effect on the clustering
                         produced.  Dname must be a CHARACTER variable
                         or quoted string.  Not legal with keywords
                         'dissim' or 'similar'.

  keep:charVec  none     Specifies which, if any, results should be
                         returned as the value of cluster().  charVec
                         must be a quoted string or a CHARACTER vector
                         or scalar.  Legal values for elements of
                         charVec are "distances" (the computed distances
                         are returned), "classes" (the computed n by
                         nclust-1 class membership matrix is returned),
                         "crit" (the criterion values at each of the
                         final nclust - 1 merges are saved), and "all"
                         (all three are returned).  When only one item
                         is to be returned, it is returned as a matrix
                         or vector.  Otherwise, items are components in
                         a structure with names 'distances', 'classes',
                         and 'criterion'.  The use of 'keep' suppresses
                         printing the table of class membership and the
                         dendrogram, unless print:T, tree:T, or
                         classes:T are arguments.  When 'keep' is not
                         used, cluster() has a NULL value.

  print:T                Forces printing output, even when 'keep' is
                         used.  Default is F when 'keep' is used;
                         otherwise the default is T.

  tree:T                 Forces printing of dendrogram.
  tree:F                 Suppresses printing of dendrogram.
                         Default is F when 'keep' is used; otherwise T.
                         Must come later than 'keep' in argument list.

  classes:T              Forces printing of table of class membership.
  classes:F              Suppresses printing of table of class
                         membership.  Default is F when 'keep' is used;
                         otherwise T.  Must come later than 'keep' in
                         argument list.

  reorder:T     F        Directs that the rows of the printed table of
                         class membership be reordered so that cases in
                         the same clusters are adjacent.  It does not
                         affect the returned value if keep:"classes"
                         appears.  The reordering is the same as that
                         implied in the dendrogram.  A warning message
                         is printed if you use reorder:T together with
                         classes:F.

Example:
  Cmd> results <- cluster(x,nclust:15,keep:vector("classes","crit"),\
        method:"median",classes:T, reorder:T)
computes the last 15 stages of clustering, using the so called median
method, returns the class membership table and the criterion in a
structure, and prints the reordered class membership table.

Gary Oehlert 2003-01-15