cluster(x [, nclust:n, standard:F, method:name, keep:charVec, print:T,\ tree:T or F, classes:T or F, reorder:T]), x a REAL matrix, name a character scalar (one of "single", "complete", "average", "ward", "mcquitty", "centroid", or "median"), charVec a CHARACTER vector with elements "all", "classes", "criterion", or "distances" cluster(dissim:d [, ...]), d a square REAL matrix cluster(similar:s [, ...]), s a square REAL matrix |
cluster(x) performs a hierarchical cluster analysis of cases (rows) of
the data matrix x. The default method is average linkage and the
default maximum number of clusters described in the output is 9. It
produces a table of cluster membership with one line per case and a
dendrogram, with the join points labeled with the value of the criterion
used. There must be at least 2 rows in x.
Distances between cases in x are computed as squared Euclidean distance
after standardization by dividing by standard deviations.
Standardization can be suppressed by including 'standard:F' as an
argument. NOTE: This is a change in behavior of cluster() from version
3.1 to version 3.3.
cluster(dissim:d) uses the upper triangle of the square matrix d as
dissimilarity or distance measure. Matrix d must have at least 2 rows
and is treated algorithmically as if it were unsquared Euclidean
distance.
cluster(similar:s) uses the upper triangle of the square matrix s as a
similarity matrix. Matrix sqrt(2*(max(vector(s))-s)) is used as a
distance matrix. Matrix s must have at least 2 rows.
Other Keywords
Keyword phrase Default Meaning
method:Name "average" The clustering method used. Legal values are
"ward", "single", "complete", "average",
"mcquitty", "median" and "centroid".
Name must be a quoted string or CHARACTER
variable.
nclust:m 9 The number (>= 2) of clusters to be
described in the output. When m > 25, the
class membership table requires more than 80
columns for printing, and if m > 22 the
dendrogram requires more than 80. m > 50 is
illegal when either the class membership table
or the dendrogram is to be printed.
standard:F T suppresses the standardization of the data
matrix to unit standard deviations before
computing distances. Not legal with 'dissim'
or 'similar'.
distance:Dname "euclid" Specifies the distance measure used to label
the dendrogram. Legal values are "euclid" and
"euclidsq". It has no effect on the clustering
produced. Dname must be a CHARACTER variable
or quoted string. Not legal with keywords
'dissim' or 'similar'.
keep:charVec none Specifies which, if any, results should be
returned as the value of cluster(). charVec
must be a quoted string or a CHARACTER vector
or scalar. Legal values for elements of
charVec are "distances" (the computed distances
are returned), "classes" (the computed n by
nclust-1 class membership matrix is returned),
"crit" (the criterion values at each of the
final nclust - 1 merges are saved), and "all"
(all three are returned). When only one item
is to be returned, it is returned as a matrix
or vector. Otherwise, items are components in
a structure with names 'distances', 'classes',
and 'criterion'. The use of 'keep' suppresses
printing the table of class membership and the
dendrogram, unless print:T, tree:T, or
classes:T are arguments. When 'keep' is not
used, cluster() has a NULL value.
print:T Forces printing output, even when 'keep' is
used. Default is F when 'keep' is used;
otherwise the default is T.
tree:T Forces printing of dendrogram.
tree:F Suppresses printing of dendrogram.
Default is F when 'keep' is used; otherwise T.
Must come later than 'keep' in argument list.
classes:T Forces printing of table of class membership.
classes:F Suppresses printing of table of class
membership. Default is F when 'keep' is used;
otherwise T. Must come later than 'keep' in
argument list.
reorder:T F Directs that the rows of the printed table of
class membership be reordered so that cases in
the same clusters are adjacent. It does not
affect the returned value if keep:"classes"
appears. The reordering is the same as that
implied in the dendrogram. A warning message
is printed if you use reorder:T together with
classes:F.
Example:
Cmd> results <- cluster(x,nclust:15,keep:vector("classes","crit"),\
method:"median",classes:T, reorder:T)
computes the last 15 stages of clustering, using the so called median
method, returns the class membership table and the criterion in a
structure, and prints the reordered class membership table.