cluster(x [, nclust:n, standard:F, method:name, keep:charVec, print:T,\ tree:T or F, classes:T or F, reorder:T]), x a REAL matrix, name a character scalar (one of "single", "complete", "average", "ward", "mcquitty", "centroid", or "median"), charVec a CHARACTER vector with elements "all", "classes", "criterion", or "distances" cluster(dissim:d [, ...]), d a square REAL matrix cluster(similar:s [, ...]), s a square REAL matrix |
cluster(x) performs a hierarchical cluster analysis of cases (rows) of the data matrix x. The default method is average linkage and the default maximum number of clusters described in the output is 9. It produces a table of cluster membership with one line per case and a dendrogram, with the join points labeled with the value of the criterion used. There must be at least 2 rows in x. Distances between cases in x are computed as squared Euclidean distance after standardization by dividing by standard deviations. Standardization can be suppressed by including 'standard:F' as an argument. NOTE: This is a change in behavior of cluster() from version 3.1 to version 3.3. cluster(dissim:d) uses the upper triangle of the square matrix d as dissimilarity or distance measure. Matrix d must have at least 2 rows and is treated algorithmically as if it were unsquared Euclidean distance. cluster(similar:s) uses the upper triangle of the square matrix s as a similarity matrix. Matrix sqrt(2*(max(vector(s))-s)) is used as a distance matrix. Matrix s must have at least 2 rows. Other Keywords Keyword phrase Default Meaning method:Name "average" The clustering method used. Legal values are "ward", "single", "complete", "average", "mcquitty", "median" and "centroid". Name must be a quoted string or CHARACTER variable. nclust:m 9 The number (>= 2) of clusters to be described in the output. When m > 25, the class membership table requires more than 80 columns for printing, and if m > 22 the dendrogram requires more than 80. m > 50 is illegal when either the class membership table or the dendrogram is to be printed. standard:F T suppresses the standardization of the data matrix to unit standard deviations before computing distances. Not legal with 'dissim' or 'similar'. distance:Dname "euclid" Specifies the distance measure used to label the dendrogram. Legal values are "euclid" and "euclidsq". It has no effect on the clustering produced. Dname must be a CHARACTER variable or quoted string. Not legal with keywords 'dissim' or 'similar'. keep:charVec none Specifies which, if any, results should be returned as the value of cluster(). charVec must be a quoted string or a CHARACTER vector or scalar. Legal values for elements of charVec are "distances" (the computed distances are returned), "classes" (the computed n by nclust-1 class membership matrix is returned), "crit" (the criterion values at each of the final nclust - 1 merges are saved), and "all" (all three are returned). When only one item is to be returned, it is returned as a matrix or vector. Otherwise, items are components in a structure with names 'distances', 'classes', and 'criterion'. The use of 'keep' suppresses printing the table of class membership and the dendrogram, unless print:T, tree:T, or classes:T are arguments. When 'keep' is not used, cluster() has a NULL value. print:T Forces printing output, even when 'keep' is used. Default is F when 'keep' is used; otherwise the default is T. tree:T Forces printing of dendrogram. tree:F Suppresses printing of dendrogram. Default is F when 'keep' is used; otherwise T. Must come later than 'keep' in argument list. classes:T Forces printing of table of class membership. classes:F Suppresses printing of table of class membership. Default is F when 'keep' is used; otherwise T. Must come later than 'keep' in argument list. reorder:T F Directs that the rows of the printed table of class membership be reordered so that cases in the same clusters are adjacent. It does not affect the returned value if keep:"classes" appears. The reordering is the same as that implied in the dendrogram. A warning message is printed if you use reorder:T together with classes:F. Example: Cmd> results <- cluster(x,nclust:15,keep:vector("classes","crit"),\ method:"median",classes:T, reorder:T) computes the last 15 stages of clustering, using the so called median method, returns the class membership table and the criterion in a structure, and prints the reordered class membership table.