Machine Learning for Mobile
上QQ阅读APP看书,第一时间看更新

Hierarchical agglomerative clustering methods

Agglomerative hierarchical clustering is a classical clustering algorithm from the statistics domain. It involves iterative merging of the two most similar groups, which, in the first instance, contain single elements. The name of the algorithm refers to its way of working, as it creates hierarchical results in an agglomerative or bottom-up way, that is, by merging smaller groups into larger ones.

Here is the high-level algorithm for this method of clustering used in document clustering.

  1. Generic agglomerative process (Salton, G: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989) result in nested clusters via iterations.
  2. Compute all pairwise document-document similarity coefficients
  3. Place each of the n documents into a class of its own
  4. Merge the two most similar clusters into one:
    • Replace the two clusters with the new cluster
    • Recompute inter-cluster similarity scores with regard to the new cluster
    • If the cluster radius is greater than maxsize, block further merging
  5. Repeat the preceding step until there are only k clusters left (note: k could equal 1)