Using Descriptor Counts in Clustering

Overview

  • Use descriptor-count information to avoid comparing every possible pair of compounds during clustering.


  • Clustering approach is agglomerative.


  • First phase: form "small," tight clusters.
    • BOOST algorithm.
    • Completes in O(n) time.
    • Substantially reduces dataset size.

  • Second phase: cluster the reduced dataset.
    • DiET algorithm.
    • Complexity depends on dataset.
    • Merge small clusters and single compounds to complete the clustering.


| Prev | Contents | Next | Robin Hewitt (rhewitt@acm.org), Feb 2003