Using Descriptor Counts in Clustering

BOOST (Bettering-Our-Odds Sort Tree)

Building the Tree:

Theory: by holding the most variable bits constant,
  • We should get a fairly well balanced set of child nodes.


  • If a node has too many, we build the tree another level at that node.


  • In each leaf node, we've raised the probability that an arbitrary pair of compounds will be highly similar because we've maximally increased the probability for bits to match between compounds.

If we make our leaf nodes fairly small, say 10-100 compounds each, we can quickly do all-by-all comparisons within each one and pull out the groups of very similar compounds we find there.


| Prev | Contents | Next | Robin Hewitt (rhewitt@acm.org), Feb 2003