Learn More
User's Manual
Clustering Advantages
Poster
White Paper
Web Services Demo
Clustering Package
Cluster analysis for structures and reactions

The Daylight Clustering Package is a suite of programs providing four clustering algorithms each of which starts with a set of structures or reactions in SMILESTM and produces clustering results in formats suitable for further analysis. The Clustering Package can handle both very large sets (millions of structures) of compounds for applications like database analysis or vendor compound selection (Jarvis-Patrick, k-modes, and sphere exclusion) as well as smaller sets for applications such as determining structure-activity relationships within lead optimization projects or document analysis (scaffold-directed). In the latter case, compounds can be clustered in a way that guarantees that each resulting cluster can be represented by a common structural scaffold.

Highlights
  • The modular nature of the package ensures that users have control over the four stages of the clustering process - descriptor assignment, similarity measure, clustering algorithm, and post-clustering analysis and storage.

  • Those clustering algorithms offered for use with large datasets are fast and scale well. This allows users to explore the effect of changing descriptors and similarity measure on the clustering outcome.

  • The scaffold-directed algorithm offered for small datasets guarantees that each cluster has a common substructure that meets a minimum coverage requirement.

  • The programs in the clustering package use dynamic memory allocation - computable problem size is limited only by available virtual memory and CPU speed so large datasets can be clustered.

  • Most of the clustering algorithms offered allow for dynamic updating of the clustering by the addition of further compounds to, say, a corporate collection.

  • The analysis program output can be tailored for easy storage of the results in Thor/Merlin or in DayCart®.

  • Prototype compounds from each cluster can easily be chosen and combined to make up a diverse representative set from the whole collection.

  • Unusual ("outlier") compounds that form small or single member clusters can easily be identified.

  • As compounds are assigned to a uniquely named class, sets of similar compounds can be rapidly retrieved using standard database or spreadsheet tools.