Learn More
Programmer's Guide
SMARTS® Toolkit
Supporting substructure recognition and the SMARTS® language

The SMARTS® language, an extension of SMILESTM, is a powerful, flexible, and compact representation of structure and reaction queries or patterns. Structure and reaction queries represent subsets of chemical structure or reaction spaces, respectively. Given a structure or reaction and a query in the form of a substructure pattern, one can determine if the molecule or reaction belongs to the subset of chemistry or reaction space represented by the query. This is the widely used substructure search or pattern matching operation.

Alternatively, given a set of structures or reactions, one can determine a query or pattern that most narrowly defines the subset of chemistry or reaction space for which the all of the members of the given set are members. This is an abstraction operation that is useful for representing sets of structures or reactions based on a shared scaffold or transformation.

The SMARTS® Toolkit is a programming library that provides functions needed to search molecules and reactions for substructural patterns and to generate substructural patterns from sets on molecules. Patterns can be simple connections of atoms or sophisticated relationships based on complex atomic environments. Functions are provided with the toolkit to find any match quickly, enumerate all possible isomorphic matches, and to enumerate only isomorphs representing unique sets of atoms. A function is also provided to generate a SMARTS® abstraction from a set of molecules.

Objects supported by this Toolkit include:
  • Path - object representing results of a structural search
  • Pathset - object representing a set of path objects
  • Pattern - object describing a structural pattern
  • Vbind - object providing faster evaluation of a match
This Toolkit can be used to:
  • SMARTS® subgraph pattern matching
  • Extend expressive power of SMARTS® with vector bindings and arbitrary pathsets
  • Generate a SMARTS® scaffold for a set of input molecules