SMARTS,  An Introduction


  1. Introduction
  2. Properties of Atoms
  3. Bonds
  4. Logical Operators
  5. Recursive SMARTS
  6. Component-Level Grouping

  1. Introduction:   SMARTS...
  2. Properties of Atoms
  3. SMARTS Hits SMILES: Note

    [+2]  

    Atoms that have a plus two charge.   All SMILES atomic properties are valid in SMARTS; this includes charge, hydrogen count, isotopic specifications, bond symbols, and chirality specification.  

    [a]  

    Atoms that are aromatic   "a" is any aromatc atom.  

    [A]  

    Atoms that are aliphatic.   "A" is any aliphatic atom.  

    [#6]  

    Atoms that have an atomic number of 6 (c or C).   "#<number>" defines an atom that has an atomic number of <number>. Hits both aliphatic and aromatic atoms.  

    [R2]  

    Atoms that are in 2 SSSR (smallest set of smallest rings).   "R<number>"   defines an atom that is in <number> smallest rings. Default (R) is any ring atom.

    [r5]  

    Atoms that are in an SSSR ring that has 5 members.   "r<number>"   defines an atom that is an SSSR ring that has <number> rings. Default (r) is any ring atom.  

    [v4]  

    Atoms that have four bonds.   "v<number>"   defines an atom that has <number> bonds. Total bond order (= is 2 bonds, # is 3)  

    [X2]  

    Atoms that are connected to two other atoms.   X<number> defines an atom that is connected to <number> other atoms  

    [H1]

    Atoms that have one attached hydrogen. H<number> defines an atom that has <number> attached hydrogens.   

    [H]  

    Hydrogen Atoms   A hydrogen atom has special properties ([H+],[2H], [H][H] etc.)  

  4. Bonds
  5. CC  

    Molecules where an aliphatic carbon is SINGLE BONDED (implicitly) to another aliphatic carbon.   All SMILES bond properties are valid in SMARTS; this includes implicit single bonds, explicit single bonds (-), double bonds (=), tripple bonds (#), and aromatic bonds (:). WON'T match double bonds or tripple bonds (includes C=C and C#C ...).  

    [#6]~[#6]  

    Molecules where two carbons are connected by any bond (includes single bonds, double bonds, tripple bonds, and aromatic bonds).   "~" means any bond (wildcard bond).  

    [#6]@[#6]

    Molecules where two carbons are connected by a ring bond. "@" is a bond between two atoms that are within the same ring.  

    F/?[#6]=C/Cl  

    Molecules where (a carbon and a fluorine which are connected by a directional bond which is "up or unspecified") is connected to another carbon (which is connected by an "up" bond to a chlorine) (e.g. F/C=C/Cl and FC=C/Cl ). This excludes molecules where (a carbon and a fluorine which are connected by a down bond) is connected to another carbon (which is connected to a chlorine by an "up" bond)   "?" means "OR unspecified". "?" may also be used with chirality specification (@ and @@).  

  6. Logical Operators
  7. [!c]  

    Atoms that are NOT aromatic carbons.   "!" means "not".  

    [N,#8]  

    Atoms that are an aliphatic Nitrogen OR an Oxygen (aromatic or aliphatic)   "," means OR. OR is higher precedence than low precedence "and", but lower precedence than high precedence "and".  

    [N,#6&+1,+0]  

    Atoms that (are aliphatic Nitrogens) or (are positively charged carbons) or (are neutral)   "&" is "and" (high precedence).  

    [12*+0,H2] 

    Atoms that (have a mass of 12 AND are neutral) OR ( have 2 hydrogens).  All smiles atomic properties are valid in SMARTS. High precedence "and" is the default logical operator.  

    [N,#6;+0,+1]  

    Atoms that (are aliphatic Nitrogens or are carbons) AND (are neutral or positively charged)   ";" is "and" (low precedence).  

  8. Recursive SMARTS
  9. [$(*O);$(*CC)]

    Atoms that are in an environment where (the atom is connected to an aliphatic oxygen) and where (the atom is connected to two sequential aliphatic carbons) Any SMARTS expression may be used to define an atomic environment by writing a SMARTS starting with the atom of interest in this form: $(<SMARTS>)  

    [#6][$(aaO);$(aaaN)]  

    Molecules where a carbon is ortho to an O and meta to an N.    

    [$([CX3]=[OX1]),
    $([CX3+]-[OX1-])]  

    Atoms that are within molecules which contain a Carbonyl group (either resonance structure).    

  10. Component-Level Grouping
  11. [#8].[#8]  

    Molecules that contain two oxygens ( e.g. O=O, OCCO and O.CCO)   "." (dot) in SMARTS means "not necessarily connected".  

    ([#8].[#8])  

    Molecules that contain two oxygens that are within the same component ( e.g. O=O and OCCO but NOT O.CCO)   A single set of parentheses may surround any legal SMARTS expression. Here parenthesis indicate that the contents are within the same component of the target SMILES.  

    ([+].[-])  

    Zwitterions    

    ([#8]).([#8])  

    Molecules or mixtures that contain two oxygens that are within different components ( e.g. O.CCO but NOT O=O or OCCO)   Separate Component-Level Groupings may be specified. Here parenthesis indicate that the respective contents are within different components of the target SMILES.  

    ([+]).([-])  

    Salts