SMARTS Tutorial
Table of Contents
1. Introduction
2. Properties of Atoms
3. Bonds
4. Logical Operators
5. Recursive SMARTS
6. Component-Level Grouping
7. Reaction SMARTS
1. Introduction:
SMARTS...
...means SMiles ARbitrary Target Specification
...is a language used for describing molecular patterns and properties
...rules are straightforward extensions of SMILES
- All SMILES symbols and properties are legal in SMARTS.
- SMARTS includes logical operators and additional molecular descriptors
...can describe structural patterns with varying degrees of specificity
and generality:
- SMILES for methane: C or [CH4]
- High specificity SMARTS describing a pattern consistent with methane:
[CH4]
Only matches aliphatic carbon atoms that have 4 hydrogens.
Won't match ethane, ethene, or cyclopentane.
- Low specificity SMARTS describing a pattern consistent with methane:
C
Matches aliphatic carbon atoms that have any number of hydrogens.
Will match ethane, ethene, and cyclopentane.
2. Properties of Atoms
SMARTS |
Hits SMILES: |
Note |
[+1] |
Atoms that have a plus one charge
|
All SMILES atomic properties are valid in SMARTS;
this includes charge, hydrogen count, isotopic specifications,
bond symbols, and chirality specification. + is +1, ++ is +2,
etc. |
[a] |
Atoms that are aromatic
|
"a" is any aromatic atom. |
[A] |
Atoms that are aliphatic
|
"A" is any aliphatic atom. |
[#6] |
Atoms that have an atomic number of 6 (c or C)
|
"#<number>" defines an atom that has an atomic
number of <number>. Hits both aliphatic and aromatic atoms. |
[R2] |
Atoms that are in 2 rings
|
"R<number>" defines an atom that is in
<number> rings. Default (R) is any ring atom. |
[r5] |
Atoms that are in a ring that has 5 members
|
"r<number>" defines an atom that is in a
ring that has <number> members. Default (r) is any ring atom. |
[v4] |
Atoms that are four-valent
|
"v<number>" defines an atom that has
<number> bonds. Total bond order (= is 2 bonds, # is 3) |
[X2] |
Atoms that are connected to two other atoms
|
"X<number>" defines an atom that is connected to
<number> other atoms (including all hydrogens) |
[H] |
Hydrogen Atoms
|
A hydrogen atom (often called an "explicit hydrogen")
has special properties ([H+],[2H], [H][H] etc). [H+] and [2H] behave
similarly. |
[H1] |
Atoms that have one attached hydrogen.
|
" H<number>" defines an atom that has <number>
attached hydrogens ("implicit" or "explicit", i.e. H property or
H atom ). Default, [*H], is 1 for a non-hydrogen atom. |
* |
Any Atom
|
In SMARTS, the wildcard atom ,"*", matches all atoms.
It won't hit hydrogens which are merely properties of heavy atoms. |
3. Bonds
SMARTS |
Hits SMILES: |
Note |
CC |
Molecules where an aliphatic carbon is SINGLE BONDED to another
aliphatic carbon
|
All SMILES bond properties are valid in SMARTS; this
includes implicit single bonds, explicit single bonds (-), double
bonds (=), triple bonds (#), and aromatic bonds (:). WON'T match
double bonds or triple bonds (includes C=C and C#C ...). |
[#6]~[#6] |
Molecules where two carbons are connected by any bond (includes single
bonds, double bonds, triple bonds, and aromatic bonds)
|
"~" means any bond (wildcard bond). |
[#6]@[#6] |
Molecules where two carbons are connected by a ring bond
|
"@" is a bond between two atoms that are within the same
ring. |
F/?[#6]=C/Cl |
Molecules where a carbon (which is connected to a fluorine by a
directional "up or unspecified" bond) is connected to another
carbon (which is connected by an "up" bond to a chlorine) (e.g.
F/C=C/Cl and FC=C/Cl ). This excludes molecules where a carbon
(which is connected to a fluorine by a "down" bond) is connected
to another carbon (which is connected to a chlorine by an "up" bond)
|
"?" means "OR unspecified". "?" may also be used with
chirality specification (@ and @@). |
4. Logical Operators
SMARTS |
Hits SMILES: |
Note |
[!c] |
Atoms that are NOT aromatic carbons
|
"!" means "not". |
[N,#8] |
Atoms that are an aliphatic Nitrogen OR an Oxygen (aromatic
or aliphatic)
|
"," means OR. OR is higher precedence than low precedence
"and"(;), but lower precedence than high precedence "and" (&). |
[#7,C&+O,+1] or [#7,C+O,+1] |
Atoms that (are Nitrogens) or (are neutral aliphatic Carbons)
or (are positively charged)
|
"&" is "and" (high precedence). High
precedence "and" is the default logical operator and may be omitted. |
[#7,C;+0,+1] |
Atoms that (are Nitrogens or are aliphatic Carbons) and (are
neutral or positively charged)
|
";" is "and" (low precedence). |
5. Recursive SMARTS
SMARTS |
Hits SMILES: |
Note |
[$(*O);$(*CC)] |
Atoms that are in an environment where (the atom is connected to an
aliphatic oxygen) and where (the atom is connected to two sequential
aliphatic carbons)
|
Any SMARTS expression may be used to define an atomic
environment by writing a SMARTS starting with the atom of interest
in this form: $(<SMARTS>) |
[$([CX3]=[OX1]), $([CX3+]-[OX1-])] |
Atoms that are within molecules which contain a Carbonyl group
(either resonance structure)
|
|
[$([#6]aaO);$([#6]aaaN)] |
Aliphatic carbon that is ortho to an O and meta to an N
|
|
6. Component-Level Grouping
SMARTS |
Hits SMILES: |
Note |
[#8].[#8] |
Molecules that contain two oxygens ( e.g. O=O, OCCO and O.CCO)
|
"." (dot) in SMARTS means "not necessarily connected". |
([#8].[#8]) |
Molecules that contain two oxygens that are within the same
component ( e.g. O=O and OCCO but NOT O.CCO)
|
A single set of parentheses may surround any legal SMARTS
expression. Here parenthesis indicate that the contents are within
the same component of the target SMILES. |
([#8]).([#8]) |
Molecules or mixtures that contain two oxygens that are within
different components ( e.g. O.CCO but NOT O=O or OCCO)
|
Separate Component-Level Groupings may be specified. Here
parenthesis indicate that the respective contents are within different
components of the target SMILES. |
7. Reaction SMARTS
SMARTS |
Hits SMILES: |
Note |
[#6]=,:[#6] |
Carbons connected by a (double or aromatic) bond.
|
Molecule SMARTS (SMARTS without ">" characters) can match anywhere in a Reaction SMILES target (reactant, agent, or product). |
>>[#6]=,:[#6] |
Product Carbons connected by a (double or aromatic) bond.
|
Reaction SMARTS (SMARTS with ">" characters) never match molecule targets. |
[C:1]>>[C:1] |
Mapped reacting carbons.
|
Mapped SMARTS atomic queries never match unmapped
target atoms. Mapped SMARTS reaction queries never
hit unmapped reaction targets. |
[C:1]>>C |
Reacting carbons.
|
Unpaired maps in the query are ignored. |
[C:1][C:1]>>[C:1] |
Multiple mapped reacting carbons.
|
SMARTS map classes inter-relate reactants to products but don't intra-relate reactants
or products. (Although query reactants have the same class, they
can match target reactants of different classes.) |
More Information
Theory Manual
SMARTS Examples
SMARTS Practice
|