SMARTS can be thought of as an extension of this mechanism to represent variabilities and choices in the node and edge properties in this graph, i.e. a way to specify variable choices for any of the atoms or bonds in the molecule. Using SMARTS one can specify a pattern to search for.
There are 4 parts to understanding and creating a SMARTS expression:
X3 Y1 Z1
| | /
| | /
| | /
X2----C----C----N
| | \
| | \
| | \
X1 Y2 Z2
The SMARTS CC[O,N] will match ethanol or ethylamine. If we examine how, we see
anything in square braces
is a description for 1 atom only
__|__
| |
CH2----CH2----[OH,NH2]
| ||
| ||
Oxygen ||
OR|
Nitrogen
Expressions within [ and ] are therefore atom SMARTS expressions.
Atom SMARTS may appear wherever an atom may occur in a SMILES.
This SMARTS CC[O,N] is read as
Aliphatic Carbon singly bonded to
an aliphatic Carbon singly bonded to
an atom that is (a
Nitrogen OR an Oxygen)
',' is a logical representing OR
Here's the complete list of logicals in their order of precedence in
SMARTS
Symbol | Expression | Meaning |
---|---|---|
! | !N | NOT Nitrogen |
& | N&a | Nitrogen AND aromatic (high precedence) |
, | N,a | Nitrogen OR aromatic |
; | N;a | Nitrogen AND aromatic (low precedence) |
'a' signifies an aromatic atom in SMARTS
Using this information let us analyse a few atom SMARTS expressions
an atom
_|__
| |
[!N&a]
: an atom that is ( (NOT
a Nitrogen)AND
is aromatic)
||||
i.e. an aromatic atom that is not a Nitrogen.
NOT|||
|||
Nitrogen||
||
AND|
|
is aromatic
[N,C&a] : an atom that is ( a Nitrogen
OR a ( Carbon AND
is aromatic ))
i.e. a Nitrogen or an aromatic Carbon
The number of Hydrogens attached to an atom is expressed in SMARTS as
H<n> where <n> is any number.
<n> is optional and is by default 1 when not stated.
Therefore
[H1] represents any
atom with 1 attached hydrogen
Using this construct let us try figuring out the atomic SMARTS below:
[NH1] = [N&H1]
= atom that ( is a Nitrogen AND has 1 attached Hydrogen )
e.g. a secondary amine
[nH1] = [n&H1]
= atom that (is an aromatic Nitrogen AND has 1 hydrogen)
= a pyrrole Nitrogen
[C,n&H1] = atom that (
is a Carbon OR
is an aromatic Nitrogen AND
has 1 attached Hydrogen ) (prior to applying logicals)
= atom that (
is a Carbon OR
(is an aromatic Nitrogen AND
has 1 attached Hydrogen) )
= atom that (is a Carbon OR is a pyrrole Nitrogen )
[C,n;H1] = atom that (
is a Carbon OR
is an aromatic Nitrogen
AND has 1 attached Hydrogen ) (prior to applying logicals)
= atom that (
(is a Carbon OR
is an aromatic Nitrogen) AND
has 1 attached Hydrogen )
= atom that ( is a ternary Carbon OR is a pyrrole Nitrogen )
Here's the complete list of atom property symbols in SMARTS
Symbol | Symbol name | Atomic property requirements | Default |
---|---|---|---|
* | wildcard | any atom | (no default) |
a | aromatic | aromatic | (no default) |
A | aliphatic | aliphatic | (no default) |
D<n> | degree | <n> explicit connections | (no default) |
H<n> | total-H-count | <n> attached hydrogens | exactly one |
h<n> | implicit-H-count | <n> implicit hydrogens | exactly one |
R<n> | ring membership | in <n> SSSR rings | any ring atom |
r<n> | ring size | in smallest SSSR ring of size <n> | any ring atom |
v<n> | valence | total bond order <n> | (no default) |
X<n> | connectivity | <n> total connections | (no default) |
- <n> | negative charge | -<n> charge | -1 charge (-- is -2, etc) |
+<n> | positive charge | +<n> formal charge | +1 charge (++ is +2, etc) |
#n | atomic number | atomic number <n> | (no default) |
@ | chirality | anticlockwise | anticlockwise, default class |
@@ | chirality | clockwise | clockwise, default class |
@<c><n> | chirality | chiral class <c> chirality <n> | (nodefault) |
@<c><n>? | chiral or unspec | chirality <c><n> orunspecified | (no default) |
<n> | atomic mass | explicit atomic mass | unspecified mass |
[#6] = a carbon atom
[Ca] = a calcium atom
[++] = any atom with a +2 charge
[CH2] = atom that is (an aliphatic
carbon and has two hydrogens)
= ( a methylene carbon)
[35*] = any atom of mass 35
[F,Cl,Br,I] = the 1st four halogens.
[!C;R] = atom that is (( NOT aliphatic
carbon ) AND is in a ring)
cc = c:c
= any pair of attached aromatic carbons
':' is the symbol for an aromatic bond
CC = C-C
= any pair of attached aliphatic carbons
'-' is the symbol for a single bond
c-c = 2 aromatic Carbons joined by a non-aromatic
single bond
e.g. as in biphenyl
Bonds can be variable as in atom SMARTS along with logicals e.g.
C-,=,#N = a Carbon bonded via a single
or double or triple bond to
a Nitrogen
C~N = a Carbon bonded via any bond to
a Nitrogen
~ is the symbol for a wildcard bond
C@N = a Carbon bonded via a ring bond
to a Nitrogen
@ is the symbol for any ring bond
C/?N=C/O = C bonded
via trans or unspecified chirality to a N double-bonded to a
C singly bonded to an oxygen
Here's a list of SMARTS primitives for bonds:
Symbol | Atomic property requirements |
---|---|
- | single bond (aliphatic) |
/ | directional single bond "up" |
\ | directional single bond "down" |
/? | directional bond "up or unspecified" |
\? | directional bond "down or unspecified" |
= | double bond |
# | triple bond |
: | aromatic bond |
~ | any bond (wildcard) |
@ | any ring bond |
Hence one might represent a C ortho to a N via the SMARTS
CaaN = Carbon singly bonded to an aromatic
atom bonded via an aromatic
bond to an aromatic atom singly bonded to a Nitrogen.
CaaaN = C meta to an N
CaaaaN = C para to N
an atom that is
_______|________
|
|
[$(CaaN),$(CaaaN)]
|_____|||______|
| | |
| | |
| OR |
| |
| a Carbon meta to Nitrogen
|
a Carbon ortho to
a Nitrogen
This is a recursive SMARTS, where in place of an atom property
in an atom SMARTS one can use any logical SMARTS itself, with the rule
that
a recursive SMARTS must always appear within the symbols '$(' and ')'.
The above SMARTS would read
atom that is ( ( a Carbon ortho to a N) OR ( A Carbon meta to a N))
A few examples follow:
Caa(O)aN = Carbon ortho to O and meta to N (but in a single path i.e. 2O,3N only)
Ca(aO)aaN = Carbon ortho to O and meta to N (but in differing paths i.e. 2O,5N only)
C[$(aaO);$(aaaN)] = C ortho to O
and C meta to N (all cases)
A SMILES like
C.C would be 2 methanes
However its SMARTS meaning is a pattern that matches
2 aliphatic Carbons which may be in the same molecule or in different molecules, therefore matching ethane(CH3CH3) or propane(CH3CH2CH3)
To restrict the Carbons to be in a single molecule, one can use the parentheses operators to group dot-disconnected fragments e.g.
(CC.C) would not match ethane, but would match propane or butane
(C).(C) would not match ethane propane or butane alone, since the 2
C's must come from differing components.
(C).(C) would match the SMILES CCCC.COC (i.e. CH3CH2CH2CH3
+ CH3OCH3)
(CC).C would match butane and propane since there is no restriction on the second Carbon
In SMARTS one might represent and (thereby match reactions representing) general esterifications as
C(=O)O.OCC>>C(=O)OCC.O an acid and an alcohol in the reactant go to an ester and water in the product.
O--H
Y2 Z1
|
| |
|
| |
X2---C===O
+ Y1--C--C--O--H
|
| |
|
| |
X1
Y3 Z2
To convert this to represent and match intermolecular esterifications
only
in a separate mol
in 1 mol. |
__|___
_|_
|
| | |
(C(=O)O).(OCC)>>C(=O)OCC.O
both acid and alcohol come from different
|
| | | molecules
------------
--------
|
|
in the reactant
|
in the product
To represent and match intramolecular esterifications only:
in 1 mol.
__|_______
|
|
(C(=O)O.OCC)>>C(=O)OCC.O
both acid and alcohol come from the same
|
| | | molecules
-----------
--------
|
|
in the reactant
|
in the product
With the current state of SMARTS editors in order to use SMARTS powerfully a chemist needs to either
Examples:
An aliphatic polar atom such as Oxygen or Nitrogen or Sulfur having at least 1 hydrogen and not in a ring
= (aliphatic O or N or S) AND (having at least 1 Hydrogen) AND (NOT in a Ring)
an atom that is
______|_____
|
|
[O,N,S;!H0;R0]
| | ||| ||_____not
in a ring
OR| |||_____________with
0 Hydrogens
| ||__________NOT
OR|
|
AND (Lower precedence than OR)
=
an atom that is
_______________________________|_____________________________
|
|
[F,Cl,Br,I;[$(*[$(c1c(O)cccc1),$(c1cc(O)ccc1),$(c1ccc(O)cc1))]]
|
||
------- |
| AND (LOWER ORDER THAN OR)
a F OR Cl OR Br OR I
an atom that is
________________________|________________________
|
|
[$(*[$(c1c(O)cccc1),$(c1cc(O)ccc1),$(c1ccc(O)cc1))]
| --------------|
-------------| -------------
|
| | |
OR |
any atom
| OR meta in a phenol
para in a phenol
connected |
ortho in a phenol
This can be further simplified to
[F,Cl,Br,I;[$(*c1[$(c(O)cccc1),$(cc(O)ccc1),$(ccc(O)cc1)])]
Now why can't the above be further simplified to:
[F,Cl,Br,I;[$(*c1[$(c(O)cc),$(cc(O)c),$(ccc(O))]cc1)]
Think and let me know your answer later.