SMILES Tutorial
Table of Contents
1. Introduction
2. Atoms
3. Properties of Atoms
4. Bonds
5. Branching
6. Rings
7. Aromaticity
8. Stereo Isomerism
9. Reactions
1. Introduction
SMILES...
...means Simplified Molecular Input Line Entry Specification
...is a Compact machine and human-readable chemical nomenclature:
- SMILES for ethane: CC
- Mol file representation for ethane:
SMI2MOL
2 1 0 0 0 0 0 0 0 0999 V2000
-0.5100 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5100 1.5300 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M END
...is Canonicalizable
...is Comprehensive
...is Well Documented
2. Atoms
Depiction |
SMILES |
Name |
Note |
 |
[Li] |
Lithium |
Square brackets ( [ ] ) are used to delimit individual atoms. |
 |
O |
Water |
Elements in the "organic subset" may be written without brackets
if the number of attached hydrogens conforms to the lowest normal valence
consistent with explicit bonds:
B(3), C(4), N(3,5), O(2), P(3,5), S(2,4,6), F(1), Cl(1), Br(1), I(1) |
 |
*F |
Unknown atom bonded to Fluorine |
* is wildcard (any atom). The wildcard atom may also be written
without brackets. |
 |
C |
Methane |
|
 |
[H] |
Hydrogen Atom |
Hydrogen is NOT part of the "organic subset" and therefore needs brackets. |
 |
[C] |
Elemental Carbon (Graphite) |
If atoms are used within brackets, all Hydrogens must be specified,
otherwise it is assumed that there are none. |
3. Properties of Atoms
 |
[Co+2] or [Co++] |
Cobalt(II) |
Brackets are required whenever atomic properties (including chirality)
are specified. Charge is specified by sign and numerical value or by the quantity
of signs. |
 |
[NH4+] |
Ammonium Ion |
Hydrogen count is an atomic property and may be specified by including
an H (and optionally an integer) after the Atomic Symbol. Hydrogens are not
normally considered atoms. These Hydrogens are often referred to as "implicit"
Hydrogens. |
 |
[H+] |
Proton |
Hydrogen IS considered an atom when it is charged or has a
specified mass (e.g. [2H], Deuterium). Hydrogen atoms are often referred
to as "explicit" Hydrogens. |
 |
[13C] |
Carbon-13 |
Atomic mass is specified by including an integer before the
Atomic Symbol. Default mass is "unspecified". |
 |
[2H]O[2H] |
Heavy Water |
|
4. Bonds
 |
C-C-O or CCO |
Ethanol |
Single bonds are denoted by a dash, `-'. Single bond symbols and aromatic bond symbols (`:') may be omitted. |
 |
O=C=O |
Carbon Dioxide |
Double bonds are denoted by an equals sign `='. |
 |
C#N |
Hydrogen Cyanide |
Triple bonds are denoted by a pound sign `#'. |
 |
[Na+].[Cl-] |
Sodium Chloride |
The "dot" is a "non-bond", "disconnect", or "zero-order" bond. |
 |
CCO.O |
Ethanol and Water |
Dot can be used to delineate mixtures. Each dot-disconnected SMILES is considered a "component" of the overall molecule or mixture. |
5. Branching
 |
C(F)(F)F or FC(F)F |
Fluoroform |
Branches may be stacked. |
 |
CCCC(C(=O)O)CCC |
4-Heptanoic Acid |
Branches may be nested. |
 |
CC(=O)O |
Acetic Acid |
Bonds may be specified within branches. |
6. Rings
 |
C1CCCCC1 |
Cyclohexane |
Ring closure is specified by breaking bonds and numerically labeling (with the same number) the atoms that were connected to each other. |
 |
C12CCCCC1CCCC2 or C1CC2CCCCC2CC1 |
Decalin |
Atoms can have more than one ring closure. |
 |
C1CCCCC1C1CCCCC1 |
Bicyclohexane |
Closure numbers may be reused. |
 |
C1=CCCCC1 or C=1CCCCC1 or C1CCCCC=1 or C=1CCCCC=1 |
Cyclohexene |
The default bond order for the ring closure is single (or aromatic) but may be specified by including a bond symbol between the atom and the closure number. |
7. Aromaticity
 |
c1ccccc1 or c:1:c:c:c:c:c1 or C1=CC=CC=C1 |
Benzene |
Aromatic atoms may be specified by using lower case letters.
Only the following atoms may be interpreted as aromatic:
C, N, P, O, S, As,
Se, and * (wildcard atom) |
 |
[CH-]1C=CC=C1 or [cH-]1cccc1 |
Cyclopentadienyl Anion |
Aromaticity detection is accomplished by using an extended
version of Hueckel's rule. To qualify as aromatic, the number
of available "excess" p-electrons in the ring (or ring system)
must equal 4N+2. Here, the extra electron allows carbon to contribute
2 p electrons. |
 |
n1ccccc1 or N1=CC=CC=C1 |
Pyridine |
Pyridine nitrogen (5 valence electrons) has an unbound pair of
electrons in an sp2 orbital and contributes 1 p electron |
 |
[nH]1cccc1 or N1C=CC=C1 |
1-H-Pyrrole |
Pyrrolyl nitrogen (5 valence electrons) contributes two p electrons. |
 |
o1cccc1 or O1C=CC=C1 |
Furan |
Oxygen (6 valence electrons) has an unbound pair of electrons in an sp2 orbital and contributes 2 p electrons. |
8. Stereo Isomerism
 |
C/C=C/C |
Trans-2-butene |
E and Z type isomerism can be specified using the `/' and `\' characters. Double bond orientation may be unspecified. (e.g. CC=CC ) |
 |
N[C@@H](C)C(=O)O |
L-alanine |
Tetrahedral chirality can be specified using the "visual mnemonic" `@' character (anticlockwise) or two `@' characters (clockwise).
Looking FROM the 1st neighbor listed in the SMILES TO the chiral atom, the other three neighbors appear anticlockwise or clockwise in the order listed. |
 |
N[CH](C)C(=O)O |
Alanine |
Tetrahedral orientation may be unspecified. |
9. Reactions
 |
C.O=O>O=[O+]-[O-]> O=C=O.O |
Combustion of methane in the presence of ozone (non-stoichiometric) |
Reactions are delineated by using the greater-than ('>') sign. The format is Reactants>Agents>Products. Atoms may be created or destroyed |
 |
[I-].[Na+].C=CCBr>> [Na+].[Br-].C=CCI |
Displacement Reaction |
Agents are optional. |
 |
CC(=[O:1])[OH:2]. CC[OH:3]>[H+]> CC(=[O:1])[O:3]CC. [OH2:2] |
Acid catalyzed esterification of acetic acid and ethanol |
Atom maps ([<atom>:<map class>]) allow specification of the correspondence between reactant and product atoms. Map class is an atomic property. |
 |
[CH2:1]=[CH:2][CH:3]= [CH:4][CH2:5][H:6]>> [H:6][CH2:1][CH:2]= [CH:3][CH:4]=[CH2:5] |
A 1,5-hydride shift. |
Atom-mapped hydrogens must be specified explicitly. |
More Information
Theory Manual
SMILES Examples
|