SMILES Tutorial

Table of Contents

1. Introduction
2. Atoms
3. Properties of Atoms
4. Bonds
5. Branching
6. Rings
7. Aromaticity
8. Stereo Isomerism
9. Reactions

1. Introduction

SMILES...

...means Simplified Molecular Input Line Entry Specification
...is a Compact machine and human-readable chemical nomenclature:
     - SMILES for ethane:     CC
     - Mol file representation for ethane:
  
             SMI2MOL 

             2  1  0  0  0  0  0  0  0  0999 V2000
              -0.5100    1.5300    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
               0.5100    1.5300    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
             1  2  1  0  0  0  0
           M  END 
         
...is Canonicalizable
...is Comprehensive
...is Well Documented

2. Atoms

Depiction SMILES Name Note
[Li] Lithium Square brackets ( [ ] ) are used to delimit individual atoms.
O Water Elements in the "organic subset" may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds:
B(3), C(4), N(3,5), O(2), P(3,5), S(2,4,6), F(1), Cl(1), Br(1), I(1)
*F Unknown atom bonded
to Fluorine
* is wildcard (any atom). The wildcard atom may also be written without brackets.
C Methane  
[H] Hydrogen Atom Hydrogen is NOT part of the "organic subset" and therefore needs brackets.
[C] Elemental Carbon
(Graphite)
If atoms are used within brackets, all Hydrogens must be specified, otherwise it is assumed that there are none.

3. Properties of Atoms

[Co+2]
or
[Co++]
Cobalt(II) Brackets are required whenever atomic properties (including chirality) are specified. Charge is specified by sign and numerical value or by the quantity of signs.  
[NH4+] Ammonium Ion Hydrogen count is an atomic property and may be specified by including an H (and optionally an integer) after the Atomic Symbol. Hydrogens are not normally considered atoms. These Hydrogens are often referred to as "implicit" Hydrogens.  
[H+] Proton Hydrogen IS considered an atom when it is charged or has a specified mass (e.g. [2H], Deuterium). Hydrogen atoms are often referred to as "explicit" Hydrogens.  
[13C] Carbon-13 Atomic mass is specified by including an integer before the Atomic Symbol. Default mass is "unspecified".  
[2H]O[2H] Heavy Water  

4. Bonds

C-C-O
or
CCO
Ethanol Single bonds are denoted by a dash, `-'. Single bond symbols and aromatic bond symbols (`:') may be omitted.  
O=C=O Carbon Dioxide Double bonds are denoted by an equals sign `='.  
C#N Hydrogen Cyanide Triple bonds are denoted by a pound sign `#'.  
[Na+].[Cl-] Sodium Chloride The "dot" is a "non-bond", "disconnect", or "zero-order" bond.
CCO.O   Ethanol and Water   Dot can be used to delineate mixtures. Each dot-disconnected SMILES is considered a "component" of the overall molecule or mixture. 

5. Branching

C(F)(F)F or FC(F)F Fluoroform   Branches may be stacked.  
CCCC(C(=O)O)CCC   4-Heptanoic Acid   Branches may be nested.
CC(=O)O Acetic Acid Bonds may be specified within branches.

6. Rings

C1CCCCC1 Cyclohexane Ring closure is specified by breaking bonds and numerically labeling (with the same number) the atoms that were connected to each other.  
C12CCCCC1CCCC2
or
C1CC2CCCCC2CC1
Decalin Atoms can have more than one ring closure.  
C1CCCCC1C1CCCCC1 Bicyclohexane Closure numbers may be reused.  
C1=CCCCC1 or
C=1CCCCC1 or
C1CCCCC=1 or
C=1CCCCC=1
Cyclohexene The default bond order for the ring closure is single (or aromatic) but may be specified by including a bond symbol between the atom and the closure number.

7. Aromaticity

c1ccccc1
or
c:1:c:c:c:c:c1
or
C1=CC=CC=C1
Benzene Aromatic atoms may be specified by using lower case letters. Only the following atoms may be interpreted as aromatic:
C,  N,  P,  O,  S,  As,  Se,  and  *  (wildcard atom)  
[CH-]1C=CC=C1
or
[cH-]1cccc1 
Cyclopentadienyl Anion   Aromaticity detection is accomplished by using an extended version of Hueckel's rule. To qualify as aromatic, the number of available "excess" p-electrons in the ring (or ring system) must equal 4N+2. Here, the extra electron allows carbon to contribute 2 p electrons.  
n1ccccc1
or
N1=CC=CC=C1
Pyridine Pyridine nitrogen (5 valence electrons) has an unbound pair of electrons in an sp2 orbital and contributes 1 p electron
[nH]1cccc1
or
N1C=CC=C1
1-H-Pyrrole Pyrrolyl nitrogen (5 valence electrons) contributes two p electrons.
o1cccc1
or
O1C=CC=C1
Furan Oxygen (6 valence electrons) has an unbound pair of electrons in an sp2 orbital and contributes 2 p electrons.

8. Stereo Isomerism

C/C=C/C Trans-2-butene E and Z type isomerism can be specified using the
`/' and `\' characters. Double bond orientation may be unspecified. (e.g. CC=CC )
N[C@@H](C)C(=O)O L-alanine Tetrahedral chirality can be specified using the "visual mnemonic" `@' character (anticlockwise) or two `@' characters (clockwise). Looking FROM the 1st neighbor
listed in the SMILES TO the chiral atom, the other three neighbors appear anticlockwise or clockwise in the order listed.
N[CH](C)C(=O)O Alanine Tetrahedral orientation may be unspecified.

9. Reactions

C.O=O>O=[O+]-[O-]>
O=C=O.O
Combustion of methane in the presence of ozone (non-stoichiometric) Reactions are delineated by using the greater-than ('>') sign. The format is Reactants>Agents>Products. Atoms may be created or destroyed 
[I-].[Na+].C=CCBr>>
[Na+].[Br-].C=CCI
Displacement Reaction Agents are optional.  
CC(=[O:1])[OH:2].
CC[OH:3]>[H+]>
CC(=[O:1])[O:3]CC.
[OH2:2]
Acid catalyzed esterification of acetic acid and ethanol Atom maps ([<atom>:<map class>]) allow specification of the correspondence between reactant and product atoms. Map class is an atomic property.  
[CH2:1]=[CH:2][CH:3]=
[CH:4][CH2:5][H:6]>>
[H:6][CH2:1][CH:2]=
[CH:3][CH:4]=[CH2:5]
A 1,5-hydride shift. Atom-mapped hydrogens must be specified explicitly.  


More Information

Theory Manual
SMILES Examples