SMILES,  An Introduction


  1. Introduction
  2. Atoms
  3. Properties of Atoms
  4. Bonds
  5. Branching
  6. Rings
  7. Aromaticity
  8. Chirality
  9. Reactions

  1. Introduction:   SMILES...
  2. Atoms
  3.  
    Depiction SMILES Name Note
    [Li] Lithium Square brackets ( [ ] ) are used to delimit individual atoms.
    O Water Elements in the "organic subset" may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds:
    B(3), C(4), N(3,5), O(2), P(3,5), S(2,4,6), F(1), Cl(1), Br(1), I(1)
    CMethane  
    [H] Hydrogen Atom Hydrogen is NOT part of the "organic subset" and therefore needs brackets.
    [C] Elemental Carbon
    (Graphite)
    If atoms are used within brackets, all Hydrogens must be specified, otherwise it is assumed that there are none.
    *F Unknown atom bonded
    to Fluorine
    * is wildcard (any atom).

  4. Properties of Atoms
  5. [Co+2]
    or
    [Co++]
    Cobalt(II) Brackets are required whenever atomic properties (including chirality) are specified. Charge is specified by sign and numerical value or by the quantity of signs.  
    [NH4+] Ammonium Ion Hydrogen count is an atomic property and is specified by including an H (and optionally an integer) after the Atomic Symbol. Hydrogens are not normaly considered atoms.  
    [H+] Proton Hydrogen IS considered an atom when it is charged or has a specified mass (e.g. [2H], Deuterium).
    [13C] Carbon-13 Atomic mass is specified by including an integer before the Atomic Symbol.  
    [2H]O[2H] Heavy Water  

  6. Bonds
  7. C-C-O
    or
    CCO
    Ethanol Single bonds are denoted by a dash, `-'. Single bond symbols and aromatic bond symbols (`:') may be omitted.  
    O=C=O Carbon Dioxide Double bonds are denoted by an equals sign `='.  
    C#N Hydrogen Cyanide Tripple bonds are denoted by a pound sign `#'.  
    [Na+].[Cl-] Sodium Chloride The "dot" is a "non-bond", "disconnect", or "zero-order" bond.
    CCO.O   Ethanol and Water   Dot can be used to delineate mixtures. Each dot-disconnected SMILES is considered a "component" of the overall molecule or mixture. 

  8. Branching
  9. C(F)(F)F
    or
    FC(F)F
    Fluoroform   Branches may be stacked.  
    CCCC(C(=O)O)CCC   4-Heptanoic Acid   Branches may be nested.
    CC(=O)O Acetic Acid Bonds may be specified within branches.                                                         

  10. Rings
  11. C1CCCCC1 Cyclohexane Ring closure is specified by breaking bonds and numerically labeling (with the same number) the atoms that were connected to each other.  
    C12CCCCC1CCCC2
    or
    C1CC2CCCCC2CC1
    Decalin Atoms can have more than one ring closure.  
    C1CCCCC1C1CCCCC1 Bicyclohexane Closure numbers may be reused.  
    C1=CCCCC1 or
    C=1CCCCC1 or
    C1CCCCC=1 or
    C=1CCCCC=1
    Cyclohexene The default bond order for the ring closure is single (or aromatic) but may be specified by including a bond symbol between the atom and the closure number.

  12. Aromaticity
  13. c1ccccc1
    or
    c:1:c:c:c:c:c1
    or
    C1=CC=CC=C1
    Benzene Aromatic atoms within the "organic subset" may be specified by using lower case letters.  
    [CH-]1C=CC=C1
    or
    [cH-]1cccc1 
    Cyclopentadienyl Anion   Aromaticity detection is accomplished by using an extended version of Hueckel's rule. To qualify as aromatic, the number of available "excess" p-electrons in the ring (or ring system) must equal 4N+2, where N is a positive integer.  
    n1ccccc1
    or
    N1=CC=CC=C1
    Pyridine Pyridine nitrogen has an unbound pair of electrons in an sp2 orbital
    [nH]1cccc1
    or
    N1C=CC=C1
    1-H-Pyrrole Pyrrolyl-N is written [nH] and shares two pi-electrons.
    o1cccc1
    or
    O1C=CC=C1
    Furan Oxygen shares a pair of pi electrons, so furan is aromatic.

  14. Chirality
  15. C/C=C/C Trans-2-butene E and Z type chirality can be specified using the `/' and `\' characters. Double bond orientation may be unspecified. (e.g. CC=CC ) 
    N[C@@H](C)C(=O)O L-alanine Tetrahedral chirality can be specified using the "visual mneumonic" `@' character (anticlockwise) or two `@' characters (clockwise).   Looking FROM the 1st neighbor listed in the SMILES TO the chiral atom, the other three neighbors apear anticlockwise or clockwise in the order listed.
    N[CH](C)C(=O)O Alanine Tetrahedral orientation may be unspecified.

  16. Reactions
  17. O>[Si](*)(*)(O*)O*>CCO Water Into Wine (in the presence of Silica) Reactions are delineated by using the greator-than ('>') sign. The format is Reactants>Agents>Products. Atoms may be created or destroyed 
    [I-].[Na+].C=CCBr>
    >[Na+].[Br-].C=CCI
    Displacement Reaction Agents are optional.