SSMILES Tutorial

This is a one-page summary of SSMILES, an extremely simple subset of SMILES. The complete SMILES language is covered in the SMILES Tutorial. Links to other SMILES-related topics are available in the SMILES Home Page.

Introduction

SSMILES is an extremely simplified subset of SMILES used for expressing the molecular graph of "normal" organic chemicals, i.e., neutral molecules with atoms at their normal organic valences. In such cases there is no need for the brackets, charges, aromatic specifications, atom mapping, etc. used in the full SMILES language. If you only need to operate on such organic structures, simple forms of the six SMILES rules are pretty much all you need to know.

Rules

  1. Atoms are represented by atomic symbols: B, C, N, O, F, P, S, Cl, Br, and I.
  2. Double bonds are `=', triple bonds are `#'.
  3. Branching is indicated by parentheses.
  4. Ring closures are indicated by pairs of matching digits.
  5. Period `.' is used to represent disconnections (0-order bond).
  6. `>' delimits reaction components: reactants > agents > products.

Examples

Depiction SSMILES Name Remark
C methane Hydrogens fill normal valence.
CCO ethanol A single bond is assumed to join adjacent atoms unless otherwise specified.
C=O formaldehyde An "equals" sign represents a double bond.
C#N hydrogen cyanide A "pounds" sign represents a triple bond.
CC(=O)O acetic acid Parentheses are used to indicate branching.
CC(C)(C)C neopentane Branches may be stacked.
C1CCCCC1 cyclohexane Bonds can also be represented by pairs of matching digits, e.g., ring closures.
N1=CC=CC=C1 pyridine Aromatic compounds are written as Kekule structures.
S2C=CC=C2 thiophene Any ring closure digit may be used.
CC(=O)O.CCO acetic acid and ethanol A period is a "non-bond"
CC(=O)O.CCO>>CC(=O)OCC.O esterification of acetic acid and ethanol to ethyl acetate and water Two >'s are required but components are optional.

Limitations

The SSMILES subset of SMILES is formalized to provide a truly simple chemical nomeclature, i.e., one that can be learned "on the spot" each time it is needed. However, it is not a comprehensive chemical nomenclature: one cannot specify inorganic elements, disconnections, formal charges, unusual valences, isotopic and chiral specifications, nor reaction mapping. See the description of the full SMILES language for more information.

References

"SMILES 1. Introduction and Encoding Rules", Weininger, D., J. Chem. Inf. Comput. Sci., 1988, 28, 31.
Daylight Chemical Information Systems, Inc.
info@daylight.com