SSMILES Tutorial
This is a one-page summary of SSMILES,
an extremely simple subset of SMILES.
The complete SMILES language is covered in the
SMILES Tutorial.
Links to other SMILES-related topics are available in the
SMILES Home Page.
Introduction
SSMILES is an extremely simplified subset of SMILES used for expressing
the molecular graph of "normal" organic chemicals,
i.e., neutral molecules with atoms at their normal organic valences.
In such cases there is no need for the brackets, charges, aromatic
specifications, atom mapping, etc. used in the full SMILES language.
If you only need to operate on such organic structures,
simple forms of the six SMILES rules are pretty much all you need to know.
Rules
- Atoms are represented by atomic symbols:
B, C, N, O, F, P, S, Cl, Br, and I.
- Double bonds are `=', triple bonds are `#'.
- Branching is indicated by parentheses.
- Ring closures are indicated by pairs of matching digits.
- Period `.' is used to represent disconnections (0-order bond).
- `>' delimits reaction components:
reactants > agents > products.
Examples
Depiction |
SSMILES |
Name |
Remark |
|
C |
methane |
Hydrogens fill normal valence. |
|
CCO |
ethanol |
A single bond is assumed to join adjacent
atoms unless otherwise specified. |
|
C=O |
formaldehyde |
An "equals" sign represents a double bond. |
|
C#N |
hydrogen cyanide |
A "pounds" sign represents a triple bond. |
|
CC(=O)O |
acetic acid |
Parentheses are used to indicate branching. |
|
CC(C)(C)C |
neopentane |
Branches may be stacked. |
|
C1CCCCC1 |
cyclohexane |
Bonds can also be represented by pairs of matching digits,
e.g., ring closures. |
|
N1=CC=CC=C1 |
pyridine |
Aromatic compounds are written as Kekule structures. |
|
S2C=CC=C2 |
thiophene |
Any ring closure digit may be used. |
|
CC(=O)O.CCO |
acetic acid and ethanol |
A period is a "non-bond" |
|
CC(=O)O.CCO>>CC(=O)OCC.O |
esterification of acetic acid and ethanol to ethyl acetate and water |
Two >'s are required but components are optional. |
Limitations
The SSMILES subset of SMILES is formalized to provide a truly simple
chemical nomeclature, i.e., one that can be learned "on the spot" each
time it is needed.
However, it is not a comprehensive chemical nomenclature:
one cannot specify
inorganic elements, disconnections, formal charges, unusual valences,
isotopic and chiral specifications, nor reaction mapping.
See the description of the full
SMILES language
for more information.
References
"SMILES 1. Introduction and Encoding Rules",
Weininger, D., J. Chem. Inf. Comput. Sci., 1988, 28, 31.
Daylight Chemical Information Systems, Inc.
info@daylight.com