SMILES Tutorial: Related languages

This document is intended to be viewed with a tables-capable browser.

The SMILES language family

SMILES is one of a family of languages maintained by Daylight CIS.

SSMILES is an extremely simplified subset of SMILES used for expressing "normal" organic chemicals and their reactions, i.e., neutral molecules with atoms at their normal organic valences. There is no need for brackets, charges, aromatic specifications, etc. The rules for the language are:

  1. Atoms are represented by atomic symbols
  2. Double bonds are `=', triple bonds are `#'
  3. Branching is indicated by parentheses
  4. Ring closures are indicated by pairs of matching digits.
  5. Period `.' is used to represent disconnections.
  6. `>' delimits reaction components: reactants > agents > products.
That's it. Although not comprehensive, there are some advantages to such extreme simplification, e.g., it's nearly trivial to learn and easy to implement (there's a 1-line SSMILES interpreter in APL and a hardware implementation that parses > 10e7 SSMILES per second).

USMILES (unique SMILES) and ASMILES (absolute SMILES) aren't separate languages per se: they are unique SMILES for a given molecule (or reaction), without and with isomeric information, respectively. The SMILES language was specifically designed to be "uniquifiable", e.g., one can automatically generate a single, unique SMILES string from the SMILES for any equivalent molecular graph or reaction. Unique SMILES forms the basis of the THOR (thesaurus-oriented retrieval) chemical database system.

SMARTS (SMILES arbitrary target specification) is a superset of SMILES which expresses molecular patterns. SMARTS patterns are expressed as SMILES with extra primitives and boolean operators, e.g., the pattern of "a halogen ortho to hydroxy" may be written in SMARTS as:

[F,Cl,Br,I]cc[OH]

SMARTS also represents reaction patterns, e.g., the pattern of a reaction with "at least one reactant molecule containing pyridine and one product molecule containing a halogen ortho to hydroxy" may be written:

n1ccccc1>>[F,Cl,Br,I]cc[OH]

SMIRKS (SMILES reaktion specification) is a superset of both SMILES and SMARTS which is used for expressing transformations ... generic patterns which can be applied to new molecules. In general, atoms and bonds participating in the reaction must be specified in their SMILES form (with correct oxidation state) but other atoms and bonds can be specified as SMARTS patterns. For instance, the acid-catalyzed esterification of aliphatic alcohols and acids may be written generically in SMIRKS as:

[A:1][O:2]+[A:3][C:4](=[O:5])[OH:6]>>[A:3][C:4](=[O:5])[O:2][A:1]+[OH2:2]

CHUCKLES, CHORTLES and CHARTS are SMILES-based languages which express monomer-based structures, combinatorial mixtures and patterns, respectively.

XSMILES and CEX

CEX (chemical exchange) is a set of software tools which allow chemical information to be exchanged between widely differing environments while retaining well-defined semantics. CEX is being developed by an alliance of producers and consumers of chemical information software. One of the advantages of CEX code is that code will be available free of charge and will be supported by this alliance.

XSMILES is the molecular language used by CEX (more accurately, the lexical form of the molecule object being passed on a CEX stream). XSMILES is similar to, but not identical to, SMILES as described in this document. In brief, XSMILES is like SMILES with all traces of Daylight conventions removed, e.g., aromaticity conventions. As such, XSMILES is much more suitable than SMILES for exchange of chemical information. On the other hand, it isn't powerful enough to express unique nomenclature, but that's probably just as well, being in the purvue of application software (rather than information exchange software).

It is anticipated the the first CEX release will be available in late 1995, complete with XSMILES parsers and generators and lots of other goodies. A hyperlink will be posted here when that happens.

The "chemical/smiles" MIME type

The "chemical" MIME type, championed by H. Rzepa of Imperial College, London, is (hopefully) nearing ratification. Should this happen, SMILES will almost certainly appear as MIME subtype, i.e., the MIME type: chemical/smiles. Until that blessed day, the IETF recommendation is to use something like application/x-smiles or x-chemical/x-smiles.

Information about this issue can be obtained via Chemical MIME types.


Forward to "&etc".
Back to "Conventions".
Return to table of contents.
Daylight Chemical Information Systems, Inc.
info@daylight.com