Reaction Languages: Reaction SMILES


Reaction specification

Reaction SMILES is the language used for describing specific, single-step reactions. Reaction SMILES are a strict superset of molecule SMILES. Any valid molecule SMILES can be a component of a reaction SMILES. Reaction SMILES do have additional isomeric features (features which are part of the Absolute SMILES but not part of the Unique SMILES). These features are agent components and atom maps.

Reaction components

A molecule in a reaction has one of three roles: reactant, agent, or product. Reactions are indicated in SMILES by specifying, in left-to-right order, reactant, agent and product molecule(s) separated by the "greater-than" symbol (">"). Multiple molecules per role may be specified by use of dot disconnections.

All of the roles of molecules are optional within a reaction. That is, it is legal to have a reaction SMILES with no reactant, agent, or product molecules, or any combination thereof. (">>" is a legal reaction SMILES which contains no molecules).

There are no requirements that Reaction SMILES be stoichiometrically balanced nor that net charge is preserved (in general, both are recommended).

With respect to canonical SMILES, agents are present in Absolute SMILES and are omitted from Unique SMILES.

Simple Reaction SMILES
Depiction Reaction SMILES and remark
C.O=O.O=O>>O=C=O.O.O

combustion of methane (stoichiometric)

C=O.O=O>CCO>O=C=O.O

life ... don't talk to me about life

CCC.O=N(=O)O>>CCCN(=O)(=O).CC(C)N(=O)=O.CCN(=O)=O.CN(=O)=O

nitration of propane to form various products (non-stoichimetric representation)

CC(=O)O.CCO>>CC(=O)OCC

esterification of acetic acid and ethanol to ethyl acetate

C1[C@H](C)C[C@H](Cl)C1.[OH-]>>C1[C@H](C)C[C@@H](O)C1.[Cl-]

Sn2 displacement of cis-3-methylcyclopentylchloride to the trans alcohol.

Atom mapping

SMILES atom maps allow specification of the correspondance between reactant and product atoms. Reaction atom mapping is never absolutely required, although it is very useful for describing reaction mechanism and for searching atom-mapped databases.

Atom maps are numeric labels which represent classes and are normally paired (i.e., one reactant and one product atom have the same map-class). The numeric values of the atom maps in SMILES have no actual meaning (there is no ranking or ordering implied by the numbers); the atom maps are used only to associate sets of reactant and product atoms with one another.

It is possible to give many atoms the same map class (e.g., when atoms are equivalent on both sides). Atoms in agent molecules are never given atom maps.

The atom map syntax in SMILES is the "colon" character, followed by a non-negative integer in the range of 0 - 2^31. (eg. [CH3:1] is an atom in map class "1"). The default value for atom map is "unspecified", so any explicit atom map values must be part of a bracket-enclosed atomic expression.

With respect to canonical SMILES, atom maps are isomeric features and hence are included in absolute SMILES and omitted from Unique SMILES.

Hydrogen specification

Hydrogens in molecule SMILES often do not need to be specified explicitly; their existence is sufficiently described by the default valence and connectivity for a given atom. In these cases, the hydrogens are said to be implicit. For molecule SMILES, there are four cases where specification of explicit hydrogens is required.

- charged hydrogen, i.e. a proton, [H+]
- hydrogen connected to other hydrogens, e.g., molecular hydrogen, [H][H]
- hydrogen connected to more than one other atom, e.g., bridging hydrogen
- isotopic hydrogen specifications, e.g. in heavy water, [2H]O[2H]

For reaction SMILES, a new case is added. Atom-mapped hydrogens must be specified explicitly. Since atom maps are considered an "isomeric feature", then this new rule parallels rule #4 for isotopic hydrogens.

This rule allows hydrogens to be used explicitly as atoms in a reaction, and reaction mechanisms which include hydrogens can be stated precisely.

Atom-mapped Reaction SMILES
Depiction Reaction SMILES and remark
CC(=[O:1])[OH:2].CC[OH:3]>[H+]>CC(=[O:1])[O:3]CC.[OH2:2]

A stoichiometrically complete specification of the acid catalyzed esterification of acetic acid and ethanol. Note that [H+] is indicated as an agent. The atom-mapping of the oxygens specifies that the carboxyl oxygen becomes water and that the alcohol oxygen becomes the ester oxygen.

CC(=[O:1])[OH:1].CC[OH:3]>[H+]>CC(=[O:1])[O:3]CC.[OH2:1]

The same overall reaction as above, however the carbonyl and carboxyl oxygens are both in the same map class. This is a case where the atom maps allow the user to specify more accurate information about the actual mechanism than our valence model permits.

[NH2:1][CH2:4][CH2:5][c:11]1[cH:7][cH:6][c:9]([OH:2])[c:10]([OH:3])[cH:8]1>[Na+].[O-]N=O> [NH2:1][CH2:4][CH2:5][c:11]1[cH:8][c:10]([OH:3])[c:9]([OH:2])[cH:6][c:7]1[N+](=O)[O-]

This reaction from CCR97 is typical of database records, where added or deleted atoms during the reaction are not mapped. This becomes important for transformations.

[CH2:1]=[CH:2][CH:3]=[CH:4][CH2:5][H:6]>>[H:6][CH2:1][CH:2]=[CH:3][CH:4]=[CH2:5]

A 1,5-hydride shift. Atom maps allow the user to explicitly say which hydrogen is involved in the reaction.


Forward to "Reaction SMARTS".
Return to table of contents.
Daylight Chemical Information Systems, Inc.
jjdelany@daylight.com