EuroMUG '98: Reaction SMARTS

Reaction Languages: Reaction SMARTS

Reaction specification

Reaction SMARTS, the reaction query language, is a strict superset of molecule SMARTS. The extensions for SMARTS which describe reaction query features are essentially the same as those described for reaction SMILES. The semantics are described here.

Reaction components

These are pretty straightforward. Each component of the reaction query describes a substructure which appears within that component role during a substructure match.

The behavior when mixing molecule and reaction SMARTS with molecule and reaction targets is worth describing. Molecule SMARTS (SMARTS without ">" characters) can match anywhere in a reaction target (reactant, agent, or product). Reaction queries never match molecule targets.

Another way to think of these cases is to consider the "role" of each atom a searchable property... a molecule query is a search for a collection of atoms and bonds where we haven't specified a role, hence the molecule query potentially will match any role in a reaction. A reaction query, on the other hand, specifies roles for all the atoms and bonds in it's query, and since the atoms in the target molecule don't have specified roles, these can't match.

Simple Reaction SMARTS Matching
Target Reaction SMARTS and remark
C=C
The SMARTS matches double-bonded carbons anywhere in the target.
NOTE: All matches are highlighted in red.

>>C=C
Looking for double-bonded carbons in product components. This query never matches a molecule target (since molecule SMILES don't have "products").

C=C>>CC
This matches a pair of double-bonded carbons in the reactant and two single-bonded carbons in product. It matches a double bond reduction reaction, and also matches any case where each component appears separately (see the next example).
C=C>>CC
This would be a 'false hit' if looking for a reduction reaction. Atom maps must be used to refine the query and eliminate these false hits.

Simple Reaction SMARTS Matching
Target Reaction	SMARTS and remark
	C=C The SMARTS matches double-bonded carbons anywhere in the target. NOTE: All matches are highlighted in red.
	>>C=C Looking for double-bonded carbons in product components. This query never matches a molecule target (since molecule SMILES don't have "products").
	C=C>>CC This matches a pair of double-bonded carbons in the reactant and two single-bonded carbons in product. It matches a double bond reduction reaction, and also matches any case where each component appears separately (see the next example).
	C=C>>CC This would be a 'false hit' if looking for a reduction reaction. Atom maps must be used to refine the query and eliminate these false hits.

Atom mapping

SMARTS atom maps allow specification the correspondance between reactant and product atoms within the target reaction. As with reaction SMILES, atom maps are numeric labels which represent classes and are normally paired (i.e., one reactant and one product atom have the same map-class). Atom mapped reaction SMARTS atomic expressions never match unmapped target atoms. Hence, mapped reaction queries will never hit unmapped reaction targets.

When atom maps in the SMARTS are pairwise (a reactant and a product atom have the same map class label), the SMARTS says that these two atoms are in the same class in the target. The actual values of the map classes in the pattern and target need not match (eg. SMARTS [C:1]>>[C:1] does match the target [CH2:2]=[CH2:3]>>[CH3:2][CH3:3]) but the two target atoms must be in the same map classes as one another. Furthermore, it is possible that additional atoms in the target to be mapped to the same class (eg. SMARTS [C:1]>>[C:1] also matches the target [CH2:2]=[CH2:2]>>[CH3:2][CH3:2]).

It is legal in SMARTS for many atoms to have the same map class. The semantics of atom maps in SMARTS, when multiple reactant and product atoms have the same class labels uses "or" logic. The SMARTS will match if each reactant atom of the ambiguous class matches at least one product in the class, and each product in the class matches at least one reactant. It is not necessary that all atoms in the target be in the same atom map class. In effect, using ambiguous atom maps in a query makes that query broader; it will match ambiguous targets as well as specifically mapped ones.

Atom maps in SMARTS must always be the final part of an atom expression and has the lowest-precedence and between the atom map and the rest of the expression. An expression like: [C;:6&:7] is not legal.

Atom-mapped Reaction SMARTS
Target Reaction SMARTS and remark
[C:1]=[C:2]>>[C:1][C:2]
This matches a double-bond reduction reaction.

[C:1]=[C:2]>>[C:1][C:2]
This query correctly does not match the atom-mapped reaction target, eliminating the false hit.

Atom-mapped Reaction SMARTS
Target Reaction	SMARTS and remark
	[C:1]=[C:2]>>[C:1][C:2] This matches a double-bond reduction reaction.
	[C:1]=[C:2]>>[C:1][C:2] This query correctly does not match the atom-mapped reaction target, eliminating the false hit.

This final table gives examples to clarify the nuances of atom map query semantics.

More Atom-mapped Reaction SMARTS
SMARTS Query:	Target:	Match count:	Comment:
C>>C	CC>>CC	4	No maps, normal match.
C>>C	[CH3:7][CH3:8]>> [CH3:7][CH3:8]	4	No maps in query, maps in target are ignored.
[C:1]>>[C:1]	CC>>CC	0	No maps in target, hence no matches.
[C:1]>>C	[CH3:7][CH3:8]>> [CH3:7][CH3:8]	4	Unpaired map in query ignored.
[C:?1]>>[C:?1]	CC>>CC	4	Query says mapped as shown or not present.
[C:1]>>[C:1]	[CH3:7][CH3:8]>> [CH3:7][CH3:8]	2	Matches for target 7,7 and 8,8 atom pairs.
[C:1]>>[C:2]	[CH3:7][CH3:8]>> [CH3:7][CH3:8]	4	When a query class is not found on both sides of the query, it is ignored; this query does NOT say that the atoms are in different classes.
[C:1][C:1]>>[C:1]	[CH3:7][CH3:7]>> [CH3:7][CH3:7]	4	Atom maps match with "or" logic. All atoms get bound to class 7.
[C:1][C:1]>>[C:1]	[CH3:7][CH3:8]>> [CH3:7][CH3:8]	4	The reactant atoms are bound to classes 7 and 8. Note that having the first query atom bound to class 7 does not preclude binding the second atom. Next, the product atom can bind to classes 7 or 8.
[C:1][C:1]>>[C:1]	[CH3:7][CH3:7]>> [CH3:7][CH3:8]	2	The reactants are bound to class 7. The product atom can bind to class 7 only.

Forward to "Reaction SMIRKS".
Return to table of contents.

Daylight Chemical Information Systems, Inc.
jjdelany@daylight.com