Back to Table of Contents
The reaction toolkit provides a set of tools which support both specific and generic single-step reactions. These tools add the capability to address numerous reaction-oriented chemical information problems. These tools are integrated into the Daylight system and are used extensively within Thor and Merlin to add support for reactions to these systems.
The reaction toolkit adds support for two additional object types:
Reaction Toolkit Object Classes | |
Reaction | a single-step reaction |
Transform | a generic reaction |
The reaction object is actually implemented within the Smiles toolkit library. The transform object is implemented within the Smarts toolkit library. Note that the reaction toolkit is licensed separately, even though the toolkits are contained within the Smiles and Smarts libraries.
The extensive use of polymorphism for both reaction and transform objects is one of the key principals which makes the reaction toolkit convenient to use. A design criteria for a reaction object is that it behave as much like a molecule object as possible. Similarly, a design criteria for the transform object is that it behave like a pattern object.
In effect, a reaction object is a "superset" of a molecule object. A reaction can do everything a molecule can, and then some (which we'll cover in detail).
For example, a reaction contains one or more molecule objects. These are the components of the reaction (reactant, agent and product molecule). Each of these molecule objects in turn contains atoms, bonds, and cycles. Now one can certainly take a stream of molecules over a reaction. This works as one would expect, returning a stream which contains every component molecule in the reaction.
dt_stream(reaction, TYP_MOLECULE) => all molecules in the reactionOne can also take streams of atoms, bonds, or cycles over a reaction, effectively ignoring the molecule layer of the reaction. In this case, the streams work exactly the same for molecules and reactions.
dt_stream(reaction, TYP_ATOM) => all atoms in the reaction dt_stream(reaction, TYP_BOND) => all bonds in the reaction dt_stream(reaction, TYP_CYCLE) => all cycles in the reactionNote that in the case of streams of atoms or bonds over a reaction, the resulting stream will contain ALL of the atoms, bonds or cycles in every molecule in the reaction.
Generally, the strategy for reaction toolkit programming is to ignore the "molecule layer" of a reaction whenever possible. This results in toolkit code which is most flexible in that the code will correctly process both molecules and reactions.
As an example, consider the following code:
#include "dt_smiles.h" #include "dt_depict.h" main() { dt_Handle ob, d, atoms, atom; char line[400], *msg; int len, count; /*** Get SMILES from user ***/ if (!gets(line)) return (0); /*** Create object. dt_smilin returns a molecule or reaction, but we don't care which. The rest of the toolkit calls operate equally well on either. ***/ ob = dt_smilin(strlen(line), line); /*** We could check the type of object returned if we wanted, but it isn't necessary (dt_type(ob) would return TYP_MOLECULE or TYP_REACTION) ***/ count = 0; atoms = dt_stream(ob, TYP_ATOM); while (NULL_OB != (atom = dt_next(atoms))) if (dt_number(atom) == 6) count++; /*** Count carbons ***/ dt_dealloc(atoms); printf("The object contains %d carbon atoms.\n", count); /*** Note that dt_alloc_depiction(3) can take a reaction or molecule object in version 4.5 ***/ d = dt_alloc_depiction(ob); dt_calcxy(d); /*** Call drawing library to show depiction ***/ dl_beginscreen(); dt_depict(d); dl_endscreen(d); /*** Destroy objects. ***/ dt_dealloc(d); dt_dealloc(ob); return(1); }Whether the user enters a reaction or molecule SMILES is completely irrelevant to the program, the way it is coded, or its execution. This example program and many others like it (cansmi, showparts, protons, hbonds, smarts_filter, addfp, etc.) only need be recompiled under version 4.51 or later to be fully reaction-capable.
The other important factor which makes the reaction toolkit convenient is the treatment of derivative objects (paths, substruct, pathsets, depictions, conformations, fingerprints). Each of the derivative object types has been extended to handle Reaction objects directly. There is no need to use or understand the behavior of a bunch of new derivative objects specifically for reactions.
In the case of derivative objects, the molecule layer of a reaction is ignored; the derivative objects just work at the atom and bond layer. For example, the depiction object used in the example code above handles reactions just as well as molecules. One can create a depiction for either a molecule or a reaction object. The returned depiction objects behave exactly as in version 4.42 with one exception: the base object (dt_base(3)) of a depiction may now be either a reaction or molecule; in version 4.4 the base of a depiction was always a molecule. See section 20.7 for further discussion of derivative objects and reactions.
A reaction consists of a set of molecule objects, each has a specific role in the reaction: reactant, product, or agent. Agents are molecules which do not contribute atoms to the products, or accept atoms from the reactants. Note that this definition is not enforced by the toolkit. It is manifested in the definition of atom maps for reactions.
This section focuses on tookit functions which are specific to reaction objects or functions which have new, unique behaviors for reaction objects. These functions are generally useful for building reactions from scratch and for manipulating reaction objects.
dt_alloc_reaction(void) => Handle reaction
dt_addcomponent(dt_Handle reaction, dt_Handle mol, dt_Integer role)
=> Handle mol
Practically speaking, a reaction object will have at most one each of reactant, agent, and product molecules and these are generally processed (eg. streams of molecules over a reaction) in reactant-agent-product order. If one adds multiple molecule objects to a reaction with the same role, these are combined within the reaction object. The way to think about this is that molecules are used as the internal representation of structural data in a reaction, yet the reaction object reserves the right to change it's internal representation as necessary. Since the original molecules are unaffected, this works out well.
dt_getrole(dt_Handle ob,
dt_Handle reaction) => dt_Integer role
dt_smilin(dt_String
smiles) => dt_Handle object
Input SMILES | Toolkit licenses available | dt_smilin(3) behavior |
Any SMILES | none | Program exits |
Molecule SMILES | smiles | returns Molecule object |
Molecule SMILES | smiles, reaction | returns Molecule object |
Reaction SMILES | smiles | returns NULL_OB, warning in error queue |
Reaction SMILES | smiles, reaction | returns Reaction object |
dt_cansmiles(dt_Handle
reaction, dt_Integer iso) => String smiles
When 'iso' is TRUE, returns the absolute SMILES for the reaction. This includes all agents, isotopic and isomeric information, and atom maps.
dt_type(dt_Handle reaction)
=> dt_Integer TYP_REACTION
dt_typename(dt_Handle
reaction) => dt_String "reaction"
dt_info(dt_Handle
reaction, dt_String "smiles") => dt_String input SMILES
dt_mod_is_on(dt_Handle
reaction) => dt_Boolean state
dt_mod_on(dt_Handle
reaction) => dt_Boolean ok
dt_mod_off(dt_Handle
reaction) => dt_Boolean ok
dt_dealloc(dt_Handle
reaction) => dt_Boolean ok
The following code gives a simple example of creation and manipulation of a reaction object. In this example, a reaction is built two different ways: first, a reaction is created from scratch, and molecule objects are added to build up the reaction. Second, a reaction is built from a single reaction-SMILES. The resulting reactions have the same unique SMILES.
void build_reaction(void) { dt_Handle reaction1, reaction2; dt_Handle mol1, mol2, mol3; dt_String smi1 = "CCO"; dt_String smi2 = "CC(=O)O"; dt_String smi3 = "CCOC(CC)=O"; dt_String smi4 = "CCO.CC(=O)O>OCC>CCOC(=O)CC"; dt_String cansmi1, cansmi2; dt_Integer slen1, slen2; /*** Make molecule objects. We'll build the reaction from its pieces ***/ mol1 = dt_smilin(strlen(smi1), smi1); mol2 = dt_smilin(strlen(smi2), smi2); mol3 = dt_smilin(strlen(smi3), smi3); /*** Make an empty reaction. Set it to mod on. Add the pieces. ***/ reaction1 = dt_alloc_reaction(); dt_mod_on(reaction1); /*** Note: ethanol added twice, as reactant and agent. This is legal. ***/ dt_addcomponent(reaction1, mol1, DX_ROLE_REACTANT); dt_addcomponent(reaction1, mol1, DX_ROLE_AGENT); dt_addcomponent(reaction1, mol2, DX_ROLE_REACTANT); dt_addcomponent(reaction1, mol3, DX_ROLE_PRODUCT); dt_mod_off(reaction1); /*** The molecules are no longer needed (copies are kept by the reaction). We can deallocate them. ***/ dt_dealloc(mol1); dt_dealloc(mol2); dt_dealloc(mol3); /*** Get the unique SMILES for the reaction. ***/ cansmi1 = dt_cansmiles(&slen1, reaction1, FALSE); if (cansmi1 == NULL) return; /*** Make a second reaction from a SMILES. ***/ reaction2 = dt_smilin(strlen(smi4), smi4); cansmi2 = dt_cansmiles(&slen2, reaction2, FALSE); if (cansmi2 == NULL) return; /*** The two unique SMILES shold be the same. ***/ if ((slen1 == slen2) && (0 == strncmp(cansmi1, cansmi2, slen1))) fprintf(stderr, "The two SMILES are the same. Life is good.\n"); else fprintf(stderr, "The two SMILES are different. Life is bad.\n"); dt_dealloc(reaction1); dt_dealloc(reaction2); return; }
Reactions are made up of molecule objects. These are normal molecules, with a new property, role, which is used to distinguish the reactant, product and agent in a reaction. Molecules within reactions have the reaction as a parent, and have a value defined for their role property, but are otherwise indistinguishable from any other molecules in the toolkit.
dt_parent(dt_Handle
molecule) => dt_Handle parent
dt_dealloc(dt_Handle
molecule) => dt_Boolean ok
dt_mod_on(dt_Handle
reaction) => dt_Boolean ok
dt_mod_off(dt_Handle
reaction) => dt_Boolean ok
This is identical to callind dt_mod_off(3) for the parent reaction. In effect, the toolkit treats a reaction and its component molecules as a single unit for structural modification; setting the state for either the reaction or one of its child molecules sets the state for all of them.
Within the SMILES language for reactions, atom maps are numeric atom labels. All atoms within a SMILES string with the same atom map label are associated in an atom map set.
Within the toolkit, atom maps are manipulable only as atom map sets. The toolkit takes care of interpreting the labels on input SMILES and labeling the output SMILES in a systematic way.
Agent atoms and atoms which are not part of a reaction may never be put in an atom map class. Only reactant and product atoms from the same reaction may appear in a given atom map class.
There are no requirements for completeness or uniqueness of the atom mappings over a reaction. Atom mappings are independent of the connectivity and properties of the underlying molecules. The rules for an atom maps are as follows:
dt_setmap(dt_Handle atom1,
dt_Handle atom2) => dt_Boolean ok
If either 'atom1' or 'atom2' already belongs to a map class, the result of this operation is to merge the sets of atoms into a single map class which contains 'atom1', 'atom2', and any atoms which were previously mapped to 'atom1' or 'atom2'. For example, the following four functions, applied in any order, result in a single map class which contains atoms: r1, r2, r3, p1, p2.
dt_setmap(r1, p1); dt_setmap(r2, p1); dt_setmap(r3, p1); dt_setmap(r1, p2);
If 'atom2' is NULL_OB, 'atom1' is unmapped from its current map set. That is, 'atom1' will no longer be mapped to any other atoms in the reaction. The atom map set from which 'atom1' is removed remains intact unless the atom map set becomes invalid. A map class becomes invalid if it no longer contains at least one reactant and one product atom. If the atom map set becomes invalid, all of the remaining atoms are unmapped from one-another.
dt_getmap(dt_Handle
atom) => dt_Handle substruct
dt_mapped(dt_Handle
atom1, dt_Handle atom2) => dt_Boolean mapped
Hydrogens in reactions are handled as with molecules (suppressed unless the hydrogen is special). With reactions, there is an additional case which will make a hydrogen special. It is often desireable (eg. 1,5-hydride shift) to store information about the location of hydrogens as part of the atom map of a reaction. Hydrogens with a supplied atom map are considered "special" and these hydrogens are not suppressed in the toolkit. These mapped hydrogens appear explicitly in Isomeric SMILES for reactions. Otherwise, atom-mapped hydrogens do not appear in canonical SMILES.
Note that the special hydrogen dt_isohydro(3) can not be part of any atom map class. Hence, this special hydrogen can never be used in place of an atom-mapped hydrogen in a reaction. Any atom-mapped hydrogens must be stored as explicit hydrogens.
A reaction query is expressed with the SMARTS language. SMARTS has been extended with reaction and atom map query syntax. There is no separate pattern object for a reaction query. When a SMARTS is interpreted, a pattern object is returned. In effect, the pattern object takes on the additional expressive capabilities for reactions.
dt_smartin(dt_String
SMARTS) => dt_Handle pattern
dt_smarts_opt(dt_String
SMARTS, dt_Integer vmatch) => dt_String SMARTS
The flexibility and utility of the Daylight toolkit arises partly because of the ability to create derivative objects based on Molecules. These objects include paths, substructs, pathsets, depictions, conformations and fingerprints. Each of these objects has a specific unique purpose within the toolkit, however they all share some common features which are important for reaction processing:
These features allowed us to directly extend these objects to handle reactions. As discussed in Section 20.3, the "molecule layer" of a reaction is ignored; only the atoms and bonds of a reaction are considered.
Hence, each of these objects is now defined as having a either a molecule or a reaction as its "base" object. Otherwise, their behaviors are essentially unchanged. They still store data about the atoms and bonds in their base object, and they still ignore other non-relevant attributes of their base object (like the molecules).
Briefly, we address each of the main derivative types in the next sections and highlight their behaviors with regard to reactions.
Paths and substructs are collections of atoms and bonds, which all come from the same base object. With reactions, this behavior remains unchanged. The atoms and bonds within a path or substructure must come from the same reaction but they may be from different molecules within a reaction. For example, the following code creates a path from a reaction object, adds all of the double-bonds from the reaction to the path, and returns the path.
dt_Handle get_db(dt_Handle ob) { dt_Handle bonds, bond, path; /*** Inappropriate type ***/ if ((dt_type(ob) != TYP_MOLECULE) && (dt_type(ob) != TYP_REACTION)) return (NULL_OB); /*** Make a path. The base of the path will be "ob" ***/ path = dt_alloc_path(ob); /*** If a reaction, ignore the molecule layer. Only deal with the bonds. If a molecule, this happens by default. ***/ bonds = dt_stream(ob, TYP_BOND); while (NULL_OB != (bond = dt_next(bonds))) if (dt_bondorder(bond) == DX_BTY_DOUBLE) dt_add(path, bond); /*** Clean up and return ***/ dt_dealloc(bonds); return (path); }
Note that absolutely no consideration is given to the fact that the bonds may be in different molecules within the reaction. As long as the atoms and bonds added to a path or substruct are all part of the correct base object (the object given in dt_alloc_path(3)) this succeeds.
A pathset is a collection of paths over the same base object. The base object may be a reaction. A pathset is returned from the SMARTS matching functions.
In this case, the pathset returned depends on the type of target used for the match function:
dt_match(dt_Handle
pattern, dt_Handle target, dt_Integer limit) => dt_Handle
pathset
dt_umatch(dt_Handle
pattern, dt_Handle target, dt_Integer limit) => dt_Handle
pathset
The semantics for pattern matching are as follows:
Pattern | Target | Result |
Molecule query | Molecule object | Molecule substructure matches |
Molecule query | Reaction object | All substructure matches over entire reaction |
Reaction query | Molecule object | No hits |
Reaction query | Reaction object | Reaction substructure matches |
dt_vmatch(dt_Handle
pattern, dt_Handle target, dt_Integer limit) => dt_Handle
pathset
There is one important exception for vector-matching: It is only legal to use a molecule pattern for dt_vmatch(3). One may match the molecule pattern against either a reaction or molecule target, but it is not possible to use a reaction pattern for vector matching on any target (reaction or molecule).
The main distinction between a reaction depiction and a molecule depiction is the presence of a reaction arrow, and the potential desire to lay out the various reaction parts (reactant, agent, product) in different regions. These two functions are handled with dt_depict(3), and dt_calcxy(3); all other depiction-related functions remain unchanged.
dt_calcxy(dt_Handle
depiction) => dt_Boolean ok
If atom map classes are available for the atoms in the depiction, the toolkit will attempt to orient the reactant and product sides of the depictions the same way. The toolkit attempts to minimize the RMS distance between mapped atom pairs by reorienting the product part of the reaction depiction before laying out the parts of the reaction. This orientation first applies to ring atoms within the depiction. If no mapped ring atoms are found, non-ring atoms are used.
dt_depict(dt_Handle
depiction) => dt_Boolean ok
The arrow is positioned as follows: a horizontal vector is laid out between the midpoints of the reactant and product parts of the depiction. The vector is clipped so that it doesn't overlap any parts of the reaction. Finally, the clipped vector with an arrowhead is drawn. If it is not possible to clip the vector so it doesn't overlay any part of the reaction, the toolkit will then draw a short arrow between the midpoints of the reactants and products, ignoring any overlap.
The conformation object allows the storage of (x, y, z) coordinate data for the atoms in a molecule and reaction. A conformation object makes no distinction between the roles of atoms in the reaction object. With the exception of allowing a conformation to be created from a reaction, all conformation-oriented functions remain unchanged.
The fingerprint object does behave differently for a reaction object versus a molecule object. The differences are seen when creating a fingerprint object, all other fingerprint toolkit functions remain unchanged. In addition, there is a new fingerprint-creation function, dt_fp_differencefp(3), which is designed primarily for reaction processing.
dt_fp_generatefp(dt_Handle
object, dt_Integer minstep, dt_Integer maxstep, dt_Integer size) =>
dt_Handle fingerprint
For reactions, the fingerprints tend to be quite dense, and are somewhat less efficient a structural screens that for molecules. The main advantage of this scheme is the full compatability of these reaction fingerprints with molecule fingerprints in the Daylight system. Note also that this fingerprint scheme doesn't provide the most appropriate measure of similarity for reactions.
dt_fp_differencefp
(dt_Handle object, dt_Integer minstep, dt_Integer maxstep, dt_Integer
size) => dt_Handle fingerprint
For a molecule or molecule-derived object, returns the normal fingerprint, (identical to dt_fp_generatefp(3)).
For a reaction or reaction-derived object, returns the difference in fingerprint between the reactant and product parts of the object as follows:
There is one important caveat for difference fingerprints: to work optimally, the reaction must have unit stoichiometry. If not, missing atoms on either side of the reaction will result in extraneous bits being set in the difference fingerprint.
Transforms are very similar in behavior to patterns. Essentially the transform language is a subset of SMARTS, with some additional specific requirements. These requirements are validated on input of the transform. This also means that any valid SMIRKS is a valid SMARTS and hence can be optimized by dt_smarts_opt(3). A more extensive discussion of the relationship if SMILES, SMARTS, and SMIRKS can be found in the Daylight Theory Manual.
dt_smirkin(dt_String
SMIRKS) => dt_Handle transform
dt_smarts_opt(dt_String
SMIRKS, dt_Integer vmatch) => dt_String SMIRKS
dt_type(dt_Handle transform)
=> dt_Integer TYP_TRANSFORM
dt_typename(dt_Handle
transform) => dt_String "transform"
dt_info(dt_Handle
transform, dt_String "smirks") => dt_String input SMIRKS
dt_match(dt_Handle
transform, dt_Handle target, dt_Integer limit) => dt_Handle
pathset
dt_pattern(dt_Handle
transform, dt_Integer role) => dt_Handle pattern
dt_transform(dt_Handle
transform, dt_Handle som, dt_Integer direction, dt_Integer limit) =>
dt_Handle sequence of reactions
dt_utransform(dt_Handle
transform, dt_Handle som, dt_Integer direction, dt_Integer limit) =>
dt_Handle sequence of reactions
The "direction" may be one of DX_FORWARD or DX_REVERSE. When direction is DX_FORWARD, the given molecules are treated as reactants and the transform is applied in the forward direction to the molecules. When "direction" is DX_REVERSE, the given molecules are treated as products and the transform is applied in the reverse direction.
The application of a transform logically occurs in two steps. In the forward direction, the reactant side of the transform is matched, as SMARTS, against the set of molecules given. Each place where the SMARTS matches is marked. In the second step, the atom and bond changes in the transform are applied to the matched molecules.
The only difference between dt_transform(3) and dt_utransform(3) is the function which is used to match the SMARTS expression (dt_match(3) and dt_umatch(3) respectively). The net result is that with dt_utransform(3), the resulting answers are generated from the unique set of matches, while with dt_transform(3), the complete set of answers results.
A transform (at least in one direction) can be thought of as a SMARTS expression plus a set of atom and bond changes.
The resulting sequence of reaction objects are owned by the user. Both the sequence and the reactions must be deallocated by the calling program when done with them. The given molecules or sequence of molecules are not modified by the function.