General
Version 4.5 of Thor supports existing datatype normalizations for
reaction data and introduces two new reaction-specific normalizations.
- ASMILES
"Absolute (isomeric) SMILES" --
ASMILES normalization preserves all reaction information,
including:
reactant/agent/product roles,
atom maps,
isomeric and isotopic information.
- USMILES
"Unique (generic) SMILES" --
USMILES normalization preserves reactant/product roles,
removes the agent and discards atom maps.
- MAKEGRAPH
"Graph (oxidation-state-suppressed SMILES)" --
Reaction roles are discarded when making GRAPHs, leaving a normal
unique SMILES of the oxidation-state-suppressed, dot-disconnected
reactants and products.
Aside from its normal use in tautomer lookup, this is useful
for indexing reactions in the reverse direction.
- MAKERXNMOL Rtype Atype Ptype
MAKERXNMOL takes three tags (reactant, agent, product)
and when applied to reaction data, autogen's each reactant
component into dataitems by role (e.g., $RMOL, $AMOL, and $PMOL).
- ATOM_NTUPLE, BOND_NTUPLE, PART_NTUPLE
Reaction tuples are processed as if the > reaction
delimiter was a dot, i.e., tuples apply over the entire reaction.
Examples
The following simplified datatrees containing carbonate equilibria
reactions are offered as an example.
Assume that the
$SMI datatype is defined with
USMILES,
MAKEGRAPH and
MAKERXNMOL normalizations,
that
$RMOL,
$PMOL and
REM
are defined in the usual way (simply),
that PK has two
NUMERIC
fields (pK and Temperature),
and that PC has a single
PART_NTUPLE
field (pC).
The following datatrees might be loaded into a Thor database.
$SMI<"O=C=O.[OH2]>>[OH]C(=O)[OH]">
REM<"Dissolution of carbon dioxide in water.">
PK<1.47;25.0>
PC<6.99,-1.74,4.40>
|
$SMI<"[OH]C(=O)[OH]>>[H+].[O-]C(=O)[OH]">
REM<"Dissociation of carbonic acid to bicarbonate.">
PK<6.35;25.0>
|
$SMI<"[O-]C(=O)[OH]>>[H+].[O-]C(=O)[O-]">
REM<"Dissociation of bicarbonate to carbonate.">
PK<10.33;25.0>
|
Note that the SMILES data are quoted in the above datatrees (as they must
be because of the `>' characters). The REM(ark) data are also
quoted (though they don't need to be in these cases). The three numbers
1.74, 6.99, and 4.40 in the PC dataitem form a component-tuple,
corresponding to the reaction components in order: carbon dioxide, water,
and carbonic acid, respectively.
Assume the above trees were loaded into a database which was then
thorlist-ed.
Loading the above trees into a Thor database, then thorlist-ing them
would produce the following datatrees containing normalized data:
$SMI<"O.O=C=O>>OC(=O)O">
REM<Dissolution of carbon dioxide in water.>
PK<1.47;25.0>
PC<-1.74,6.99,4.40>
$GRF<O.OCO.OC(O)O>
$RMOL<O>
$RMOL<O=C=O>
$PMOL<OC(=O)O>
|
$SMI<"OC(=O)O>>[H+].OC(=O)[O-]">
REM<Dissociation of carbonic acid to bicarbonate.>
PK<6.35;25.0>
$GRF<OC(O)O.OC(O)O>
$RMOL<OC(=O)O>
$PMOL<[H+]>
$PMOL<OC(=O)[O-]>
|
$SMI<"OC(=O)[O-]>>[H+].[O-]C(=O)[O-]">
REM<Dissociation of bicarbonate to carbonate.>
PK<10.33;25.0>
$GRF<OC(O)O.OC(O)O>
$RMOL<OC(=O)[O-]>
$PMOL<[H+]>
$PMOL<[O-]C(=O)[O-]>
|
Several normalization features are illustrated here.
- The SMILES data are normalized ("uniquified"), with each component
maintaining its role in the reaction.
- The SMILES data are quoted (due to the > characters).
- Data other than SMILES are not quoted (no reserved characters).
- The PC component-tuple has been rearranged to correspond in order
with the canonical SMILES.
- Each component of each reaction appears as an
$RMOL or
$PMOL
identifier, allowing the reactions to be looked up by any component.
- A Graph ($GRF) has been generated for each reaction.
Note that within the
$GRF, the representations of carbonic acid at
different oxidation states are identical. Also note that the
$GRF
of the dissociation reactions to different oxidation states are also
identical.
Finally, note that the
$GRF of the dissolution reaction is not the
same as the other two, but would be identical to that of the reverse
reaction (allowing the reverse reaction to be found efficiently).
Daylight Chemical Information Systems, Inc.
info@daylight.com