2. Molecules and Reactions in A ComputerThe foundation of a chemical information system is its ability to represent molecules in a computer and to communicate a molecule's structure from one place to another. This can seem like a simple problem at first glance so that easy solutions are often proposed and implemented. But a close examination of the problem reveals that several subtle traps await the unwary and methods of avoiding them must be considered before an effective computer representation of a molecule can be designed.
2.1 Representing MoleculesTo represent a molecule in a computer, we must first choose a particular physical model. Many models have served chemists, ranging from the Bohr model through the most modern quantum theory; all have had adherents, detractors, uses, and flaws. When using such models, we must always avoid the trap of arguing that a particular model is right rather than arguing that it is useful. Models are just that - models. Daylight's system represents molecules using a fairly standard valence model. For example, the Daylight system understands the normal valences of organic compounds, and by counting the bonding electrons in a molecule, can fill in unspecified hydrogens, detect aromatic and anti-aromatic ring systems, and issue warnings when unlikely or impossible molecules are specified. The Daylight system represents a molecule as a graph in which the nodes are atoms and the edges are bonds. Each atom has a several properties, including its atomic number, atomic weight, charge, and the number of attached hydrogens. If the atom is a chiral center, it can also have chiral specifications. Bond properties are simpler: a bond is single, double, triple, or aromatic. The concept of aromaticity in the Daylight system is not a chemical one, but rather is a set of rules designed for a chemical nomenclature system (this is discussed more in the SMILES chapter). There is some flexibility in this valence model. Molecules can be represented as a hydrogen-suppressed graph (hydrogen atoms are represented as a property of "heavy" atoms) or as a hydrogen-complete graph (hydrogens are represented the same way as other atoms). Bonds in cyclic structures can be represented as aromatic or as the alternating single/double bond Kekulé form. Isotopic information such as chirality and atomic mass can be unspecified, partially specified, or completely specified. 2.2 Analyzing MoleculesThere are two classes of properties associated with a molecule and its constituent parts (atoms and bonds): explicit properties and derived properties. Explicit properties are those needed to completely specify the graph of a molecule: its atoms and bonds and their properties. Derived properties are properties that can be computed from the graph. The following sections describe the derived properties that are of interest in the Daylight system.2.2.1 CyclesWhen a molecule's atoms and bonds are specified, it may be that there are cycles in the graph. A bond is said to be a ring bond if it can be removed without breaking the graph (molecule) into two pieces. In this case, the graph is said to be biconnected). Atoms that are connected via ring bonds are ring atoms.There are two parts to ring-detection in a graph:
2.2.2 Bond Type, Bond Order, and AromaticityBonds are, in a sense, both an explicit and a derived property. Although you generally specify the bond type of each bond, the Daylight system will sometimes rearrange them, such as in Kekulé ring systems.The Daylight system defines bond type and bond order as follows:
Note that the definition of aromaticity is not intended to imply anything about the reactivity, magnetic resonance spectra, heat of formation, or odor of substances. Rather, the definition is designed to be useful in a chemical nomenclature system (SMILES) that is discussed in detail in the subsequent chapter. 2.2.3 SymmetryA molecule's symmetry is useful for many purposes, including generating canonical labelings (such as when generating a unique SMILES), classifying chirality, detecting degenerate chiral specifications, and eliminating redundant calculations. The Daylight system automatically detects symmetry in the molecules it represents as those with two dimensional rotations.2.2.4 Canonical LabelingA computer representation of a molecule is often built in an arbitrary fashion; one can start with any atom on the molecule and add atoms and bonds in any order. If a "label" (typically a number) is assigned to each atom and bond as it is specified, the labeling is also arbitrary - a different input order of the same molecule results in a different set of labels.Chemical nomenclature systems such as SMILES require a canonical labeling of the atoms and bonds - a numbering that is independent of the history of the molecule's representation. The Daylight system generates such a labeling whenever it generates a unique SMILES. 2.2.5 ChiralityChirality, like bonds, is both an explicit property and an derived property. That is, the Daylight system accepts various chiral specifications on input, though it will sometimes change the specification to a different (but equivalent) form. Like the canonical labeling of atoms discussed above, a "canonical chiral representation" must be chosen if a unique SMILES is desired.
2.3 Representing ReactionsA reaction consists of an set of molecules, each of which plays a specific role in a reaction: reactant, product, or agent. Since reactions are made up of molecules, reactions naturally use the same valence model, bonding, aromaticity, and symmetry rules as molecules. At minimum, a reaction must contain valid molecules based on these rules. In an ideal world (at least from an information-processing point of view), all reactions would be represented stoichiometrically (every relevant atom shown), and enough information would be present to tell unambiguously which atom was which between the reactants and products. This information would be provided by a pairwise mapping of the reactant and product atoms. In effect, the only difference between the reactant molecule(s) and product molecule(s) would be the bond changes and atom property changes (chirality, charge, aromaticity) which occur during the reaction. If these criteria are met, one can 'superimpose' the reactants and products on one-another and represent the reaction as a reaction graph. This is both a complete and compact description of a reaction. Unfortunately, these stringent requirements can rarely be met for reactions available in electronic form. The Daylight system is designed to be able to represent and store both completely specified (reaction graph-like) reactions and information-deficient reactions in a repeatable and searchable fashion. Although all of the molecules within a reaction must be chemically valid, an overall analysis of the reaction for chemical sensibility is not carried out. The Daylight system is oriented towards single-step reactions with the following three roles for molecules defined:
Note that the above distinctions between reactant, agent, and product all involve the participation of atoms in the reaction. This participation is recorded via the reaction atom map. The atom map simply maps the correspondence of the reactant and product atoms in the reaction. Agents never have meaningful atom maps, since by definition agent atoms do not participate directly in a reaction. Clearly, reactions have additional data which one wants to store about them. The Daylight approach is to only encode the pure structural information in the lexical representation of the reaction and handle the additional data outside of the reaction. A standardized THOR database can allow coupling of the following data about the individual components of a reaction to those components:
2.4 DepictionsOne of the most important jobs of a chemical information system is to communicate information to the chemist effectively. One of the best ways to do this is graphically, using the standard schematic representation of chemicals familiar to all chemists.The Daylight system provides an algorithm for generating these schematic diagrams ab initio - a drawing can be made of any molecule or reaction, whether or not it has ever been seen before. When generating a schematic diagram, two criteria are critical:
Go To Next Chapter...3. SMILES - A Simplified Chemical Language |