Toolkit Tutorial: Overview


  1. Introduction
  2. Dataypes, Objects, & Relationships
  3. SMILES - Molecules
  4. SMILES - Reactions
  5. SMARTS - Patterns
  6. SMIRKS - Transforms
  7. Exercises

  1. Introduction
  2. The Daylight Toolkits...

    ... are a set of shared object libraries and header files.
    ... give programmers the power to handle complex chemistry with ease.
    ... use a robust, consistent and well-defined API.
    ... are written to allow cross-platform portability and language-independence.

  3. Datatypes, Objects, & Relationships
  4. Datatypes

    Several fundamental datatypes are used by the toolkit in order to generalize platform-specific handling of data. The following table lists the datatypes that facilitate cross-platform portability:

    Datatype Concept & Example
    dt_Boolean Concept: Logical
    Example: The dt_Boolean datatype represents one bit of information. Valid values are 1 (for TRUE) and 0 (for FALSE).
    dt_Integer Concept: Integral
    Example: The dt_Integer datatype represents whole numbers within the range of 2^31 to -2^31.
    dt_Real Concept: Floating-point
    Example: The dt_Real datatype represents floating-point values (2-2^23)^127 to -(2-2^23)^127 and is not able to represent zero.
    dt_String Concept: Character Array
    Example: The dt_String datatype represents an array of characters. For platform independence, the length of the character array is specified with an associated dt_Integer datatype, not with a character (i.e., NULL) within the array. Functions that return a string typically accept the address of the dt_Integer as the first parameter, which is used to pass the length of the string back to the caller.

    Objects

    A generic datatype, called a handle, is used to represent all objects in the toolkit. An object handle is an integer. The value of the handle is a number, and that number has meaning within the toolkit. It's clean and simple. The opaqueness of objects and standardization of the toolkit interface makes the toolkit stable and enables predictable and reproducible behavior.

    Datatype Concept & Example
    dt_Handle Concept: Object Handling
    Example: A handle is an integer representing an object. The integer is an index into the internal toolkit table. A handle is used to represent objects, including: Atom, Bond, Cycle, Molecule, Reaction, Pattern, Pathset, Path, Transform, Stream, Sequence.
    Typically, the first object created with the toolkit is represented by the number one, the second object is number two, and so on. The integer is an index into an internal table and the toolkit efficiently sorts out everything from there. Further, representing objects in this way makes them opaque - you can't access data about the objects directly, only operate on them with a variety of methods.

    The following shapes are used in the following figures:

    Shape Concept & Example
    Concept: Basic Object
    Example: Molecule, Reaction, Pattern, Transform
    Concept: Child Object
    Example: Atom, Bond, Cycle, Molecule in a Reaction
    Concept: Container Object
    Example: Stream, Sequence

    Relationships

    Object Concept & Example
    Concept: A child is an integral part of a parent.
    Example: Atoms, bonds, and cycles (children) are parts of a molecule (parent). You may allocate a molecule and add an oxygen atom to it to create water ("O"). The water molecule has an atom count of 1, a bond count of 0, and a cycle count of 0. The atom refers to the molecule as its parent.

    Molecules (as children, i.e., reactants, agents, and products) are parts of a reaction (parent). You may allocate a reaction and add components to create a water ionization reaction ("O>>[OH-].[H+]"). To do so, add the water as a reactant component. Then, set the implicit hydrogen count to 1 and the charge to -1 and add it as a product component. Then, set the atom number to 1 and charge to 1 and add it as a product component. The reaction has a molecule count of 2, an atom count of 3, a bond count of 0, and a cycle count of 0. The reaction adds a copy of the molecules, which refer to the reaction as its parent.

    Concept: A child affects its parent.
    Example: Deallocation of the atom changes the molecules' atom count to 0.
    Concept: A parent owns its child.
    Example: Deallocation of the molecule causes deallocation of the atom, which makes the atom invalid.
    Concept: A parent is directly accessible from the child.
    Example: The following are the parent-child relationships:

    Molecule-Atom
    Molecule-Bond
    Molecule-Cycle
    Reaction-Molecule

  5. SMILES - Molecules
  6. Object Concept & Example
    Concept: A molecule is input and output using the SMILES Toolkit.
    Example: You can read SMILES in and create an ionized water molecule ("[OH-].[H+]"). The molecule can be written in canonical SMILES or arbitrary SMILES form.
    Concept: A stream is a container object.
    Example: A molecule is streamed to access its atoms, bonds, or cycles.
    Concept: A stream is like a linked list.
    Example: The atom stream contains a next atom. Initially, asking for the next item gives the first atom ("[OH-]"). The atom number is 8, the symbol is "O", and the charge is -1. The next item gives the second atom ("[H+]"). The atom number is 1, the symbol is "H", and the charge is 1. When there are no more atoms, the next item is the NULL OBject. The atom stream can be reset and the next item gives the first atom again. Each atom refers to the molecule as its parent.
    Concept: A stream is derived from a base object.
    Example: The atom stream refers to its molecule as its base. The atoms are not deallocated when the atom stream is deallocated. The stream is deallocated when the molecule is deallocated.
    Concept: A stream is invalid when its base object changes.
    Example: The atom stream is invalid when an atom is added to the molecule, or an atom or the molecule is deallocated.

  7. SMILES - Reactions
  8. Object Concept & Example
    Concept: A reaction is input and output using the SMILES Toolkit with reaction capabilities.
    Example: You can read SMILES in and create a water ionization reaction ("O>>[OH-].[H+]"). The reaction can be written in canonical SMILES or arbitrary SMILES form.
    Concept: A stream is a container object.
    Example: A reaction is streamed to access its molecules, atoms, bonds, or cycles.
    Concept: A stream is like a linked list.
    Example: The molecule stream contains a next molecule. Initially, asking for the next item gives the first molecule ("O"). The molecule role is "reactant". The next item gives the second molecule ("[OH-].[H+]"). The molecule role is "product". When there are no more molecules, the next item is the NULL OBject. The molecule stream can be reset and the next item gives the first molecule again. Each molecule refers to the reaction as its parent.
    Concept: A stream is derived from a base object.
    Example: The molecule stream refers to its reaction as its base. The molecules are not deallocated when the molecule stream is deallocated. The stream is deallocated when the reaction is deallocated.
    Concept: A stream may become invalid when its base object changes.
    Example: The molecule stream is invalid when a component is added to the reaction, or a molecule or the reaction is deallocated.

  9. SMARTS - Patterns
  10. Object Concept & Example
    Concept: A pattern object is input using the SMARTS Toolkit.
    Example: You can read SMARTS in and create an aromatic oxygen pattern ("Oa"). The pattern can be SMARTS optimized for matches on typical molecules.
    Concept: A pattern object and either a molecule or reaction are used to find matching paths.
    Example: The aromatic oxygen pattern and hydroquinone ("Oc1ccc(O)cc1") have matching paths.
    Concept: A pathset object is a set of paths.
    Example: There are two oxygen-aromatic paths in hydroquinone.
    Concept: A path object is a set of atoms and bonds.
    Example: The oxygen-aromatic path contains 2 atoms and 1 bond.
    Concept: A pathset and its paths are derived from a base object.
    Example: The pathset and paths refer to their molecule as its base. The pathset and oxygen-aromatic paths are deallocated when the hydroquinone molecule is deallocated. Paths are deallocated when the pathset is deallocated.
    Concept: A pathset and path may become invalid when its base object changes.
    Example: The pathset and the oxygen-aromatic paths are invalid when an atom is added to the molecule, or an atom or the molecule is deallocated.
    Concept: A stream is a container object
    Example: The pathset is streamed to access its oxygen-aromatic paths.
    Concept: A stream is like a linked list.
    Example: The path stream contains a next path. Initially, asking for the next item gives the first path ("Oa"). The unique identifiers (uid) of the atoms and bond reflect their position in the input SMILES string. When there are no more paths, the next item is the NULL OBject. The path stream can be reset and the next item gives the first path again. Each path refers to the molecule as its base.
    Concept: A stream is derived from a base object.
    Example: The path stream refers to its pathset as its base. The paths are not deallocated when the path stream is deallocated. The stream is deallocated when the pathset is deallocated.
    Concept: A stream may become invalid when its base object changes.
    Example: The path stream is invalid when an atom is added to the molecule, or an atom or the molecule is deallocated.

  11. SMIRKS - Transforms
  12. Object Concept & Example
    Concept: A transform object is input using the SMILES Toolkit with reaction capabilities.
    Example: You can read SMIRKS in and create a water ionization transform ("[O:1][H:2]>>[O-:1].[H+:2]").
    Concept: A sequence is a container object.
    Example: Any object can be appended or inserted to a sequence.
    Concept: A sequence is like a linked list
    Example: Like a stream, the molecule sequence contains a next molecule. Initially, asking for the next item gives the first molecule. The next item gives the second molecule. When there are no more molecules, the next item is the NULL OBject. The molecule sequence can be reset and the next item gives the first molecule again.
    Concept: A sequence is completely separate and not derived from another object.
    Example: The molecule sequence does not refer to another object. The molecules are not deallocated when the sequence is deallocated. The sequence is not deallocated when the molecules are deallocated. The proper way to deallocate a sequence and its contents is to reset the container, delete and deallocate each item, then deallocate the container.
    Concept: A sequence does not become invalid when other objects change.
    Example: The molecule sequence is not invalid when adding an atom to a molecule, or when a molecule is deallocated.
    Concept: A transform object and a sequence of molecules are used to transform the molecules into a sequence of reactions.
    Example: The water ionization transform and a sequence containing water and hydroquinone molecules are transformed into ionized water and ionized hydroquinone reactions ("[OH-:1].[H+:2]" and "[O-:1]c1ccc(cc1)O.[H+:2]"). The atoms involved in the transformation are "mapped" by class number.

Next Section: Overview Exercises


Michael A. Kappler
Last modified: Wed Jun 9 13:13:50 MDT 2004