7. SMILES Toolkit: Molecules
Back to Table of Contents
A
molecule object
represents the
atoms,
bonds,
cycles
and chiral centers of a molecule. Because it is such a fundamental object in
computational chemistry, there are more functions that operate on molecules
than any other object. One can:
- Produce a molecule from a SMILES string.
- Produce a SMILES string or a unique SMILES string from a
molecule.
- Build a molecule "from scratch" using functions to create an
empty molecule, then adding atoms and bonds.
- Add and delete atoms and bonds.
- Change the properties of atoms and bonds.
- Test for aromaticity of a molecule, atom, or bond. Aromaticity
is determined automatically for Kekulé structures.
- Find symmetry classes for atoms.
- Tests for and set chiral features.
- Generate streams of the atoms, bond, and cycles of a molecule,
and streams of atoms of a cycle, bonds of a cycle, and so forth.
7.1 Creating Molecules
There are two ways to create a molecule object: "From scratch"
(allocate an empty molecule), and by parsing a SMILES string:
-
dt_alloc_mol() => molecule
-
Returns a new, empty molecule.
-
dt_smilin(string smiles) => molecule
-
Interprets the given SMILES string and return a handle for the
resulting molecule structure.
Efficiency Note: The Toolkit's internal representation of molecule
objects is designed for efficient analysis of the molecule's
properties, and for responding to queries about the molecule quickly.
It is not intended to be a compact representation of the molecule,
and uses many times more memory to store than a compact
representation such as a SMILES string. Applications that require
many thousands of molecules in memory simultaneously should use a
more compact representation for those molecules that are not of
immediate interest.
7.2 Constituents of a Molecule
These functions provide ways to enumerate (generate streams of) the
atoms, bonds, cycles, and chiral features of molecules. Also
included are two functions, dt_bond() and dt_xatom(), for accessing
related constituents without the necessity of creating a stream.
-
dt_stream(Handle ob, integer typeval) => stream
-
Generate a stream of
atoms,
bonds or
cycles -- a stream that contains all of the
objects of the specified type that are part of the object.
Object can be a molecule, atom, bond or cycle. For example, a stream
of dt_stream(bond, TYP_ATOM)
returns the two atoms at either end of the bond;
a stream of dt_stream(cycle, TYP_BOND)
returns all the bonds that are part of
the cycle.
Note: remember, dt_stream() is polymorphic --
it applies to other objects, too. Here, we are only discussing the
molecule and its constituent parts.
-
dt_canstream(Handle object,Integer type, boolean iso, boolean addh) => stream
-
Allocates a stream of type 'type', in canonical order, for the molecule or reaction 'object'.
Object can be a molecule, atom, bond or cycle.
-
dt_origstream(Handle object,Integer type) => stream
-
Returns a stream of objects in which the objects appear in "original"
order. That is, dt_next() will return atoms in the same order as
they appear in the original string used to create the object
molecule via dt_smilin(), or in the order in which they were added to
molecule using dt_addatom().
-
dt_bond(Handle at1, Handle at2) => bond
-
Returns the handle of the bond joining the two atoms.
-
dt_xatom(Handle a, Handle b) => atom
-
Return the atom that is across the bond b from the atom a.
-
dt_uid(Handle abc) => integer
-
Returns the unique id of an
atom,
bond or
cycle
within the containing molecule.
A unique id is a smallish non-negative integer (i.e. it can be zero)
that is guaranteed to not change for as long as the object
abc exists. The intention is that unique id's, unlike
handles, be reasonably dense; for this reason the uid makes a good
array index but a handle does not. Note that unlike handles, uid's
are only unique across a single containing object; for example, atoms
from two different molecules may have the same uid.
-
dt_uidrange(Handle molecule, integer typ) => integer
-
Returns a number that is at least 1 greater than the largest uid
currently associated with any constituent having type
typ
contained in the molecule.
7.3 Modifying Molecules
7.3.1 Derived Properties
Many
molecule
properties are derived properties. Derived properties
are not explicitly specified as you create the molecule; rather, they are
computed once the molecule is assembled. For example, you don't
directly add a
cycle
(a ring) to a molecule; instead you add various
bonds
between the molecule's
atoms; the Toolkit detects the
existence of cycles after a molecule's atoms and bonds are completely
specified. Cycles are thus a derived property. Other derived
properties include aromaticity, chirality and, in some cases, bond
type (see also
dt_bondtype()
and
dt_bondorder()).
7.3.2 The Modify-on and Modify-off States
Before a molecule object can be modified it must be put into the
modify-on state; when modifications are complete, the molecule object
is returned to the modify-off state. Generally speaking, functions
that modify significant properties of a molecule or its constituents
may be applied only in the modify-on state. These functions are
further divided into structural-modification functions (described
below) which change the structure of the molecule, and non-
structural-modification functions, which merely change the properties
of the existing structure of the molecule.
These modify-on and modify-off states serve two purposes. First,
when modifying a molecule or building one "from scratch," the
molecule may enter temporary configurations in which it does not
represent a valid chemical compound. The modify-on state indicates
that the molecule may be in such a state, and prevents the
application from asking questions (such as questions about derived
properties) that the Toolkit may not be able to answer. Second, some
of the derived properties take a significant amount of time to
compute (e.g. finding a "smallest set of smallest rings" is a
computationally difficult task for which no fast algorithm exists).
The transition from modify-on to modify-off tells the Toolkit to
recompute derived properties as necessary.
-
dt_mod_on(Handle m) => boolean
-
Puts the given molecule into the modify-on state; molecules in this
state may be modified.
-
dt_mod_off(Handle m) => boolean
-
Puts the given molecule into the modify-off state. This function
causes the molecule's structure to be analyzed; its properties may be
changed as a result. The most notable change is to the aromaticities
of constituents (atoms, bonds, and cycles). A recalculation of
contained cycles may also take place.
If there is an error, the molecule is deallocated just as though dt_dealloc() had been called.
(This is an unfortunate side-effect of the structure-analysis
functions: if they fail, they leave the molecule in an unusable state.
Molecules that are "precious" should be copied just prior to
invoking dt_mod_off(); if it
returns TRUE the copy can be discarded. The copy-and-discard
operation is "cheap" (i.e. fast) compared to the structural
analysis.)
-
dt_mod_is_on(Handle m) => boolean
-
Returns TRUE if the molecule is in the modify-on state, FALSE
otherwise.
7.3.3 Functions Applicable Only During Modify-On
These functions can only be applied to a molecule or its constituent
parts when the molecule is in the modify-on state. Generally
speaking, such functions modify the structure of a molecule in some
significant way.
dt_addatom()
dt_addbond()
dt_dealloc() (when applied to an atom or bond)
dt_setbondorder()
dt_setbondtype()
dt_setcharge()
dt_setchival()
dt_setdbo()
dt_setimp_hcount()
dt_setnumber()
dt_setweight()
7.3.4 Functions Applicable Only During Modify-Off
These functions can only be applied to a molecule or its constituent
parts when the molecule is in the modify-off state. Generally
speaking, such functions only make sense when applied to well-formed
molecules.
dt_arbsmiles()
dt_cansmiles()
dt_symclass()
dt_symorder()
dt_xsmiles()
7.3.5 Functions Applicable At All Times
All functions not listed either here or in the previous section that
normally apply to molecules can be applied to a molecule in both the
modify-on or the modify-off states.
7.4 Structural-Modification Functions
The three functions dt_addatom(), dt_addbond(), and dt_dealloc()
(when applied to atoms or bonds) are collectively referred to as
structural modification functions. After calling a structural
modification function, future streams returned by dt_stream() are no
longer guaranteed to return objects in the same order that they were
returned before the modification. Note that this remains true even
if the structure of the molecule is later restored to an equivalent
form.
Also, remember that any structural modification to a molecule causes
all streams of atoms, bonds or cycles over the molecule to be
deallocated.
-
dt_addatom(molecule m, integer atno, integer hcount) => atom
-
Add an atom with atomic number
atno and
hcount hydrogens to the given molecule.
-
dt_addbond(atom a1, atom a2, integer btype) => bond
-
Add a bond with the given bond type between the two atoms.
-
dt_dealloc(object ab) => boolean
-
Atoms and bonds are removed from a molecule by deallocating them.
7.5 Properties of Atoms
Arbitrary SMILES:
An Arbitrary SMILES is derived by the same algorithm as a
unique SMILES, except that a user-specified set of labelings is
used, allowing the generation of a SMILES in an arbitrary order. The
user-specified labeling of each atom is called the arbitrary
order of the atom. The SMILES begins with the atom whose
arbitrary order is lowest; when branch points are reached, the branch
with the atom whose arbitrary order is lowest is written first. The
following functions are related to Arbitrary SMILES:
-
dt_setarborder(atom at, integer order) => boolean
-
Sets the atom's arbitrary order value
-
dt_arborder(atom at) => integer
-
Returns arbitrary order value for the given atom.
-
dt_arbsmiles(molecule m, boolean iso) => string
-
Returns an Arbitrary SMILES string for the given molecule.
The
iso parameter indicates whether
the returned SMILES string should contain isomeric labelings.
Atomic Charge:
Two functions are provided to set and get the charge on an atom:
-
dt_setcharge(atom at, integer charge) => boolean
-
Sets the atom's formal charge.
-
dt_charge(atom at) => integer
-
Returns the atom's formal charge.
Hydrogen Count:
The graphs used to represent molecules are usually hydrogen-
suppressed: hydrogens are represented as a property of the "heavy"
atoms to which they are attached rather than as separate atom
objects. Such hydrogens are called implicit hydrogens. In some
cases hydrogens must be actual objects (e.g. when there is isotopic
information or more than one bond to the hydrogen); in other cases it
may be convenient to have hydrogen objects (e.g. when data, such as
xyz coordinates, are known about them). Such hydrogens are called
explicit hydrogens.
The following functions are used for implicit and explicit hydrogens
(also see dt_addatom()):
-
dt_hcount(atom at) => integer
-
Returns the total number of hydrogen atoms (implicit and explicit
hydrogens) bonded to the atom.
-
dt_imp_hcount(atom at) => integer
-
Returns the number of implicit hydrogens bonded to the atom.
-
dt_setimp_hcount(atom at, integer count) => boolean ok
-
Sets the number of implicit hydrogens on the atom.
Atomic Number, Symbol, and Weight:
An atom's
atomic number and weight are independent in the Daylight
Toolkit. In real life, only certain isotopes exist for each atomic
number; the Daylight Toolkit imposes no such constraint.
The atomic symbol is derived directly from the atomic number; the
Toolkit doesn't provide a way to set it independently.
-
dt_number(atom at) => integer
-
Returns the atom's atomic number.
-
dt_setnumber(atom at, integer num) => boolean
-
Sets the atom's atomic number.
-
dt_symbol(atom at) => string
-
Returns the atom's atomic symbol (e.g. "C", "Si").
-
dt_weight(atom at) => integer
-
Returns the atom's atomic weight. The returned weight is '0' if the
weight is unspecified (eg. the default weight of an atom). Returns
an integer weight for atoms which have been set to a specific isotope
value with dt_setweight().
-
dt_setweight(atom at, integer weight) => boolean
-
Sets the atom's atomic weight.
7.6 Properties of Bonds
Bond type and bond order are closely related but not identical
properties of a
bond object.
Bond order: a formal property of the bond, which can only
be one of DX_BTY_SINGLE, DX_BTY_DOUBLE, DX_BTY_TRIPLE, representing
single bonds, double bonds, and triple bonds, respectively.
Bond type: a derived property, which
is normally computed by the Toolkit when
the molecule
goes from modify-on to modify-off. The primary
situation where the bond type will differ from bond order is in
aromatic structures, in which single and double bonds can be
converted to aromatic bonds. Bond type can be any of DX_BTY_SINGLE,
DX_BTY_DOUBLE, DX_BTY_TRIPLE, or DX_BTY_AROMAT.
When a molecule is in the modify-on state, a bond's type or order can
be changed. Normally, one specifies a bond's type, and lets the
Toolkit generate the bond order from that. If you specify a bond's
order via dt_setbondorder(), its type will be changed too. If you
specify its type via
dt_setbondtype(),
the bond order may not agree
until dt_mod_off() is called.
-
dt_bondtype(Handle bond) => integer
-
Returns the bond's type. This
value can change when a molecule changes from the modify-on state to
the modify-off state (For example, it might change from single or
double to aromatic).
-
dt_setbondtype(Handle bond, integer type) => boolean ok
-
Sets the bond's type.
Also may affect the bond's order; if the bond
type is set to single, double, or triple, the bond order is too; if
the bond type is set to aromatic, bond order becomes unknown.
-
dt_bondorder(Handle bond) => integer order
-
Returns the bond's order.
-
dt_setbondorder(Handle bond, integer order) => boolean ok
-
Sets the bond's order.
Also affects the bond's type, which is also set
to the value
order .
7.7 Properties of Cycles
There are no specific functions for accessing or modifying
cycles in a
molecule,
as cycles are a derived property of the
bonds. The
general function dt_stream() will return the cycles of a molecule,
bond, or atom.
7.8 Generating SMILES
-
dt_cansmiles(molecule m, boolean iso) => string
-
Returns a canonical SMILES string for the given molecule. (Note that
this causes calculation of the canonical labelings if it has not yet
been done, a potentially time-consuming operation.) The
iso
parameter tells whether the SMILES string should contain isomeric
labellings (isotopic and chiral information). (A canonical SMILES
string with isomeric labelings is called an Absolute SMILES. Without
isomeric labelings, it is called a Unique SMILES.). The molecule
must be in the modify-off state (see
dt_mod_off()).
Note: The string returned is part of the molecule object and may
change or be discarded if the molecule is modified or deallocated.
In general, you should copy the string if you will need it later.
-
dt_xsmiles(molecule m, boolean iso, boolean explicit) => string
-
Returns an exchange SMILES string for the given molecule. An exchange SMILES
is a SMILES with Daylight aromaticity conventions eliminated.
The
iso parameter tells whether the SMILES string should contain
isomeric labellings (isotopic and chiral information).
The explicit parameter tells whether to also explicitly list
attached hydrogens for all atoms. The molecule must be in the modify-off
state (see dt_mod_off()).
Note: The string returned is part of the molecule object and may
change or be discarded if the molecule is modified or deallocated.
In general, you should copy the string if you will need it later.
7.9 Aromaticity
These functions test the aromaticity of
molecules,
atoms,
bonds and
cycles,
and where appropriate, allow you to set those attributes.
Aromaticity in the Daylight Toolkit is a complex subject. For a more
thorough discussion of aromaticity in the Daylight System, please see
SMILES Chapter of the
Daylight Theory Manual.
-
dt_aromatic(object ob) => boolean
-
Returns TRUE if the given object (an atom, bond, cycle or molecule)
is considered aromatic.
-
dt_setaromatic(atom at, boolean isarom)
-
Sets the aromaticity of the atom at to TRUE or FALSE according to the
value of isarom.
7.10 Symmetry
The Daylight Toolkit can compute the symmetry of a molecule. There
are two different symmetry values you can access.
Symmetry Class:
Two atoms in a molecule will be in the same symmetry class if
and only if they are symmetrically equivalent. The actual number
assigned to a symmetry class is arbitrarily -- the
the only significance of the numbers is whether two atoms have the same
class number or not.
Symmetry Order:
The algorithm
that generates the symmetry order uses graph invarients (including
the symmetry classes described above) to generate a unique
labeling (the symmetry order) of the molecule's graph.
An atom's symmetry order controls the generation of the
Unique SMILES (see
dt_cansmiles()).
Note that any change, however slight, to the molecule may cause the
symmetry class and/or symmetry order values to change.
-
dt_symclass(atom at) => integer
-
Returns the unique symmetry class of an atom in its parent molecule.
-
dt_symorder(atom at) => integer
-
Returns the unique symmetry order of an atom in its parent molecule.
7.11 Chirality
The most complex attributes are chirality attributes, which are
specified by single integer codes called chiral values. These values
combine two separate pieces of information, a chiral class
(corresponding to a geometric configuration such as tetrahedral,
octahedral, and so on) and a chiral order (a particular ordering
around the chiral center, such as clockwise, counter-clockwise, and so
on).
Symbolic constants are defined to simplify the specification of
chiral values. In the current implementation, only cis/trans and
tetrahedral chirality are supported. The following symbolic
constants combine the chiral class and chiral order information for
convenience:
Cis/Trans Chirality
|
DX_CHI_NO_DBO | cis/trans situation, but chirality is unspecified
|
DX_CHI_CIS | cis configuration around a double bond
|
DX_CHI_TRANS | trans configuration around a double bond
|
Tetrahedral Chirality
|
DX_CHI_NONE | unspecified chirality
|
DX_CHI_THCCW | tetrahedral center with counterclockwise configuration
|
DX_CHI_THCW | tetrahedral center with clockwise configuration
|
-
dt_dbo(bond db, bond b1, bond b2) => integer
-
Returns the "double-bond orientation" between
b1
and b2 . The bond
db should be a double bond that is at the center of a cis/trans
configuration. Bonds b1 and b2
should single bonds attached to the
atoms at the end of db , one on each of the two atoms. The return
value will be equal to one of the symbolic constants DX_CHI_CIS,
DX_CHI_TRANS, or DX_CHI_NO_DBO. The latter case indicates that the
cis/trans configuration around db is unspecified.
-
dt_setdbo(bond db, bond b1, bond b2, integer dboval) => boolean
-
Sets the "double-bind orientation" between b1 and b2 to the
given value. The first three parameters are as described above for
dt_dbo(). The last parameter is one of
DX_CHI_CIS, DX_CHI_TRANS, or DX_CHI_NO_DBO.
-
dt_chival(atom at, sequence seq) => integer
-
Returns the chiral value around the given chiral center
at ,
determined with respect to the order of the bonds in this sequence.
See the function's full description for details.
-
dt_chiseq(atom at, integer chival) => sequence
-
Returns a sequence of bonds having the chirality given by chival
around the given atom at (the chiral center).
the chiral order portion of the value is used to determine the ordering
of the returned sequence.
See the function's full description for details.
-
dt_setchival(atom at, sequence seq, integer chival) => boolean
-
Sets the chiral value at the given chiral center at. The parameter
seq is a sequence of bonds that meets the conditions specified for
dt_chival(); the chiral value is set with respect to the order of
bonds in this sequence.
-
dt_chiperm(sequence seq, bond start, integer chival) => sequence
-
Given a sequence of bonds having the given chiral value, modify it
(i.e., permute it) so that the chiral value is preserved, but so that
it begins with the given bond
start .
-
dt_chiclass(integer chival) => integer
-
Return an integer code for just the chiral class part of the given
chiral value.
-
dt_chiorder(integer chival) => integer
-
Returns an integer code for just the chiral order portion of the
given chiral value.
-
dt_isohydro() => atom
-
Returns a hydrogen-atom object that is useful for representing
implicit-hydrogen atoms in calls to the isomeric functions. Each
call to this function returns the same special atom. The atom may
not be modified (attempts will fail) and it has no parent molecule
(calls to dt_parent() will return
NULL_OB). In general, applications
should not attempt to play around with it too much; its only intended
use is in calls to the isomeric functions defined above.
Back to Table of Contents
Go to previous chapter Basics: Streams and Sequences
Go to next chapter SMILES Toolkit: Substructures and Paths.
|