Previous Up Previous Contents References
Appendix

7 Appendix

GENSMI Algorithm
Input: SMILES, Output: indefinite number of SMILES
molecule <- dt_smilin(seed) // interpret a SMILES string
MakeEdge(molecule) // form cycles recursively
MakeNode(molecule) // add an atom and form cycles recursively
dt_dealloc(molecule) // remove object
Figure 1: Pseudo-Code for generating SMILES. The seed can be an atom or a fragment.
MakeNode Routine
Input: molecule, Output: indefinite number of SMILES
NotDone() // check limits on number of SMILES, memory use, time elapsed
setA <- SequenceOfAtoms(molecule, TYP_ATOM) // like dt_stream()
setA <- DeleteRedundancy(setA) // uses dt_symclass() for atom uniqueness
atomA <- dt_next(atomsA) // loop of set A
OkToConnect(atomA) // check valence and ring membership
i <- 1 to atom types // loop over elements - C, N, O, etc.
i <- 1 to bond types // loop over bond orders - single, double, etc.
Index(atomA) > PrevIndexA(molecule) // avoid redundancy between recursion levels
and AtomComposition(molecule, atom typei) // check distribution
and ApproxMass(molecule, 2*atom typei + atom valencei - 2*bond typei) // check maximum
molecule <- Extend(atomA, bond typei) // uses dt_addbond()
Branching(molecule) // number of bonds per non-terminal atom
and Complexity(molecule) // SCMC - Allu and Oprea, CUP IV
and Cyclization(molecule) // number of cycles per ring atom
and ChemConstraints(molecule) // uses dt_match() for SMARTS patterns
and UniqueSmiles(molecule) // uses dt_cansmiles() for graph isomorphism
Similarity(molecule) // uses dt_fp_tanimoto(), dt_fp_tversky(), and dt_fp_euclid()
Output(molecule) // write SMILES and stereoisomers, uses du_chiralify()
MakeEdge(molecule) // form cycles recursively
MakeNode(molecule) // add an atom and form cycles recursively
molecule <- UnExtend(atomA, bond typei) // uses dt_dealloc()
Figure 2: Pseudo-Code for adding an atom. Input requires a molecule. Passage of all constraints results in output of a SMILES string and recursion, otherwise the routine returns.

MakeEdge Routine
Input: molecule,Output: indefinite number of SMILES
NotDone() // check limits on number of SMILES, memory use, time elapsed
setA <- SequenceOfAtoms(molecule, TYP_ATOM) // like dt_stream()
setA <- DeleteRedundancy(setA) // uses dt_symclass() for atom uniqueness
setZ <- SequenceOfAtoms(molecule, TYP_ATOM)
atomA <- dt_next(atomsA) // loop of set A
OkToConnect(atomA) // check valence and ring membership
setB <- RingSet(atomA, setZ) // check A-B patch length
atomB <- dt_next(setB) // loop over set B
Index(atomA) < Index(atomB) // avoid redundancy of A,B and B,A
and NotConnected(atomA, atomB) // check for existing bond
and OkToConnect(atomB) // check valence and ring membership
and RingStrain(atomA, atomB) // avoid pro-spiral and bridging aromatic systems
i <- 1 to bond types // loop over bond orders - single, double, etc.
Index(atomA) > PrevIndexA(molecule) // avoid redundancy between recursion levels
and BondComposition(molecule, bond typei) // check distribution
and ApproxMass(molecule, 2*bond typei) // check minimum
molecule <- Cyclize(atomA, atomB, bond typei) // uses dt_addbond()
Branching(molecule) // number of bonds per non-terminal atom
and Complexity(molecule) // SCMC - Allu and Oprea, CUP IV
and Cyclization(molecule) // number of cycles per ring atom
and ChemConstraints(molecule) // uses dt_match() for SMARTS patterns
and UniqueSmiles(molecule) // uses dt_cansmiles() for graph isomorphism
Similarity(molecule) // uses dt_fp_tanimoto(), dt_fp_tversky(), and dt_fp_euclid()
Output(molecule) // write SMILES and stereoisomers, uses du_chiralify()
MakeEdge(molecule) // form cycles recursively
molecule <- UnCyclize(atomA, atomB, bond typei) // uses dt_dealloc()
Figure 3: Pseudo-Code for forming a cycle. Input requires a molecule. Passage of all constraints results in output of a SMILES and recursion, otherwise the routine returns.


Previous Up Previous Contents References