Daylight's SMILES Toolkit was used to build the structure generator and psuedo-code is listed in the appendix. Daylight's SMARTS Toolkit was used to identify undesired chemical patterns. Daylight's FINGERPRINT Toolkit was used to evaluate similarity to compounds in the following databases: ACD03.4, BioScrNP99, BioScrSC99, MedChem03, NatProd03, NCI00, QSAR03, Spresi95, SpresiRxn98, TCM01, and WDI03.4. Structurally unique representations were produced using dt_cansmiles() and isomorphic graphs were avoided using an in-memory hash table of SMILES. Drug databases were profiled based on average molecular weight, number of bonds per non-terminal atom, number of cycles per molecule, number of cycles per ring atom, and atom and bond type composition. The average value and 2 standard deviations of each metric were used to bias the algorithm towards ``drug-like'' structures. Structure were evaluated using a ``Simple Metric for Molecular Complexity'' developed previously.[1] Molecules were recorded with a canonical string representing how the structure was assembled.
First, a pair of control experiments were performed to ensure that GENSMI was programmed properly. The number of unique acyclic and monocyclic isomers of a given molecular formula has been computed previously, so GENSMI was tested by generating all isomers from C7 to C20 and compared against published values. The experiment involves one node type (carbon), one edge type (single bond), and one to four connections per atom. Open valences are filled with a null node (an implicit hydrogen). The number of checks for graph isomorphism was counted to reveal the amount of redundancy in the algorithm. Also, the total number of graphs without isomorphic pruning were counted to access the value to dt_cansmiles().