Leads From Large Databases

Talk given at the Daylight User Meeting, 17 December 1996
By A. Gobbi (Ciba Geigy AG Basel)

Table of Contens

A Method to Find More Leads?

How to Select Next Sample?

The "Most Common Substructures" method:

The "Genetic" method:

A Few Words on Similarity

Tanimoto Similarity:

Asymmetric Similarity

Does it Work ?!

Test data set from the NCI database:

Results: "Most Common Substructures"

Results: The Structures

Results: "Genetic"

Results: Tanimoto

Optimize the Optimization?!

Leads From Large Databases

More leads by design? No!

A Method to Find More Leads?

What do you do at the bench? Optimize your biological activity!

Is it for free? No: iterate selection and experiments.

How to Select Next Sample?

The "Most Common Substructures" method:

Get the fingerprints of all your best compounds.
Set a bit in the search fingerprint if n% of this fingerprints have a bit set in this position.
Use the m compounds most similar to this fingerprint for the next sample.

The "Genetic" method:

Get fingerprints for a pair of your best compounds.
Create a new fingerprint by crossover.
Include the most similar compound into your new sample set.
Repeat steps 1-3 to get m compounds.

A Few Words on Similarity

Tanimoto Similarity:

0.79

Asymmetric Similarity

0.85

If 1 is a substructure of 2 Asymmetric(1,2) = 1.0.

Does it Work ?!

Test data set from the NCI database:

Number of Compounds: 19 719

Smallest: -4 (4 Compounds)

Largest: 13.3 (1 Compound)

Results: "Most Common Substructures"

Starting-Point

2.0

2.49

(Asymmetric similarity; 75% Fingerprints)

Results: The Structures

Results: "Genetic"

2 Random structure added per cycle; Structures removed after 10 cycles

Results: Tanimoto

Most Common Substructures Genetic

(Bits set if present in 75% of the selected molecules) 2 Random structure added per cycle; Structures removed after 10 cycles

Optimize the Optimization?!

Dieter Poppinger

Bernd Rohde

L. Weber, S. Wallbaum, C. Broger, K. Gubernator Angew. Chem. 1995, 107, 2452

N. E. Shemetulskis, D. Weininger, C. J. Blakley, J. J. Yang, C. Humblet J. Chem. Inf. Comput. Sci. 1996, 36, 862

NCI-Database: http://epnws1.ncifcrf.gov:2345/

What do you do at the bench?	Optimize your biological activity!

Is it for free?	No: iterate selection and experiments.

Number of Compounds:	19 719
Smallest:	-4	(4 Compounds)
Largest:	13.3	(1 Compound)

Most Common Substructures	Genetic

(Bits set if present in 75% of the selected molecules)	2 Random structure added per cycle; Structures removed after 10 cycles