Leads From Large Databases

Talk given at the Daylight User Meeting, 17 December 1996
By A. Gobbi (Ciba Geigy AG Basel)

Table of Contens

Leads From Large Databases

  • A Method to Find More Leads?
  • How to Select Next Sample?
  • The "Most Common Substructures" method:
  • The "Genetic" method:
  • A Few Words on Similarity
  • Tanimoto Similarity:
  • Asymmetric Similarity
  • Does it Work ?!
  • Test data set from the NCI database:
  • Results: "Most Common Substructures"
  • Results: The Structures
  • Results: "Genetic"
  • Results: Tanimoto
  • Optimize the Optimization?!

  • Leads From Large Databases

    More leads by design? No!


    A Method to Find More Leads?

    What do you do at the bench?
    Optimize your biological activity!
    Is it for free?
    No: iterate selection and experiments.


    How to Select Next Sample?

    The "Most Common Substructures" method:

    1. Get the fingerprints of all your best compounds.
    2. Set a bit in the search fingerprint if n% of this fingerprints have a bit set in this position.
    3. Use the m compounds most similar to this fingerprint for the next sample.


    The "Genetic" method:

    1. Get fingerprints for a pair of your best compounds.
    2. Create a new fingerprint by crossover.
    3. Include the most similar compound into your new sample set.
    4. Repeat steps 1-3 to get m compounds.




    A Few Words on Similarity

    Tanimoto Similarity:

    0.79



    Asymmetric Similarity

    0.85

    If 1 is a substructure of 2 Asymmetric(1,2) = 1.0.


    Does it Work ?!

    Test data set from the NCI database:


    Number of Compounds:19 719
    Smallest:-4(4 Compounds)
    Largest:13.3(1 Compound)


    Results: "Most Common Substructures"

    Starting-Point

    2.0

    2.49

    (Asymmetric similarity; 75% Fingerprints)


    Results: The Structures


    Results: "Genetic"

    2 Random structure added per cycle; Structures removed after 10 cycles

    Results: Tanimoto

    Most Common Substructures
    Genetic
    (Bits set if present in 75% of the selected molecules)
    2 Random structure added per cycle; Structures removed after 10 cycles


    Optimize the Optimization?!



    Dieter Poppinger

    Bernd Rohde





    L. Weber, S. Wallbaum, C. Broger, K. Gubernator Angew. Chem. 1995, 107, 2452

    N. E. Shemetulskis, D. Weininger, C. J. Blakley, J. J. Yang, C. Humblet J. Chem. Inf. Comput. Sci. 1996, 36, 862

    NCI-Database: http://epnws1.ncifcrf.gov:2345/