ENHANCING THE DIVERSITY OF A CORPORATE DATABASE USING CHEMICAL DATABASE CLUSTERING AND ANALYSIS

SHEMETULSKIS NE, DUNBAR JB JR, DUNBAR BW, MORELAND DW, HUMBLET C

Parke-Davis Pharmaceutical Research Division of the Warner Lambert Company, 2800 Plymouth Rorad, Ann Arbor, MI 48105

The contribution that the Chemical Abstracts structural database (CAST-3D) and the Maybride database (MAY) would make to diversifying the structural information and property space spanned by our corporate database (CBI) is assessed. A subset of the CAST-3D database has been selected to augment the structural diversity of various electronic databases used in computer- assisted drug design projects. The analysis of the MAY database directly offers the potential to expand the CBI compound library, but also provides a source for structural diversity in a format suitable for computer-assisted database searching and molecular design. The analysis perfomred is twofold. First, a non-hierarchical clustering technique available in the Daylight clustering package is applied to evaluate the structural differences between databases. The comparison is then extended to analyze various structure-derived property spaces calculated from molecular descriptors such as the logarithm of the ocatanol-water partition coeffiecient (CLOGP), the molar refractivity (CMR) and the electronic dipole moment (CDM). The diversity contribution of each database to these property spaces is quantified in relation to our corporate database.

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN,9,1995,407-416