Tom Doman
G. D. Searle
Daylight User's Group Meeting
2/27/96 - 3/1/96
Santa Fe, New Mexico
Limitations of Jarvis-Patrick Clustering
Requirements for an Improved Clustering Algorithm
Put Jarvis-Patrick to work on those areas of the inventory where it works best, then "fuzzy cluster" around the J.P. clusters.
STEP 1:
Choose a subset of the compound space which consists only of those compounds which have a sufficient number of "close" neighbors.
STEP 2:
Apply the J-P algorithm to this subset using a number of nearest neighbors which would insure that the nearest neighbor list for each compound consisted only of "close" neighbors. (No undesired "chaining" can occur.)
STEP 3:
360 Create a fuzzy clustering of the entire database by assigning p's to each compound which describe the "degree of membership" to each of the clusters determined in STEP 2.
STEP 4:
Determine which compounds do not belong to any of the existing clusters on the basis of their p-values. If "almost all" compounds already "belong" to some cluster, STOP.
STEP 5:
Determine a new "sufficient number" to be used in STEP 1. Repeat the process with STEPS 1 and 2 only operating upon those compounds identified in STEP 4.
STEP 1:
Choose a subset of the compound space which consists only of those compounds which have a sufficient number of "close" neighbors.
STEP 2:
Apply the J-P algorithm to this subset using a number of nearest neighbors which would insure that the nearest neighbor list for each compound consisted only of "close" neighbors. (No undesired "chaining" can occur.)
Perform Jarvis-Patrick clustering on this subset, using fairly restrictive parameters (e.g. near=20, need=15); refer to resulting clusters as "crisp".
STEP 3:
Create a fuzzy clustering of the entire database by assigning p's to each compound which describe the "degree of membership" to each of the clusters determined in STEP 2.
STEP 4:
Determine which compounds do not belong to any of the existing clusters on the basis of their p-values. If "almost all" compounds already "belong" to some cluster, STOP.
STEP 5:
Determine a new "sufficient number" to be used in STEP 1. Repeat the process with STEPS 1 and 2 only operating upon those compounds identified in STEP 4.
Collect the next small portion of unclustered compounds and loop back through step 3 until finished.
John Cibulskis
Mike Cibulskis
Patrick McCray
Compounds shown in figures are from the "Master File":
Hansch, C.; Leo, A.; Hoekman, D. Exploring QSAR. Hydrophobic, Electronic, and Steric Constants.; American Chemical Society: Washington, D.C., 1995.