Ipcress
Icosahedral permuted comparison of relative shape similarity
Euromug '96
- New projects
- next
Synopsis
The goal of this project is to develop molecular surface shape
descriptors, examine the utility of various similarity metrics,
and explore the possibility of building a high-speed search method
for molecular shapes.
Status
Fast methods for generating shape descriptors based on tesselated
icosahedra and X-based development applications have been implemented.
Current tasks include design of the search algorithm,
evaluation of optimization methods,
and integration of surface properties.
Results are promising so far except that our current approach for
integrating surface properties plays havoc with many of the most
powerful speed optimizations.
People
- Dave Weininger (lead and design)
- Anthony Nichols (grid-based algorithms)
- Mark Hermsmeier (concept and collaborator)
Description
The IPCRESS strategy is to generate a description of molecular surface
based on telssalated icosahedron and use these for fast comparisons of
molecular shape.
The general approach is similar to that used in the BURST program,
but IPCRESS works from the inside-out instead of vise-versa.
The method used for generating shape descriptors is essentially the same
as presented at MUG '96, in theory:
- normalize orientation of conformation
- generate molecular surface
- find center of molecular surface
- orient centered tessalated icosahedron
- extend rays from verticies to last molecular surface intersection
- shape descriptor is an ordered list of ray distances
Methods used to compare molecular shape descriptors are also similar to
that previously reported, in general:
- given two molecular shape descriptors
- iterate over all unique, rigid/mirrored, isomorphic rotations (120)
- compute metric (e.g., RMS) of vector element diferences
- shape similarity is the lowest metric (e.g., RMS)
This method is theoretically amenable to very profound speed optimization.
Exact shape descriptors are easy to compute (100's/S);
approximate descriptors can be computed at amazing speeds using a grid
method (~1000 operations per molecule).
The rotational iteration is intrinsically fast because
vector permutation can be used instead of transendental methods
(this is the main advantage of the icosahedral algorithm).
Because there are no "real" geometry calculations,
the whole search can be done with integer arithmetic.
The descriptors are similar to fingerprints and are amenable to many of
the same speed optimizations used in merlinserver.
This is the main window of the development program xvip:
The following examples are from an experiment using selected
conformations from the medchem95c database.
The idea was to determine whether ipcress could find similarities in
"3-D" molecular shape among a small set of structures which differed
in their "2-D" connectivity.
Starting with standard medchem95c data
(using conformations and clusters in the standard distribution),
structures were selected which were the "centroid" of 2D-based J-P clusters
which also were present wdi95 with known mechanism of action, indications,
and at least 10 tradenames.
Molecular shape descriptors were generated by
computing 162 exact ray intersection distances from the atomic geometric
mean to the VDW surface using non-unified radii (these computations were
done with hydrogens).
Results were evaluated by comparing various metrics across all possible
inter-conformational comparisons.
Graphical displays were used only to verify reasonable operation,
and are shown here for purposes of illustration.
Although the structures were selected to represent clusters of differing
connectivity, some pairs remain which are similar by all other criteria,
e.g., ofloxacin and norfloxacin:
Graphical representation of shape descriptors
for these conformations of ofloxacin and norfloxacin:
Display of ipcress match:
overlayed conformations (offset is due to geometric mean centering)
and deviation in ray distances (white stars are ofloxacin intersections).
Sulindac is similar to ofloxacin in shape but not in connectivity.
Here is the matching conformation of sulindac and its shape descriptor:
Overlaid conformations and shape descriptors of ofloxacin and sulindac
(note that the match was for sulindac's mirror image):
Surface properties can be included in the similarity metric,
but it's hard to determine a "correct" scaling metric.
Using a simple connectivity-based charges as surface properties,
and counting normalized shape and surface errors equally,
the best match to testosterone in this set is stanalone:
Testosterone and stanalone conformations are shown as matched by ipcress,
coloring by surface property.
The testosterone-oxymetazoline match is also shown which close in shape
but not in surface property.
Daylight Chemical Information Systems, Inc.
info@daylight.com