Daylight/Oracle Chemistry Cartridge
What are its capabilities?
|
Dayblob is a reimplementation of the Daylight chemical structure processing algorithms for use inside ORDB/RDB systems. Essentially all structure-oriented database capabilities in Daylight Release 4.61 are delivered, including: management of multi-component molecules and reactions; support for isotopic and isomeric features; indexed lookup of full structure and tautomers; selection by substructure, SMARTS pattern and structural similarity; and fast nearest-neighbor selection. A number of utility functions are also provided, e.g., calculation of cannonical molecular formula, atom counts, molecular weight and format conversions.
The most fundamental difference between dayblob and other Daylight products is that dayblob references data exclusively by "Row ID" (rather than by identifier/data content, as with Thor/Merlin). Many other changes are also required to live in the ORDBMS environment such as: support for arbitrary binary identifiers, use of buffered persistent data, support for parallel readonly access, implementation of both full table scan and functional interfaces to support query optimization, atomic operation of insertion, deletion, replacement, and reindexing.
Dayblob is not based on either Thor or Merlin, although it shares many concepts and low-level algorithms with those systems. It is an independent module, designed and written from scratch to provide high-performance processing of chemical information from within an ORDB/RDB environment. This presentation will describe design criteria and product performance as well as the collaboration process.
The Chemical Database Cartridge is implemented using the Oracle8i object and extensibility technology. As currently implemented, this cartridge consists of an extensible index with an associated set of SQL3 chemical search operators. The extensible index assumes that each chemical structure to be indexed resides in an Oracle8i VARCHAR2(2000) data type. In this model, the chemical data can be represented as either a column of an Oracle8i table, or as a data attribute of an Oracle8i Object Type. The most novel feature of this extensible index is the related physical storage. That is, the index is stored in Oracle8i Binary Large Object, or BLOB. At execution time, the BLOB containing the extensible index can be loaded into memory and retained for the life of the related Oracle8i database session. This means that the index, after the initial load from disk, can usually be searched with zero I/O. However, the cost of this is about 100MB of memory per Oracle8i session. Given that the cost of memory continues to drop, the space versus time tradeoff seems well worth the price. This presentation will describe the extensible index implementation in detail, address limitations and sketch out how the cartridge is likely to evolve beyond Oracle8i.
Our plans for use of the Chemical Cartridge at Novartis ...