Daylight Chemistry Database Cartridge
Oracle8i Implementation Overview
Sam DeFazio, Cathy Trezza
Oracle Corporation
New England Development Center
Nashua, NH 03062
sdefazio@us.oracle.com
ctrezza@us.oracle.com
| NAME VARCHAR2(100) | SMILES VARCHAR2(4000) |
|
|
|
|
|
| HASH BLOB | REFRESH_NUMBER NUMBER |
The index data is stored in an ancillary index table. This table contains a couple of columns. One is a Binary Large Object, also known as a BLOB (shown as HASH here). This column contains the index structure used in any queries. The second column is called REFRESH_NUMBER, and is used for caching the index (which is discussed below). This table contains exactly one row.
DML Methods
Query Operator Methods
create or replace type SmilesIndexIM as object
(
-- Query context/state
db_loc RAW(8),
lob_loc RAW(8),
-- DDL Methods
member function ODCIGetInterfaces(...),
member function ODCIIndexCreate(...),
member function ODCIIndexDrop(...),
-- DML Methods
ODCIIndexInsert (...) return number,
member function ODCIIndexDelete (...) return number,
member function ODCIIndexUpdate (...) return number,
-- Query Operator Methods
-- IndexStart method for Contains
member function ODCIIndexStart(..., strt number,
stop number,
cmpval varchar2)
return number,
-- IndexStart method for SIMILAR
member function ODCIIndexStart(..., strt number,
stop number,
cmpval varchar2,
threshold float)
return number,
member function ODCIIndexFetch(nrows number,
rids OUT
sys.odciridlist)
return number,
member function ODCIIndexClose return number
)
create or replace indextype SmilesIndex
for
Contains(VARCHAR2,VARCHAR2),
LookFor(VARCHAR2,VARCHAR2),
Tautomers(VARCHAR2,VARCHAR2),
Matches(VARCHAR2,VARCHAR2),
Similar(VARCHAR2,VARCHAR2,FLOAT),
Nearest(VARCHAR2,VARCHAR2,INTEGER)
using SmilesIndexIM ;
-- Create sample table of compounds
create table Compound
(ID number,
Name varchar2(256),
Smiles varchar2(4000) ...
);
-- Create index on smiles column
create index on Compound(Smiles)
INDENXTYPE IS SmilesIndex ... ;
-- Perform similarity query and return score
select Score(1)
from Compound
where Similar(Smiles, 'S=PS', .1, 1)=1;
-- Select compounds similar to given structure
select Name, Smiles
from Compound
where Contains(Smiles, 'CN(C)CCCN')=1;
-- Select the nearest 10 compounds to given structure
select Score(1)
from Compound
where Nearest(Smiles, 'S=PS', 10)=1;

The start, fetch, and close methods of a query operator run in a separate external process, called extproc. There is one external process per session. The external process is separate from the database process, and CAN NOT bring down the server in the event of a program error in the index. The data from the buffer cache is passed to the external process via standard OCI requests to read the data from the BLOB. This data is stored in memory and passed to the Dayblob interface to process the query. The results of the query are the ROWIDs which are then passed back to the database along with any ancillary information, such as score.
The index itself can be large. A 1 million compound table would have an index table that is about 136M, and the entire amount could be allocated for each session using the index. To deal with this issue, we recommend session multiplexing, discussed below.

Common queries run in a short period of time, so the best way to deal with this problem is to place an application tier between the user and the database and multiplex the requests to the available sessions in the database. This allows you to process all the queries but keep a handle on the memory usage on the server.
Caching is turned on and off via a procedure call and is under the user's control. The cache is automatically reloaded when a change occurs to the index. This synchronization is handled by a sequence number in the index table (REFRESH_NUMBER) and the sequence number of a package holding the memory for the index. When these numbers match, the memory is in synch with the table. When these numbers differ, such as when a transaction is commited in this or another session, the two numbers will be different and the index is marked to be reloaded at the start of the next query.
