Jack Delany
PL/SQL Functions, SQL Operators:
There are quite a few user-accessible PL/SQL functions and SQL operators implemented for the cartridge. All are stateless. All of the PL/SQL functions are contained in a PL/SQL package called 'DDPACKAGE'. This package is owned by 'DDUSER' by default and has public execute privilege.
General Purpose Functions:
function ddpackage.fsetdebug ( value IN NUMBER ) => NUMBER
operator setdebug ( value IN NUMBER ) => NUMBER
0 - No logging at all. 1 - Error messages only (the default value). 5 - Warnings and errors. 9 - Notes, warnings, and errors.
This value is set on a per-session basis.
function ddpackage.ftestlicense ( product IN VARCHAR2 ) => NUMBER
operator testlicense ( product IN VARCHAR2 ) => NUMBER
The license is contained in a special table which must be created and populated at cartridge install time:
SQL> desc dduser.license; Name Null? Type ------------------------------- -------- ---------------------------- PRODUCT VARCHAR2(40) KEY VARCHAR2(32) EXPIRATION DATE SQL> Select * from dduser.license; PRODUCT KEY EXPIRATION ------------- --------------------------------- ---------- daycart '8befd3d06296e9b89abfb158b431d278 01-JAN-00
The normal cartridge installation script will create this table. Upon optaining a license for the cartridge, you will populate the table with one or more rows of data, depending on the licensed products which you purchase.
function ddpackage.fsmi2cansmi ( smiles IN VARCHAR2, type IN NUMBER ) => VARCHAR2
operator smi2cansmi ( smiles IN VARCHAR2, type IN NUMBER ) => VARCHAR2
function ddpackage.fsmi2netch ( smiles IN VARCHAR2 ) => NUMBER
operator smi2netch ( smiles IN VARCHAR2 ) => NUMBER
function ddpackage.fsmi2hcount ( smiles IN VARCHAR2 ) => NUMBER
operator smi2hcount ( smiles IN VARCHAR2 ) => NUMBER
function ddpackage.fsmi2mf ( smiles IN VARCHAR2 ) => VARCHAR2
operator smi2mf ( smiles IN VARCHAR2 ) => VARCHAR2
function ddpackage.fsmi2amw ( smiles IN VARCHAR2 ) => NUMBER
operator smi2amw ( smiles IN VARCHAR2 ) => NUMBER
function ddpackage.fsmi2graph ( smiles IN VARCHAR2 ) => VARCHAR2
operator smi2graph ( smiles IN VARCHAR2 ) => VARCHAR2
function ddpackage.fsmi2fp ( smiles IN VARCHAR2, min IN NUMBER, max IN NUMBER,
nbits IN NUMBER ) => VARCHAR2
operator smi2fp ( smiles IN VARCHAR2, min IN NUMBER, max IN NUMBER,
nbits IN NUMBER ) => VARCHAR2
function ddpackage.fsmi2xfp ( smiles IN VARCHAR2, min IN NUMBER,
max IN NUMBER, nbits IN NUMBER ) => VARCHAR2
operator smi2xfp ( smiles IN VARCHAR2, min IN NUMBER,
max IN NUMBER, nbits IN NUMBER ) => VARCHAR2
function ddpackage.ffoldfp ( fpstr IN VARCHAR2, nbits IN NUMBER,
dens IN NUMBER ) => VARCHAR2
operator foldfp ( fpstr IN VARCHAR2, nbits IN NUMBER,
dens IN NUMBER ) => VARCHAR2
function ddpackage.fbitcount ( fpstr IN VARCHAR2 ) => NUMBER
operator bitcount ( fpstr IN VARCHAR2 ) => NUMBER
function ddpackage.fnbits ( fpstr IN VARCHAR2 ) => NUMBER
operator nbits ( fpstr IN VARCHAR2 ) => NUMBER
function ddpackage.fisfp ( fpstr IN VARCHAR2 ) => NUMBER
operator isfp ( fpstr IN VARCHAR2 ) => NUMBER
function ddpackage.fatomnorm ( smiles IN VARCHAR2, list IN VARCHAR2,
ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2
operator atomnorm ( smiles IN VARCHAR2, list IN VARCHAR2,
ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2
function ddpackage.fbondnorm ( smiles IN VARCHAR2, list IN VARCHAR2,
ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2
operator bondnorm ( smiles IN VARCHAR2, list IN VARCHAR2,
ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2
Comparison functions:
function ddpackage.fexact ( a IN VARCHAR2, b IN VARCHAR2) => NUMBER
operator exact ( a IN VARCHAR2, b IN VARCHAR2) => NUMBER
function ddpackage.fgraph ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) =>
NUMBER
operator graph ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER
function ddpackage.ftautomer ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) =>
NUMBER
operator tautomer ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER
function ddpackage.freactant ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) =>
NUMBER
operator reactant ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER
function ddpackage.fagent ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) =>
NUMBER
operator agent ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER
function ddpackage.fproduct ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) =>
NUMBER
operator product ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2 ) => NUMBER
function ddpackage.fcontains ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2) =>
NUMBER
operator contains ( smiles1 IN VARCHAR2, smiles2 IN VARCHAR2) => NUMBER
function ddpackage.fmatches ( smiles IN VARCHAR2, smarts IN VARCHAR2) => NUMBER
operator matches ( smiles IN VARCHAR2, smarts IN VARCHAR2) => NUMBER
function ddpackage.feuclid ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 )
=> NUMBER
operator euclid ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER
function ddpackage.ftanimoto ( fp_or_smi1 IN VARCHAR2,
fp_or_smi2 IN VARCHAR2 ) => NUMBER
operator tanimoto ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 ) => NUMBER
function ddpackage.ftversky ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2,
alpha IN NUMBER, beta IN NUMBER ) => NUMBER
operator tversky ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2,
alpha IN NUMBER, beta IN NUMBER ) => NUMBER
function ddpackage.ffingertest ( fp_or_smi1 IN VARCHAR2,
fp_or_smi2 IN VARCHAR2 ) => NUMBER
operator fingertest ( fp_or_smi1 IN VARCHAR2, fp_or_smi2 IN VARCHAR2 )
=> NUMBER
Extensible Indexes:
The main advantage of the ddexact indextype is that it can index strings up to 4000 bytes (the maximum varchar2 length), while the internal BTree can only index strings up to approximately 40% of a block; for a database with a 2048 byte blocksize, this means that the largest indexable varchar2 string using the BTree is approximately 700 bytes. Similarly, in a future version of the cartridge this index will support the comparison of LOB datatypes; the BTree index does not.
The valid form for this operator to use the index is:
SQL> select * from table where exact(smiles, 'Oc1ccccc1') = 1;
The valid forms for the operators to use the index are:
SQL> select * from table where graph(smiles, 'Oc1ccccc1') = 1; SQL> select * from table where tautomer(smiles, 'Oc1ccccc1') = 1;
The valid forms for the operators to use the index are:
SQL> select * from rxn where reactant(smiles, 'Oc1ccccc1') = 1; SQL> select * from rxn where agent(smiles, 'Oc1ccccc1') = 1; SQL> select * from rxn where product(smiles, 'Oc1ccccc1') = 1;
The index can be created on either a SMILES or fingerprint column, provided that the fingerprints are all the same size. If the index is created on a SMILES column, all six searches are implemented on that column: contains(), matches(), tanimoto(), tversky(), euclid(), and fingertest(). If the index is created on a fingerprint column, then only the four fingerprint-specific searches (tanimoto(), tversky(), euclid(), and fingertest()) are supported for that column.
The index creation command takes an optional parameter 'fpsize'. This value is the number of bits that the blob index will use for fingerprint creation.
When indexing a SMILES column, the blob will use the 'fpsize' parameter for the creation size for fingerprints within the internal blob index or a default value of 512 bits if the 'fpsize' is not specified.
When indexing a fingerprint column, all fingerprints within that column must match the 'fpsize' parameter. If the 'fpsize' parameter is not specified, the size of the first fingerprint encountered by the index code is used. Any fingerprints which do not match this size are rejected by the index.
The query for tanimoto(), tversky(), euclid(), and fingertest() can be either a fingerprint or a SMILES. If a SMILES, it will be used to create a normal fingerprint of the appropriate size for the index which will be used for comparison. If the query is a fingerprint, the size of the fingerprint must match the size of the fingerprint in the column (or in the blob for a SMILES index).
The valid forms for the operators to use the index are:
select count(1) from table where contains(smiles, 'O=c1ccocc1') = 1; select count(1) from table where matches(smiles, 'O=c1ccocc1') = 1; select count(1) from medium where tanimoto(smiles, 'OC(=O)CS') > 0.8; select count(1) from medium where tversky(smiles, 'OC(=O)CS', 0.5, 0.5) > 0.8; select count(1) from medium where euclid(smiles, 'OC(=O)CS') < 0.2; select count(1) from medium where fingertest(smiles, 'OC(=O)CS') = 1; select count(1) from medium where tanimoto(fp, 'OC(=O)CS') > 0.8; select count(1) from medium where tversky(fp, 'OC(=O)CS', 0.5, 0.5) > 0.8; select count(1) from medium where euclid(fp, 'OC(=O)CS') < 0.2; select count(1) from medium where fingertest(fp, 'OC(=O)CS') = 1;
To further illustrate the blob index, consider the following sequence:
create table test (smiles varchar2(4000), fp varchar2(4000)); insert into test values (smi2cansmi('CCN', 0), NULL); ... load bunches of SMILES... commit; update test set fp = smi2fp(smiles, 0, 7, 128); commit; -- We specified a preferred size for the internally generated fingerprints -- as 1024 bits. These are internal to the blob and aren't visible. create index s1 on test(smiles) indextype is ddblob parameters ('fpsize=1024'); create index f1 on test(fp) indextype is ddblob; commit; select count(1) from test where contains(smiles, 'NCCc1ccccc1') = 1; -- This screens and searches using the index s1; internally using 1024 bit -- fingerprints. The output is generated using the functional form, at -- its default of 512 bits. select smiles, tanimoto(smiles, 'NCCc1ccccc1') from test where tanimoto(smiles, 'NCCc1ccccc1') > 0.8; -- This screens and searches using the index s1; internally using 1024 bit -- fingerprints. The output is generated using the functional form; this -- time calculated at 1024 bits. select smiles, tanimoto(smiles, smi2fp('NCCc1ccccc1', 0, 7, 1024)) from test where tanimoto(smiles, 'NCCc1ccccc1') > 0.8; -- This searches using the index f1, using the 128 bit fingerprints from -- the actual column for comparison. select smiles, tanimoto(fp, 'NCCc1ccccc1') from test where tanimoto(fp, 'NCCc1ccccc1') > 0.8; -- This does two similarity searches at once, using the two different -- indexes. select * from test where contains(smiles, 'Oc1ccccc1') = 1 and tanimoto(fp, 'Oc1ccccc1F') > 0.5;