This set of labs guides the user through additional use of Daycart functionality with an emphasis on understanding performance and tuning issues for queries.
The first operation will be to load the demo tables into your local Oracle instance and create Daycart indexes. For this lab we use the NCI95 public dataset, which contains 126k structures and approximately 25k results from HIV testing in two tables. The files nci.sql and hiv.sql are available on the ftp server "mugserver".
Go ahead and execute the SQL scripts to create and populate the NCI and NCI_HIV tables.
$ sqlplus mug/coffee
SQL> @nci
... (lots of output)
SQL> @hiv
... (lots of output)
The table descriptions are as follows:
CREATE TABLE NCI ( SMI VARCHAR2(428) NOT NULL, NSC NUMBER(6) NOT NULL, CAS_RN VARCHAR2(11) NOT NULL, CL_1 NUMBER(5), CL_2 NUMBER(3), CL_3 NUMBER(5,4)); NSC: NCI structure number. Primary key. SMI: Unique canonical SMILES string. CL_1: Cluster number: all structures in the same cluster have the same cluster number. CL_2: Size of cluster: number of members in this cluster. CL_3: Unexplained variance: variance of the cluster unexplained by this member. CAS_RN: CAS Registry number. CREATE TABLE NCI_HIV ( NSC NUMBER(6) NOT NULL HIV VARCHAR2(2) NOT NULL); NSC: NCI structure number. Primary key. HIV: HIV response (CI: confirmed inactive CM: confirmed moderately active CA: confirmed active)
set timing on create index nci_sbi on nci(smi) indextype is c$dcischem.ddblob;
Since the NCI table doesn't include an FP column you'll need to generate it.
set timing on
alter table nci add fp(varchar2(90));
update nci set fp=smi2fp(smi, 0, 7, 512);
create index nci_fbi on nci(fp) indextype is c$dcischem.ddblob;
set timing on
create index nci_sbi on nci(smi) indextype is c$dcischem.ddblob
parameters ('initfpcolumn=fp');
/ramdisk is a RAM-based filesystem. This is a
quick-and-dirty way to demonstrate the effect on performance of minimizing
I/O bottlenecks in the Oracle database.
The caveat is that data in the /ramdisk filesystem does not persist after
a reboot or machine crash, so for these experiments it's best to do the
work and then remove the index and tablespace immediately thereafter to avoid
the possiblity of corrupting the database.
The total elapsed time for index creation should decrease. Also, if you
watch the system during the process you'll observe that the system I/O waits
decreases during the phase of index creation where the data is being written
out to the Oracle tables.
create tablespace test_ts datafile '/ramdisk/testts.dbf' size 50M';
create index nci_bi on nci(smi) indextype is ddblob
parameters ('initfpcolumn=fp table_tablespace=test_ts');
drop index nci_bi;
drop tablespace test_ts;
Here's a simple example. Note that all the NSC numbers are 500000 so the
rows can be conveniently deleted when done. This allows the tests to
be repeated quickly.
insert into nci values ('NCCc1ccccc1', 500000, '00-00-0', NULL, NULL, NULL);
insert into nci values ('c1ccccc1', 500000, '00-00-0', NULL, NULL, NULL);
...
The following will give an overall elapsed time for the operations.
Alternately you can run it in SQLPlus with timing on and get the individual
insertion times.
$ time sqlplus -s 'mug/coffee' < insert.sql
create index ncibi on nci(smi) indextype is c$dcischem.ddblob
parameters('fpsize=8192');
A simple sql script with contains(), tanimoto(), matches() etc. searches is reasonable. Using the count(1) function as the returned value eliminates screen I/O as a variable in the timings.
Here's an example:
set timing on
select count(1) from nci where contains(smi, 'NCC') = 1;
select count(1) from nci where contains(smi, 'NCCc1cc(O)c(O)cc1', 20) = 1;
select count(1) from nci where tanimoto(smi, 'NCCc1cc(O)c(O)cc1') > 0.9;
select count(1) from nci where contains(smi, 'NCCc1cc(O)c(O)cc1', 20) = 1;
select count(1) from nci where matches(smi, '[OH1]C(=O)CS') = 1;
Edit the init.ora file in /oracle/admin/dev/pfile/. Then stop and restart
the instance. Watch the startup messages to make sure the "Database Buffers"
has decreased.
If the database uses an SPFILE, then you'll need to regenerate it after
editing the init.ora file with:
$ sqlplus '/ as sysdba'
SQL> create spfile from pfile='/oracle/admin/dev/pfile/init.ora'