Daylight Summer School
Lab: DayCart --
For HINTS
This set of labs guides the user through additional use of Daycart
functionality with an emphasis on understanding performance and tuning
issues for queries.
LOAD DEMO DATA:
The first operation will be to load the demo tables into your local
Oracle instance and create Daycart indexes. For this lab we use the
NCI95 public dataset, which contains 126k structures and approximately
25k results from HIV testing in two tables. The files nci.sql and hiv.sql
are available on the ftp server "mugserver".
Go ahead and execute the SQL scripts to create and populate the NCI and
NCI_HIV tables.
The table descriptions are as follows:
CREATE TABLE NCI (
SMI VARCHAR2(428) NOT NULL,
NSC NUMBER(6) NOT NULL,
CAS_RN VARCHAR2(11) NOT NULL,
CL_1 NUMBER(5),
CL_2 NUMBER(3),
CL_3 NUMBER(5,4));
NSC: NCI structure number. Primary key.
SMI: Unique canonical SMILES string.
CL_1: Cluster number: all structures in the same cluster have the
same cluster number.
CL_2: Size of cluster: number of members in this cluster.
CL_3: Unexplained variance: variance of the cluster unexplained by
this member.
CAS_RN: CAS Registry number.
CREATE TABLE NCI_HIV (
NSC NUMBER(6) NOT NULL
HIV VARCHAR2(2) NOT NULL);
NSC: NCI structure number. Primary key.
HIV: HIV response (CI: confirmed inactive
CM: confirmed moderately active
CA: confirmed active)
Index Creation:
- Measure the time to create a ddblob index on the NCI(SMI) column.
- Measure the time to create a ddblob index on the NCI(FP) column.
- Measure the time to create a ddblob index on the NCI(SMI) column
using the initfpcolumn option. How do the timings of the three previous
steps relate to one-another?
- Dangerous!!! Create a small tablespace in /ramdisk and measure the
time to create the same index in that tablespace. Then remove the small
tablespace.
Data modification:
- Create a test set to insert a hundred or so rows of data into the NCI
table.
- Execute it and measure the performance.
- Look at the difference in time for the insertions on the table, both
with and without the Daycart indexes on the SMI column.
- Repeat the timings against the index created in a tablespace created
in the /ramdisk partition (as in the "Index Creation" section).
Searching performance:
- Create a ddblob index on the NCI(SMI) column. Use a fingerprint size
of 8192 bits. This will artificially force the index data structure to be
larger than typical for the tablesize.
- Create a small (~5-7) set of timing benchmarks for ddblob searches.
Execute it a few repetitions to measure the times.
-
Shut down the Oracle instance and modify the db_cache_size parameter,
making it a very small value. Start up the instance and run the queries
again, observing the timings and system I/O.
Daylight Chemical Information Systems Inc.
support@daylight.com