Daycart 9i Optimizer Package

THE PROBLEM:

Given a query like:

  select * from my_table where
     contains(smi, 'c1ccccc1') = 1
     and
     name like '%dopa%';

What is the optimium order in which to evaluate the query for best performance. Depending on the situation, best performance might mean finding all the results or might mean finding the first/most relevant results.

API DESCRIPTION:

The following functions allow the user to query the actual CPU and IO
costs and selectivies provided to Oracle by the ddoptimizer package. 
These are the values which the SQL execution engine uses internally
to decide between multiple execution plans.

FUNCTION ddoptimizer.fgetsel (
             operator_name IN VARCHAR2,
             query_string IN VARCHAR2_OR_CLOB,
             direction IN NUMBER,
             hitcount IN NUMBER) => NUMBER

Returns the query selectivity as a percentage, the range is 1 - 100.

  - operator_name: the index function, eg. 'exact', 'contains'.,
  - query_string: the SMILES/SMARTS query,
  - direction: 1=find hits, 0=find non-hits,
  - hitcount: For 'nearest' queries, the number of hits desired.


FUNCTION ddoptimizer.fgetfunccpucost (operator_name IN VARCHAR2)
    => NUMBER
FUNCTION ddoptimizer.fgetfunciocost (operator_name IN VARCHAR2)
    => NUMBER

Returns the CPU cost for the functional execution path for the
given operator.

  - operator_name: the index function, eg. 'exact', 'contains'. 


FUNCTION ddoptimizer.fgetindexcpucost (
             index_name IN VARCHAR2,
             operator_name IN VARCHAR2,
             selectivity IN NUMBER,
             hitcount IN NUMBER) => NUMBER

FUNCTION ddoptimizer.fgetindexiocost (
             index_name IN VARCHAR2,
             operator_name IN VARCHAR2,
             selectivity IN NUMBER,
             hitcount IN NUMBER) => NUMBER

Returns the CPU and IO costs for the index execution path for the given
operator, against the given index.

  - index_name: the Oracle index name being used,
  - operator_name: the index function, eg. 'exact', 'contains',
  - selectivity: the percentage selectivity for the query,
  - hitcount: For 'nearest' queries, the number of hits desired.


The ddpackage.finfo() and ddpackage.fsetinfo() functions are general
utilities to get and set parameters within Daycart.  Parameters can
be modified on a per-session basis.  The following parameters apply
to the new ddoptimizer package:

 -ddoptimizer_index_cpu_factor: Scaling factor controlling the
     CPU cost via the index path (default: 100)
 -ddoptimizer_index_io_factor: Scaling factor controlling the
     IO cost via the index path (default: 100)
 -ddoptimizer_func_cpu_factor: Scaling factor controlling the
     CPU cost via the functional path (default: 100)
 -ddoptimizer_func_io_factor: Scaling factor controlling the
     IO cost via the functional path (default: 100)

These parameters can be modified before every query, if necessary.
Unfortunately they can not be made to apply to parts of a compound
query.  Consider the following query:

select * from my_table where 
  contains(smi, 'c1ccccc1') = 1 
  and
  matches(smi, '[CX3]') = 1

The scaling factors apply to both the contains() and matches()
queries with the SQL session; it is not possible to use the scaling 
factors to bias the execution plan to choose either predicate 
preferentially.


EXAMPLES OF USE:

The following examples assume a ddblob and ddgraph index on the
table my_table as follows:

  create index my_graph_index on my_table(smi) indextype is ddgraph;
  create index my_blob_index  on my_table(smi) indextype is ddblob;


Query 1: (basic query)

  select * from my_table where contains(smi, 'c1ccccc1') = 1;

  The following ddoptimizer functions will provide the complete
  statistics for this query:

  sel  := ddoptimizer.fgetsel('contains', 'c1ccccc1', 1, 0);
  fcpu := hddoptimizer.fgetfunccpucost('contains');
  fio  := ddoptimizer.fgetfunciocost('contains');
  icpu := ddoptimizer.fgetindexcpucost('my_blob_index', 'contains', sel, 0);
  iio  := ddoptimizer.fgetindexiocost('my_blob_index', 'contains', sel, 0);


Query 2: (note extra 'nearest' parameter)

  select * from my_table where matches(smi, 'c1ccccc1', 30) = 1;

  The following ddoptimizer functions will provide the complete
  statistics for this query:

  sel  := ddoptimizer.fgetsel('contains', 'c1ccccc1', 1, 30);
  fcpu := hddoptimizer.fgetfunccpucost('contains');
  fio  := ddoptimizer.fgetfunciocost('contains');
  icpu := ddoptimizer.fgetindexcpucost('my_blob_index', 'contains', sel, 30);
  iio  := ddoptimizer.fgetindexiocost('my_blob_index', 'contains', sel, 30);


Query 3: (note non-hits are wanted)

  select * from my_table where tautomer(smi, 'c1ccccc1') = 0;

  The following ddoptimizer functions will provide the complete
  statistics for this query:

  sel  := ddoptimizer.fgetsel('tautomer', 'c1ccccc1', 0, 0);
  fcpu := hddoptimizer.fgetfunccpucost('tautomer');
  fio  := ddoptimizer.fgetfunciocost('tautomer');
  icpu := ddoptimizer.fgetindexcpucost('my_graph_index', 'tautomer', sel, 0);
  iio  := ddoptimizer.fgetindexiocost('my_graph_index', 'tautomer', sel, 0);