THE PROBLEM:
Given a query like:
select * from my_table where contains(smi, 'c1ccccc1') = 1 and name like '%dopa%';
What is the optimium order in which to evaluate the query for best performance. Depending on the situation, best performance might mean finding all the results or might mean finding the first/most relevant results.
API DESCRIPTION:
The following functions allow the user to query the actual CPU and IO costs and selectivies provided to Oracle by the ddoptimizer package. These are the values which the SQL execution engine uses internally to decide between multiple execution plans. FUNCTION ddoptimizer.fgetsel ( operator_name IN VARCHAR2, query_string IN VARCHAR2_OR_CLOB, direction IN NUMBER, hitcount IN NUMBER) => NUMBER Returns the query selectivity as a percentage, the range is 1 - 100. - operator_name: the index function, eg. 'exact', 'contains'., - query_string: the SMILES/SMARTS query, - direction: 1=find hits, 0=find non-hits, - hitcount: For 'nearest' queries, the number of hits desired. FUNCTION ddoptimizer.fgetfunccpucost (operator_name IN VARCHAR2) => NUMBER FUNCTION ddoptimizer.fgetfunciocost (operator_name IN VARCHAR2) => NUMBER Returns the CPU cost for the functional execution path for the given operator. - operator_name: the index function, eg. 'exact', 'contains'. FUNCTION ddoptimizer.fgetindexcpucost ( index_name IN VARCHAR2, operator_name IN VARCHAR2, selectivity IN NUMBER, hitcount IN NUMBER) => NUMBER FUNCTION ddoptimizer.fgetindexiocost ( index_name IN VARCHAR2, operator_name IN VARCHAR2, selectivity IN NUMBER, hitcount IN NUMBER) => NUMBER Returns the CPU and IO costs for the index execution path for the given operator, against the given index. - index_name: the Oracle index name being used, - operator_name: the index function, eg. 'exact', 'contains', - selectivity: the percentage selectivity for the query, - hitcount: For 'nearest' queries, the number of hits desired. The ddpackage.finfo() and ddpackage.fsetinfo() functions are general utilities to get and set parameters within Daycart. Parameters can be modified on a per-session basis. The following parameters apply to the new ddoptimizer package: -ddoptimizer_index_cpu_factor: Scaling factor controlling the CPU cost via the index path (default: 100) -ddoptimizer_index_io_factor: Scaling factor controlling the IO cost via the index path (default: 100) -ddoptimizer_func_cpu_factor: Scaling factor controlling the CPU cost via the functional path (default: 100) -ddoptimizer_func_io_factor: Scaling factor controlling the IO cost via the functional path (default: 100) These parameters can be modified before every query, if necessary. Unfortunately they can not be made to apply to parts of a compound query. Consider the following query: select * from my_table where contains(smi, 'c1ccccc1') = 1 and matches(smi, '[CX3]') = 1 The scaling factors apply to both the contains() and matches() queries with the SQL session; it is not possible to use the scaling factors to bias the execution plan to choose either predicate preferentially.
EXAMPLES OF USE:
The following examples assume a ddblob and ddgraph index on the table my_table as follows: create index my_graph_index on my_table(smi) indextype is ddgraph; create index my_blob_index on my_table(smi) indextype is ddblob; Query 1: (basic query) select * from my_table where contains(smi, 'c1ccccc1') = 1; The following ddoptimizer functions will provide the complete statistics for this query: sel := ddoptimizer.fgetsel('contains', 'c1ccccc1', 1, 0); fcpu := hddoptimizer.fgetfunccpucost('contains'); fio := ddoptimizer.fgetfunciocost('contains'); icpu := ddoptimizer.fgetindexcpucost('my_blob_index', 'contains', sel, 0); iio := ddoptimizer.fgetindexiocost('my_blob_index', 'contains', sel, 0); Query 2: (note extra 'nearest' parameter) select * from my_table where matches(smi, 'c1ccccc1', 30) = 1; The following ddoptimizer functions will provide the complete statistics for this query: sel := ddoptimizer.fgetsel('contains', 'c1ccccc1', 1, 30); fcpu := hddoptimizer.fgetfunccpucost('contains'); fio := ddoptimizer.fgetfunciocost('contains'); icpu := ddoptimizer.fgetindexcpucost('my_blob_index', 'contains', sel, 30); iio := ddoptimizer.fgetindexiocost('my_blob_index', 'contains', sel, 30); Query 3: (note non-hits are wanted) select * from my_table where tautomer(smi, 'c1ccccc1') = 0; The following ddoptimizer functions will provide the complete statistics for this query: sel := ddoptimizer.fgetsel('tautomer', 'c1ccccc1', 0, 0); fcpu := hddoptimizer.fgetfunccpucost('tautomer'); fio := ddoptimizer.fgetfunciocost('tautomer'); icpu := ddoptimizer.fgetindexcpucost('my_graph_index', 'tautomer', sel, 0); iio := ddoptimizer.fgetindexiocost('my_graph_index', 'tautomer', sel, 0);