Optimizer Support

5. Optimizer Support

The cartridge will provide estimates of the costs of extensible index queries to the Oracle optimizer. The Oracle optimizer uses these cost estimates for its decision making with respect to the execution plan for a given SQL query. The execution plan chosen by Oracle can have a dramatic effect on the time and resources required to perform a particular query. The cartridge has the ability to provide selectivity and resource estimates of varying precision to Oracle, depending upon the configuration of the database index.

The default cartridge installation includes the code for performing cost estimates, but does not enable the cartridge optimizer code. All costs for index queries default to zero. Any Oracle query will preferrentially execute the extensible index query first.

INSTALLATION

In order to activate the optimizer code for the Daylight cartridge, one must associate statistics with the package and indextypes with the following code:

  SQL> connect c$dcischem/secret
  SQL> associate statistics with packages ddpackage using ddoptimizer;
  SQL> associate statistics with indextypes ddexact using ddoptimizer;
  SQL> associate statistics with indextypes ddgraph using ddoptimizer;
  SQL> associate statistics with indextypes ddrole using ddoptimizer;
  SQL> associate statistics with indextypes ddblob using ddoptimizer;

To test the optimizer code, execute the verification SQL script in $DY_ROOT/dcischem/TEST/dd_optimizer_test.sql. The results can be checked against dd_optimizer_test.ref_out.

  sqlplus -s 'c$dcischem/secret' < dd_optimizer_test.sql > test.out
  diff test.out dd_optimizer_test.ref_out

There should be no differences found.

Once the optimizer code is enabled, one can view the cost estimates provided to Oracle and the execution plan chosen using the "explain plan" command in SQL.

The command "analyze index <indexname> compute statistics" will collect and store information about the size of the index table for the ddgraph, ddrole, and ddexact indextypes. This size information will then be used to refine the cost estimates provided to the Oracle optimizer. The ddblob indextype does not require a separate "analyze index" command; the blob index always knows its exact size.

Note that the cost and selectivity estimates are not going to be reasonable in all cases; developers may find that forcing a particular execution order will result in better query performance. Oracle SQL supports a number of hints which allow the developer to control the execution order. See the section on "Optimizer Hints" in the Oracle SQL Reference manual for more details.

DAYCART API DESCRIPTION: FINDING THE COST

The optimizer features allow the user access to the data used within the optimizer to determine SQL statement execution paths.

To understand the execution path of a SQL statement, the user needs to understand the actual costs involved when DayCart functions are called. The costs referred to here are the actual CPU and I/O costs, or at least the best estimation of those.

The following functions allow the user to query the actual CPU and IO costs and selectivities provided to Oracle by the ddoptimizer package. These are the values which the SQL execution engine uses internally to decide between multiple execution plans.

   FUNCTION ddoptimizer.fgetsel (
           operator_name IN VARCHAR2,
           query_string IN VARCHAR2_OR_CLOB,
           direction IN NUMBER,
           hitcount IN NUMBER) => NUMBER

Returns the query selectivity as a percentage, the range is 1 - 100. Operator_name is the index function (eg. 'exact', 'contains'). Query_string is the SMILES, SMARTS, or fingerprint query. The direction indicates whether to find matches (1) or non-matches(0), and the hitcount, for 'nearest', queries, is the number of hits desired.

   FUNCTION ddoptimizer.fgetfunccpucost (
           operator_name IN VARCHAR2) => NUMBER
   FUNCTION ddoptimizer.fgetfunciocost (
           operator_name IN VARCHAR2) => NUMBER

Returns the CPU cost for the functional execution path for the given operator.


   FUNCTION ddoptimizer.fgetindexcpucost (
             index_name IN VARCHAR2,
             operator_name IN VARCHAR2,
             selectivity IN NUMBER, 
             hitcount IN NUMBER) => NUMBER
   FUNCTION ddoptimizer.fgetindexiocost (
             index_name IN VARCHAR2,
             operator_name IN VARCHAR2,
             selectivity IN NUMBER, 
             hitcount IN NUMBER) => NUMBER

These return the CPU and IO costs for the index execution path for the given operator, against the given index, where index_name is the Daycart index being used, operator_name is the index function, selectivity is the percentage selectivity of the query (from fgetsel()), and hitcount is the number of hits desired for 'nearest' queries, or zero.

DAYCART API DESCRIPTION: CUSTOMIZING THE COST

In addition to finding the estimated costs involved, you can also adjust those settings as needed. Because all databases, data, and usage is different, the user is the person best equipped to set the actual costs. These functions provide the user with the tools to adjust the costs involved and maintain an optimal SQL statement execution plan.

The ddpackage.finfo() and ddpackage.fsetinfo() functions are general utilities to get and set parameters within Daycart. Parameters can be modified on a per-session basis. The following parameters apply to the ddoptimizer package:

 -ddoptimizer_index_cpu_factor: Scaling factor controlling the
     CPU cost via the index path (default: 100)
 -ddoptimizer_index_io_factor: Scaling factor controlling the
     IO cost via the index path (default: 100)
 -ddoptimizer_func_cpu_factor: Scaling factor controlling the
     CPU cost via the functional path (default: 100)
 -ddoptimizer_func_io_factor: Scaling factor controlling the
     IO cost via the functional path (default: 100)

These parameters can be modified before every query, if necessary. Unfortunately they can not be made to apply to parts of a compound query. Consider the following query:

   select * from my_table where
     contains(smi, 'c1ccccc1') = 1
     and
     matches(smi, '[CX3]') = 1

The scaling factors apply to both the contains() and matches() queries with the SQL session; it is not possible to use the scaling factors to bias the execution plan to choose either predicate preferentially.

EXAMPLES OF USE:

The following examples assume a ddblob and ddgraph index on the table my_table as follows:

  create index my_graph_index on my_table(smi) indextype is ddgraph;
  create index my_blob_index  on my_table(smi) indextype is ddblob;

For these examples, my_table held 101227 rows.

Query 1: (basic query)

  select * from my_table where contains(smi, 'c1ccccc1') = 1;

The following ddoptimizer functions will provide the complete statistics for this query:

  sel  := ddoptimizer.fgetsel('contains', 'c1ccccc1', 1, 0);
  fcpu := ddoptimizer.fgetfunccpucost('contains');
  fio  := ddoptimizer.fgetfunciocost('contains');
  icpu := ddoptimizer.fgetindexcpucost('my_blob_index', 'contains', sel, 0);
  iio  := ddoptimizer.fgetindexiocost('my_blob_index', 'contains', sel, 0);

The following results are with scaling factors set to 100 across the board:

  Cost of select... contains('c1ccccc1'):       12
  CPU cost of contains() function:              700
  IO  cost of contains() function:              28
  CPU cost of my_blob_index index               30492
  IO  cost of my_blob_index index               175

And now if we set the scaling factors to 200 across the board by using the commands:

   SQL> select ddpackage.fsetinfo('ddoptimizer_index_cpu_factor=200') from dual;
   SQL> select ddpackage.fsetinfo('ddoptimizer_index_io_factor=200') from dual;
   SQL> select ddpackage.fsetinfo('ddoptimizer_func_cpu_factor=200') from dual;
   SQL> select ddpackage.fsetinfo('ddoptimizer_func_io_factor=200') from dual;

The results change for the new scale:

  Cost of select... contains('c1ccccc1'):       12
  CPU cost of contains() function:              1400
  IO  cost of contains() function:              56
  CPU cost of my_blob_index index               60983
  IO  cost of my_blob_index index               349

It can also be useful to specify the functional form of a query in order to eliminate consideration of the index path for the execution plan. Consider the following query:

  SQL> select * from <table> where
          graph(smiles, 'c1ccccc1') = 1 and
          contains(smiles, 'C') = 1;

By default, (assuming that both required indexes exist) this query will use indexes for both the graph and contains search, and then will merge the results. In most cases, it would be much more efficient to perform the graph index search, then perform the functional version of contains() on the handful of hits returned from the graph search. The following query will force the contains() operation to occur functionally:

  SQL> select * from <table> where
          graph(smiles, 'c1ccccc1') = 1 and
          ddpackage.fcontains(smiles, 'C') = 1;

In order to disable the cartridge optimizer code, any computed statistics must be dropped with "analyze index <indexname> delete statistics". Then, The cartridge optimizer code can be disabled with the following commands:

  SQL> connect c$dcischem/secret
  SQL> disassociate statistics from indextypes ddblob;
  SQL> disassociate statistics from indextypes ddgraph;
  SQL> disassociate statistics from indextypes ddrole;
  SQL> disassociate statistics from indextypes ddexact;
  SQL> disassociate statistics from packages ddpackage;

Once the cartridge optimizer code is disabled, cost estimates revert to their default values of zero cost.