Daycart PL/SQL Functions, SQL Operators

2. PL/SQL Functions, SQL Operators

Several stateless PL/SQL functions and SQL operators have been implemented for the cartridge. All of the PL/SQL functions are contained in a package called 'ddpackage' owned by 'c$dcischem'. Privileges required to access these functions are granted by the 'daycart' role.

The default installation (create.sql) creates public synonyms for the package 'ddpackage' and all of the operators described herein. Therefore, any Oracle user can access functions and operators using the synonyms, e.g., select tanimoto(...) from table rather than select c$dcischem.tanimoto(...) from table). Please note that there are no synonyms allowed for indextype specifications; hence one must use full names when creating indexes as a user other than c$dcischem, e.g., create index <name> on small(smi) indextype is c$dcischem.ddblob.

2.1 String Data Handling

Daycart supports both VARCHAR2 and CLOB string datatypes interchangeably in all Daycart functions, operators and index searches. At runtime Oracle uses argument overloading to transparently handle the strings arguments passed into the Daycart functions. However, since Oracle does not transparently handle return types, the user must be aware of the string type returned by a function. In general, Daycart functions return strings with the same string type which was passed into the function as the 'key' string parameter. For example:

function ddpackage.fsmi2cansmi (smiles IN VARCHAR2_OR_CLOB, type IN NUMBER) => VARCHAR2_OR_CLOB

If the string argument passed into smi2cansmi() is a VARCHAR2, then the function will always return a VARCHAR2. Similarly, if the string argument is a CLOB, the function will always return a CLOB.

There are three Daycart functions which take multiple input strings and also return a string --- partnorm(), atomnorm(), and bondnorm(). These functions use the ntuple-list data as the 'key' argument is the ntuple-list data.

2.2 General Purpose Functions

There are five general purpose functions available that are typically used for debugging or setting session-level options only. Operators and public synonyms are not created for these functions during the installation. Hence, these they must be referenced by their fully-qualified names, e.g., c$dcischem.ddpackage.ftestlicense().

fsetdebug

function ddpackage.fsetdebug (value IN NUMBER) => NUMBER

Controls the level of messages written to the log. The default logfile is /tmp/extproc.log. The logfile name can be changed with the DAYCART_LOGFILE environment variable in listener.ora. Sets the new value of the logging level to the new 'value' for the current session. Valid values are integers the range of 0 - 9. By default only Error message are written to the log.

    0 - No logging at all
    1 - Error messages only
    5 - Warnings and errors
    9 - Notes, warnings, and errors

fgeterrors

function ddpackage.fgeterrors (level IN NUMBER) => VARCHAR2

Returns error strings from the error queue for previously failed functions or operators based on the level requested and clears the error queue. By default only message at the level of Errors or above are sent to the screen when a function fails.

    0 - All messages
    1 - Notes
    2 - Warnings
    3 - Errors
    4 - Fatal errors

fgetlog

function ddpackage.fgetlog => CLOB

Returns error strings from the log messages and clears the local buffer of log messages. Behavior is controlled by the session-level option 'log'.

In order to get log messages from the ddpackage.fgetlog() function the 'log' option must be set to either 'LOCAL' or 'BOTH'. From that point log messages will be kept and can be retrieved with the fgetlog() function. By default, the 'log' option is set to 'CENTRAL' and all log messages go to the central logfile, which by default is /tmp/extproc.log. The central logfile is controlled by the environment variable DAYCART_LOGFILE, which can be set in the listener.ora parameter file for extproc.

ftestlicense

function ddpackage.ftestlicense (product IN VARCHAR2_OR_CLOB) => NUMBER

Checks the license. The license is contained in a special table is created at install time. Currently the only recognized value for product is 'daycart'. Returns 1 if the Daylight cartridge has a valid license and 0 if not.

finfo

function ddpackage.finfo (which IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB

Returns informational strings from DayCart. Valid input parameters are listed below. In addition, finfo can be used to find the value for any dayconvert option. See Section 2.3 Molecule / Reaction Functions for detailed descriptions of the dayconvert options.

'toolkit_version'

Returns the current Daylight Toolkit version.

'daycart_version'

Returns the cartridge executable version.

'extproc_pid'

Returns the extproc process ID for this DayCart session.

'hostid'

Returns the hardware hostid used for generation of the license key.

'debug_level'

Returns the debug level set for logging messages.

'session_tag'

Returns the session tag set for a given session. The session tag, if set, is included in any log messages and can be used to identify the session which generated a log entry.

'exit_on_fault'

Returns TRUE or FALSE. When TRUE, the daycart session will keep track of the count of severe errors encountered (ORA-00600, ORA-03113, ORA-03114, ORA-07445) and if more than fifty of these errors are logged, the extproc process for the current session will exit.

'log'

Returns where the session log info is written. Choices are NONE, LOCAL, CENTRAL, and BOTH. CENTRAL is the default, and refers to the value of DAYCART_LOGFILE (in the listener.ora) or a default of /tmp/extproc.log. LOCAL logging indicates that any log messages are stored for retrieval by the ddpackage.fgetlog() function.

'default_delimiter'

Returns the default delimiter used to separate lines of multi-line output.

'force_delimiter'

Returns TRUE or FALSE. When TRUE, the default delimiter is always used for multi-line output. When FALSE, Daycart may attempt to detect the delimiter to use (from other input).

'vcs_table_cache'

Returns the value for the state of the VCS cache of salts and transforms. When off, each invocation of a vcs function causes the salt or transform table to be reread.

'default_fpsize'

Returns the default fingerprint size used for similarity comparisons and index creation when not otherwise defined.

'timeout_interrupt'

Returns the current search interrupt time in seconds. An ongoing ddblob search will return to Oracle from extproc in at most this many seconds, provided that some search results have been found. Unless the search is interrupted by the client the search will continue.

'timeout_abort'

Returns the current search abort time in seconds. An ongoing ddblob search will abort and return the hits found so far after this elapsed time.

'timeout_progress'

For the last ddblob search, returns zero if the search completed or a positive integer if the search aborted due to timeout_abort. In the event of an abort, the value returned is the number of structures searched; this can be used as the approximate progress of the search (relative to the number of rows in the table being searched).

'thread_count'

Returns the number of worker threads which will be allocated in a session for multithreaded searches. Multithreading is used for contains(), isin(), and matches() searches.

'count_value'

Returns the number of hits expected (estimated or actual) for the previously-run query. One must set the count_mode using ddpackage.fsetinfo('count_mode={ACTUAL|ESTIMATE}') immediately before executing the query, and then get the count_value immediately after the query. When in count mode, no hits will be returned from any Daycart search but the count_value variable will reflect the number of hits which would have been returned.

fsetinfo

function ddpackage.fsetinfo (name_value_pair IN VARCHAR2_OR_CLOB) => NUMBER

Allows the user to individually set session-level options in DayCart for the following parameters and all dayconvert options (see Section 2.3 Molecule / Reaction Functions).

'vcs_table_cache={on|off}'

'debug_level={level}'

'session_tag={value}'

'exit_on_fault={TRUE|FALSE}'

'log={place}'

'default_fpsize={nbits}'

'default_delimiter={string}'

'force_delimiter={TRUE|FALSE}'

'timeout_interrupt={seconds}'

'timeout_abort={seconds}'

'options={class}'

'thread_count={count}'

'count_mode={NONE|ACTUAL|ESTIMATE}'

Starting with version 4.93, fsetinfo can be used to globally set options on a session-level basis. An OPTIONS table which can be populated with user-defined sets of parameters is automatically created during installation. Executing fsetinfo using 'options={class}' will reset all options to their default values and then will set the values associated with the given class. On session startup, the options with class of zero are set.

Name                                      Type
----------------------------------------- ----------------------------
NAME                                      VARCHAR2(100)
VALUE                                     VARCHAR2(100)
CLASS                                     NUMBER(7)

NAME = parameter name as listed above or any of the dayconvert options
VALUE = new value
CLASS = set number for options.

2.3 Molecule / Reaction Functions

Functions and their respective operations relating to conversion, transformation. property values and normalization and of molecules and reactions are described below.

fsmi2cansmi

function ddpackage.fsmi2cansmi (smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER) => VARCHAR2_OR_CLOB
operator smi2cansmi (smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER)=> VARCHAR2_OR_CLOB

Returns a canonical SMILES string from an input SMILES. Type is either 0 or 1, for unique or absolute SMILES, respectively.

fsmi2xsmi

function ddpackage.fsmi2xsmi (smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER, explicit IN NUMBER)=> VARCHAR2_OR_CLOB
operator smi2xsmi (smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER, explicit IN NUMBER) => VARCHAR2_OR_CLOB

Returns an exchange SMILES string which is semantically identical to the input SMILES but which does not use Daylight-specific aromaticity conventions. Type is either 0 or 1, for unique or absolute SMILES, respectively. When 'explicit' is 1, it supplies hydrogen count and other atomic properties explicitly for every atom. Note that exchange SMILES are not canonical; the same input molecule or reaction may return different exchange SMILES depending on the input order to this function.

fsmi2netch

function ddpackage.fsmi2netch (smiles IN VARCHAR2_OR_CLOB) => NUMBER
operator smi2netch (smiles IN VARCHAR2_OR_CLOB) => NUMBER

Returns the net charge of the input molecule or reaction.

fsmi2hcount

function ddpackage.fsmi2hcount (smiles IN VARCHAR2_OR_CLOB) => NUMBER
operator smi2hcount (smiles IN VARCHAR2_OR_CLOB) => NUMBER

Returns the total hydrogen count for the input molecule or reaction.

fsmi2mf

function ddpackage.fsmi2mf (smiles IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB
operator smi2mf (smiles IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB

Returns the molecular formula string for the input molecule or reaction.

fsmi2amw

function ddpackage.fsmi2amw (smiles IN VARCHAR2_OR_CLOB) => NUMBER
operator smi2amw (smiles IN VARCHAR2_OR_CLOB) => NUMBER

Returns the average molecular weight for the input molecule or reaction. The weight used for any atoms which do not have specified isotopes is the average atomic weight. The weight used for atoms with a specified isotope is the high precision molecular weight for that atom. For example, "c1ccccc1" returns 78.1184, while "[1H][12c]1[12c]([1H])[12c]([1H])[12c]([1H])[12c]([1H])[12c]1[1H]" returns 78.0469502.

fsmi2pmw

function ddpackage.fsmi2pmw (smiles IN VARCHAR2_OR_CLOB) => NUMBER
operator smi2pmw (smiles IN VARCHAR2_OR_CLOB) => NUMBER

Returns the high-precision molecular weight for the input molecule or reaction. The weight used for any atoms which do not have specified isotopes is the high precision weight for the most abundant isotope. The weight used for atoms with a specified isotope is the high precision molecular weight for that atomic isotope. For example, "c1ccccc1" returns 78.0469502, the high precision molecular weight. "BrBr" returns 157.836675, while "[81Br][81Br]" returns 161.832582.

Both smi2amw() and smi2pmw() return the same values for SMILES with fully specified isotopes; they only differ in their handling of SMILES with unspecified isotopic weights. Note also that the unique canonical SMILES (eg. smi2cansmi('smiles', 0)) returns the structure with all isotopic information removed; this is useful for consistent handling of partially specified isotopic information in conjunction with the two functions.

fsmi2graph

function ddpackage.fsmi2graph (smiles IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB 
operator smi2graph (smiles IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB

Returns the hydrogen- and charge-suppressed canonical graph string for the input molecule or reaction.

fvcs_desalt

function ddpackage.fvcs_desalt(smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER, class IN NUMBER(7,0)) => VARCHAR2_OR_CLOB
operator vcs_desalt(smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER, class IN NUMBER(7,0))=> VARCHAR2_OR_CLOB

Removes molecule fragments found in the c$dcischem.salts table from the input SMILES. Type is either 0 or 1, for unique or absolute SMILES, respectively. The class value is the class of salts entries used from the salts table. All of the structures in the salts table with the given class are checked against the input SMILES and if found, they are removed.

fvcs_normalize

function ddpackage.fvcs_normalize(smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER, class IN NUMBER(7,0)) => VARCHAR2_OR_CLOB
operator vcs_normalize(smiles IN VARCHAR2_OR_CLOB, 
	type IN NUMBER, class IN NUMBER(7,0)) => VARCHAR2_OR_CLOB

Performs a SMIRKS-based structure normalization on the input SMILES. All of the SMIRKS from the c$dcischem.transform table with the given class are applied to the input molecule. The resulting molecule is output as a canonical SMILES. Type is either 0 or 1 for unique or absolute SMILES, respectively.

fatomnorm
fbondnorm
fpartnorm

function ddpackage.fatomnorm (smiles IN VARCHAR2_OR_CLOB, 
	list IN VARCHAR2_OR_CLOB, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2_OR_CLOB
operator atomnorm (smiles IN VARCHAR2_OR_CLOB, 
	list IN VARCHAR2_OR_CLOB, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2_OR_CLOB

function ddpackage.fbondnorm (smiles IN VARCHAR2_OR_CLOB, 
	list IN VARCHAR2_OR_CLOB, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2_OR_CLOB
operator bondnorm (smiles IN VARCHAR2_OR_CLOB, 
	list IN VARCHAR2_OR_CLOB, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2_OR_CLOB

function ddpackage.fpartnorm (smiles IN VARCHAR2_OR_CLOB, 
	list IN VARCHAR2_OR_CLOB, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2_OR_CLOB
operator partnorm (smiles IN VARCHAR2_OR_CLOB, 
	list IN VARCHAR2_OR_CLOB, ntuple IN NUMBER, isotype IN NUMBER) => VARCHAR2_OR_CLOB

Returns a potentially reordered N-tuple string for the given list input parameter. The list string is a comma-separated list of data which is associated in order with the atoms, bonds or parts of the input SMILES. The list string is reordered based on the canonical atom, bond, or part ordering of the input SMILES. ntuple is the number of comma-separated values per atom, bond, or dot-separated part, and isotype is 0 for unique SMILES canonicalization and 1 for absolute SMILES canonicalization.

fgen_molecules
fgen_reactions

function ddpackage.fgen_molecules(smiles IN VARCHAR2_OR_CLOB, 
	smirks IN VARCHAR2_OR_CLOB,direction IN NUMBER, limit IN NUMBER, 
	type IN NUMBER) => VARCHAR2_OR_CLOB
operator gen_molecules(smiles IN VARCHAR2_OR_CLOB, 
	smirks IN VARCHAR2_OR_CLOB, direction IN NUMBER, limit IN NUMBER, 
	type IN NUMBER) => VARCHAR2_OR_CLOB

function ddpackage.fgen_reactions(smiles IN VARCHAR2_OR_CLOB, 
	smirks IN VARCHAR2_OR_CLOB, direction IN NUMBER, limit IN NUMBER, 
	type IN NUMBER) => VARCHAR2_OR_CLOB
operator gen_reactions(smiles IN VARCHAR2_OR_CLOB, 
	smirks IN VARCHAR2_OR_CLOB, direction IN NUMBER, limit IN NUMBER, 
	type IN NUMBER) => VARCHAR2_OR_CLOB

Applies the transform created from the given SMIRKS to the input molecules. The argument direction indicates that the transform is applied in the forward (0) or reverse (1) direction. The limit parameter is the maximum count of specific molecules to return and 0 indicates no limit. The resulting molecules or reactions are output as a set of newline-delimited canonical SMILES. The gen_molecules() function and operator returns the molecules which result from a transformation; the gen_reactions() function and operator return the complete reaction. The type indicates either 0 or 1 for unique or absolute SMILES, respectively.

The results are returned as a single string. If more than one molecule or reaction is formed, then each is on a separate line of the output, delimited by the default_delimiter string (see ddpackage.finfo().

fdayconvert

function ddpackage.fdayconvert (data IN VARCHAR2_OR_CLOB, 
	ifmt IN VARCHAR2, ofmt IN VARCHAR2, type IN NUMBER, 
	ptable_class IN NUMBER) => VARCHAR2_OR_CLOB
operator dayconvert (data IN VARCHAR2_OR_CLOB, 
	ifmt IN VARCHAR2, ofmt IN VARCHAR2, type IN NUMBER, 
	ptable_class IN NUMBER) => VARCHAR2_OR_CLOB

Returns the input chemical information in a different format. See the Daylight Conversion Manual for additional information. Note: The 'type' and 'ptable_class' parameters are optional as described below.

The 'ifmt' and 'ofmt' parameters are used to designate the input and output formats, respectively. Both parameters need to be identified by particular letter sequence. Valid combinations of input and output formats for conversion are as follows where tdtsma and tdtsmrk are tdt versions with smarts and smirks, respectively:

	smi ---> mol/sdf/rdf
	tdt ---> mol/sdf/rdf
	mol or sdf ---> smi/ism/sma/tdt/tdtsma
	rdf ---> smi/ism/sma/smrk/tdt/tdtsma/tdtsmrk

Either mol or sdf can be used for rgfile input.

Delimiters for interpreting multi-line input data are detected from the input stream. Input delimiters may be the strings "\n", "\r", or "\r\n". On output, the delimiter chosen for multi-line output depends on the input delimiter detected and the settings of "force_delimiter" and "default_delimiter". See the description of these items in the documentation for the ddpackage.finfo(). function.

The 'type' parameter has been included for backwards compatibility and no longer controls the inclusion of isomeric information in the conversion output. Conversion output from a smiles form to an MDL form will always include the isomeric information while conversion from an MDL form to SMILES is controlled by the inclusion of ism versus smi for ofmt. However, inclusion of a value for the type parameter (0 or 1) is required for versions 4.92;

select ddpackage.fdayconvert ('CCCOC', 'smi', 'sdf', 1) from dual;

In contrast for versions 4.93 and later, a value for the type parameter or is only required if a ptable_class value is given.

select ddpackage.fdayconvert ('CCCOC', 'smi', 'sdf') from dual;

OR

select ddpackage.fdayconvert ('CCCOC', 'smi', 'sdf', 1, 10) from dual;

The ptable_class parameter is optional. If the 'ptable_class' parameter is provided, it indicates that valence and charge information in the user-defined PTABLE table information is to be used instead of that provided in the default p-table. The specific information to be used is based upon the class number supplied.

Name                                      Type
----------------------------------------- ----------------------------
AT_NO                                     NUMBER
SYMBOL                                    VARCHAR2(8)
AT_MASS                                   NUMBER
VALENCE_CHARGE_LIST                       VARCHAR2(4000)
CLASS                                     NUMBER

AT_NO = atomic number
SYMBOL = atomic symbol 	 	
AT_MASS = atomic mass
VALENCE_CHARGE_LIST = list of valence and charge, e.g., '2,-1,3,0'  
	for valence 2 with -1 change or valence 3 with 0 charge

The following are a series of dayconvert related options. The value for any of these options can be found using the finfo() function. Values for these options can be changed on a per session basis for an individual option or a group of options using the fsetinfo() function as described in Section 2.2 General Purpose Functions.

'conv_smi_is_ism={TRUE|FALSE}'

FALSE

TRUE

'conv_add_3d={TRUE|FALSE}'

TRUE

'conv_add_2d={TRUE|FALSE}'

TRUE

'conv_use_3d={TRUE|FALSE}'

FALSE

TRUE

'conv_split_fields={TRUE|FALSE}'

FALSE

TRUE

'conv_id_field={id_field}'

Sets the id field to be used. The default behavior for molecules being converted to tdt format is to use the first line of each header block as the id field. The default behavior for reactions is to use the value following $RIREG as the id.

'conv_prefix={prefix}'

Sets the designated prefix to be parsed from the datatype names for conversion of rdf files. The default is none.

'conv_implicit_chirality={TRUE|FALSE}'

FALSE

TRUE

'conv_ring_cistrans={TRUE|FALSE}'

FALSE

TRUE

'conv_db_explicit_h={TRUE|FALSE}'

FALSE

TRUE

'conv_chi_explicit_h={TRUE|FALSE}'

FALSE

TRUE

'conv_fix_radical_rings={TRUE|FALSE}'

TRUE

FALSE

'conv_nametag={tag}'

Sets the tdt datatype tag used for the id. Default is $NAM.

'conv_comment_smi={TRUE|FALSE}'

FALSE

TRUE

'conv_smi_tuples={TRUE|FALSE}'

TRUE

FALSE

'conv_day_hcount={TRUE|FALSE}'

TRUE

'conv_day_stereo={TRUE|FALSE}'

TRUE

FALSE

'conv_day_chih={TRUE|FALSE}'

TRUE

FALSE

fsmi2scbits

function ddpackage.fsmi2scbits (smiles IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB 
operator smi2scbits (smiles IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB

Returns ASCII encoded scaffold data for a given molecule or reaction. This data is not normally visible within Daycart. The function is provided as a convenience when creating ddblob indexes. It allows one to precompute and store the scaffold data. One can then use the 'initsccolumn' parameter for ddblob creation to avoid recomputing the scaffold data.

2.4 Fingerprint Functions

This section describes functions and their respective operations relating to generation of and information concerning fingerprints.

fsmi2fp

function ddpackage.fsmi2fp (smiles IN VARCHAR2_OR_CLOB, 
	min IN NUMBER, max IN NUMBER, nbits IN NUMBER) => VARCHAR2_OR_CLOB 
operator smi2fp (smiles IN VARCHAR2_OR_CLOB, min IN NUMBER, 
	max IN NUMBER, nbits IN NUMBER) => VARCHAR2_OR_CLOB

Returns the ASCII fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and size is the number of bits in the fingerprint.

fsmi2xfp

function ddpackage.fsmi2xfp (smiles IN VARCHAR2_OR_CLOB, 
	min IN NUMBER, max IN NUMBER, nbits IN NUMBER) => VARCHAR2_OR_CLOB 
operator smi2xfp (smiles IN VARCHAR2_OR_CLOB, min IN NUMBER, 
	max IN NUMBER, nbits IN NUMBER) => VARCHAR2_OR_CLOB

Returns the ASCII difference fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and size is the number of bits in the fingerprint.

ffoldfp

function ddpackage.ffoldfp (fpstr IN VARCHAR2_OR_CLOB, 
	nbits IN NUMBER, dens IN NUMBER) => VARCHAR2_OR_CLOB
operator foldfp (fpstr IN VARCHAR2_OR_CLOB, nbits IN NUMBER, 
	dens IN NUMBER) => VARCHAR2_OR_CLOB

Folds the given fingerprint to the minimum appropriate size or density, whichever is limiting, and returns the new, folded fingerprint.

fbitcount

function ddpackage.fbitcount (fpstr IN VARCHAR2_OR_CLOB) => NUMBER
operator bitcount (fpstr IN VARCHAR2_OR_CLOB) => NUMBER

Returns the number of bits on in the fingerprint.

fnbits

function ddpackage.fnbits (fpstr IN VARCHAR2_OR_CLOB) => NUMBER
operator nbits (fpstr IN VARCHAR2_OR_CLOB) => NUMBER

Returns the total size of the fingerprint, in bits. In the current Daylight toolkit this will always be a power of two.

fisfp

function ddpackage.fisfp (fpstr IN VARCHAR2_OR_CLOB) => NUMBER
operator isfp (fpstr IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if the string is a fingerprint, 0 otherwise. The syntax of a fingerprint can never be confused with a valid SMILES. This is the bit of cleverness which allows us to overload the searching functions.

2.5 Comparison Functions

Functions and their respective operations relating to a variety of direct comparisons between input smiles, smarts, or fingerprints are described below.

fexact

function ddpackage.fexact (a IN VARCHAR2_OR_CLOB, b IN VARCHAR2_OR_CLOB) => NUMBER
operator exact (a IN VARCHAR2_OR_CLOB, b IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if the two input strings are identical, 0 otherwise. The operator is optionally backed by the ddexact indextype.

fgraph

function ddpackage.fgraph (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER
operator graph (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if the two input SMILES share the same canonical graph, 0 otherwise. The operator is optionally backed by the ddgraph indextype.

ftautomer

function ddpackage.ftautomer (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER
operator tautomer (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if the two input SMILES share the same canonical graph, net charge, and total hydrogen count, 0 otherwise. The operator is optionally backed by the ddgraph indextype.

fusmiles

function ddpackage.fusmiles (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER
operator usmiles (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if the two input SMILES share the same unique canonical smiles, 0 otherwise. The operator is optionally backed by the ddgraph indextype.

fasmiles

function ddpackage.fasmiles (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER
operator asmiles (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if the two input SMILES share the same absolute canonical smiles, 0 otherwise. The operator is optionally backed by the ddgraph indextype.

fcomponent

function ddpackage.fcomponent (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER
operator component (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if smiles2 (a molecule SMILES) is a component of smiles1 (any SMILES), otherwise returns 0. Also returns false if smiles2 is not a single-component query. The last example in the table below illustrates this problem. The operator is optionally backed by the ddrole indextype. A component is a single dot-separated part of a larger molecule or reaction SMILES. Some examples follow:

smiles1	smiles2	Returns
CCC	CCC	1
CCC.CCCN	CCC	1
CCC.CCCN	CCCN	1
CCC>>CCCN	CCC	1
CCC>>CCCN	CCCN	1
CCC.CCCN	CCC.CCCN	0

freactant
fagent
fproduct

function ddpackage.freactant (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER
operator reactant (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

function ddpackage.fagent (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB)=> NUMBER
operator agent (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB)=> NUMBER

function ddpackage.fproduct (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB)=> NUMBER
operator product (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB) => NUMBER

Returns 1 if smiles2 (a molecule SMILES) is a component of smiles1 (a reaction SMILES) with the appropriate role, otherwise returns 0. The operators are optionally backed by the ddrole indextype.

fcontains

function ddpackage.fcontains (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)
operator contains (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)

Returns 1 if smiles1 contains smiles2; that is, smiles2, assuming opened valences for all hydrogens, is a substructure of smiles1. The operator is optionally backed by the ddblob indextype.

The optional count argument is ignored in the functional forms. It is used in the index implementation as the desired number of hits to be returned. When used, the best 'count' hits (based on tanimoto similarity) will be returned that meet the search criteria.

fisin

function ddpackage.fisin (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)
operator isin (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is
optional)

Returns 1 if smiles2 contains smiles1; that is, smiles1, assuming opened valences for all hydrogens, is a substructure of smiles2. This functionality is identical to 'contains()' with the arguments swapped. The operator is optionally backed by the ddblob indextype.

fmatches

function ddpackage.fmatches (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)
operator matches (smiles1 IN VARCHAR2_OR_CLOB, 
	smiles2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)

Returns 1 if the smarts expression matches the given SMILES, 0 otherwise. The operator is optionally backed by the ddblob indextype.

The optional count argument is ignored in the functional forms. It is used in the index implementation as the desired number of hits to be returned. When used, the best 'count' hits (based on bits set in the target fingerprint) will be returned that meet the search criteria.

fmatchcover

function ddpackage.fmatchcover (smiles IN VARCHAR2_OR_CLOB, 
	smarts IN VARCHAR2_OR_CLOB) => NUMBER
operator matchcover (smiles IN VARCHAR2_OR_CLOB, 
	smarts IN VARCHAR2_OR_CLOB) => NUMBER

Returns the ratio of atoms in the target SMILES matched by the given query to the number of atoms in the target as a number between zero and one.

feuclid

function ddpackage.feuclid (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)
operator euclid (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)

Returns the euclidean distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. This is optionally backed by the ddblob indextype.

ftanimoto

function ddpackage.ftanimoto (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)
operator tanimoto (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)

Returns the tanimoto distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. This is optionally backed by the ddblob indextype.

ftversky

function ddpackage.ftversky (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, alpha IN NUMBER, beta IN NUMBER, 
	count IN NUMBER) => NUMBER   (count is optional)
operator tversky (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, alpha IN NUMBER, beta IN NUMBER, 
	count IN NUMBER) => NUMBER   (count is optional)

Returns the tversky distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. This is optionally backed by the ddblob indextype.

ffingertest

function ddpackage.ffingertest (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)
operator fingertest (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, count IN NUMBER) => NUMBER   (count is optional)

Returns 1 if all of the bits in fp_or_smi2 are also present in fp_or_smi1. That is, the fingerprint from fp_or_smi2 represents a possible substructure of fp_or_smi1. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. This is optionally backed by the ddblob indextype.

The optional count argument is ignored in the functional forms. It is used in the index implementation as the desired number of hits to be returned. When used, the best 'count' hits (based on the number of bits set in the target) will be returned that meet the search criteria.

fsimilarity

function ddpackage.fsimilarity (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
	fp_or_smi2 IN VARCHAR2_OR_CLOB, expression in VARCHAR2, 
	count IN NUMBER) => NUMBER   (count is optional)
operator similarity (fp_or_smi1 IN VARCHAR2_OR_CLOB, 
        fp_or_smi2 IN VARCHAR2_OR_CLOB, expression in VARCHAR2, 
	count IN NUMBER) => NUMBER   (count is optional)

Returns the similarity or distance between two fingerprints or SMILES, based on the computed expression. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number. This is optionally backed by the ddblob indextype.

The value of expression can be any legal expression based on the counts of bits in the corresponding fingerprints (see the man page for expression(5)). Optionally, the expression string can be preceded with either "distance=" or "similarity=". These are used in by the index implementation but ignored in the functional form. The following examples are all identical:

  fsimilarity(smi1, smi2, 'TANIMOTO')
  fsimilarity(smi1, smi2, 'tanimoto')
  fsimilarity(smi1, smi2, 'similarity=tanimoto')
  fsimilarity(smi1, smi2, 'c/(a+b+c)')
  fsimilarity(smi1, smi2, 'similarity=c/(a+b+c)')

2.6 Program Object Functions

There are two lower-level functions which invoke program objects. It is expected that they will almost always be called from a PL/SQL wrapper layer, which would be responsible for packaging the communication between Oracle and the program object in a meaningful way. Hence, the default cartridge installation does not create operators for these two functions.

fprogob

function ddpackage.fprogob (name IN VARCHAR2_OR_CLOB,
        message IN VARCHAR2_OR_CLOB) => VARCHAR2_OR_CLOB

Communicates with a program object running on the Oracle server. The parameter 'name' is the symbolic name of the program object. The table c$dcischem.progob contains the mappings of symbolic names to actual executable programs. The function fprogob() proceeds as follows:

Looks up the symbolic name in the table c$dcischem.progob,
If the executable is not already running, starts the program object,
Parses the 'message' parameter into a sequence of strings,
Sends the sequence of strings to the program object,
Takes the returned sequence of strings and converts it into a newline delimited VARCHAR2_OR_CLOB,
Returns the VARCHAR2_OR_CLOB string.

Valid delimiters for the lines in the 'message' parameter include the normal line-termination characters for UNIX, Mac, and PC: '\n', '\r', and '\r\n'. The function properly handles all three termination cases. The returned VARCHAR2_OR_CLOB string is delimited by UNIX line-termination '\n'.