Daycart SQL Functions

2. SQL Functions

A number of user-defined SQL functions have been implemented for the cartridge. All of the functions are defined in the current database during the installation process.

The default installation (pg_create.sql) allows access to the functions by any user of the database.

2.1 String Data Handling

Daycart supports TEXT string datatypes for all string arguments. Postgresql will transparently convert VARCHAR and CHAR types into TEXT to execute Daycart functionality.

2.2 General Purpose Functions

Thg general purpose functions are typically used for debugging or setting session-level options.

geterrors

function geterrors (INTEGER level) => TEXT

Returns error strings from the error queue for previously failed functions or operators based on the level requested and clears the error queue. By default only message at the level of Errors or above are sent to the screen when a function fails.

    0 - All messages
    1 - Notes
    2 - Warnings
    3 - Errors
    4 - Fatal errors

getlog

function getlog => TEXT

Returns error strings from the log messages and clears the local buffer of log messages. Behavior is controlled by the session-level option 'log'.

In order to get log messages from the getlog() function the 'log' option must be set to either 'LOCAL' or 'BOTH'. From that point log messages will be kept and can be retrieved with the getlog() function. By default, the 'log' option is set to 'CENTRAL' and all log messages go to the central logfile, which by default is in the databases pg_log directory.

testlicense

function testlicense (TEXT product) => INTEGER

Checks the license. The license is contained in a special table is created at install time. Currently the only recognized value for product is 'daycart'. Returns 1 if the Daylight cartridge has a valid license and 0 if not.

getinfo

function getinfo (TEXT which) => TEXT

Returns informational strings from DayCart. Valid input parameters are listed below. In addition, getinfo can be used to find the value for any dayconvert option. See Section 2.3 Molecule / Reaction Functions for detailed descriptions of the dayconvert options.

'toolkit_version'

Returns the current Daylight Toolkit version.

'daycart_version'

Returns the cartridge executable version.

'extproc_pid'

Returns the extproc process ID for this DayCart session.

'hostid'

Returns the hardware hostid used for generation of the license key.

'debug_level'

Returns the debug level set for logging messages.

'session_tag'

Returns the session tag set for a given session. The session tag, if set, is included in any log messages and can be used to identify the session which generated a log entry.

'log'

Returns where the session log info is written. Choices are NONE, LOCAL, CENTRAL, and BOTH. CENTRAL is the default, and refers to the value of DAYCART_LOGFILE (in the listener.ora) or a default of /tmp/extproc.log. LOCAL logging indicates that any log messages are stored for retrieval by the ddpackage.fgetlog() function.

'default_delimiter'

Returns the default delimiter used to separate lines of multi-line output.

'force_delimiter'

Returns TRUE or FALSE. When TRUE, the default delimiter is always used for multi-line output. When FALSE, Daycart may attempt to detect the delimiter to use (from other input).

setinfo

function setinfo (TEXT name_value_pair) => NUMBER

Allows the user to individually set session-level options in DayCart for the following parameters and all dayconvert options (see Section 2.3 Molecule / Reaction Functions).

'debug_level={level}'

'session_tag={value}'

'log={place}'

'default_delimiter={string}'

'force_delimiter={TRUE|FALSE}'

'options={class}'

The function setinfo can be used to globally set options on a session-level basis. An OPTIONS table which can be populated with user-defined sets of parameters is automatically created during installation. Executing setinfo using 'options={class}' will reset all options to their default values and then will set the values associated with the given class. On session startup, the options with class of zero are set.

Name                                      Type
----------------------------------------- ----------------------------
NAME                                      VARCHAR(100)
VALUE                                     VARCHAR(100)
CLASS                                     INTEGER

NAME = parameter name as listed above or any of the dayconvert options
VALUE = new value
CLASS = set number for options.

2.3 Molecule / Reaction Functions

Functions and their respective operations relating to conversion, transformation. property values and normalization and of molecules and reactions are described below.

smi2cansmi

function smi2cansmi (TEXT smiles, INTEGER type) => TEXT

Returns a canonical SMILES string from an input SMILES. Type is either 0 or 1, for unique or absolute SMILES, respectively.

smi2xsmi

function smi2xsmi (TEXT smiles, INTEGER type, INTEGER explicit) => TEXT

Returns an exchange SMILES string which is semantically identical to the input SMILES but which does not use Daylight-specific aromaticity conventions. Type is either 0 or 1, for unique or absolute SMILES, respectively. When 'explicit' is 1, it supplies hydrogen count and other atomic properties explicitly for every atom. Note that exchange SMILES are not canonical; the same input molecule or reaction may return different exchange SMILES depending on the input order to this function.

smi2netch

function smi2netch (TEXT smiles) => INTEGER

Returns the net charge of the input molecule or reaction.

smi2hcount

function smi2hcount (TEXT smiles) => INTEGER

Returns the total hydrogen count for the input molecule or reaction.

smi2mf

function smi2mf (TEXT smiles) => TEXT

Returns the molecular formula string for the input molecule or reaction.

fsmi2amw

function smi2amw (TEXT smiles) => DOUBLE PRECISION

Returns the average molecular weight for the input molecule or reaction. The weight used for any atoms which do not have specified isotopes is the average atomic weight. The weight used for atoms with a specified isotope is the high precision molecular weight for that atom. For example, "c1ccccc1" returns 78.1184, while "[1H][12c]1[12c]([1H])[12c]([1H])[12c]([1H])[12c]([1H])[12c]1[1H]" returns 78.0469502.

smi2pmw

function smi2pmw (TEXT smiles) => DOUBLE PRECISION

Returns the high-precision molecular weight for the input molecule or reaction. The weight used for any atoms which do not have specified isotopes is the high precision weight for the most abundant isotope. The weight used for atoms with a specified isotope is the high precision molecular weight for that atomic isotope. For example, "c1ccccc1" returns 78.0469502, the high precision molecular weight. "BrBr" returns 157.836675, while "[81Br][81Br]" returns 161.832582.

Both smi2amw() and smi2pmw() return the same values for SMILES with fully specified isotopes; they only differ in their handling of SMILES with unspecified isotopic weights. Note also that the unique canonical SMILES (eg. smi2cansmi('smiles', 0)) returns the structure with all isotopic information removed; this is useful for consistent handling of partially specified isotopic information in conjunction with the two functions.

smi2graph

function smi2graph (TEXT smiles) => TEXT

Returns the hydrogen- and charge-suppressed canonical graph string for the input molecule or reaction.

vcs_desalt

function vcs_desalt(TEXT smiles, INTEGER type, INTEGER class) => TEXT

Removes molecule fragments found in the salts table from the input SMILES. Type is either 0 or 1, for unique or absolute SMILES, respectively. The class value is the class of salts entries used from the salts table. All of the structures in the salts table with the given class are checked against the input SMILES and if found, they are removed.

vcs_normalize

function vcs_normalize(TEXT smiles, INTEGER type, INTEGER class) => TEXT

Performs a SMIRKS-based structure normalization on the input SMILES. All of the SMIRKS from the transform table with the given class are applied to the input molecule. The resulting molecule is output as a canonical SMILES. Type is either 0 or 1 for unique or absolute SMILES, respectively.

gen_molecules
gen_reactions

function gen_molecules(TEXT smiles, TEXT smirks,
	INTEGER direction, INTEGER limit, INTEGER type) => TEXT
function gen_reactions(TEXT smiles, TEXT smirks,
	INTEGER direction, INTEGER limit, INTEGER type) => TEXT

Applies the transform created from the given SMIRKS to the input molecules. The argument direction indicates that the transform is applied in the forward (0) or reverse (1) direction. The limit parameter is the maximum count of specific molecules to return and 0 indicates no limit. The resulting molecules or reactions are output as a set of newline-delimited canonical SMILES. The gen_molecules() function and operator returns the molecules which result from a transformation; the gen_reactions() function and operator return the complete reaction. The type indicates either 0 or 1 for unique or absolute SMILES, respectively.

The results are returned as a single string. If more than one molecule or reaction is formed, then each is on a separate line of the output, delimited by the default_delimiter string (see info().

dayconvert

function dayconvert (TEXT data, TEXT ifmt, TEXT ofmt) => TEXT
function dayconvert (TEXT data, TEXT ifmt, TEXT ofmt,
    INTEGER ptable_class) => TEXT

Returns the input chemical information in a different format. See the Daylight Conversion Manual for additional information. Note: The 'ptable_class' parameters are optional as described below.

The 'ifmt' and 'ofmt' parameters are used to designate the input and output formats, respectively. Both parameters need to be identified by particular letter sequence. Valid combinations of input and output formats for conversion are as follows where tdtsma and tdtsmrk are tdt versions with smarts and smirks, respectively:

	smi ---> mol/sdf/rdf
	tdt ---> mol/sdf/rdf
	mol or sdf ---> smi/ism/sma/tdt/tdtsma
	rdf ---> smi/ism/sma/smrk/tdt/tdtsma/tdtsmrk

Either mol or sdf can be used for rgfile input.

Delimiters for interpreting multi-line input data are detected from the input stream. Input delimiters may be the strings "\n", "\r", or "\r\n". On output, the delimiter chosen for multi-line output depends on the input delimiter detected and the settings of "force_delimiter" and "default_delimiter". See the description of these items in the documentation for the info(). function.

The ptable_class parameter is optional. If the 'ptable_class' parameter is provided, it indicates that valence and charge information in the user-defined PTABLE table information is to be used instead of that provided in the default ptable. The specific information to be used is based upon the class number supplied.

Name                                      Type
----------------------------------------- ----------------------------
AT_NO                                     INTEGER
SYMBOL                                    VARCHAR(8)
AT_MASS                                   INTEGER
VALENCE_CHARGE_LIST                       VARCHAR(4000)
CLASS                                     INTEGER

AT_NO = atomic number
SYMBOL = atomic symbol 	 	
AT_MASS = atomic mass
VALENCE_CHARGE_LIST = list of valence and charge, e.g., '2,-1,3,0'  
	for valence 2 with -1 change or valence 3 with 0 charge

The following are a series of dayconvert related options. The value for any of these options can be found using the finfo() function. Values for these options can be changed on a per session basis for an individual option or a group of options using the fsetinfo() function as described in Section 2.2 General Purpose Functions.

'conv_smi_is_ism={TRUE|FALSE}'

FALSE

TRUE

'conv_add_3d={TRUE|FALSE}'

TRUE

'conv_add_2d={TRUE|FALSE}'

TRUE

'conv_use_3d={TRUE|FALSE}'

FALSE

TRUE

'conv_split_fields={TRUE|FALSE}'

FALSE

TRUE

'conv_id_field={id_field}'

Sets the id field to be used. The default behavior for molecules being converted to tdt format is to use the first line of each header block as the id field. The default behavior for reactions is to use the value following $RIREG as the id.

'conv_prefix={prefix}'

Sets the designated prefix to be parsed from the datatype names for conversion of rdf files. The default is none.

'conv_implicit_chirality={TRUE|FALSE}'

FALSE

TRUE

'conv_ring_cistrans={TRUE|FALSE}'

FALSE

TRUE

'conv_db_explicit_h={TRUE|FALSE}'

FALSE

TRUE

'conv_chi_explicit_h={TRUE|FALSE}'

FALSE

TRUE

'conv_fix_radical_rings={TRUE|FALSE}'

TRUE

FALSE

'conv_nametag={tag}'

Sets the tdt datatype tag used for the id. Default is $NAM.

'conv_comment_smi={TRUE|FALSE}'

FALSE

TRUE

'conv_smi_tuples={TRUE|FALSE}'

TRUE

FALSE

'conv_day_hcount={TRUE|FALSE}'

TRUE

'conv_day_stereo={TRUE|FALSE}'

TRUE

FALSE

'conv_day_chih={TRUE|FALSE}'

TRUE

FALSE

atomnorm
bondnorm
partnorm

function atomnorm (TEXT smiles, TEXT list, INTEGER ntuple,
    INTEGER isotype) => TEXT
function bondnorm (TEXT smiles, TEXT list, INTEGER ntuple,
    INTEGER isotype) => TEXT
function partnorm (TEXT smiles, TEXT list, INTEGER ntuple,
    INTEGER isotype) => TEXT

Returns a potentially reordered N-tuple string for the given list input parameter. The list string is a comma-separated list of data which is associated in order with the atoms, bonds or parts of the input SMILES. The list string is reordered based on the canonical atom, bond, or part ordering of the input SMILES. ntuple is the number of comma-separated values per atom, bond, or dot-separated part, and isotype is 0 for unique SMILES canonicalization and 1 for absolute SMILES canonicalization.

2.4 Fingerprint Functions

This section describes functions and their respective operations relating to generation of and information concerning fingerprints.

smi2fp

function smi2fp (TEXT smiles, INTEGER min, INTEGER max,
    NUMBER nbits) => TEXT

Returns the ASCII fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and nbits is the number of bits in the fingerprint.

smi2xfp

function smi2xfp (TEXT smiles, INTEGER min, INTEGER max,
    NUMBER nbits) => TEXT

Returns the ASCII difference fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and nbits is the number of bits in the fingerprint.

smi2screen

function smi2screen (TEXT smiles) => TEXT

Returns an ASCII string which contains screening data for a given molecule or reaction. The screen data can be used by the contains() and matches() functions for additional pre-screening.

foldfp

function foldfp (TEXT fpstr, INTEGER nbits, DOUBLE PRECISION dens) => TEXT

Folds the given fingerprint to the minimum appropriate size or density, whichever is limiting, and returns the new, folded fingerprint.

bitcount

function bitcount (TEXT fpstr) => INTEGER

Returns the number of bits on in the fingerprint.

nbits

function nbits (TEXT fpstr) => INTEGER

Returns the total size of the fingerprint, in bits. In the current Daylight toolkit this will always be a power of two.

isfp

function isfp (TEXT fpstr) => INTEGER

Returns 1 if the string is a fingerprint, 0 otherwise. The syntax of a fingerprint can never be confused with a valid SMILES. This is the bit of cleverness which allows us to overload the searching functions.

2.5 Comparison Functions

Functions and their respective operations relating to a variety of direct comparisons between input smiles, smarts, or fingerprints are described below.

graph

function graph (TEXT smiles1, TEXT smiles2) => INTEGER

Returns 1 if the two input SMILES share the same canonical graph, 0 otherwise.

tautomer

function tautomer (TEXT smiles1, TEXT smiles2) => INTEGER

Returns 1 if the two input SMILES share the same canonical graph, net charge, and total hydrogen count, 0 otherwise.

usmiles

function usmiles (TEXT smiles1, TEXT smiles2) => INTEGER

Returns 1 if the two input SMILES share the same unique canonical smiles, 0 otherwise.

asmiles

function asmiles (TEXT smiles1, TEXT smiles2) => INTEGER

Returns 1 if the two input SMILES share the same absolute canonical smiles, 0 otherwise.

component

function component (TEXT smiles1, TEXT smiles2) => INTEGER

Returns 1 if smiles2 (a molecule SMILES) is a component of smiles1 (any SMILES), otherwise returns 0. Also returns false if smiles2 is not a single-component query. The last example in the table below illustrates this problem. A component is a single dot-separated part of a larger molecule or reaction SMILES. Some examples follow:

smiles1	smiles2	Returns
CCC	CCC	1
CCC.CCCN	CCC	1
CCC.CCCN	CCCN	1
CCC>>CCCN	CCC	1
CCC>>CCCN	CCCN	1
CCC.CCCN	CCC.CCCN	0

reactant
agent
product

function reactant (TEXT smiles1, TEXT smiles2) => INTEGER
function agent (TEXT smiles1, TEXT smiles2) => INTEGER
function product (TEXT smiles1, TEXT smiles2) => INTEGER

Returns 1 if smiles2 (a molecule SMILES) is a component of smiles1 (a reaction SMILES) with the appropriate role, otherwise returns 0.

contains

function contains (TEXT smiles1, TEXT smiles2) => INTEGER
function contains (TEXT smiles1, TEXT smiles1_screen,
    TEXT smiles2) => INTEGER

Returns 1 if smiles1 contains smiles2; that is, smiles2, assuming opened valences for all hydrogens, is a substructure of smiles1.

The additional smiles1_screen argument can be used to pass in pre-computed screening data. The data should be computed and stored in advance of a search using the smi2screen() function. This data is used for fast searching of large tables as it allows Postgresql to perform some screening operations which speed up searching.

isin

function isin (TEXT smiles1, TEXT smiles2) => INTEGER

Returns 1 if smiles2 contains smiles1; that is, smiles1, assuming opened valences for all hydrogens, is a substructure of smiles2. This functionality is identical to 'contains()' with the arguments swapped.

matches

function matches (TEXT smiles, TEXT smarts) => INTEGER
function matches (TEXT smiles, TEXT smiles_screen, TEXT smarts) => INTEGER

Returns 1 if the smarts expression matches the given SMILES, 0 otherwise.

matchcover

function matchcover (TEXT smiles, TEXT smarts) => DOUBLE PRECISION

Returns the ratio of atoms in the target SMILES matched by the given query to the number of atoms in the target as a number between zero and one.

euclid

function euclid (TEXT fp_or_smi1, TEXT fp_or_smi2) => DOUBLE PRECISION

Returns the euclidean distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0.

tanimoto

function tanimoto (TEXT fp_or_smi1, TEXT fp_or_smi2) => DOUBLE PRECISION

Returns the tanimoto distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0.

tversky

function tversky (TEXT fp_or_smi1, TEXT fp_or_smi2,
    DOUBLE PRECISION alpha, DOUBLE PRECISION beta) => DOUBLE PRECISION

Returns the tversky distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0.

fingertest

function fingertest (TEXT fp_or_smi1, TEXT fp_or_smi2) => DOUBLE PRECISION

Returns 1 if all of the bits in fp_or_smi2 are also present in fp_or_smi1. That is, the fingerprint from fp_or_smi2 represents a possible substructure of fp_or_smi1. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0.

similarity

function similarity (TEXT fp_or_smi1, TEXT fp_or_smi2,
    TEXT expression) => DOUBLE PRECISION

Returns the similarity or distance between two fingerprints or SMILES, based on the computed expression. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number.

The value of expression can be any legal expression based on the counts of bits in the corresponding fingerprints (see the man page for expression(5)). Optionally, the expression string can be preceded with either "distance=" or "similarity=", which are ignored. The following examples are all identical:

  similarity(smi1, smi2, 'TANIMOTO')
  similarity(smi1, smi2, 'tanimoto')
  similarity(smi1, smi2, 'similarity=tanimoto')
  similarity(smi1, smi2, 'c/(a+b+c)')
  similarity(smi1, smi2, 'similarity=c/(a+b+c)')