2.2 General Purpose Functions 2.3 Molecule / Reaction Functions 2.4 Fingerprint Functions 2.5 Comparison Functions
2. SQL FunctionsA number of user-defined SQL functions have been implemented for the cartridge. All of the functions are defined in the current database during the installation process.The default installation (pg_create.sql) allows access to the functions by any user of the database. 2.1 String Data HandlingDaycart supports TEXT string datatypes for all string arguments. Postgresql will transparently convert VARCHAR and CHAR types into TEXT to execute Daycart functionality. 2.2 General Purpose FunctionsThg general purpose functions are typically used for debugging or setting session-level options. geterrorsfunction geterrors (INTEGER level) => TEXT Returns error strings from the error queue for previously failed functions or operators based on the level requested and clears the error queue. By default only message at the level of Errors or above are sent to the screen when a function fails. 0 - All messages 1 - Notes 2 - Warnings 3 - Errors 4 - Fatal errorsgetlog function getlog => TEXT Returns error strings from the log messages and clears the local buffer of log messages. Behavior is controlled by the session-level option 'log'. In order to get log messages from the getlog() function the 'log' option must be set to either 'LOCAL' or 'BOTH'. From that point log messages will be kept and can be retrieved with the getlog() function. By default, the 'log' option is set to 'CENTRAL' and all log messages go to the central logfile, which by default is in the databases pg_log directory. testlicensefunction testlicense (TEXT product) => INTEGER Checks the license. The license is contained in a special table is created at install time. Currently the only recognized value for product is 'daycart'. Returns 1 if the Daylight cartridge has a valid license and 0 if not. getinfofunction getinfo (TEXT which) => TEXT Returns informational strings from DayCart. Valid input parameters are listed below. In addition, getinfo can be used to find the value for any dayconvert option. See Section 2.3 Molecule / Reaction Functions for detailed descriptions of the dayconvert options.
'daycart_version'
'extproc_pid'
'hostid'
'debug_level'
'session_tag'
'log'
'default_delimiter'
'force_delimiter'
function setinfo (TEXT name_value_pair) => NUMBER Allows the user to individually set session-level options in DayCart for the following parameters and all dayconvert options (see Section 2.3 Molecule / Reaction Functions).
'session_tag={value}' any string, truncated to 32 chars 'log={place}' valid values are NONE, LOCAL, CENTRAL, BOTH 'default_delimiter={string}' 'force_delimiter={TRUE|FALSE}' 'options={class}' valid values correspond to options table The function setinfo can be used to globally set options on a session-level basis. An OPTIONS table which can be populated with user-defined sets of parameters is automatically created during installation. Executing setinfo using 'options={class}' will reset all options to their default values and then will set the values associated with the given class. On session startup, the options with class of zero are set. Name Type ----------------------------------------- ---------------------------- NAME VARCHAR(100) VALUE VARCHAR(100) CLASS INTEGER NAME = parameter name as listed above or any of the dayconvert options VALUE = new value CLASS = set number for options.
2.3 Molecule / Reaction FunctionsFunctions and their respective operations relating to conversion, transformation. property values and normalization and of molecules and reactions are described below. smi2cansmifunction smi2cansmi (TEXT smiles, INTEGER type) => TEXT Returns a canonical SMILES string from an input SMILES. Type is either 0 or 1, for unique or absolute SMILES, respectively. smi2xsmifunction smi2xsmi (TEXT smiles, INTEGER type, INTEGER explicit) => TEXT Returns an exchange SMILES string which is semantically identical to the input SMILES but which does not use Daylight-specific aromaticity conventions. Type is either 0 or 1, for unique or absolute SMILES, respectively. When 'explicit' is 1, it supplies hydrogen count and other atomic properties explicitly for every atom. Note that exchange SMILES are not canonical; the same input molecule or reaction may return different exchange SMILES depending on the input order to this function. smi2netchfunction smi2netch (TEXT smiles) => INTEGER Returns the net charge of the input molecule or reaction. smi2hcountfunction smi2hcount (TEXT smiles) => INTEGER Returns the total hydrogen count for the input molecule or reaction. smi2mffunction smi2mf (TEXT smiles) => TEXT Returns the molecular formula string for the input molecule or reaction. fsmi2amwfunction smi2amw (TEXT smiles) => DOUBLE PRECISION Returns the average molecular weight for the input molecule or reaction. The weight used for any atoms which do not have specified isotopes is the average atomic weight. The weight used for atoms with a specified isotope is the high precision molecular weight for that atom. For example, "c1ccccc1" returns 78.1184, while "[1H][12c]1[12c]([1H])[12c]([1H])[12c]([1H])[12c]([1H])[12c]1[1H]" returns 78.0469502. smi2pmwfunction smi2pmw (TEXT smiles) => DOUBLE PRECISION Returns the high-precision molecular weight for the input molecule or reaction. The weight used for any atoms which do not have specified isotopes is the high precision weight for the most abundant isotope. The weight used for atoms with a specified isotope is the high precision molecular weight for that atomic isotope. For example, "c1ccccc1" returns 78.0469502, the high precision molecular weight. "BrBr" returns 157.836675, while "[81Br][81Br]" returns 161.832582. Both smi2amw() and smi2pmw() return the same values for SMILES with fully specified isotopes; they only differ in their handling of SMILES with unspecified isotopic weights. Note also that the unique canonical SMILES (eg. smi2cansmi('smiles', 0)) returns the structure with all isotopic information removed; this is useful for consistent handling of partially specified isotopic information in conjunction with the two functions. function smi2graph (TEXT smiles) => TEXT Returns the hydrogen- and charge-suppressed canonical graph string for the input molecule or reaction. vcs_desaltfunction vcs_desalt(TEXT smiles, INTEGER type, INTEGER class) => TEXT Removes molecule fragments found in the salts table from the input SMILES. Type is either 0 or 1, for unique or absolute SMILES, respectively. The class value is the class of salts entries used from the salts table. All of the structures in the salts table with the given class are checked against the input SMILES and if found, they are removed. vcs_normalizefunction vcs_normalize(TEXT smiles, INTEGER type, INTEGER class) => TEXT Performs a SMIRKS-based structure normalization on the input SMILES. All of the SMIRKS from the transform table with the given class are applied to the input molecule. The resulting molecule is output as a canonical SMILES. Type is either 0 or 1 for unique or absolute SMILES, respectively. gen_moleculesgen_reactions function gen_molecules(TEXT smiles, TEXT smirks, INTEGER direction, INTEGER limit, INTEGER type) => TEXT function gen_reactions(TEXT smiles, TEXT smirks, INTEGER direction, INTEGER limit, INTEGER type) => TEXT Applies the transform created from the given SMIRKS to the input molecules. The argument direction indicates that the transform is applied in the forward (0) or reverse (1) direction. The limit parameter is the maximum count of specific molecules to return and 0 indicates no limit. The resulting molecules or reactions are output as a set of newline-delimited canonical SMILES. The gen_molecules() function and operator returns the molecules which result from a transformation; the gen_reactions() function and operator return the complete reaction. The type indicates either 0 or 1 for unique or absolute SMILES, respectively. The results are returned as a single string. If more than one molecule or reaction is formed, then each is on a separate line of the output, delimited by the default_delimiter string (see info(). dayconvertfunction dayconvert (TEXT data, TEXT ifmt, TEXT ofmt) => TEXT function dayconvert (TEXT data, TEXT ifmt, TEXT ofmt, INTEGER ptable_class) => TEXT Returns the input chemical information in a different format. See the Daylight Conversion Manual for additional information. Note: The 'ptable_class' parameters are optional as described below. The 'ifmt' and 'ofmt' parameters are used to designate the input and output formats, respectively. Both parameters need to be identified by particular letter sequence. Valid combinations of input and output formats for conversion are as follows where tdtsma and tdtsmrk are tdt versions with smarts and smirks, respectively: smi ---> mol/sdf/rdf tdt ---> mol/sdf/rdf mol or sdf ---> smi/ism/sma/tdt/tdtsma rdf ---> smi/ism/sma/smrk/tdt/tdtsma/tdtsmrk Either mol or sdf can be used for rgfile input. Delimiters for interpreting multi-line input data are detected from the input stream. Input delimiters may be the strings "\n", "\r", or "\r\n". On output, the delimiter chosen for multi-line output depends on the input delimiter detected and the settings of "force_delimiter" and "default_delimiter". See the description of these items in the documentation for the info(). function. The ptable_class parameter is optional. If the 'ptable_class' parameter is provided, it indicates that valence and charge information in the user-defined PTABLE table information is to be used instead of that provided in the default ptable. The specific information to be used is based upon the class number supplied. Name Type ----------------------------------------- ---------------------------- AT_NO INTEGER SYMBOL VARCHAR(8) AT_MASS INTEGER VALENCE_CHARGE_LIST VARCHAR(4000) CLASS INTEGER AT_NO = atomic number SYMBOL = atomic symbol AT_MASS = atomic mass VALENCE_CHARGE_LIST = list of valence and charge, e.g., '2,-1,3,0' for valence 2 with -1 change or valence 3 with 0 chargeThe following are a series of dayconvert related options. The value for any of these options can be found using the finfo() function. Values for these options can be changed on a per session basis for an individual option or a group of options using the fsetinfo() function as described in Section 2.2 General Purpose Functions.
'conv_add_3d={TRUE|FALSE}'
'conv_add_2d={TRUE|FALSE}'
'conv_use_3d={TRUE|FALSE}'
'conv_split_fields={TRUE|FALSE}'
'conv_id_field={id_field}'
'conv_prefix={prefix}'
'conv_implicit_chirality={TRUE|FALSE}'
'conv_ring_cistrans={TRUE|FALSE}'
'conv_db_explicit_h={TRUE|FALSE}'
'conv_chi_explicit_h={TRUE|FALSE}'
'conv_fix_radical_rings={TRUE|FALSE}'
'conv_nametag={tag}'
'conv_comment_smi={TRUE|FALSE}'
'conv_smi_tuples={TRUE|FALSE}'
'conv_day_hcount={TRUE|FALSE}'
'conv_day_stereo={TRUE|FALSE}'
'conv_day_chih={TRUE|FALSE}'
function atomnorm (TEXT smiles, TEXT list, INTEGER ntuple, INTEGER isotype) => TEXT function bondnorm (TEXT smiles, TEXT list, INTEGER ntuple, INTEGER isotype) => TEXT function partnorm (TEXT smiles, TEXT list, INTEGER ntuple, INTEGER isotype) => TEXT Returns a potentially reordered N-tuple string for the given list input parameter. The list string is a comma-separated list of data which is associated in order with the atoms, bonds or parts of the input SMILES. The list string is reordered based on the canonical atom, bond, or part ordering of the input SMILES. ntuple is the number of comma-separated values per atom, bond, or dot-separated part, and isotype is 0 for unique SMILES canonicalization and 1 for absolute SMILES canonicalization.
2.4 Fingerprint FunctionsThis section describes functions and their respective operations relating to generation of and information concerning fingerprints. smi2fpfunction smi2fp (TEXT smiles, INTEGER min, INTEGER max, NUMBER nbits) => TEXT Returns the ASCII fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and nbits is the number of bits in the fingerprint. smi2xfpfunction smi2xfp (TEXT smiles, INTEGER min, INTEGER max, NUMBER nbits) => TEXT Returns the ASCII difference fingerprint for a given molecule or reaction. Min and max are the minimum and maximum pathlengths, respectively, and nbits is the number of bits in the fingerprint. smi2screenfunction smi2screen (TEXT smiles) => TEXT Returns an ASCII string which contains screening data for a given molecule or reaction. The screen data can be used by the contains() and matches() functions for additional pre-screening. foldfpfunction foldfp (TEXT fpstr, INTEGER nbits, DOUBLE PRECISION dens) => TEXT Folds the given fingerprint to the minimum appropriate size or density, whichever is limiting, and returns the new, folded fingerprint. bitcountfunction bitcount (TEXT fpstr) => INTEGER Returns the number of bits on in the fingerprint. nbitsfunction nbits (TEXT fpstr) => INTEGER Returns the total size of the fingerprint, in bits. In the current Daylight toolkit this will always be a power of two. isfpfunction isfp (TEXT fpstr) => INTEGER Returns 1 if the string is a fingerprint, 0 otherwise. The syntax of a fingerprint can never be confused with a valid SMILES. This is the bit of cleverness which allows us to overload the searching functions. 2.5 Comparison FunctionsFunctions and their respective operations relating to a variety of direct comparisons between input smiles, smarts, or fingerprints are described below. graphfunction graph (TEXT smiles1, TEXT smiles2) => INTEGER Returns 1 if the two input SMILES share the same canonical graph, 0 otherwise. tautomerfunction tautomer (TEXT smiles1, TEXT smiles2) => INTEGER Returns 1 if the two input SMILES share the same canonical graph, net charge, and total hydrogen count, 0 otherwise. usmilesfunction usmiles (TEXT smiles1, TEXT smiles2) => INTEGER Returns 1 if the two input SMILES share the same unique canonical smiles, 0 otherwise. asmilesfunction asmiles (TEXT smiles1, TEXT smiles2) => INTEGER Returns 1 if the two input SMILES share the same absolute canonical smiles, 0 otherwise. componentfunction component (TEXT smiles1, TEXT smiles2) => INTEGER Returns 1 if smiles2 (a molecule SMILES) is a component of smiles1 (any SMILES), otherwise returns 0. Also returns false if smiles2 is not a single-component query. The last example in the table below illustrates this problem. A component is a single dot-separated part of a larger molecule or reaction SMILES. Some examples follow:
reactant agent product function reactant (TEXT smiles1, TEXT smiles2) => INTEGER function agent (TEXT smiles1, TEXT smiles2) => INTEGER function product (TEXT smiles1, TEXT smiles2) => INTEGER Returns 1 if smiles2 (a molecule SMILES) is a component of smiles1 (a reaction SMILES) with the appropriate role, otherwise returns 0. containsfunction contains (TEXT smiles1, TEXT smiles2) => INTEGER function contains (TEXT smiles1, TEXT smiles1_screen, TEXT smiles2) => INTEGER Returns 1 if smiles1 contains smiles2; that is, smiles2, assuming opened valences for all hydrogens, is a substructure of smiles1. The additional smiles1_screen argument can be used to pass in pre-computed screening data. The data should be computed and stored in advance of a search using the smi2screen() function. This data is used for fast searching of large tables as it allows Postgresql to perform some screening operations which speed up searching. isinfunction isin (TEXT smiles1, TEXT smiles2) => INTEGER Returns 1 if smiles2 contains smiles1; that is, smiles1, assuming opened valences for all hydrogens, is a substructure of smiles2. This functionality is identical to 'contains()' with the arguments swapped. matchesfunction matches (TEXT smiles, TEXT smarts) => INTEGER function matches (TEXT smiles, TEXT smiles_screen, TEXT smarts) => INTEGER Returns 1 if the smarts expression matches the given SMILES, 0 otherwise. The additional smiles1_screen argument can be used to pass in pre-computed screening data. The data should be computed and stored in advance of a search using the smi2screen() function. This data is used for fast searching of large tables as it allows Postgresql to perform some screening operations which speed up searching. function matchcover (TEXT smiles, TEXT smarts) => DOUBLE PRECISION Returns the ratio of atoms in the target SMILES matched by the given query to the number of atoms in the target as a number between zero and one. function euclid (TEXT fp_or_smi1, TEXT fp_or_smi2) => DOUBLE PRECISION Returns the euclidean distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. tanimotofunction tanimoto (TEXT fp_or_smi1, TEXT fp_or_smi2) => DOUBLE PRECISION Returns the tanimoto distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. tverskyfunction tversky (TEXT fp_or_smi1, TEXT fp_or_smi2, DOUBLE PRECISION alpha, DOUBLE PRECISION beta) => DOUBLE PRECISION Returns the tversky distance between two fingerprints or SMILES. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. fingertestfunction fingertest (TEXT fp_or_smi1, TEXT fp_or_smi2) => DOUBLE PRECISION Returns 1 if all of the bits in fp_or_smi2 are also present in fp_or_smi1. That is, the fingerprint from fp_or_smi2 represents a possible substructure of fp_or_smi1. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number between 0.0 and 1.0. similarityfunction similarity (TEXT fp_or_smi1, TEXT fp_or_smi2, TEXT expression) => DOUBLE PRECISION Returns the similarity or distance between two fingerprints or SMILES, based on the computed expression. If both parameters are fingerprints and are not the same size (nbits()), then the larger will be folded automatically to match the size of the smaller before comparison. If one parameter is a SMILES, its fingerprint is generated automatically at the size of the other parameter. If both parameters are SMILES, then a fingerprint size of 512 bits is used. The returned value is a floating point number. The value of expression can be any legal expression based on the counts of bits in the corresponding fingerprints (see the man page for expression(5)). Optionally, the expression string can be preceded with either "distance=" or "similarity=", which are ignored. The following examples are all identical: similarity(smi1, smi2, 'TANIMOTO') similarity(smi1, smi2, 'tanimoto') similarity(smi1, smi2, 'similarity=tanimoto') similarity(smi1, smi2, 'c/(a+b+c)') similarity(smi1, smi2, 'similarity=c/(a+b+c)') |