10. Fingerprint Toolkit
Back to Table of Contents
10.1 Introduction
Fingerprints, their uses and the history of their development in the Daylight
Toolkit (tm) are described in detail in the Daylight Theory Manual,
chapter on Fingerprints.
The Daylight Fingerprint Toolkit provides a set of tools for rapidly
screening very large databases of chemical structures for
substructure searching, and for computing the structural similarity
between molecules.
Those who have seen or used Daylight's Merlin program will
immediately recognize that the Fingerprint Toolkit is part of the
foundation of Merlin. However, it should be noted that Merlin has
many more capabilities than just the functionality available via the
Fingerprint Toolkit; fingerprinting is only a base on which a much
larger set of capabilities is built.
The Fingerprint Toolkit, unlike other Daylight Toolkits, is not
recommended for most programming projects. It is intended for a
few special situations where customers have existing
database-searching capabilities and wish to enhance performance or add
similarity metrics. If you are contemplating building a chemical
information system, we strongly recommend that you consider using
Merlin and THOR rather than starting with the Fingerprint Toolkit.
10.2 Fingerprint Functions
The Daylight Fingerprint Toolkit uses Fingerprint Objects to
represent fingerprints. Fingerprints have the following properties:
Fingerprint Properties
|
bitmap
| the fingerprint itself
|
number of bits
| the number of bits in the fingerprint (its length in bits)
|
orig number of bits
| the number of bits in the original fingerprint (before folding)
|
number of bits set
| the number of 1's in the fingerprint's bitmap
|
orig num. bits set
| the number of 1's in the original fingerprint's bitmap (before folding)
|
version
| the version of the Daylight Toolkit used to create the fingerprint
|
10.2.1 Global Settings
In versions prior to 4.42, there were three global toolkit values
which controlled fingerprint creation size and folding. These are no
longer needed.
10.2.2 Creating Fingerprints
There are two functions to create fingerprints. You can allocate a
"blank" fingerprint, then fill it in later with data from an
external source (see
Fingerprint Bit Operations, below),
or you can create a fingerprint directly from a molecule.
-
dt_fp_allocfp(integer size) => Handle fingerprint
-
Allocate an empty fingerprint. The fingerprint's size will be the given
"size" value. It is an error to apply
any function that returns a property of the fingerprint unless that
property has been explicitly set.
-
dt_fp_generatefp(Handle ob,
integer minstep, integer maxstep, integer size) => Handle fingerprint
-
Allocate a fingerprint object of the given size; fill its fields with a
fingerprint generated from the object ob, then set the objects "original
size", "original bits set", "size" and "bits
set" properties (see dt_fp_obitcount(), dt_fp_obits(), dt_fp_bitcount() and dt_fp_nbits()).
The object ob can be any object for which dt_stream(ob, TYP_ATOM) and
dt_stream(ob, TYP_BOND) will return a stream of atoms and bonds,
respectively. Typically ob is a molecule object, but one can
fingerprint various substructures using path, pathset, substructure,
cycle, atom, or bond objects. For example, one can produce a "ring-
system fingerprint" using a substructure object that contains all of
the atoms and bonds in all cycles of a molecule.
The parameters "minstep" and "maxstep" control the
fingerprint generation. "minstep" sets the minimum-length path to
be included in the fingerprint; "maxstep" sets the maximum-length
path included.
-
dt_fp_partfp(Handle part,
Handle ob, integer minstep, integer maxstep, integer size) => Handle
fingerprint
-
Like dt_fp_generatefp(3), except only sets the fingerprint for paths
which include the object 'part', which may be an atom or bond. This
function performs the full path enumeration over 'ob', but only
sets bits in the resulting fingerprint for paths containing 'part'.
This function is a supported version of the function previously
included in the contrib/stigmata directory. Note that the results
using this function will be slightly different, because this
version correctly includes branch and cycle paths containing the
object 'part'. The contributed version only considered the
straight-chain paths containing 'part'.
10.2.3 Properties
-
dt_fp_nbits(Handle fp) => integer nbits
-
Return the fingerprint's size (number of bits in the bitmap).
-
dt_fp_obits(Handle fp) => integer obits
-
Return the fingerprint's original size (before folding). This is the
value of "size" which was provided when the fingerprint was created
(see above).
-
dt_fp_bitcount(Handle fp) => integer bitcount
-
Return the number of 1's (bits set) in the fingerprint's bitmap.
-
dt_fp_obitcount(Handle fp) => integer obitcount
-
Return the original bitcount (before folding).
-
dt_fp_setobitcount(Handle fp, integer obc) => boolean ok
-
Set the original bitcount. This is intended to be used only with
fingerprint objects created via
dt_fp_allocfp() and filled
manually.
-
dt_fp_setobits(Handle fp, integer ob) => boolean ok
-
Set the original number of bits. As with
dt_fp_setobitcount(), only
for use with fingerprint objects created via
dt_fp_allocfp().
10.2.4 Fingerprint Bit Operations
These operations allow the user to manipulate the individual bit-values of the
fingerprint. They are useful for creation of custom fingerprints (eg.
bitscreens, 3-D or spectral fingerprints), combining multiple fingerprints, or
writing specialized comparison functions.
Note that dt_stringvalue()
and dt_setstringvalue()
can be used to get and set the entire binary value of a fingerprint. The
functions described here are useful for manipulating individual or ranges of
bits within a fingerprint.
-
dt_fp_bitvalue(Handle fp,
integer bitno) => integer value
-
Returns the current value of a bit in the fingerprint.
-
dt_fp_setbitvalue(Handle
fp, integer bitno, integer value) => boolean ok
-
Sets the current value of a bit in the fingerprint to "value".
-
dt_fp_range(Handle fp, integer
offset, integer nbits, integer *soffset) => string range
-
Gets a range of bits from a fingerprint. The range is returned as a string of
binary data, starting "soffset" bits from the beginning of the
string. The range of bits requested begins at "offset", for
"nbits" bits, or to the end of the fingerprint.
-
dt_fp_setrange(Handle fp,
integer offset, integer nbits, integer slen, string string,
integer soffset, integer operation) => boolean ok
-
Sets the values of a range of bits in the fingerprint. The range of bits set
are given by "offset" for "nbits" bits. The
"operation" and the given string, starting "soffset" bits
from the beginning of the string, controls how the bits are set. Legal
operations are:
DX_FP_SET
| sets each bit in the range to the source (string) value.
|
DX_FP_NOT
| sets each bit in the range to the inverse of source (string) value.
|
DX_FP_OR
| sets each bit in the range to the logical-"or" of the source
(string) value and current bit value.
|
DX_FP_AND
| sets each bit in the range to the logical-"and" of the source
(string) value and current bit value.
|
DX_FP_XOR
| sets each bit in the range to the logical-"xor" of the source
(string) value and current bit value.
|
10.2.5 Comparisons
-
dt_fp_fingertest(Handle patfp, Handle molfp) => boolean sub
-
Returns TRUE if all of the bits that are set (1) in
patfp
are also set in molfp; that is, returns the value of the logical
expression
patfp == (patfp AND molfp).
In other words, return TRUE if the molecule that generated patfp
could be a substructure of the molecule that generated molfp .
Returns FALSE if any set bit in patfp is not also set in
molfp , or if the two fingerprints are not compatible
(e.g. different sizes), or if either object is not a fingerprint.
-
dt_fp_euclid(Handle fp1, Handle fp2) => float dist
-
Returns the euclidian distance between fp1 and fp2, or -1.0 if an
error is detected (i.e. the fingerprints are not compatible or are
not fingerprint objects).
-
dt_fp_tanimoto(Handle fp1, fp2) => float tan_coeff
-
Returns the Tanamoto coefficient between
fp1 and
fp2 , or -1.0 if an error is detected (i.e. the
fingerprints are not compatible or are not fingerprint objects).
-
dt_fp_foldfp(Handle
fp, integer minsize, float mindensity) => boolean ok
-
Fold the fingerprint. Returns TRUE if no errors are detected. Note that
folding may not actually occur. Folds zero or more times until the
"minsize" or "mindensity" values are reached.
Back to Table of Contents
Go to previous chapter SMARTS Toolkit
Go to next chapter Depict Toolkit.
|