Daylight Fingerprints-

The fingerprint program is included in the Database Package ,Cluster Package and there is a Fingerprint Toolkit.

Fingerprints were devised to enable high-speed structural screening.
 

Daylight MolecularFingerprints contain:

Example:
        the molecule OC=CN would generate the following patterns:

 

0-bond paths: C O N
1-bond paths: OC C=C CN
2-bond paths: OC=C C=CN
3-bond paths: OC=CN
 

Reaction Fingerprints

Example

Sn2 displacement reaction:

[I-].[Na+].C=CCBr>>[Na+].[Br-].C=CCI

The paths generated for the molecules would be as follows:

Enumerated Fingerprint Paths: 
Path Length:  Reactant (count/path):  Product (count/path): 
1 I, 1 Na, 3 C, 1 Br  1 I, 1 Na, 3 C, 1 Br 
1 C=C, 1 C-C, 1 C-Br  1 C=C, 1 C-C, 1 C-I 
1 C=C-C, 1 C-C-Br  1 C=C-C, 1 C-C-I 
1 C=C-C-Br  1 C=C-C-I 
 
Difference in Path Counts: 
Path Length: 
Difference (count/path): 
0 I, 0 Na, 0 C, 0 Br 
0 C=C, 0 C-C, 1 C-Br, 1 C-I 
0 C=C-C, 1 C-C-Br, 1 C-C-I 
1 C=C-C-Br, 1 C=C-C-I 
 
After generating the difference in counts, we only use the six paths with non-zero differences to set bits in the difference fingerprint. These are the paths which walk through bonds that change during the reaction. By considering only these paths, we get a fingerprint which reflects the overall bond changes in the reaction.

Mixture Fingerprints-Fingerprint tuples


Fingerprint options:


Comparing Fingerprints-

Three Similarity Metrics

 
Symbol Definition Description
bits(F)   A function that returns the number of "1" bits in a bitmap
BT   The total number of bits (the fingerprint's size); a constant
B1 =  bits(F1) The number of 1's in F1
B2 =  bits(F2) The number of 1's in F2
BC =  bits( F1 AND F2 ) The number of 1's in common between F1 and F2
BI =  bits(F1 XOR (NOT F2)) The number of identical bits (1's and 0's) between F1 and F2
BU1 =  bits(F1 AND (NOT F2)) The number of unique bits (1's) in F1 
BU2 =  bits(F2 AND (NOT F1)) The number of unique bits (1's) in F2 
 
Tanimoto Coefficient-  the number of bits in common divided by the total number of bits that could be in common. Scale, 1.0 identical fingerprints, 0.7 highly similar, 0.5 roughly similar

 

TC = BC / (B1 + B2 - BC)
Euclidian distance-  a measure of the geometric distance between two fingerprints. Scale , 0.0 identical fingerprints, 0.3 highly similar, 0.5 roughly similar
 
 
DE(F1,F2) = (BT - BI) / BT
 

 The distance-as-substructure metric is:
 

DSE(F1,F2) = (B1 - bits(F1 AND F2)) / B1
Tversky Similarity-
For a complete description of Tversky similarity see John Bradshaw's MUG '97 presentation, "Introduction to Tversky similarity measure".
 
TS = BC / ( BU1 +  BU2 + BC)

Example, Setting the weighting of prototype features to 100% and variant features to 100%, i.e. =1, =1, produces a symmetrical similarity metric identical to the Tanimoto metric.

Example, Setting the weighting of prototype and variant features asymmetrically produces a similarity metric in a more-substructural or more-superstructural sense. Setting the weighting of prototype features to 100% (=1) and variant features to 0% (=0) means that only the prototype features are important, i.e., this produces a "superstucture-likeness" metric. In this case, a Tversky similarity value of 1.0 means that all prototype features are represented in the variant, 0.0 that none are.

Example, etting the weights to 0% prototype (=0) / 100% variant (=1) produces a "substucture-likeness" metric, where completely embedded structures have a 1.0 value and "near-substructures" have values near 1.0.
 

 


Daylight Chemical Information Systems Inc.