The bit values in Daylight fingerprints are categorical data.
The binary values of 0 and 1 indicate the absence (0) or possible presence (1) of a particular path.
There is no sense in which any of the 0 values along a particular fingerprint can be equated or related. The fact a snake possesses neither wheels nor legs allows us to say nothing about the relative value of wheels and legs.
The same is true for the 1 values. In Daylight fingerprints this situation is complicated further by the ambiguous nature of the meaning of a single set bit.
It is therefore inappropriate to
As the popular k-means algorithm does both these things, it too is inappropriate as an algorithm to cluster objects described by Daylight fingerprints.
k-modes clustering ( Chaturvedi, A., Green, P.E. et al (2001) Journal of Classification 18, 35-55 ) gets around these problems by
We have implemented the algorithm as decribed in Huang, Z. ( 1998 ) Data Mining and Knowledge Discovery 2, 283-304
Daylight Chemical Information Systems, Inc. support@daylight.com |
John Bradshaw. |