N.E. Shemetulskis, D. Weininger, C.J. Blankley, J.J. Yang, and C. Humblet, "Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets", Journal of Chemical Information and Computer Sciences, 36(4),1996,862-871.
The flexibilitiy feature is driven by a user defined threshold value, which determines how many of the structures have to contain a common feature for it to be considered part of the common features of the set. If you have a highly structurally diverse collection, then a low threshold value (e.g. 0.5) would be appropriate. This gives the algorithm much freedom to find common features. In the case of the threshold value of 0.5, common features will be found if they exist in at least half of the data set. The commonalities are accumulated into a single representation called a modal fingerprint. It is the modal fingerprint which is used to generate similarity metrics for each structure in the dataset and also for the database searching option.
Each structure in the analyzed dataset can be visualized with the visualization routine, xvstigmata. Each atom in each structure is given a color based on it's stigmata generated atom score. The scores range from zero to one and the colors follow the temperature scale where zero is red, and one is white. The full range mapping is red-orange-yellow-green-blue-white. Atoms which are part of many of the common paths of the set appear in green, blue, or white, those not in any common paths would appear as red. Xvstigmata also displays the similarity metrics, MSIM and MODP. MSIM is the tanimoto similarity between the molecular fingerprint for the structure and the modal fingerprint. MODP also ranges from zero to one and represents the percentage of the bits in common to the modal fingerprint relative to the total bits set in the modal fingerprint. These two values enable one to determine how much of the commonality found in the dataset is also found in the structure, and also how many novel features are contained in the structure. The visualization coupled with these two metrics provides means for quick structural analysis of the input dataset.
The third main feature enables the modal fingerprint to be used as database query. The top hits (determined from MODP and MSIM scores) from the search can be visualized with xvstigmata. This feature is useful for finding structures of potential interest from databases which contain the common features and also some unique features. Such structures may proove interesting for idea generation, further screening, etc.