Daylight v4.9
Release Date: 1 February 2008

Name

spherex - calculate sphere exclusion clustering

Unix Synopsis

spherex [options] in.tdt [out.tdt]

Description

spherex(1) performs a sphere exclusion clustering of the input dataset, which must contain fixed-size fingerprints. Its output is designed to be post-processed using the listclusters(1) and showclusters(1) programs. The input file must be a .tdt file (either "list" or "dump" format) containing fingerprint (FP) data. Input is copied to output with a "cluster" (CL) data item inserted after each fingerprint item. In addition, the selected member of each cluster (as opposed to the excluded members) is marked with a tag (CLT). If the name of the output file is not specified, output will be written to standard output.

A "CL generation" ($CLG) datatree is also written to output which includes the run ID (if set via the -id option), program name, version number, and the parameters used.

Options

-FID fpid
Use only fingerprints identified by `fpid' rather than the first one encountered in each tree. This is chiefly useful for testing: in normal use, there is usually only one fingerprint per tree. (-in)
-JP_RUNID runid
Identify this run by `runid' in $CLG and CL output data. (-id)
-RANDOM bool
Controls whether to select the items at random from the input dataset (TRUE) or to select the items in order of ascending fingerprint density (FALSE). The default is FALSE.
-RANDOM_SEED val
When using the -RANDOM option this sets the initial numeric seed for the random number generator. This can be used to repeat runs based on the pseudo-random set of items selected with the -RANDOM option.
-RECORD_COUNT count
Initially allocate memory for `count' structures. Ideally, `count' should be set to the number of structures to be input. It is good practice to specify a number equal to or slightly greater than this number. If more than `count' structures are encountered while reading input, memory will be reallocated as needed, resulting in a performance penalty and a possible "out of memory" error. The default is 10000. (-m)
-SPHEREX_RANK val
When set, the closest `val' items are excluded. Rather than using a sphere with the THRESHOLD value this simply takes the best `val' items by rank. The default is to not use rank but to use the THRESHOLD value.
-THRESHOLD val
The similarity/distance threshold to use for exclusion. This controls the size of the sphere around each selected item for the exclusion step. The default is a value of 0.8.
-EXPRESSION expr
Uses given expression as the similarity measure for the neighbors list generation. (Default: tanimoto()).
-COMPARISON [DISTANCE|SIMILARITY]
Controls relative goodness of similarity comparisons for list ranking. SIMILARITY means that higher values are better; DISTANCE means that lower values are better. If not specified, the program attempts to derive the directionality of the measure given in the -EXPRESSION option by computing it at the endpoints. Note: The COMPARISON option is only used in conjunction with the EXPRESSION option.

Return Value

Returns 0 to its environment on success, or 1 on error, in which case a diagnostic message is printed:

spherex: input file not specified

An input file was not specified on the command line.
spherex: can't open input file
The input file specified on the command line does not exist or is not readable.
spherex: can't open output file
The output file specified on the command line can't be accessed for writing.
spherex: problem with option manager
The option manager could not be initialized. Verify that DY_ROOT is set properly.
spherex: note, x of x trees contain valid fingerprints
This (non-fatal) message appears if not all input trees contain fingerprints, and is intended to let the user know how much work is actually being done (trees without fingerprints are ignored in the computations). The number of trees with fingerprints is typically a few less than the total.
spherex: no trees with valid fingerprints were found
No valid fingerprints were found, either because no FP items were in the input or their "Run ID" didn't match that specified with the -FID option.

Files

$DY_ROOT/bin/spherex

Daylight License

programs: cluster

Related Topics

fingerprint(1) listclusters(1) showclusters(1) licensing(5)

Daylight Theory Manual