LOOKFOR
The lookfor operator allows selection of rows
containing exactly the same structure as a given SMILES.
This is a fast search which is done by indexed (hashed) retrieval.
A related operator, lookup, is also available.
It is even faster, but requires a canonical SMILES as input.
smi2usmi() and id2usmi() are available
in functional form to return a canonical SMILES, given a
generic SMILES and binary row ID, respectively.
TAUTOMERS
The tautomers operator allows selection of rows
containing possible tautomers of a given SMILES.
For this purpose, structures are considered tautomers if they have same
molecular graph, molecular formula (including hydrogen count) and net
charge. I.e., they can be changed into one another by moving only
electrons and hydrogen atoms.
This is a fast search which is done by indexed (hashed) retrieval.
The functional form is available as smi_tautomers()
which tests if two SMILES are tautomeric by the above definition.
CONTAINS
The contains operator is extended to allow
selection of structures which contain a query substructure.
The query substructure is given as a SMILES structure; results will
include all hydrogen substitutions.
This is the classic "open substructure search".
WARNING: Searching for superstructures of common substructures
like CCC is not advised: it may take a long time, produce a lot
of output, and the results will not be very interesting.
The functional form is available as smi_contains()
which tests if one SMILES structure is contained by another.
SIMILAR (score)
The similar operator allows selection of rows
containing structures which are similar to a given structure.
A fingerprint-based binary Tanimoto coefficient is used for similarity
determination, where
a value of 1.0 means "identical", 0.0 means "completely dissimliar".
The level of similarity to be required is user-adjustable.
This is a fast, robust search type.
At high levels (~0.8 or more), this is a fast, robust search type.
At low levels (~0.65 or less), the results are not meaningful and
result in large answer sets.
Functional forms smi_similarity()
and id_similarity() are available which
return the similarity of two SMILES and two Row IDs, respectively.
The operator score() is also provided as an
ancillary operator to similar().
NEAREST
The nearest operator allows selection of rows
containing the nearest neighbors of a given structure.
The Tanimoto similarity coefficient is used for nearest neighbor
determination.
The number of nearest neighbors to be selected is user-adjustable.
This is a fast, robust search type.
MATCHES
The matches operator allows selection of rows
containing structures which match a given SMARTS pattern.
This very powerful search is primarily useful for advanced users.
Though this search is highly optimized, it can be slow if
all query nodes contain variablity.
Note that this search takes a SMARTS (pattern) as input
rather than a SMILES (structure).
The functional form is available as smi_matches()
which tests if a SMILES structure matches a given pattern.