Some SMILES/SMARTS Searching Subtleties
In the Daylight system, SMILES and SMARTS searching refers to explicit
and general structural searches over a dataset. Typically, these
searches are performed with the Merlinserver via Merlin client
program such as XVMerlin. "Searching" is distinguished from "lookup",
a Thor function, which accesses one record (TDT) by exact specification
of an identifier.
SMILES Searching
A "SMILES search" is a search across a dataset of structures for an
exact substructure, specified by a SMILES. The Merlinserver utilizes
Daylight "fingerprints" and in-memory techniques to optimize speed
for SMILES searches. However, this methodology is irrelevant to the
resulting hitlist, i.e., to the search logic. The fingerprints are used
for a high-speed screen which discards impossibles in the dataset.
An authoritative check is then performed on the possibles remaining.
It should be noted that while the SMILES language used for search
targets is identical to that used for structure specification, the
meaning of a search-SMILES is subtly different from that of a
structure-SMILES. The biggest difference is that the implicit hydrogens
of the search-SMILES are ignored. For example, the SMILES for
cyclohexane, C1CCCCC1 will match any six aliphatic carbons in a ring
In order to fingerprint the search-SMILES, it must be interpreted as a
molecule. Thus, "cOc" is not a valid search-SMILES (though it is a
valid SMARTS).
SMARTS Searching
A "SMARTS search" is a search across a dataset of structures (specified
by their SMILES) for a structural pattern, specified by a SMARTS. A
SMARTS may be less restrictive or more restrictive than a SMILES, for
example "[#6]" means any carbon, and "[C,N,O]" means either a carbon,
nitrogen, or oxygen, whereas "[C;H0]" means an aliphatic carbon with no
hydrogens attached.
SMARTS searches are generally slower than SMILES searches. However, as
of release 4.4, fingerprints-screening is used to some extent,
basically, to screen based on the explicit portion of the SMARTS.