Mug '97
:
4.5 Thor and Merlin systems
There are quite a few changes to Thor/Merlin systems in the 4.51 release,
most of which are evolutionary rather than revolutionary.
The major visible changes provide support for reactions and
live CD-ROM databases.
The following list describes most changes which will be visible to
database users and managers - but, please - check the readme files
supplied with the distribution for an authoritative list (as always).
- Most datatype definitions now require quotes
- All databases must be reloaded
(
thordbfix451 is supplied).
- No changes are visible when working at the object level
(i.e., with toolkits).
- Thorserver merges datatrees locally for better performance
(5x faster reloads).
- Thorserver now reuses vacated space and coalesces adjacent
empty blocks when able.
- Extra reserved space is inserted after cross-references
in an adaptive manner.
- Thorload will optionally generate indirect data keys.
- Precise datatype/datafield control is provided.
- A particuarly efficient quadruple hash scheme is used.
- USMILES --
produce unique SMILES for "generic" reaction
- AUTOGEN GRAPH --
record role-free, oxidation-state-suppressed reaction
- MAKERXNMOL --
crossreference reaction by component and role
- ATOM_NTUPLE, BOND_NTUPLE, PART_NTUPLE --
component-specific data
- Datafields may have the PART_NTUPLE normalization tag.
- Data in such fields are vectors of data corresponding to
disconnected components.
- Order correspondence is maintained on canonicalization.
- Useful for mole-fractions of mixtures,
fractional stoichiometry, etc.
- Component fingerprint-tuples are used by merlin to good effect.
- Fingerprint-tuples (FPP) are component-tuples of
fingerprints.
- FPP data is produced with the new
fingerprint -m program option.
- A database may contain both
FPP and FPP data.
- Merlin will use FP and FPP
data as available and needed.
- FPP availability dramatically improves performance of
screening large libraries.
- Reaction fingerprints distinguish reactant and product features.
- Product bits are offset in reaction fingerprints.
- No special action is needed to obtain such fingerprints --
these are the "normal" reaction fingerprints.
- Suitable for structure screening and similarity comparison
- Difference fingerprints represent the difference between
reactants and products of a reaction.
- Such fingerprints characterize the transformation
(sans atom mapping).
- Can be used for clustering and as "alternative fingerprints".
- Can't be used at the same time as normal fingerprints (yet).
Reaction searching
- Widgets now handle reactions (e.g., grins and depict in xvmerlin).
- Character screening is changed to accomodate new SMILES syntax.
- Reaction fingerprinting distinguishes reactant and product features.
- All previously-available structure-based searches now operate
on reactions.
- Similarity and cluster analysis methods work with reactions just
as with molecules.
- Searching reaction databases (with reaction queries) is generally
faster than molecule searching.
Tversky similarity search
- Merlin supports Tversky measures between binary fingerprints
for searching and sorting.
- A method to calculate these measures is supplied in the
fingerprint toolkit as
dt_fp_tversky().
- Provides a continuous range of super- to sub-structure similarity
- Extremes provide similarity as superstructure and as substructure
(missing since 'old VAX v3.6)
- Very powerful tool for reaction searching.
- Might provide a "diversity metric" which measures distinctive
features.
- First described by John Bradshaw at EuroMUG-93
- Will be subject of a
presentation by John Bradshaw at this meeting.
MCL (Merlin Control Language) interface updated
- MCL now allows searching and sorting based on
Tversky similarity measures.
- The MCL processor writes "embedded HTML" output via
the mcl -h option.
- MCL documentation has been overhauled.
Alternative Thor database file suffixes
- Previously, the primary Thor database file had to end in ".THOR".
- Thorserver -DATABASE_SUFFIX_LIST allows
specification of alternative suffixes.
- The default value of this option is
".THOR|.TDB|thor|.tdb"
- This is required to operate with some filesystems (e.g., ISO-9660).
Relative database pathnames in .THOR file
- File names in a .THOR file are now interpreted relative to
the .THOR file's directory.
- Simplifies moving databases from one directory to another
- Allows databases to reside on removable media such as a CD-ROM
"Readonly" Thor databases are supported
- Thor database "readonly" property is saved in the .THOR file.
- Readonly property set via
sthorman or
thorchange -setaccess.
- Databases which are on "readonly" media (e.g., CD-ROMs)
are automatically readonly.
- Absolutely no writing takes place for a readonly database
(e.g., no .LCK file is created).
- More than one thorserver can access a readonly database.
"Live" Thor CD-ROM databases are supported
- Live databases can be run directly from the CD-ROM.
- Thor performance suffers but remains useful for interactive work.
- Merlin performance is not degraded at all.
- Live databases can be "burned" onto ISO-9660 formatted CD's.
Non-identifiers can be cross-referenced
- Identifiers in a Thor datatree are cross-referenced to
the tree root (as always).
- Tree roots must be identifiers,
and may or may not be a SMILES (as always).
- Non-identifiers preceded by slash ('/') are also
crossreferenced (new).
- Allows you to build many-to-many relationships --
use with caution!
- Provides database status including header contents
- "thorfilters" now provides all thor management functions
- sh and perl scripters are now first class citizens.
Lone hydrogens removed from GRAPHs
- GRAPH data represent oxidation-state suppressed molecules
- Lone hydrogens are now removed during Thor's GRAPH normalization
- This was a bug-fix.
- Thor clients using a database with monomer-level structures
require the monomer definitions.
- When first needed, the database's whole monomoner is downloaded
to the client.
- This approach is OK for 100's to 1000's of monomers, too slow
for 1000's to 10,000's.
- Added the thor-client option of caching monomer tables in a local
directory (e.g., /tmp).
- The monomer-table cache is updated only when the monomer-database
changes.
- Best for databases which are accessed many times between
changes to the monomer-table
- Ideal for remote toolkit applications using combinatorial databases
Daylight Chemical Information Systems, Inc.
info@daylight.com