Daylight Software Version 4.51 Release Notes

============================ CONTENTS ============================



Rev: May 1, 1997

This document describes changes and features specific to version 4.51
of Daylight Chemical Information Systems software.  If there is a
machine-specific file for your machine in this directory (e.g.
"readme_v451_sgi"), please read it too.

    - Tversky Similarity
    - Read-Only CDROM-able Databases
    - Fingerprint-tuples
    - Component-tuples
    - Properties of objects
    - Non-identifier Cross-referencing
    - Automatic Indirect Data
    - Data quoting is uniformly enforced
    - Datatree merging moved to server
    - Monomer table caching
    - Merlin Component-tuple fingerprint optimization
    - THORDBFIX451
    - Thordbinfo
    - Admin and Theory Manuals Restructured

====================== 2. DETAILS OF CHANGES =====================


    The Reaction Toolkit and Reaction capability in Thor and Merlin is
    introduced.  Reactions are stored as SMILES where the the
    reactant(s), agent(s), and product(s) are separated by the
    ">" character.  Reaction atom-maps may be complete
    or incomplete and are stored as Absolute SMILES.  Reactions are
    searchable with all structural searches, super-, sub-, and
    similarity, with reaction-role specificity.  A "difference
    fingerprint" is introduced which characterizes the topological
    changes involved in the reaction, rather than the total features in
    all reaction-roles.  Reactions are depicted in a usual way,
    and GRINS can handle Reaction SMILES.  In the Reaction Toolkit,
    Reaction Transformations are introduced, which apply reactions
    programatically, offering many applications, including combinatorial
    library development and genetic algorithms.  Finally, several 
    commercial databases of reactions are being released by Daylight.

Tversky Similarity

    A new Merlin capability supports a flexible user-configurable 
    similarity coefficient.  By varying parameters alpha and beta, 
    the weight placed on shared and unique structural features may
    be adjusted over a continuous range.  Some settings include the
    Tanimoto coefficient itself, 'similarity as substructure', and
    'similarity as superstructure'.  These features are supported 
    by the Merlin Toolkit, Merlinserver, xvmerlin and MCL.

Read-Only CDROM-able Databases

    Read only databases introduced.  No lock file is created, thus
    the database can be read from CDROM.  Variable db file suffixes 
    are allowed, specified by the option DATABASE_SUFFIX_LIST, thereby
    supporting ISO 9660 CDROM (8.3 names) format, and allowing lower
    case file names.  Also, relative pathnames are allowed in the
    Thor header, facilitating CDROM dbs and simplifying installations.


    The fingerprint program will produce component-tuples of
    fingerprints (FPP datatype) when given the new -m option.

    A database may contain both FP and FPP data which will be used as
    available and needed.  For databases of large dot-disconnected
    mixtures, this can produce a significant increase in screening
    speed (with a penalty in database/pool size)


    Datafields which have the PART_NTUPLE normalization are treated as
    vectors of data which have a correspondence to the disconnected
    components of a structure. The order-correspondence of the vector
    is maintained on canonicalization. Like other ntuples, any kind of
    data can be stored in component-tuples (string, integer, real,
    binary, etc.)

    Such vectors are very useful for representing things like
    mole-fractions of mixtures and stoichimetry of reactions.

    Component fingerprint-tuples may be used to enhance the speed of
    atom-level searching in large mixtures.

Properties of objects

    Toolkit functions for assigning named properties to objects
    are added: dt_setboolean(), dt_sethandle(), dt_setinteger(),
    dt_setreal(), dt_setstring(), dt_proptype(), and dt_propnames().
    In this way flexible data structures may be built onto any
    toolkit object.

Non-identifier Cross-referencing

    Non-identifier dataitems in a Thor datatree which is preceded by a
    slash ('/') are automatically cross-referenced. to the tree root
    (which may or may not be a SMILES).

Automatic Indirect Data

    Added a facility to automatically generate indirect references from
    a datatrees in thorload.  Previously, a user was required to
    generate and load indirect data manually.  Typically this was
    accomplished by assigning arbitrary indirect keys to the data,
    followed by the manual registration of the separate main and
    indirect data.

    The relevant options to thorload are:


    Controls automatic generation of indirect references.  Default:

        -EXCLUDE_INDIRECT ALL or [tag ...]
        -INCLUDE_INDIRECT ALL or [tag ...]
    These two options select which indirect datatypes are to be
    automatically processed.  Each takes a list of datatype tags or the
    keyword "ALL"

        -INDIRECT_DATABASE dbname

    Specifies the indirect database name to which the generated
    indirect references are registered.  This is required when
    -GENERATE_INDIRECT TRUE is specified.  Default: none.

Data quoting is uniformly enforced

    All data containing characters used for datatree syntax  $ < ; > |
    must be quoted in lexical datatrees.  Same as previous quoting
    convention but now enforced uniformly, in particular, with datatype
    definitions, e.g.  $D<"$SMI">, or _P<"*;*">

    All databases must be updated (rebuilt).  thordbfix451 is supplied 
    for this purpose. 

    No changes are visible when working at the object level
    (i.e., with toolkits). 

Datatree merging moved to server

    Loading Thor databases was pretty slow even if "raw" data loading
    was used (i.e., database reloading). Moving the merge operation to
    the server makes datatree merging much faster (5x) for raw data
    loads. (Previous behavior: the client-side got the extant datatree
    from the server, merged it with another, and returned the merged
    datatree to the server.)

    The server now checks to see if the new tree fits in the current
    record's location; if so, it uses it rather than creating a new one
    and dinking the hash lists. When deleting records, the server now
    coalesces adjacent empty blocks when able.

    Extra reserved space is inserted following commonly-used
    cross-references in an adaptive manner. Such space is not subject
    to garbage collection (except when the database is explicitly

Monomer table caching

    Thor clients connecting to a database with a defined monomer table
    download the whole monomer table the first time it is used (e.g.,
    for normalization). This can get time-consuming if there are many
    thousands of monomer definitions.

    The capability of caching monomer tables in a local directory
    (e.g., /tmp) was added.  This is a new, experimental feature.  It is
    not part of the Thor toolkit interface but may be made part of the
    formal Thor toolkit interface in a future release.  For verion 4.51,
    it is implemented only in thorlookup and daytoolserver as the
    "THOR_MONOCACHE_DIR" option.  When set, the programs will use the
    given directory to cache local copies of the monomertable.
    The new dt_info() property "monomtime" returns the date and time
    that a database's monomer table was last modified.

    These functions are combined to produce the following behavior: if
    a local directory is specified, the monomer table is cached in that
    directory only if it doesn't already exist or it has been changed
    since the table was cached.

    This is most suitable for remote toolkit applications and for
    applications which access combinatorial databases over slow lines.

Merlin Component-tuple fingerprint optimization

    Mixtures stored as dot-disconnected SMILES are searched faster by
    using the FPP component-ntuple fingerprint to avoid interpreting
    the entire SMILES if possible.


    Added "thordbfix451", used to rebuild Thor databases by Daylight
    and users for compatibility with the 4.51 release:


    This is a suite of programs that works with thorfilters to dump,
    modify and rebuild pre-4.51 Thor databases.


    New "thorfilter" program: thordbinfo(1).  Prints information about
    a database that isn't available via any other thorfilter program.
    This was motivated by the need for thorfix451 (see below), but is a
    generally- useful new program.

Admin and Theory Manuals Restructured

    The Administration and Theory Manuals were restructured so that
    the Theory Manual is comprised only of Daylight computational
    chemistry theory, and all Thor and Merlin administration topics are
    covered in the Administration Manual, retitled the "Daylight
    Installation and Administration Guide."  The Administration Guide
    includes a well-defined installation section and user guides for
    the administrator programs including sthorman and the Thorfilters
    programs.  Among other benefits, now Daylight administrators should
    only need refer to one manual!

========================== 3. BUGS FIXED ==========================

>   SMILES Toolkit bugs involving aromaticity detection have been
    fixed.  Previously, there existed "a few" non-stable dt_cansmi()
    SMILES which oscillated between two values, or were not
    interpretable by dt_smilin().

>   Fixed showclusters/listclusters bugs.  No longer require that
    cluster numbers be ascending.  No longer require cluster sizes as
    input.  Will warn but will proceed correctly in both of the above

>   SMILES bonds no longer limited to 10 connections per atom.

>   Fixed a bug in Prado.  If one used the -print_smarts option, it
    would fail because the SMARTS toolkit wasn't licensed.  This was a
    problem in the order in which the licenses were checked (a toolkit
    function was called before the dy_lm_check_program() function.

>   Fixed a bug in Rubicon.  +RUBE_WRITE_BOUNDS works correctly (as
    documented) now.

>   Fixed a bug in SMILES toolkit.  If one were to set the atomic
    number of an atom above the highest legal value (DX_ATN_MAX), the
    toolkit core-dumped when one generated a cansmiles.

>   Fixed a bug where dt_setlabel[12]ga() simply didn't work.  They
    always used DL_GA_TEXTLABEL.  Now they use the value set by the
    user.  The default is now DL_GA_DEFAULT, which is different that
    the previous value.  This will cause some colors to change for
    applications which don't set the labelgas.

>   Fixed a bug in tablet.  The correct version of the program wasn't
    being written out.

>   Fixed a bug in dt_stream(bond, TYP_CYCLE).  Returned an empty
    stream if the bond wasn't in any cycles.  Now returns NULL_OB.

>   Part searching in smarts.  Also dt_smarts_opt() works properly.

>   dt_canstream(), dt_origstream(), dt_arbstream().  Replace
    dt_canatom_stream, dt_canbond_stream(), dt_origsmi_stream().

>   jpscan, jarpat - fixed -NNID option to work as in previous
    versions, takes the first NN<> dataitem if the option isn't

>   listclusters, showclusters - fixed memory deallocation bug in

>   Fixed bug in progob toolkit for remote toolkit.  Program objects in
    the remote toolkit didn't work because the program object tookit
    got the interrupt from the license management of the toolserver.
    Fixed in v442p1.

>   Changed SUN5 compilation to use -K PIC.  This allows one to build
    Shareable object libraries for Perl and TCL.  Looked at speed
    difference in Merlinserver.  Fixed in v442p1.

>   Changed SMILES toolkit to remove several limits.  First, removed a
    20K character limit on the length of a SMILES.  Also, removed the
    limit on the number of bonds allowed to a single atom (used to be

>   Lone hydrogens are removed during GRAPH normalization.

========================== 4. KNOWN BUGS =========================


======================= 5. RELEASE HISTORY =======================

Daylight releases are numbered using the following scheme:

   The "system number" (e.g. 3.xx, 4.xx) indicates completely different
   systems.  Each system is a complete new design and coding.

   Major releases (e.g. 4.1x, 4.2x, 4.3x) contain new features and
   enhancements.  Often, programs and databases from one major release
   aren't compatible with those from another.

   Minor releases, or "updates" (e.g. 4.32, 4.33) are for bug fixes and
   minor additional features.  They are also occasionally for adding
   new platforms (computers and/or operating-system version).

The first two releases of system 4 were called "4.1" and "4.2"; under the
above-described scheme they would have been called "4.11" and "4.21".
There were two additional releases of the "demo" tape, which would have
been "4.22" and "4.23".  Release 4.24 (October '92) was the first to use
this new version-numbering scheme.

4.1    20 Dec 1991   First 4.x release: SunOS only

4.21   20 Mar 1992   Second release: bug fixes, added SGI platform

4.22   ??            Demo Tape update (applics and toolkits not affected).
                     Updated the demo database: added clustering data to
                     illustrate Daylight's clustering product.

4.23   22 Sep 1992   Demo Tape update (applics and toolkits not affected)
                     Same as 4.22, but added workaround for a bug in the
                     SGI X window system.

4.24   02 Oct 1992   Update: many bug fixes, some added features.

4.25   13 Nov 1992   SGI Toolkit Tape only; corrects incompatibility between
                     versions of SGI IRIX Operating system.  All other 4.24
                     SGI and Sun tapes are unaffected.

4.31   01 May 1993   Added Print Package, Merlin Toolkit.  Many bug fixes
                     and enhancements.  Added support for VAX/VMS (Toolkits,
                     servers, and some non-X-Windows programs).  Added Thor
                     and Merlin management utilities.

4.32   01 Jul 1993   Added Rubicon program and Rubicon Toolkit.  Added
                     support for HPUX on HP9000/7xx series, and for
                     Solaris on Sun machines.  Improved "man" pages and
                     help-widget text files.  A number of minor bug fixes.

4.33   28 Jan 1994   Improved merlinserver, thorserver, and Merlin and
                     Thor Toolkits.  Improved printing.  A number of minor
                     bug fixes.  Restructured & added to "contrib" programs.

4.34   25 Feb 1994   Revamped clustering programs.  Partial molecular
                     fingerprint generation.  Fixed bugs introduced in 4.33.

4.40   Nov 23 1994   Preliminary "beta-test" versions of 4.41.  
4.40b  Feb 12 1995   Preliminary "beta-test" versions of 4.41.  

4.41   Mar 17 1995   Databases of mixtures, "monomer" toolkit (CHUCKLES,
                     CHORTLES, & CHARTS), Program-Object Toolkit, parallel-
                     ized (multi-CPU) version of clustering, MCL.

4.42   Feb 02 1996   HTML Documentation, CGI application programs, record
		     locking, thordestroy(1), Thor/Merlin messages,
		     Thor/Merlin eviction, faster TDT merging, Merlin
		     parallel SMARTS searches, better Merlin performance
		     under heavy load, Merlin program objects.

4.42p1 Apr 02 1996   Merlinserver, Merlinsmartstalk, and 
                     Daytoolserver bugs fixed in this patch.

4.51   Mar ** 1997   Reaction Toolkit, reaction databases in Thor/Merlin,
		     formal object properties, read-only (CDROM) databases,
		     cross-referencing non-identifiers in THOR, Merlin
		     "similarity as sub/superstructure", and more.