Daylight Summer School 1998, July 28-30, St. John's College, Santa Fe, NM

Daylight Administration - Class Notes

These are intended to be brief notes supplementing and outlining the course material presented in the course Introduction to Daylight. The Daylight manuals should be considered the text for the course and the authoritative documentation, and should be used in conjunction with these notes for best results!

The Daylight Installation and Administration Guide is the relevant manual for this unit.

A. Installation

  1. FTP, WWW, CDROMs
    The Daylight 4.61 release is distributed on CDROM and via FTP at ftp.daylight.com and WWW at daylight.daylight.com, and also at our mirror sites Indiana University and Imperial College. All these means equivalent and result in downloading a tar/gzip archive of either the full Daylight release or the "lite" release without demo databases. Current platforms for Daylight 4.61 are Sun SunOS 5.3-5.6 (Solaris 2.3-2.6) and SGI IRIX 5.3-6.5 (R4000+ cpu). The archive names reflect the version and platform, the current archive names are:

    day461-irix-lite.tar.gz
    day461-irix.tar.gz
    day461-sun-lite.tar.gz
    day461-sun.tar.gz

  2. The Daylight distribution

    Once installed the Daylight system consists of a set of directories and files with topmost directory v461 and a database directory called thordb. The files and directories should all be owned by the Daylight administrator, usually "thor". There also should be a directory for keeping configuration files through version upgrade. Possibilites are /usr/local/daylight or /daylight.

  3. Daylight environment variables

    The following should be defined for all Daylight users:

    $DY_ROOT - software distribution directory
    $DY_LICENSEDATA - license file

    And for the administrator:

    $DY_THORDB - database directory

    Other environment variables may be defined as a means to setting Daylight options. The following, in particular, may be useful for administrators:

    $DY_DATABASE_PASSWORDS_FILE - server security file
    $DY_THOR_LOG_FILE - Thorserver log
    $DY_MERLIN_LOG_FILE - Merlinserver log
    $DY_MERLIN_SERVER_LIST - Hosts with Merlinservers
    $DY_MERLIN_MEMORY_LIMIT - Max process size

  4. /etc/services

    This file must be edited to list the Daylight TCP/IP services. Add the following lines:

    daytools 5554/tcp
    thor 5555/tcp
    merlin 5556/tcp

  5. httpd configuration

    Daylight web tools require a webserver to be running on the machine equipped with Daylight software, and configured with the following aliases:

    /dayhtml/ -> $DY_ROOT/dayhtml/
    /dayicon/ -> $DY_ROOT/dayhtml/icons/
    /daycgi/ -> $DY_ROOT/daycgi/ (script alias)

  6. Licensing

    A license supplied by Daylight must be installed at $DY_LICENSEDATA. The cpu must be identified as follows:

    (1) output of "testlicense -i" (a Daylight command)
    or (2) output of "uname -a" and "hostid" (Sun)
    or (3) output of "uname -a" AND "lmhostid" OR "printf "%x\n" `sysinfo -s`" (SGI)

B. Servers & Users

Security is control over access to data and other resources. The Daylight system is concerned with:

  1. Access to the Daylight servers (Thor, Merlin, Daytools).
  2. Access to data in databases.
The security of the system is dependent upon the underlying filesystem permissions. For this reason we recommend that all software and databases are owned by a single user "thor", and that the servers are run by thor. Other users should be able to run the client executables.

Server access is controlled by a system of users and passwords. This user list is completely separate from the list of unix users. However, some Daylight client programs (e.g., xvmerlin) may use the unix username as a "guess" to attempt server access. The list of Daylight users and passwords is contained in the file dy_passwords.dat, located in $DY_ROOT/etc by default or specified by environment variable DY_DATABASE_PASSWORDS_FILE. Although this file may be easily edited by hand, it is designed to modified by inputs to the sthorman program.

  1. dy_passwords.dat

    This file specifies allowed hosts and users, and passwords. It is normally not edited but accessed only via the Thorserver. However, it is a simple text file.

    Example dy_passwords.dat file:

            host:*only*
            host:gator
            host:corona
            user:norah:1aA3h3azZaqw23DS
            user:jj:
            user:june:GjkO96REnK2G1Waw
            user:mug:AsDF12REIO9PlLYb
            user:thor:
            user:thorinfo:
    

  2. restricted vs. allowed hosts

    Server security may be in one of these two modes. In equivalent hosts mode, if a host is listed, any user from that host may connect with no server password. Allowed users may connect from any host. In restricted hosts mode, only allowed users, from allowed hosts, may connect.

    In addition, a third security level is "no security", resulting from specifying no passwords file:

    thorserver -DATABASE_PASSWORDS_FILE ""

  3. "thor" and "thorinfo"

    The software is pre-configured with these allowed users. In addition, thor is hard-coded to have special administrative privileges, and thorinfo is hard-coded to have read-only privileges.

  4. user limits

    The Thorserver, Merlinserver and DayToolserver each are licensed to allow a specific number of simultaneous client connections.

  5. thorserver vs. merlinserver

    The Thorserver and Merlinserver are two separate executable programs which work in tandem to provide access to databases. A client may be a Thor-client (xvthor), or a Merlin-client (mcl), or both (xvmerlin). They share the same passwords file ($DY_DATABASE_PASSWORDS_FILE), and database path ($DY_DATABASE_PATH).

  6. daytoolserver

    The Remote Toolkit is comprised of the daytoolserver and one or more Mac or Windows client programs. The daytoolserver may use the same passwords file as the database servers or a different one ($DY_TOOLSERVER_PASSWORDS_FILE). If program-objects are needed, a allowed directory for program-object executables must be specified (option TOOLSERVER_PROGOB_DIR).

  7. tasks

    The Daylight administrator must be able to specify allowed users and their passwords, allowed hosts, and other configuration parameters.

C. Customizing

  1. Options

    Daylight applications have a unified options manager whereby options and allowed values are defined, defaults specified, and non-default values can be specified in several ways according to defined precedence.

    1. User Profile - Each user may have a user profile which specifies option values. The default path for the file dy_profile.opt is $HOME/dy_profile.opt (but this can be modified by environment variable DY_PROFILE ).

    2. System Profile - The system profile is a file shared by all users of a Daylight installation. It is located by default at $DY_ROOT/etc/unix/dy_sysprofile.opt (but this can be modified by environment variable DY_SYSPROFILE ). The user profile supercedes the system profile.

    3. Environment variables - As an alternate to the user and system profiles, all Daylight options may be set by defining their associated environment variables, which is in each case the option name prefixed by "DY_" .

    4. Command line options - Options may be set on the command line when the application is launched. These specifications will overrride all others.

    5. Option definition files - Options are defined, their names and allowed values, in .dat files in the $DY_ROOT/etc/ directory. These files are not to be modified by the user. However, they may be useful as option definition references.

    Daylight options are defined in directories $DY_ROOT/etc/unix and $DY_ROOT/etc/common. The system as shipped has options specified by .dat files in these directories, all of which are listed in $DY_ROOT/etc/unix/dy_options.dat. Applications look for $DY_ROOT/etc/unix/dy_options.dat, unless variable DY_OPTIONS is set otherwise.

    dy_options.dat specifies, among others, dy_basic_opts.dat, which specifies options SYSPROFILE and PROFILE. SYSPROFILE is set to $DY_ROOT/etc/unix/dy_sysprofile.opt, which exists, and PROFILE is set to $HOME/dy_profile.opt, which may not exist in $HOME, though a sample is provided in $DY_ROOT/etc/common.

    Daylight administrators should modify dy_sysprofile.opt to make changes applicable to everyone at a site, and users should create and modify their own $HOME/dy_profile.opt to customize options for themselves alone. The environment variable DY_PROFILE may be redefined to, say, "$HOME/.dy_profile.opt" if desired.

    It should be noted that the environment variables DY_ROOT and DY_THORDB are not Daylight options. DY_ROOT must be set for all users, and DY_THORDB is normally set for the Daylight administrator. DY_LICENSEDATA should also normally be set for all users.

  2. Useful shortcuts

D. Databases

  1. installing databases

    Database installation means copying database files from CDROM or via FTP to a local disk for access by the database servers. Sufficient disk space must be available, and for Merlin access, sufficient RAM.

  2. configuring databases:

    Daylight databases are configurable in several ways. Configuration options are stored as fields in the header file for the database (the .THOR file). The header file can be edited directly, as it is a simple text file. However, the approved and more reliable method is to use a Thor-manager client (sthorman, thorchange, etc.). Auxilliary Databases - All databases have at least one auxilliary database, a datatypes database. Other possible auxilliary databases are indirect and monomer. The auxilliary databases can be set at database creation, and can be modified by sthorman or thorchange.

    Database Passwords - Databases each have three passwords:

    In practice, read/write access is sufficient to severly or completely corrupt a database. Thus, the executive password may be most useful for preventing unauthorized, accidental, inconvenient or catastrophic "boo-boos", while write password is responsible for securing the data from corruption.

    Read-only - Databases can be set to read-only.

    Caching - Databases can be configured for caching in several ways to improve their performance (speed). Caching forces the Thorserver to hold some or all of a database in memory for fast access, and supplements the operating system's normal file caching. Caching is using RAM instead of disk to improve speed and efficiency. Daylight caching may be specified by the configuration of a database, or initiated by client request if allowed by configuration.

    Caching configuration specifications are normally made by thormake, thorchange, or sthorman.

    The option CACHE_LEVEL is ignored unless CACHE_WHEN is ALWAYS.

     -CACHE_WHEN NEVER
          Disable caching; all data remain in the disk files.
          Ignore caching requests from Thor clients.
    
     -CACHE_WHEN OK
          Cache if, when, and as specified by a Thor client.
          (Synonymous with "ON_REQUEST".)
    
     -CACHE_WHEN ALWAYS
    
          -CACHE_LEVEL WRITETHRU
               Read hashtable from RAM, write to disk (and RAM).
          -CACHE_LEVEL READWRITE
               Read and write hashtable in RAM.  Disk synced when necessary.
         -CACHE_LEVEL WRITETHRU_ALL
               Read entire database from RAM, write to disk (and RAM)
          -CACHE_LEVEL READWRITE_ALL
               Read and write entire database in RAM.  Disk synced when
    necessary.
    

    By default, both primary data and cross-references are cached. However, we can select one or the other only. Note that both primary and xref data have separate hash tables and datafiles.

          -CACHE_WHAT XREFS
          -CACHE_WHAT DATA
    

    Possible values in the .THOR header file:

            cache level: 
            cache level: NEVER
            cache level: ON_REQUEST
            cache level: ALWAYS WRITETHRU
            cache level: ALWAYS WRITETHRU XREFS_ONLY
            cache level: ALWAYS WRITETHRU DATA_ONLY
            cache level: ALWAYS READWRITE
            cache level: ALWAYS READWRITE XREFS_ONLY
            cache level: ALWAYS READWRITE DATA_ONLY
            cache level: ALWAYS WRITETHRU_ALL
            cache level: ALWAYS WRITETHRU_ALL XREFS_ONLY
            cache level: ALWAYS WRITETHRU_ALL DATA_ONLY
            cache level: ALWAYS READWRITE_ALL
            cache level: ALWAYS READWRITE_ALL XREFS_ONLY
            cache level: ALWAYS READWRITE_ALL DATA_ONLY
    

    Another choice to be made is whether to cache the "regular" database, or perhaps the indirect database. For databases where there are a large number of indirect references per TDT, caching the indirect database can provide the greatest performance gains.

    Database holding -

    "Holding" a database keeps it open when no clients are using it. This improves performance of frequently-opened and cached databases. Holding is independent from caching level.

    Record locking -

    With record-locking enabled, clients can lock individual records for exclusive access, "commit" and "rollback" changes, and unlock records.

    Header (.THOR) file entry:

            tdt locking: TRUE         (Either present or not.)
    

    Read-only databases -

    Databases can be configured writable (by default) or readonly. Readonly databases do not create lockfiles, thus they are useable from CDROM. Also it is possible to open one database with two separate Thorservers. Header file entry:

            read only: TRUE           (Either present or not.)
    

  3. pools and RAM

    Sufficient physical memory (RAM) is essential to the operation of Merlin. Unlike some applications which can perform acceptably with virtual memory or swapping, the Merlinserver is optimized to achieve high speed searching in RAM, and will slow prohibitively when swapping occurs. Therefore, the administrator should configure the Daylight system to avoid overuse of memory, taking into account:

    It is possible to estimate memory usage a priori. However, a simpler way is to obtain pool-size information from the Merlinserver log file. Pool size data for commercial databases is available from Daylight.

  4. Database Design

    Given a set of data, there may be several ways to design the datatypes and database for convenience, compactness, and searchability.

    Indirect databases - frequently occuring data may best be represented indirectly. By using an indirect auxilliary database, a common datum my be stored only once, and compact indirect references stored in its place.

    Monomer databases - are auxilliary databases which define a table of monomers, or molecular building blocks, which may be referenced by monomer symbols in a Chortles. Combinatorial mixtures can thus be specified by a single Chortles which denote a mixture of 1000's of individual compounds. Mixture databases with auxilliary monomer databases normally are still rooted by SMILES, where the SMILES represents the mixture by replacing variable positions with the wildcard ("*").

    Reaction databases - reactions are stored as SMILES as are single molecules. Reaction databases are not auxilliary databases. It is also possible to store reactions and molecules together in the same database.

    Datatypes - Given the database type, database design consists mostly of datatype design. Use the examples in $DY_ROOT/data/datatypes/ for common datatypes and as templates for new datatypes.

  5. Creating Datatypes

    For each new datatype invented, decide whether it will be an identifier. Identifiers require additional disk space, but offer Thor-lookup capability, and logically identify a distinct chemical entity (isomer, sample, vial, registration number) to which data can be associated. Each identifier tag begins with "$".

  6. Creating Databases

    Databases are created using sthorman or thormake Creating a database means creating the empty files which are in the format required by the Daylight database servers. These files must then be loaded with data to be useful. When creating a database, the following must be specified: (1) a datatypes database, (2) primary hash table size, (3) cross-reference hash table size. Also, the path of the database directory must be known. This is normally referenced by $DY_THORDB.

  7. Converting to SMILES from other formats

    Several conversion programs are provided with the SMILES Toolkit for converting structure files in other formats to Daylight SMILES-rooted TDTs. These programs are provided as contributed source code, and found in $DY_ROOT/contrib/src/applics/convert.

  8. Automating DB Administration
    Starting and stopping servers - Most administrators find it convenient to start and stop the Daylight servers using scripts designed to do this safely and efficiently. The start script can specify the full environment needed, which databases are to be loaded, the log files, etc. The stop script can evict users and make sure databases are closed before killing the server processes. These scripts can be invoked by the operating system automatically at boot and shutdown for added convenience.

    Other DB Admin Tasks - Many other database administration tasks are well suited to the use of scripts. Building databases, saving them to files, extracting subsets, performing periodic searches, obtaining database statistics, server access statistics -- all these and more may be automated using scripts and the Daylight suite of Thorfilters programs.

  9. Non-structures in Daylight Databases

    Many features are available only to SMILES-rooted databases, making it highly desireable to use SMILES as the root of all TDTs. However, this may be impossible for substances of unknown or non-determinate structure (e.g., beeswax, Fresca, eye of newt). Thor permits TDTs rooted in non-SMILES identifiers, but does not allow sub-trees with additional identifiers in these TDTs. Only data belonging to the ID is allowed.

  10. Non-identifier crossreferences

    A new category of datatype is introduced in 4.61, the non-identifier crossreference. Denoted by a preceding slash "/", these datatypes may occur anywhere and are regarded syntactically as data. However, Thor will automatically create a cross-reference corresponding to this datum providing access to the root TDT, SMILES-rooted or non-SMILES-rooted.

  11. Survey of Commercial Databases


Daylight Chemical Information Systems Inc.
support@daylight.com