These are intended to be brief notes supplementing and outlining the course material presented in the course Introduction to Daylight. The Daylight manuals should be considered the text for the course and the authoritative documentation, and should be used in conjunction with these notes for best results!
The Daylight Installation and Administration Guide is the relevant manual for this unit.
The Daylight 4.62 release is distributed on CDROM and via FTP at ftp.daylight.com and WWW at daylight.daylight.com, and also at our mirror sites Indiana University and Imperial College. All these means equivalent and result in downloading a tar/gzip archive of either the full Daylight release or the "lite" release without demo databases. Current platforms for Daylight 4.62 are Sun SunOS 5.3-5.6 (Solaris 2.3-2.6) and SGI IRIX 5.3-6.5 (R4000+ cpu). The archive names reflect the version and platform, the current archive names are:
day462-sgi-lite.tar.gz
day462-sgi.tar.gz
day462-sun5-lite.tar.gz
day462-sun5.tar.gz
Note that the lite versions lack the demo databases, and the "exotic" directory containing 3rd party software such as XView, JRE (Java Runtime Environment) -- needed for JavaGRINS, and CEX. But the entire 4.62 release including all applications and contrib are included in the lite versions.
Once installed the Daylight system consists of a set of
directories and files with topmost directory
v462
and a database directory called
thordb
. The files and directories
should all be owned by the Daylight administrator,
usually "thor". There also should be a
directory for keeping configuration files through
version upgrade. Possibilites are
/usr/local/daylight
or
/daylight
.
The following should be defined for all Daylight users:
$DY_ROOT
- software distribution directory
$DY_LICENSEDATA
- license file
And for the administrator:
$DY_THORDB
- database directory
Other environment variables may be defined as a means to setting Daylight options. The following, in particular, may be useful for administrators:
$DY_DATABASE_PASSWORDS_FILE
- server security file
$DY_THOR_LOG_FILE
- Thorserver log
$DY_MERLIN_LOG_FILE
- Merlinserver log
$DY_MERLIN_SERVER_LIST
- Hosts with Merlinservers
$DY_MERLIN_MEMORY_LIMIT
- Max process size
As of version 4.61, executables are dynamically linked, requiring shared objects available at runtime. These shared objects (Daylight libraries) are normally found by setting the environment variable LD_LIBRARY_PATH to include $DY_ROOT/lib. On SGI's with multiple 32-bit/64-bit binary formats, the variables LD_LIBRARYN32_PATH and/or LD_LIBRARY64_PATH may also need to be set appropriately.
This file must be edited to list the Daylight TCP/IP services. Add the following lines:
daytools 5554/tcp thor 5555/tcp merlin 5556/tcp
Daylight web tools require a webserver to be running on the machine equipped with Daylight software, and configured with the following aliases:
/dayhtml/ -> $DY_ROOT/dayhtml/
/dayicon/ -> $DY_ROOT/dayhtml/icons/
/daycgi/ -> $DY_ROOT/daycgi/
(script alias)
A license supplied by Daylight must be installed at
$DY_LICENSEDATA
. The cpu must be identified
as follows:
(1) Hostname (output of "hostname").
AND
(2) output of "$DY_ROOT/bin/testlicense -i"
OR (2b) output of "uname -a" and "hostid" (Sun)
OR (2c) output of "uname -a" AND "lmhostid" (SGI)
OR (2d) output of "uname -a" AND "printf "%x\n" `sysinfo -s`" (SGI)
Changes proposed in 4.71
In 4.71, "daliserver" is introduced. Daliserver is a new Daylight server providing license verification over the network. In this way, license administration can be centralized. In 4.71, the "old system" of license files on each licensed cpu will also be supported, to facilitate a smooth migration. Daliserver provides many other advantages and potential advantages. See dali-design.txt for more information.
|
Security is control over access to data and other resources. The Daylight system is concerned with:
Server access is controlled by a system of users and passwords. This user list is completely separate from the list of unix users. However, some Daylight client programs (e.g., xvmerlin) may use the unix username as a "guess" to attempt server access. The list of Daylight users and passwords is contained in the file dy_passwords.dat, located in $DY_ROOT/etc by default or specified by environment variable DY_DATABASE_PASSWORDS_FILE. Although this file may be easily edited by hand, it is designed to modified by inputs to the sthorman program.
This file specifies allowed hosts and users, and passwords. It is normally not edited but accessed only via the Thorserver. However, it is a simple text file.
Example dy_passwords.dat file:
host:*only* host:gator host:corona user:norah:1aA3h3azZaqw23DS user:jj: user:june:GjkO96REnK2G1Waw user:mug:AsDF12REIO9PlLYb user:thor: user:thorinfo:
Server security may be in one of these two modes. In equivalent hosts mode, if a host is listed, any user from that host may connect with no server password. Allowed users may connect from any host. In restricted hosts mode, only allowed users, from allowed hosts, may connect.
In addition, a third security level is "no security", resulting from specifying no passwords file:
thorserver -DATABASE_PASSWORDS_FILE ""
The software is pre-configured with these allowed users. In
addition, thor
is hard-coded to have special
administrative privileges, and thorinfo
is
hard-coded to have read-only privileges.
The Thorserver, Merlinserver and DayToolserver each are licensed to allow a specific number of simultaneous client connections.
The Thorserver and Merlinserver are two separate executable programs
which work in tandem to provide access to databases. A client may
be a Thor-client
(xvthor
), or a Merlin-client
(mcl
), or both (xvmerlin
).
They share the same passwords file
($DY_DATABASE_PASSWORDS_FILE
), and database path
($DY_DATABASE_PATH
).
The Remote Toolkit is comprised of the daytoolserver
and one or more Mac or Windows client programs.
The daytoolserver
may use the same passwords
file as the database servers or a different one
($DY_TOOLSERVER_PASSWORDS_FILE
). If
program-objects are needed, a allowed directory for program-object
executables must be specified (option
TOOLSERVER_PROGOB_DIR
).
The Dayutilserver provides computing and licensing services for JavaGRINS clients. Other clients may be added in future.
The Daylight administrator must be able to specify allowed users and their passwords, allowed hosts, and other configuration parameters.
Daylight applications have a unified options manager whereby options and allowed values are defined, defaults specified, and non-default values can be specified in several ways according to defined precedence.
dy_profile.opt
is $HOME/dy_profile.opt
(but this can be modified by environment variable
DY_PROFILE
).
$DY_ROOT/etc/unix/dy_sysprofile.opt
(but this can be modified by environment variable
DY_SYSPROFILE
). The user profile supercedes
the system profile.
"DY_"
.
$DY_ROOT/etc/
directory.
These files are not to be modified by the user. However, they may
be useful as option definition references.
Daylight options are defined in directories
$DY_ROOT/etc/unix
and
$DY_ROOT/etc/common
. The system as shipped has
options specified by .dat files in these directories, all of which are
listed in $DY_ROOT/etc/unix/dy_options.dat
.
Applications look for
$DY_ROOT/etc/unix/dy_options.dat
, unless variable
DY_OPTIONS
is set otherwise.
dy_options.dat
specifies, among others,
dy_basic_opts.dat
, which specifies options
SYSPROFILE
and PROFILE
.
SYSPROFILE
is set to
$DY_ROOT/etc/unix/dy_sysprofile.opt
, which exists,
and PROFILE
is set to
$HOME/dy_profile.opt
, which may not exist in
$HOME
, though a sample is provided in
$DY_ROOT/etc/common
.
Daylight administrators should modify
dy_sysprofile.opt
to make changes applicable to
everyone at a site, and users should create and modify their own
$HOME/dy_profile.opt
to customize options for
themselves alone. The environment variable
DY_PROFILE
may be redefined to, say,
"$HOME/.dy_profile.opt"
if desired.
It should be noted that the environment variables
DY_ROOT
and DY_THORDB
are not
Daylight options. DY_ROOT
must be set for all
users, and DY_THORDB
is normally set for the
Daylight administrator. DY_LICENSEDATA
should also
normally be set for all users.
Database installation means copying database files from CDROM or via FTP to a local disk for access by the database servers. Sufficient disk space must be available, and for Merlin access, sufficient RAM.
Daylight databases are configurable in several ways. Configuration options are stored as fields in the header file for the database (the .THOR file). The header file can be edited directly, as it is a simple text file. However, the approved and more reliable method is to use a Thor-manager client (sthorman, thorchange, etc.).
All databases have at least one
auxilliary database, a datatypes database. Other possible
auxilliary databases are indirect and monomer. The auxilliary
databases can be set at database creation, and can be modified
by sthorman
or thorchange
.
Databases each have three passwords:
thorchange
and
thordestroy
.
In practice, read/write access is sufficient to severly or completely corrupt a database. Thus, the executive password may be most useful for preventing unauthorized, accidental, inconvenient or catastrophic "boo-boos", while write password is responsible for securing the data from corruption.
Databases can be configured for caching in several ways to improve their performance (speed). Caching forces the Thorserver to hold some or all of a database in memory for fast access, and supplements the operating system's normal file caching. Caching is using RAM instead of disk to improve speed and efficiency. Daylight caching may be specified by the configuration of a database, or initiated by client request if allowed by configuration.
Caching configuration specifications are normally made by thormake, thorchange, or sthorman.
The option CACHE_LEVEL is ignored unless CACHE_WHEN is ALWAYS.
-CACHE_WHEN NEVER Disable caching; all data remain in the disk files. Ignore caching requests from Thor clients. -CACHE_WHEN OK Cache if, when, and as specified by a Thor client. (Synonymous with "ON_REQUEST".) -CACHE_WHEN ALWAYS -CACHE_LEVEL WRITETHRU Read hashtable from RAM, write to disk (and RAM). -CACHE_LEVEL READWRITE Read and write hashtable in RAM. Disk synced when necessary. -CACHE_LEVEL WRITETHRU_ALL Read entire database from RAM, write to disk (and RAM) -CACHE_LEVEL READWRITE_ALL Read and write entire database in RAM. Disk synced when necessary.
By default, both primary data and cross-references are cached. However, we can select one or the other only. Note that both primary and xref data have separate hash tables and datafiles.
-CACHE_WHAT XREFS -CACHE_WHAT DATA
Possible values in the .THOR header file:
cache level: cache level: NEVER cache level: ON_REQUEST cache level: ALWAYS WRITETHRU cache level: ALWAYS WRITETHRU XREFS_ONLY cache level: ALWAYS WRITETHRU DATA_ONLY cache level: ALWAYS READWRITE cache level: ALWAYS READWRITE XREFS_ONLY cache level: ALWAYS READWRITE DATA_ONLY cache level: ALWAYS WRITETHRU_ALL cache level: ALWAYS WRITETHRU_ALL XREFS_ONLY cache level: ALWAYS WRITETHRU_ALL DATA_ONLY cache level: ALWAYS READWRITE_ALL cache level: ALWAYS READWRITE_ALL XREFS_ONLY cache level: ALWAYS READWRITE_ALL DATA_ONLY
Another choice to be made is whether to cache the "regular" database, or perhaps the indirect database. For databases where there are a large number of indirect references per TDT, caching the indirect database can provide the greatest performance gains.
"Holding" a database keeps it open when no clients are using it. This improves performance of frequently-opened and cached databases. Holding is independent from caching level.
With record-locking enabled, clients can lock individual records for exclusive access, "commit" and "rollback" changes, and unlock records.
Header (.THOR) file entry:
tdt locking: TRUE (Either present or not.)
Databases can be configured writable (by default) or readonly. Readonly databases do not create lockfiles, thus they are useable from CDROM. Also it is possible to open one database with two separate Thorservers. Header file entry:
read only: TRUE (Either present or not.)
Sufficient physical memory (RAM) is essential to the operation of Merlin. Unlike some applications which can perform acceptably with virtual memory or swapping, the Merlinserver is optimized to achieve high speed searching in RAM, and will slow prohibitively when swapping occurs. Therefore, the administrator should configure the Daylight system to avoid overuse of memory, taking into account:
It is possible to estimate memory usage a priori. However, a simpler way is to obtain pool-size information from the Merlinserver log file. Pool size data for commercial databases is available from Daylight.
Given a set of data, there may be several ways to design the datatypes and database for convenience, compactness, and searchability.
frequently occuring data may best be represented indirectly. By using an indirect auxilliary database, a common datum my be stored only once, and compact indirect references stored in its place.
are auxilliary databases which define a table of monomers, or molecular building blocks, which may be referenced by monomer symbols in a Chortles. Combinatorial mixtures can thus be specified by a single Chortles which denote a mixture of 1000's of individual compounds. Mixture databases with auxilliary monomer databases normally are still rooted by SMILES, where the SMILES represents the mixture by replacing variable positions with the wildcard ("*").
reactions are stored as SMILES as are single molecules. Reaction databases are not auxilliary databases. It is also possible to store reactions and molecules together in the same database.
Given the database type, database design
consists mostly of datatype design. Use the examples in
$DY_ROOT/data/datatypes/
for common datatypes
and as templates for new datatypes.
For each new datatype invented, decide whether it will be an identifier. Identifiers require additional disk space, but offer Thor-lookup capability, and logically identify a distinct chemical entity (isomer, sample, vial, registration number) to which data can be associated. Each identifier tag begins with "$".
Databases are created using sthorman
or
thormake
Creating a database means creating the empty files which are in the
format required by the Daylight database servers. These files must
then be loaded with data to be useful. When creating a database,
the following must be specified: (1) a datatypes database, (2)
primary hash table size, (3) cross-reference hash table size. Also,
the path of the database directory must be known. This is normally
referenced by $DY_THORDB
.
Several conversion programs are provided with the SMILES Toolkit
for converting structure files in other formats to Daylight
SMILES-rooted TDTs. These programs are provided as contributed
source code, and found in
$DY_ROOT/contrib/src/applics/convert
.
Most administrators find it convenient to start and stop the Daylight servers using scripts designed to do this safely and efficiently. The start script can specify the full environment needed, which databases are to be loaded, the log files, etc. The stop script can evict users and make sure databases are closed before killing the server processes. These scripts can be invoked by the operating system automatically at boot and shutdown for added convenience.
Many other database administration tasks are well suited to the use of scripts. Building databases, saving them to files, extracting subsets, performing periodic searches, obtaining database statistics, server access statistics -- all these and more may be automated using scripts and the Daylight suite of Thorfilters programs.
Many features are available only to SMILES-rooted databases, making it highly desireable to use SMILES as the root of all TDTs. However, this may be impossible for substances of unknown or non-determinate structure (e.g., beeswax, Fresca, eye of newt). Thor permits TDTs rooted in non-SMILES identifiers, but does not allow sub-trees with additional identifiers in these TDTs. Only data belonging to the ID is allowed.
Ref: A beginner's guide to responsible parenting or knowing your roots, John Bradshaw, EuroMUG '98, Cambridge, UK.
In keeping with the GIGO principle, the administrator must take responsibility for the molecules loaded into the Daylight system to make best use of its capabilities. Consistency of valence models and tautomer representations is a prerequisite to reliable and comprehensible queries. (E.g., nitro-group representation and search).
A new category of datatype is introduced in 4.62, the non-identifier crossreference. Denoted by a preceding slash "/", these datatypes may occur anywhere and are regarded syntactically as data. However, Thor will automatically create a cross-reference corresponding to this datum providing access to the root TDT, SMILES-rooted or non-SMILES-rooted.