XVMerlin Manual
Daylight Version 4.9
Release Date 08/01/11
Copyright notice
This document is copyrighted © 1991-2011 by Daylight
Chemical Information Systems, Inc. Daylight explicitly
grants permission to reproduce this document under the
condition that it is reproduced in its entirety, including
this notice. All other rights are reserved.
Table of Contents
1. Introduction to XVMerlin
2. Basic Operation of XVMerlin
3. Using the XVMerlin Window Menus
4. Searching
a href="#configure">5. Configuring Merlin
1. Introduction to XVMerlin
Merlin is designed for chemical database searching, and is particularly
good at structural searching. The inherent compactness of SMILES in
conjunction with a structural "fingerprint" allow Merlin to maximize
searching speed by searching in memory. A database of 100,000
structures might typically be searched in 10MB of memory.
Merlin's capabilities complement those of THOR, Daylight's
disk-resident database system. With THOR able to store very large
datasets, and provide fast retrieval and read/write capability for one
THOR Data Tree at a time, it is Merlin's job to keep a subset of a THOR
database in memory for fast searching. Daylight has separated the
functions of data lookup (THOR) and data searching (Merlin) to optimize
the performance of both. However, the Merlin client can access the
Thor server to lookup the datatree for a xvmerlin row.
Merlin provides flexible display of text, 2D and 3D graphics, and
allows structure input via SMILES or graphic entry (GRINS).
Substructure, similarity, and string searching are available, as well
as a flexible sorting menu for numerical and textual data.
This XVMerlin User Guide is aimed at the beginning user. It describes
and illustrates the main capabilities of Merlin, and should enable the
user to start searching a database. Refer to the Daylight THOR and Merlin
Administration Guide for more information on the methodology of Merlin.
Prerequisites for running Merlin:
- The Merlin program has been installed locally
- A database has been installed and is accessible to the server.
- The Merlin server has been started, and a database "pool" has been loaded.
- Local environment variables have been defined (normally DY_ROOT and
DY_LICENSEDATA.
- The Daylight Software License is valid for "merlin".
- The read password for the database and server password are known (if any).
- To start the Merlin program, enter: "xvmerlin" (for SGIs,
"xvmerlin4d").
2. Basic Operation of XVMerlin
The Merlin client, xvmerlin, appears to the user as a set of windows.
The main Merlin window accesses all other windows and menus, and
displays status information and a configurable set of data columns.
In general, Merlin keeps a data "pool" of structures in memory of which
the current "hitlist" is a subset. The hitlist may consist of the
entire pool or the null set. The hitlist is a list of "hits" from the
previous search, ordered by the previous sort, or in their original
pool order. Iterative searching and/or sorting can be used to impose
several criteria in a step-by-step fashion, pruning the original pool
to a final hitlist of desired structures.
For a non-empty hitlist, there is defined a current structure,
indicated by the hitlist pointer (highlighted in the Merlin window).
Searches and sorts can be based on this structure and its associated
data, although search and sort key data may be entered which is not
present in the pool.
The pool is a subset of a THOR database which is loaded into memory by
the Merlin server. The hitlist is the subset of those substances which
are present as rows on the Merlin scrolling region, visible or not.
The set of visible Merlin data columns may or may not include all
datatypes present in the pool. Certain functions, such as 'Print
hitlist' will print only those datatypes for which columns are visible,
though the entire hitlist (including non-visible rows) will be
printed. In general, the "hitlist" refers to a set of substances or
rows, but not a specific set of associated data. In addition, it is
possible to store a hitlist in a buffer and retrieve it, so there is
the concept of the "current hitlist" and the stored hitlist. Finally,
the function 'Draw hits' is somewhat idiomatic, as it only depicts the
hits which are visible.
3. Using the XVMerlin Window Menus
3.1 The Hitlist Menu
The Hitlist menu provides the following functions:
- Set all hit
- Sets all pool structures "on", so the hitlist contains all the pool
- Invert
- Structures that are "on" are turned "off", and vice versa
- Reverse order
- Order reverse with respect to original pool
- Native order
- Original pool order
- Search...
- Search window (see sections on searching)
- Undo
- Undo previous hitlist operation
- Store
- Store current hitlist in buffer
- Recall
- Recall hitlist previously stored
- Exchange
- Store + Recall
- Union
- Union of stored and current hitlist
- Intersect
- Intersection of stored and current hitlist
3.2 The Display Menu
The Display menu (shown here with 'Lines per row' submenu) provides the
following functions:
- Keypad...
- See below
- Set Colors...
- Invokes EDGAR, the graphics-attributes widget
- Font
- Submenu to modify font characteristics
- Lines per row
- Specifies how many text-lines per hitlist row
- Show SMILES
- Specifies textual or graphic SMILES (depiction). Note that lines-per-row minimum is 3 for depictions to be displayed.
3.3 The Keypad
The keypad provides handy access to several tools for manipulation of
the hitlist, some of which are also present in the hitlist-menu, and
identical in function. The Reset button is identical to 'Set all
hits'. Home, Line-up, Page-up, View@top, View@center, View@bottom,
End, Line down, and Page down all scroll throughout the existing
hitlist and/or move the hitlist pointer from one row to another.
3.4. The File Menu
The File menu provides the following functions:
- Open Database
- View and select from available servers and databases
- Close Database
- Close any open database
- Servers...
- Invoke servers control panel (see below)
- Read hitlist...
- Retrieve hitlist saved as SMILES or other root-ids
- Save hits as .tdt...
- Save current hitlist
- Save hits as .tab...
- Save current hitlist in a tab-delimited file
- Print depictions...
- Print depictions for current hitlist
- Print hitlist...
- Print current hitlist
- Iconify
- Iconify XVMerlin
- Quit
- Quit XVMerlin
The server panel provides access to merlin servers on the network.
Merlin will start up with the servers specified by option
MERLIN_SERVER_LIST, but additional servers may be added.
3.5 The Data Column Menu
The data Column menu provides the following functions:
- Draw hits
- Creates a window of depictions for all visible rows
- Datatype (submenu)
- Modifies the column datatype
- Function (submenu)
- First, last, min, max, longest, shortest, all, count; select among or operate on multiple data for one datatype
- Graphic selection...
- Edit hitlist graphically
- Remove repeats
- Deletes rows where this data value is repeated
- Remove n/a's
- Deletes rows without this datatype
- Save hitlist...
- save current hitlist as SMILES file
- Search (submenu)
- Structural, similarity, and string/expression searching
- Sort (submenu)
- Numeric, ascii, and other sorting
- Zap this column
- Removes column from scrolling canvas
3.5.1 Draw Hits & the Depict Widget
The Draw Hits command is in each data column menu, and invoks a window
depicting all of the structures visible on the hitlist page, subtitled
with the datatype of that column. To bring up a larger image of a structure in its own window, use the
middle mouse button to click on a pane in the depict widget.
3.6 The Popup Menu
The popup menu is invoked by pressing the right mouse button while the
pointer is on a hitlist cell. The popup menu provides functions which
involve the row and/or cell from which they were invoked. Often this
will save typing in search fields manually.
The Popup menu provides the following functions:
- Show (submenu)
- Text, 2D or 3D graphics, or the full TDT (via THOR)
- Move to top
- Moves hitpointer structure to top of hitlist
- Move to bottom
- move hitpointer structure to bottom of hitlist
- Delete (submenu)
- Deletes current structure, all above or below, or n/a's
- Search (submenu)
- Search panel preloaded with cell contents
- Set buttons (submenu)
- Sets mouse button functions (e.g., show TDT, show 3D)
The XVmerlin client can query the THOR server for the entire
datatree, or TDT. This capability allows the user to view all data
associated with a substance via the handy TDT widget. Use the popup
menu from the row of interest to specify Show->datatree. This function
is also the default action set for the middle mouse button.
4. Searching
Searching with Merlin means scanning all of a selected datatype
(SMILES, name, molecular weight, etc.) for the current hitlist, or the
entire pool, or the un-hit portion of the pool, making some comparison
or evaluation with respect to a predefined key, and taking a specified
action (delete from hitlist, add to hitlist, etc.).
The searching window is invoked from the Hitlist menu, Column menu, or
Popup menu. The types of searches are listed in the 'Look for' menu of
the Search Control Panel:
And the action taken on the hitlist is specified by the Action menu:
The search-type and action taken are independent choices. So, there is
a lot of flexibility over the search procedure, and it is important to
be aware of the choices available. In particular, the default action
is 'Make a new hitlist', which results in a complete search of the
entire pool (hitlist and un-hit). While this is appropriate for a
first search, to then search only the hitlist resulting from the first
search will require a different action, such as 'Remove non-matches
from hitlist'.
The 'Find first' and 'Find next' buttons apply only for the actions
'Find match...' and 'Find non-match...'.
The Search window appears differently for each search type. The search
types are:
- String search:
- Looks for the specified string in the specified column.
'Select & sort' also performs an ascii sort.
- Regular expression search:
- Looks for the specified regular expression
(UNIX-style) in the specified column. Refer to UNIX documentation or a
local guru for help with regular expressions.
- Approximate-string search.
- Ranks according to similarity to given string or regular-expression.
- Structures containing given substructure:
- In this search the specified structure is searched for as a substructure.
GRINS may be used for structure input. The search type may be for SMILES,
Isomeric SMILES, SMARTS, or Isomeric SMARTS. SMARTS searches can be much
more chemically meaningful, but the search algorithm is much slower. The
'Optimize target'checkbox invokes a routine for rearranging the search
target to place uncommon atoms first, to speed up the search. This box
should normally be checked.
In general terms, a SMILES search looks for the non-hydrogen graph of
the specified SMILES. The SMILES must represent a valid molecule.
SMARTS represent substructures and may or may not be valid SMILES.
Press the Help button to find documentation on SMILES and SMARTS.
Refer also to the Daylight Theory Manual.
Merlin utilizes fingerprints for fast screening of structures as
a first step. If the database contains FPP part N-tuple fingerprints,
this screen can be optimized by checking the "Use FPP" box.
- Structures embedded within given structure:
- This search looks for SMILES which represent substructures of the
specified structure.
- Similarity search:
- Compares the fingerprint for the specified SMILES
with the fingerprints in the pool. For each comparison a similarity
coefficient is generated (and this coefficient can be displayed as a
column of datatype "Similarity"). This coefficient is generated by the
Tanimoto similarity algorithm. Merlin can either sort the hitlist on
this coefficient, or select and sort, deleting those SMILES whose
coefficient is less than an arbitrary value. The "very high" through
"rough" choices represent these arbitrary numerical thresholds, and
these values can be reset by options.
The correspondence between Merlin-similarity and chemical similarity
will, of course, be dependent on the user's definition of chemical
similarity. The fingerprint program is equipped with adjustable
information-content and information-density settings, which can improve
similarity searching for any given dataset. It is important to
recognize that no similarity metric is likely to be optimal for all
chemical tasks.
- Tversky similarity search:
- The most powerful structural search is now (as of 4.51) the Tversky
search (but not as simple to use or interpret as the Tanimoto metric).
Like the Tanimoto search, this compares features in a given structure
(the "prototype") to features in database structures (as "variants"),
and allows hitlist selection or sorting based on the results. However,
the Tversky search allows you to specify the weighting that will be
given to each set of features.
Setting the weighting of prototype features to 100% and variant
features to 100% produces a symmetrical similarity metric identical to
Tanimoto metric. (Setting them symmetrically to values less then 100%
doesn't change the rank ordering, just the absolute value, i.e., more
structures will meet a given similarity criterion).
Setting the weighting of prototype and variant features asymmetrically
produces a similarity metric in a more-substructural or
more-superstructural sense. Setting the weighting of prototype
features to 100% and variant features to 0% means that only the
prototype features are important, i.e., this produces a
"superstucture-likeness" metric. In this case, a Tversky similarity
value of 1.0 means that all prototype features are represented in the
variant, 0.0 that none are. Conversely, setting the weights to 0%
prototype / 100% variant produces a "substucture-likeness" metric,
where completely embedded structures have a 1.0 value and
"near-substructures" have values near 1.0. (Note: with no weight at
all given to variant features, this metric is pretty sensitive
fingerprint "noise" and settings of 90%/10% generally produce a more
useful ranking.)
Tversky metrics where the two weightings add up to 100% (1.0) are of
special interest (e.g., the 50/50 metric is known as the Dice index).
The Tversky search query panel provides a "Sum 100%" checkbox which,
when selected, forces the two weights to add up to 100%.
Advanced users may wish to experiment with Tversky metrics where
weightings are not limited to 100%. (Doing so is rank-equivalent to
raising the Tversky "theta" parameter above 1.0.) Weightings greater
than 100% causes the distingishing features to be emphasized more than
common features which may be useful in analysis of diversity or
dissimilarity. The Tversky query panel does not provide control of the
maximum allowed weighting directly; it must be set with an option,
e.g., xvmerlin -MERLIN_TVERSKY_ABMAX 2.0
Four xvmerlin options are used to control the default Tversky
parameters: MERLIN_TVERSKY_ALPHA (prototype weighting),
MERLIN_TVERSKY_BETA (variant weighting), MERLIN_TVERSKY_ABMAX (maximum
weighting), and MERLIN_TVERSKY_ONE (setting of "Sum 100%" checkbox).
- Equivalent search:
- This search has three distinct modes. An equivalent-SMILES search
is the same as a Thor lookup for the given SMILES. A graph search looks
for all structures with the same hydrogen-stripped, bond-order-stripped graph.
A tautomer search looks for graph-equivalents with the same formula and
charge.
4.1. Sample Searches
The first search described is a structural search; the second is a
string search. Note that the flexibility of Merlin allows many options
in performing these searches. These are only two possible procedures.
4.1.1 A String Search
- From the XVMerlin window, invoke the Search Command Panel by choosing 'Searching' from the Hitlist menu.
- From the 'Look for' menu, choose 'Strings containing given substring'
- From the 'In column' menu, choose datatype 'Local name'.
- Enter the string "BARBI" and press Select.
- Use the popup menu from the SMILES column to sort the hitlist based on SMILES length.
- Use the scroll bar (or the keypad) to find the first SMILES.
4.1.2 A Superstructure Search (SMILES)
- From the previous string search and sort, Barbituric Acid
should be at the top of the hitlist. Use the scrollbar to make the
first row the current structure.
- Invoke the popup menu from Barbituric Acid. Select the Search
panel in 'Superstructure search' mode. Note that the Search panel is
preloaded with the corresponding SMILES. Press the Select & sort
button to start the search. Sorting will be based on similarity.
After the search , note that depictions are highlighted to show the hit
atoms.
4.1.3 A SMARTS Search
- SMARTS searching is slower than SMILES searching, but is more
powerful. With the Search panel in Superstructure mode, specify SMARTS
searching, and type in the following SMARTS:
[!$(*#*)&!D1]-&!@[!$(*#*)&!D1]. Then press Select. Note that this
SMARTS represents two atoms connected by a "rotatable bond".
- This search may take a few minutes. The status widget will report
progress.
- The resulting depictions are highlighted to show the hit atoms.
5. Configuring Merlin
There are several steps that can be taken to configure Merlin so that
databases are loaded and opened automatically and displayed to the
user in a convenient and preferred way. These steps can be divided
into two categories, Merlinserver configuration issues, which are
not covered in this manual, and xvmerlin-user configuration issues,
which are. For clarity, these are some Merlinserver configuration
issues not covered here but important to the overall configuration:
- Starting the Merlinserver and Thorserver (maybe automatically)
- Loading specified databases
- Loading specified datatypes and datafields
- Server and database security
In the user's environment, there are a few Daylight options which
affect the Merlin environment. Foremost among these is
MERLIN_SERVER_LIST, which should be set to a comma-separated
list of Merlin servers. This can be done in the user's Daylight
profile file, or by environment variable, i.e., in csh:
setenv DY_MERLIN_SERVER_LIST "challenge,corona,cojones:merlin:thor"
Saving and restoring XVMerlin's state
After opening databases and defining columns with xvmerlin, you
may save the state of the program so that the same configuration
is restored when restarted. To accomplish this, follow these
steps:
- Set up xvmerlin as desired.
- Quit using the File->Quit command. Press the "save
state" button:
- Provide a password for reopening xvmerlin when prompted.
- The previous actions will result in an xvmerlin configuration
file being saved in $HOME/.dy_merlinprofile.opt. However,
to invoke this file it must be included in the user's Daylight
profile file, by default $HOME/dy_profile.opt or respecified
by environment variable DY_PROFILE. Add the following line to
your profile file:
#include $HOME/.dy_merlinprofile.opt
Now restart xvmerlin. You should be prompted for a password,
and the original saved configuration should be restored.
|