Daylight v4.9 Release Date: 1 February 2008 Namemcl - Merlin Control LanguageDescription1. INTRODUCTIONMCL is an English language interface to Merlin. In general: o MCL is intended to be readable and writable by both machines and humans. 2. CONVENTIONS USED IN THIS DOCUMENT The following section includes prototypes and brief explanations for all MCL statements, using these conventions. text ..... user text prototype; quotes aren't needed if text doesn't collide with reserved and contains only letters and digits. The following prototypes are used in this document: "column" .......... column's name (from previous Create column...) User entry conventions: +---------------+---------------------------+ | this entry | is the same as | +===============+===========================+ | "column" | [ column ] "column" | +---------------+---------------------------+ | ["hitlist"] | [ [ hitlist ] "hitlist" ] | +---------------+---------------------------+ | alpha | alpha "alpha" | +---------------+---------------------------+ | beta | beta "beta" | +===============+===========================+Assume this has been done: Set default hitlist "defhits". +-------------------------------+------------------------------------+ | this statement | is equivalent to full statement | +===============================+====================================+ | Remove "CLUSTER" repeats. | Remove column "CLUSTER" repeats. | +-------------------------------+------------------------------------+ | Reset "defhits". | Reset hitlist "defhits". | +-------------------------------+------------------------------------+ | Reset. | Reset hitlist "defhits". | +-------------------------------+------------------------------------+ | Move to row 10 of "defhits". | Move to row 10 of "defhits". | +-------------------------------+------------------------------------+ | Move to row 10. | Move to row 10 of "defhits". | +-------------------------------+------------------------------------+ | ... Tversky 0.9 0.1 ... | ... Tversky alpha 0.9 beta 0.1 ... | +===============================+====================================+
3. RESERVED WORDS The following (case-sensitive) words are reserved in MCL: above default least Print structure(s) Add depiction less Put alpha Display lines range substring ares down list Read substructure(s) as entitled matching regexp at Exchange missing Remove superstructure(s) available field repeats table below file molecule(s) Reset tautomer(s) beta font Move Reverse text by Free named row than Clear from nativeorder Select to column function next Set Tversky containing graph(s) non similar tversky Copy hitlist not similarity up Create in of Sort value(s) database into pattern status with datatype Invert per string(s) Write The following words (sort flags) are also reserved in MCL: /AAZ /ANC /ANCP /ANCPW /ANCW /ANP /ANPW /ANW /ASC /CAS /LEN /MF /NAB /NUM The following pairs of reserved words are synonyms: graph ........... graphs molecule ........ molecules string .......... strings structure ....... structures substructure .... substructures superstructure .. superstructures tautomer ........ tautomers value ........... values Additionally, the symbols $1 - $9 are reserved as formal parameters.
4. THE MCL ENVIRONMENT MCL is designed to run in at least two different environments: internally in an interactive program such as xvmerlin, and as a non-interactive "batch" process. Actually, in both cases MCL runs in a non-interactive way, i.e., a whole program (or at least a whole statement) is expected to be delivered to the MCL processor. MCL features which are environment- specific are described here. A number of MCL output-control statements are provided for use in an xvmerlin-like environment. These statements are ignored in an ASCII-oriented batch environment. Examples of such commands are: Set font "Times-14". Display smiles as depiction.In an interactive environment like xvmerlin, user input is typically transcribed to the MCL program which is then run. Although this is possible in a batch environment, it is often more convenient to use a fixed MCL program with data supplied externally, e.g., via command line arguments in the `mcl' program. The symbols $1 - $9 are used to refer to formal parameters to the MCL program. Each symbol represents an externally supplied string which is used as an MCL language token. If a parameter is referenced in MCL but not supplied by the environment, the MCL program generates an error and quits. A few (very few) other MCL statements operate differently in different environments. The most important example is "Select database...". In an interactive environment such as xvmerlin, the database to be selected is expected to be already open; if not, the statement fails and a warning is issued. In a batch environment, "Select database...". connects to servers and opens specified database(s); the statement only fails if a database fails to open for some reason. 5. MCL OBJECTS The most fundamental MCL object is the database, selected with the "Select database..." statement. All MCL statements apply to the currently selected database. To be useful, each database must have one or more named columns, which represent a kind of data (actually, one field of a datatype) and a function (e.g., the FIRST, LONGEST, AVERAGE, COUNT, etc. of that kind of data). To be useful, each database must also have one or more named hitlists, which represent an ordered set of entries in the database. Positions in a hitlist are referred to as "rows". On initial creation, hitlists represent all entries in the databases, i.e., there are the same number of rows in a hitlist as there are entries in its database, and the initial order is defined as the "native order" of the database. Hitlists have a "current position" property. Although this is referred to by its numeric (1-origin) position in the hitlist, the "current position" is defined by the entry in that position, and is stable with respect to operations which change the hitlist order. In other words, if the entry at the "current position" before an operation is present after the operation, it will still be current. 6. DEFAULTS AND STYLE To produce readable MCL code, it is wise to select names carefully. In our examples we give columns short names consisting capital letters (which tends to make them stand out) and hitlist names that end with the word "hits", e.g., "hits" or "curhits" or "savehits". A number of conventions are used in our examples for clarity: hitlists are usually specified explicitly, user-supplied names are shown quoted, and each statement is shown on its own line, e.g.: Reset "hits". Sort "hits" by "COST". Sort "hits" by "CLUSTER". Remove "CLUSTER" repeats in "hits". In general, columns and hitlists may be specified using only their names. If it seems clearer to do so, the keywords "column" or "hitlist" may precede column and hitlist names, although it is never required. The following MCL code produces the same program as that above: Reset hitlist "hits". Sort hitlist "hits" by column "COST". Sort hitlist "hits" by column "CLUSTER". Remove column "CLUSTER" repeats in hitlist "hits".However, it is perfectly acceptable to omit quotes and references to the default hitlist as long as the meaning remains clear, e.g., the following MCL code also means the same thing as that shown above: Reset. Sort by COST. Sort by CLUSTER. Remove CLUSTER repeats. Tautomer and graph searches are unusual because they are done by the Thor server directly from SMILES (all other searches are done by the Merlin server on data in columns). In these searches (only) the search is always done with data from a SMILES ($SMI) column; MCL syntax gives you no opportunity to move to the next tautomer in an existing hitlist or specify which column is to be searched, e.g.: Put tautomers of "O=c1[nH]c2cncnc2[nH]1" into "hits". 7. SEARCH LOGIC Merlin provides 10 different types of searches: o find superstructure of a given molecule (replacing H's) o find molecules which match a given SMARTS pattern o find substructures with a given embedded molecule o find tautomers of a given molecule o find molecules with same oxidation-suppressed graph as the given molecule o find structures which are similar to a given molecule o find strings which are contain an given embedded substring o find strings which match a regular expression exactly o find strings which match a regular expression approximately o find values in a given range Each type of search may be invoked to produce in seven different actions: o create a new hitlist of matching entries o add matching entries to a hitlist o add non-matching entries to a hitlist o delete matching entries from a hitlist o delete non-matching entries from a hitlist o find matching entry in a hitlist o find non-matching entry in a hitlist The English-like MCL language allows these (70) different searches to be specified quite naturally, e.g., to create a hitlist of dopamine superstructures: Put SMILES superstructures of "NCCc1ccc(O)c(O)c1" into "hitlist". Note that there is no way to create a hitlist of non-matches using just one statement. Use two statements to do this, e.g., Put SMILES superstructures of "NCCc1ccc(O)c(O)c1" into "hitlist". Invert "hitlist". or use the reverse logic: Clear "hitlist". Add SMILES non superstructures of "NCCc1ccc(O)c(O)c1" to "hitlist". The words "non" or "not" are used as a word to reverse the meaning of a match. The one exception is similarity search where the following two statements refer to complementary sets of structures: Remove SMILES structures at least 0.9 similar to "NCCc1cc(O)c(O)c1". Remove SMILES structures less than 0.9 similar to "NCCc1cc(O)c(O)c1". In an attempt to clarify the effect of a search operation, MCL syntax requires that the preposition must match the verb, e.g.: Put ....... into "hitlist". Add ....... to "hitlist". Remove .... from "hitlist". Move to ... in "hitlist". "Move to ..." searches cause the current position to be moved to the next entry which meets the given requirements, i.e., the next position in the current hitlist, starting with the current position. For example, this MCL code produces a hitlist of entries with known pKa's, sorted by pKa: Reset "hits". Remove missing "pKa" from "hits". Sort "hits" by "pKa". If we move to the first entry in this list which meets another criterion, e.g., a given pharmacological activity, we are assured that we are pointing to the one which has the lowest-valued pKa: Move to row 1. Move to "ACTIVITY" string containing "NARCOTIC" in "hits". We can remove entries with lower pKa values (row 0 is current position): Remove above row 0 of "hits". And then do the reverse: Reverse "hits" Move to row 1. Move to "ACTIVITY" string containing "NARCOTIC" in "hits". Remove above row 0 of "hits". The resulting hitlist now contains "all entries with pKa's in the range of pKa's of known narcotics". One might repeat this type of search for other properties (LogP, CMR, etc.) to select structures which meet an observed physiochemical profile (e.g., of known narcotics). All Merlin (and thus MCL) sorts are stable, i.e., after a sort, the order of entries with equal value is unchanged. For example, the following MCL code produces a hitlist containing only the lowest-cost member of each cluster. Reset "hits". Sort "hits" by "COST". Sort "hits" by "CLUSTER". Remove "CLUSTER" repeats in "hits". 8. MCL STATEMENTS Select database "database" [ "thorbase" ]. Selects a database by name in the form: base@host:service:user "database" is used for Merlin access; "thorbase" for Thor access. If "thorbase" is not specified, "database" is used with the service name "thor". Only the "base" part of the name is needed, e.g. "wdi93". Default values are the local machine name for "host"; "merlin" or "thor" for "service"; and the user's login name for "user". Create column of datatype "tag" [field "field"] [function <func>] named "name". Creates a named column. "tag" is an internal tag, e.g. "$CAS".
Free column "column".
Free object by name in current context (database). Freeing a database automatically frees its columns and hitlists. Set default hitlist "hitlist". Make named hitlist default -- if the hitlist is not specified in MCL statements where it is optional to do so, this hitlist will be used. The first hitlist created for each database is normally the default and need not be set explicitly. It is important to set the default hitlist only if you want to use a different one or if you deallocate an existing default hitlist (with Free hitlist...). Reset ["hitlist"]. Reset the named (or default) hitlist such that all rows are hit, i.e., same as xvmerlin's "Set all hit" menu item. Clear ["hitlist"]. Clear the named (or default) hitlist such that no rows are hit, i.e. same as the sequence:Reset "hitlist". Invert "hitlist". Invert ["hitlist"]. Invert the named (or default) hitlist, i.e., make non-hits hits and vise-versa. Reverse ["hitlist"]. Reverse the order of the named (or default) hitlist. Copy "hitlist" to "hitlist2". Copy the contents of one extant hitlist to another. This can be used for backup (e.g., Copy curhits to bkphits.) for later operations or such as restoration (e.g., Copy bkphits to curhits.) Add "hitlist" to "hitlist2". Add hits in the first hitlist to those in the second. E.g., the statement:Add bkphits to curhits.replaces curhits with the union of curhits and bkphits. Select "hitlist" in "hitlist2". Remove hits in the second hitlist which are not in the first. E.g.,Select bkphits in curhits.replaces curhits with the intersection of curhits and bkphits. Exchange "hitlist" with "hitlist2". Exchange the contents of the two hitlists, e.g., the implementation of the "undo" facility in xvmerlin is equivalent to:Exchange undohits with curhits. Move to row "integer" [of "hitlist"]. Set the current hitlist position, where "integer" is interpreted as:unsigned number ... absolute row number (1 is first row) zero .............. current row signed number ..... number relative to current rowMoves to appropriate extreme value (first or last row) if out-of-range, e.g., Move to row 9999999... will move to the last row of most databases.
Remove row "integer" [of "hitlist"].
Removes row(s) from hitlist. Remove "column" repeats [in "hitlist"]. Removes rows from hitlist with values in given column which are identical to the value in the previous row. Typically used to select the first row of each value in a sorted list. Remove missing "column" [in "hitlist"]. Removes rows from hitlist for which data is not available in given column.
Put "column" superstructures of "smiles" [into "hitlist"].
The classic "substructure search". The "Put" form replaces the given hitlist; "Add" and "Remove" modify it; "Move" sets the current position but leaves the hitlist unchanged. $SMI or ISM columns are typically used.
Put "column" structures matching "smarts" [into "hitlist"].
Search for a SMARTS pattern. "Put" form replaces the given hitlist; "Add" and "Remove" modify it; "Move" resets the current position but leaves the hitlist unchanged. $SMI or ISM columns are typically used.
Put "column" substructures of "smiles" [into "hitlist"].
Converse of the classic "substructure" search, this one looks for structures embedded in the given molecule. The "Put" form replaces the given hitlist; "Add" and "Remove" modify it; "Move" resets the current position but leaves the hitlist unchanged. $SMI or ISM columns are typically used for "column".
Put "column" structures <op> "tan" similar to "smiles" [into "hitlist"].
Select (or find) structures based on similarity to a given molecule. "tan" is a number indicating Tanimoto similarity for qualification where 1.0 is perfect similarity (identity). These values are typically used:0.90 ...... very highly similar 0.75 ...... highly similar 0.60 ...... moderately similar 0.50 ...... roughly similar 0.00 ...... select all structures$SMI or ISM columns are typically used for "column", e.g.,Add SMILES structures at least 0.9 similar to "NCCc1ccc(O)c(O)c1".Note: a column of type SIMILARITY must exist to do these searches.
Put "column" structures <op> "value" Tversky alpha beta to "smiles"
[into "hitlist"].
Select (or find) structures based on Tversky similarity to a given molecule using given alpha/beta parameters. Alpha and beta are typically in the range 0.0 - 1.0. "value" is a number indicating Tversky similarity; the meaning depends on specific alpha/beta settings but higher values are typically used than with Tanimoto similarity, e.g.,0.95 ...... very highly similar 0.90 ...... highly similar 0.85 ...... moderately similar 0.80 ...... roughly similar 0.00 ...... select all structures$SMI or ISM columns are typically used for "column", e.g.,Add SMILES structures at least 0.9 similar Tversky alpha 0.9 beta 0.1 to "NCCc1ccc(O)c(O)c1".Note: a column of type SIMILARITY must exist to do these searches.
Put <tautomers | graphs> of "smiles" [into "hitlist"].
Select tautomers or graphs of a given molecule. A "graph" match discounts oxidation state completely, i.e., the molecules' heavy atoms are connected in the same way (ignoring hydrogens, bond orders, and charges). A "tautomer" match requires a graph match and the net charge and hydrogen count much match, i.e., molecules which differ only in positions of H's (protons) and electrons. (Labile hydrogens are not distinguished from non-labile ones.)
Put "column" strings containing[/cmp] "string"
[into "hitlist"].
/ASC .. ASCII (default) /ANCW ... ignore case and w/s /ANC .. ignore case /ANCP ... ignore case or punct /ANW .. ignore whitespace /ANPW ... ignore punct and w/s /ANP .. ignore punctuation /ANCPW .. ignore case, punct, and w/s Select (or find) strings containing a given substring. Various character classes can be ignored by appending an option to the keyword "containing". For instance, the statement:Put "NAME" strings containing/ANCPW "METHYLPYRROLE" into "curhits".
Put "column" strings matching "regexp" [into "hitlist"].
Select (or find) strings matching a regular expression.
Put "column" values in range "min" to "max"
[into "hitlist"].
Select (or find) values by range specification.
Sort[/cmp] "hitlist" by "column". ... where cmp is one of: /ASC .... ASCII /ANCPW .. ignore case, punct, w/s /ANC .... ignore case /AAZ .... letters only, ignore case /NUM .... numeric /ANP .... ignore punctuation /NAB .... absolute numeric /ANW .... ignore whitespace /CAS .... CAS Number compare /ANCP ... ignore case and punct /ANCW ... ignore case, w/s /MF ..... molecular formula /ANPW ... ignore punct, w/s /LEN .... length of string Sort hitlist by value of given column. If cmp is not specified, the "normal" comparison type for the column's datatype will be used.
Sort ["hitlist"] by nativeorder. Rearrange the given hitlist to original pool order.
Sort ["hitlist"] by similarity ["column"] [to "smiles"]. Sort hitlist by Tanimoto similarity to given "smiles", saving similarity values in similarity column "column", if specified. If "column" is omitted, the default similarity column will be used. If "smiles" is omitted, existing values in the column will be used.
Set font "fontname".
Special functions for interactive environments such as xvmerlin. "fontname" must as specified in a FONT_ options (or "default"). "lpr" (lines per row) must be in the range 1-8. Print status. Print the status of the current environment: name the database, name all columns and show their datatypes, name all hitlists and show their length, indicate which hitlist is default and report the current hitlist position. Print "string" ["string" ...] Print string literally, followed by newline. If more than one string is supplied they are concatenated then output. Special characters such as newline and bell are copied to output literally. There is no way to print the NULL character to output. Print <list | table> [of "hitlist"] from row "integer" to row "integer" [ containing "column" ["column" ...] ] [ entitled "string" ["string" ...] ] ... where integer row values are interpreted as: unsigned number ... absolute row number (1 is first row) zero .............. current row signed number ..... number relative to current row If the `containing ...' phrase is specified, only those columns will be printed; if omitted, all columns will be printed.
Related Topicsmcl(1) xvmerlin(1) merlinserver(1) |