Rubicon Manual

Daylight Version 4.9
Release Date 08/01/11

David Weininger
Daylight Chemical Information Systems, Inc.

Copyright notice

This document, the programs rubicon and autorules are copyrighted 1992-2011 by Daylight Chemical Information Systems, Inc. of Laguna Niguel, CA. Daylight explicitly grants permission to reproduce this document under the condition that it is reproduced in its entirety, including this notice. All other rights are reserved.

Appedix 1: References

Appendix 2: Notes about Rubicon
1. Introduction "Rubicon" stands for "Rule-Based Invention of Conformations". Rubicon is a distance-geometry method which produces 3-D conformations given chemical structures with only connectivity specified. Distance-geometry methods randomly sample conformations and are particularly powerful for problems dealing with molecular matching and flexibility. The method has three basic parts: establishing geometric constraints in distance-space, sampling a conformation in that space, and embedding it in 3-dimensions while minimizing bounds violations.

One of the fundamental difficulties with most distance geometry programs is that they have a very naive view of chemical geometry. Also, the chemical intelligence in typical distance-geometry programs is hard coded into the program and is very difficult to improve upon. Rubicon addresses this problem by employing a "soft", or rule-based method for establishing geometric constraints which is based on a powerful language for describing chemical patterns (SMARTS).

Rubicon's chemical intelligence is embodied by its rule set, which is specified at run time. Various rule sets can be devised for solving a wide variety of 3-dimensional chemical problems. The simplest approach is probably to employ a naive rule set. With appropriate rule sets, Rubicon will exactly mimic most existing distance geometry algorithms. At the other extreme, very sophisticated rule sets can be devised to predict molecular geometries. In this case, Rubicon behaves like a non-distance-geometry rule-based model builders, with the advantage that unspecified geometries are sampled (and also that additional knowledge may be added as needed without rebuilding the program).

The autorules program (supplied with Rubicon) automatically derives a set of constraint rules from a given set of training conformations. Using autorules-generated rule sets, Rubicon will generate structures that have geometries similar to those found in the the training set, e.g. crystal structures, docked structures, computed low energy structures, your favorite structures, etc.

Using distance geometry to sample from all energetically reasonable conformations is a powerful idea, but it's not the only useful one. One of the most powerful uses of distance geometry is to sample in a very biased way, e.g. to only sample conformations of interest which match a desired pharmacophore or fit an enzyme's binding site.

To allow maximum flexibility, Rubicon is supplied in two forms: a ready-to-run program (rubicon) and a programming library (libdc_rube.a). The rubicon program is very robust and easy to use, but is limited to sampling conformations based on information from a rule set (e.g. only constraint errors are minimized). The libdc_rube.a programming library provides the Rubicon algorithm as a tool to be used within other programs. This allows sampling conformations based on other criteria, such as docking.

2. Rubicon Program

Rubicon is a non-interactive program which requires structures in SMILES (.tdt format) and a rule file (default file is $DY_WORK/data/rubicon.rules). Output is in .tdt or .pdb format. Given one or more input structures in .tdt format and a desired rule file, Rubicon is a very straightforward program to run:

rubicon [options] < input.tdt > output.tdt

By default, Rubicon produces one conformation ($D3D data item) per SMILES-rooted datatree on the input. If you have SMILES in .smi format, you will need to edit it a bit, e.g. the line "CCO ethanol" needs to be changed to: "$SMI<CCO>PCN<ethanol>|" or just "$SMI<CCO>|" if the name isn't important. The shell script $DY_ROOT/bin/smi2tdt, is provided to make this transformation.

2.1 Rubicon help

Rubicon provides a fairly large number of options. Some of these help you deal with options -
a summary of options can be printed with the "-options" option:

$ rubicon -options

    NOTE: ========================================================== (rubicon)
    NOTE: == RUBICON ================= 1992-2011 (c) DAYLIGHT CIS == (rubicon)
    NOTE: ========================================================== (rubicon)
    NOTE:                                                            (rubicon)
    NOTE: Rubicon program options are:                               (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -HELP or -help                                           (rubicon)
    NOTE:      Show syntax and help instructions                     (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -OPTIONS or -options                                     (rubicon)
    NOTE:      Show options summary (this info)                      (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -SETTINGS or -settings                                   (rubicon)
    NOTE:      Show options settings in effect                       (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_ACCEPT_GRMS                                  (rubicon)
    NOTE:      Acceptable RMS of gradient vector for convergence     (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_ACCEPT_MXDV                                  (rubicon)
    NOTE:      Acceptable maximum distance violation, Angstroms      (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_ACCEPT_MXVV                                  (rubicon)
    NOTE:      Acceptable maximum volume violation, cubic Angstroms  (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_ACCURACY                                      (rubicon)
    NOTE:      Effective machine accuracy                            (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -[NO]RUBE_BUMP14                                         (rubicon)
    NOTE:      (Don't) apply VDW bumping to nonbonded 1-4 distances  (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_DEBUG                                        (rubicon)
    NOTE:      QUIET ..... no output on standard error               (rubicon)
    NOTE:      TERSE ..... only show progress                        (rubicon)
    NOTE:      ERRORS .... also show error messages                  (rubicon)
    NOTE:      WARNINGS .. also show warning messages                (rubicon)
    NOTE:      NOTES ..... also show notes                           (rubicon)
    NOTE:      DEBUG ..... also show debugging output                (rubicon)
    NOTE:      VERBOSE ... also show all debugging output            (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -[NO]RUBE_FILTER_CHIRAL                                  (rubicon)
    NOTE:      (Don't) enforce chirality in conformation filter.     (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_FILTER_SLOP                             (rubicon)
    NOTE:      Minimum distance between equivalent atoms to consider (rubicon)
    NOTE:      conformations non-identical, Angstroms (off if 0.0).  (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_FILTER_SMARTS                    (rubicon)
    NOTE:      SMARTS used for conformation filter atom matching.    (rubicon)
    NOTE:      If "USMILES", unique SMILES is used as target.        (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_HYDROGENS                           (rubicon)
    NOTE:      ALL ....... include all hydrogens                     (rubicon)
    NOTE:      SOME ...... include specified hydrogens               (rubicon)
    NOTE:      NONE ...... suppress all normal hydrogens             (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_LIMITEVAL                                 (rubicon)
    NOTE:      Limit function evaluations allowed per minimization   (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_NCONFS                                     (rubicon)
    NOTE:      Number of conformations generated for each molecule   (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_OUTPUT_FORMAT                             (rubicon)
    NOTE:      TDT ....... Write output in TDT format                (rubicon)
    NOTE:      PDB ....... write output in PDB format                (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_RULES                                   (rubicon)
    NOTE:      Name(s) of Rubicon rule file(s)                       (rubicon)
    NOTE:      Rubicon rule file names typically end .rules          (rubicon)
    NOTE:      Quote if multiple, e.g. -RUBE_RULES "a.rules b.rules" (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_RUNID                                  (rubicon)
    NOTE:      Add $D3DG generation item with runid to TDT output.   (rubicon)
    NOTE:      If set to "NONE", $D3DG item will not be output.      (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_SEED                                         (rubicon)
    NOTE:      Random number seed in range 0 to 900000000            (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -RUBE_TRIALS                                       (rubicon)
    NOTE:      Number of trials per conformation                     (rubicon)
    NOTE:                                                            (rubicon)
    NOTE:   -[NO]RUBE_WRITE_BOUNDS                                   (rubicon)
    NOTE:      Add smoothed distance bounds matrix and list of       (rubicon)
    NOTE:      volume bounds to output (TDT output only).            (rubicon)
    NOTE:                                                            (rubicon)
    NOTE: ========================================================== (rubicon)
    NOTE: == RUBICON ================= 1992-2011 (c) DAYLIGHT CIS == (rubicon)
    NOTE: ========================================================== (rubicon)
    NOTE: -options request, exiting (rubicon)

Commonly used options can be set in your environment (prefix option name with "DY_" or in your profile (typically $HOME/dy_profile.opt). For a complete discussion of options and environment variables, see the Daylight Systems Administration Manual.

To see the current option settings (including the effect of any command line arguments) enter

rubicon -settings

which will produce something like:

$ rubicon -settings
    NOTE: ========================================================== (rubicon)
    NOTE: == RUBICON ================= 1992-2011 (c) DAYLIGHT CIS == (rubicon)
    NOTE: ========================================================== (rubicon)
    NOTE:                                                            (rubicon)
    NOTE: Current values of Rubicon-specific options are:            (rubicon)
    NOTE:                                                            (rubicon)
    NOTE: RUBE_ACCEPT_GRMS .. 0.01                                   (rubicon)
    NOTE: RUBE_ACCEPT_MXDV .. 0.5                                    (rubicon)
    NOTE: RUBE_ACCEPT_MXVV .. 0.5                                    (rubicon)
    NOTE: RUBE_ACCURACY ..... 1e-20                                  (rubicon)
    NOTE: RUBE_BUMP14 ....... FALSE                                  (rubicon)
    NOTE: RUBE_DEBUG ........ QUIET                                  (rubicon)
    NOTE: RUBE_FILTER_CHIRAL  TRUE                                   (rubicon)
    NOTE: RUBE_FILTER_SLOP .. 0                                      (rubicon)
    NOTE: RUBE_FILTER_SMARTS  USMILES                                (rubicon)
    NOTE: RUBE_HYDROGENS .... ALL                                    (rubicon)
    NOTE: RUBE_LIMITEVAL .... 1000                                   (rubicon)
    NOTE: RUBE_NCONFS ....... 1                                      (rubicon)
    NOTE: RUBE_OUTPUT_FORMAT  TDT                                    (rubicon)
    NOTE: RUBE_RULES ........ $DY_ROOT/data/rubicon.rules            (rubicon)
    NOTE: RUBE_RUNID ........ NONE                                   (rubicon)
    NOTE: RUBE_SEED ......... 281191802                              (rubicon)
    NOTE: RUBE_TRIALS ....... 1                                      (rubicon)
    NOTE: RUBE_WRITE_BOUNDS . FALSE                                  (rubicon)
    NOTE:                                                            (rubicon)
    NOTE: ========================================================== (rubicon)
    NOTE: == RUBICON ================= 1992-2011 (c) DAYLIGHT CIS == (rubicon)
    NOTE: ========================================================== (rubicon)
    NOTE: -settings request, exiting (rubicon)

2.2 Rubicon program operation options

RUBE_RULES (default $DY_ROOT/data/rubicon.rules)

Specify the desired Rubicon rule file with -RUBE_RULES.

RUBE_TRIALS (default 1) RUBE_NCONFS (default 1)

Rubicon attempts to produce RUBE_NCONFS acceptable conformations by trying RUBE_TRIALS random samplings. Those which meet the acceptance criteria (see RUBE_ACCEPT_* options) are passed through the conformation filter (see RUBE_FILTER_* options) for output. The default values cause Rubicon to try just once.

RUBE_ACCEPT_GRMS (default 0.01) RUBE_ACCEPT_MXDV (default 0.50 Ångstroms) RUBE_ACCEPT_MXVV (default 0.50 cubic Ångstroms)

Consider conformations to be acceptable if they converge to gradient root-mean-square RUBE_ACCEPT_GRMS, maximum distance violation RUBE_ACCEPT_MXDV, and maximum volume violation RUBE_ACCEPT_MXVV. The default values are somewhat generous for very simple structures and somehat strict for peptides.

RUBE_FILTER_SLOP (default 0 Ångstroms) RUBE_FILTER_CHIRAL (default TRUE) RUBE_FILTER_SMARTS (default "USMILES")

Conformations are passed through a filter which suppresses output of identical conformations by comparing their distance matricies. Conformations which have an isomorph with all interatomic distances within RUBE_FILTER_SLOP Ångstroms are considered to be identical. Isomorphs (atom-atom matchings) are determined by RUBE_FILTER_SMARTS - the flag USMILES (the default value) uses the unique SMILES (i.e., all heavy atoms). This can be set to any valid SMARTS, e.g. "a" will compare only distances between aromatic atoms and "!#6!#1" will compare only heteroatoms. If RUBE_FILTER_SMARTS is TRUE, signed chiral volumes are also compared so enantiomers will be considered non-identical. This filter is disabled by default (RUBE_FILTER_SLOP is 0), so all acceptable conformations are output.

RUBE_HYDROGENS (default ALL)

NONE is much faster
ALL produces slightly better structures
SOME isn't very useful from this program.

RUBE_OUTPUT_FORMAT (default TDT)

The only alternative to TDT is the venerable PDB format, which lots of programs read and write in different flavors. Rubicon's PDB output is pretty simple, and intended for output of small molecules:

REMARK Several lines of them (contain SMILES, name, source, errors)

ATOM Atom names are upper case atomic symbols (e.g. CA is calcium, you have a problem with that?) followed by ordinal per-element count up to "99" then "**". Residue names are all "RES"; residue numbers all 1.

TER One TER record output between ATOM and CONECT records

CONECT One per atom, bonds listed both ways, double bonds are double on the line, triple bonds three times, "to" atoms are not sorted on the line.

END Separates conformations

If this flavor of PDB is not suitable for your purposes, consider writing TDT output and converting it to PDB format with the program tdt2pdb (a a program with contributed source code which can be modified as desired).

RUBE_SEED (default 281191802)

Rubicon uses a pseudo-random number generator which provides identical behavior on all platforms (RANMAR, Marsaglia and Zaman, 1987). RUBE_SEED sets the seed for this number generator, which must be in the range 0 to 900000000. Given otherwise identical input, Rubicon should produce identical results for a given seed on all supported platforms.

NOTE: To induce Rubicon to produce pseudorandom trials at each invocation, specify a different value for RUBE_SEED each time it is called, e.g. seconds since midnight, i.e.:

rubicon -RUBE_SEED `date +%H%M%S`

RUBE_RUNID (default NONE)

A "runid" is a string which is appended to the output source field. This is particularly useful if you are making multiple Rubicon runs on a given structure with varying parameters and loading them in a single database.

2.3 Rubicon algorithm operation options

You will probably not need to modify options pertaining to the Rubicon method and mimizers (see the "Rubicon library" section below for more detail on the method). This set of options has been simplified since the first release (v4.31) of Rubicon.

RUBE_ACCURACY (default 1e-20)

This value works well on all machines for which v4.3x software is distributed. This option is insurance against the possibility that exotic, yet compatible, computer hardware might appear.

RUBE_BUMP14 (default is TRUE)

If set TRUE, van der Waals interactions are applied to acyclic 1-4 distances, restricting minimum torsions, which is usually a good thing to do.

RUBE_LIMITEVAL (default is 1000)

Rubicon's minimizer typically converges with 50-300 function evaluations when things are going well. If you are working with really tough structures which fail to converge a lot (either the structures or the rule file would need to be pretty weird), you might try to gain speed by running a larger number of trials with a lower evaluation limit and just save the ones that converge.

2.4 Rubicon options you probably don't want to know about

RUBE_DEBUG (default QUIET)

Mainly for debugging, this option controls output of non-essential text to standard error. The amount of output generated at the "VERBOSE" level is truly staggering (all N-square bounds matrices are dumped multiple times).

RUBE_WRITE_BOUNDS (default FALSE)

When set TRUE, the complete smoothed distance bounds matrix is added to the output as a DBM data item and the list of chiral and unsigned volumes as a VCR data item. This option is available only with TDT output format. The bounds matrix is N-squared, so use it sparingly on large molecules, particularly if hydrogens are included. If you would like to use Rubicon's front-end with your own distance geometry package (i.e. output the DBM only), try to organize it as a filter.

The DBM ("Distance Bounds Matrix") datatype consists of two fields, the bounds matrix and the name of the data source (i.e. "Rubicon 4.34"). The bounds matrix field contains the full bounds matrix after smoothing, as comma-delimited numbers, with atoms indexed in SMILES order, and with lower bounds in the lower triangle.

The VCR (Volume Constraint Rules) datatype also consists of two fields, the volume constraints which apply to the molecule and the name of the data source (i.e. "Rubicon 4.34"). The volume constraint field contains six comma-delimited numbers for each constraint: the index of four atoms defining the volume followed by the upper and lower volume bounds (in cubic angstroms). Chiral (signed) volume constraints are indicated by signed bounds (e.g., "+0.0,+5.0"); unsigned volume constraints are indicated by unsigned bounds (e.g. "0.0,5.0"). No VCR data item is output if no volume constraints apply.

3. Rubicon Rules

Rubicon constraint rules are in this general form:

RULENAME smarts (atomlist) bounds

For instance, the following rule sets the bond length bounds for bonds between all aromatic carbon and aromatic oxygen atoms to the range (1.324 - 1.402) angstroms, inclusive:

DISTANCE c:o (1,2) 1.324 1.402

Rubicon rule sets do not use conditional logic in any way - a rule is asserted to be true in every environment which it matches. This allows knowledge (rule sets) to be combined from various sources without needing to express them within a potentially restrictive logical framework; it also implies that results are not affected by the order in which they are applied. (For example, the rules illustrated here were taken from publications based on CSDB.) Rules may be expressed at various degrees of generality, e.g. the above rule is more general (applies to all aromatic carbon-oxygen bonds) than the first of the following rules, which apply to furan substructures:

DISTANCE o1cccc1 (1,2) 1.338 1.398 DISTANCE o1cccc1 (2,3) 1.299 1.383 DISTANCE o1cccc1 (3,4) 1.392 1.456

The most restrictive bound that matches applies. Upper and lower bounds are applied independently. Note that the atom list (3rd field) indicates to which SMARTS atoms the bounds apply, e.g., in the last rule above, the C3-C4 furan carbon bond length bounds are (1.392-1.456), which differs entirely from the range of the furan C2-C3 bond length.

3.1 Rubicon rule file syntax other than for rules

The general format of a Rubicon rule file and commands other than constraint rules are discussed in this section.

COMMENTS. Exclamation point (!) is used for comments: `!' and all characters following on the line are ignored.

WHITESPACE. Tokens ("words") in a Rubicon rule file are delimited by one or more whitespace characters (space, tab, newline).

COMMANDS. Command names are case-insensitive reserved words which must begin at the start of a line. They are:

DEFINE PRAGMA RADIUS DISTANCE ANGLE TORSION VOLUME BOUNDS PAIRBOUNDS TRIPLEBOUNDS QUADBOUNDS BRANCHBOUNDS RINGBOUNDS

PRAGMA. The PRAGMA command is used to provide information to the Rubicon processor which does not affect the interpretation of the constraint rules. (Rubicon processors are free to ignore PRAGMAs; they're like hints.) PRAGMA commands consist of the command PRAGMA followed by a name and a value. Two PRAGMAs are used by the 4.3 Rubicon processor: HYDROGENS (values are HCOMPLETE and HSUPPRESSED) and AUTORULES (values indicate which rule classes are automatically generated). For instance, consider the line:

PRAGMA HYDROGENS HCOMPLETE

This tells the Rubicon that this rule set is intended to be used with hydrogen-complete molecules (H-complete and H-suppressed van der Waals radii differ). Rubicon v4.3 will generate a warning if this rule file is used on a hydrogen-suppressed molecule.

DEFINE. The DEFINE command creates a SMARTS vector definition which can be used as a primitive in SMARTS later in the file. The format is:

DEFINE name smarts

Use of SMARTS vector definitions can greatly improve the readability and of a Rubicon rule file and make it easier to maintain. This is illustrated by this excerpt:

DEFINE $Namide [NX3]C=* ! amide and other semi-conjugated N DEFINE $Nitro [NX3](=*)=* ! nitro nitrogen DEFINE $Nplanar [$Namide,$Nitro] ! planar nitrogen DEFINE $N34 [NX3;!$Nplanar] ! 3-connected, tetrahedral N .... ! ! *=C-[N,n] -- note peptide bond is 1.33 ! DISTANCE [CX3]-[$N34] (1,2) 1.41 1.41 DISTANCE [CX3]-[$Nplanar] (1,2) 1.33 1.35 DISTANCE [CX3]-[$Npeptide] (1,2) 1.33 1.33 DISTANCE [CX3]-n (1,2) 1.34 1.34 DISTANCE [CX2]-[NX2] (1,2) 1.49 1.49

That's it! Everything else in a Rubicon rule file is a constraint rule.

3.2 Rubicon single-constraint rules

There are five rules which specify single constraints: RADIUS, DISTANCE, ANGLE, TORSION, and VOLUME. Each rule provides two constraints (upper and lower bound) except for RADIUS (upper radius bounds are ignored). The chief difference between these rules is how many points are specified RADIUS (1), DISTANCE (2), ANGLE (3), TORSION (4), and VOLUME (4). The VOLUME rule is special in two ways: the units are in Ångstroms-cubed, and if all four points are connected to a common atom with specified chirality, the rule is interpreted as a signed volume (using right-hand rule).

The bounds specification in a single constraint rule is a single pair of numbers representing the upper and lower bounds:

RULENAME smarts (atomlist) min max

Some examples follow:

RADIUS [#1] (1) 0.90 0.90 ! proton VDW DISTANCE [CX3]=[NX2] (1,2) 1.316 1.316 ! imine bond ANGLE *1~*~*~1 (1,2,3) 57.00 63.00 ! general 3-member ring ANGLE *~[SD6]~* (1,2,3) 84.0 90.0 ! octahedral sulfur TORSION */*=*\* (1,2,3,4) 0.00 3.00 ! cis torsion TORSION */*=*/* (1,2,3,4) 177.00 180.00 ! trans torsion VOLUME **=** (1,2,3,4) 0.00 0.00 ! flatten conjugations VOLUME aaa* (1,2,3,4) 0.00 0.00 ! flatten aromatic subst VOLUME *[C@@](*)(*)* (1,3,4,5) -50.80 -3.80 ! note signed volume

3.3 Rubicon BOUNDS rule

The BOUNDS rule allows any number of distance constraints to be specified simultaneously by allowing the bounds to be specified as a bounds sub-matrix. After matching against the molecule each cell is applied independently. The bounds specification in a BOUNDS rules is a square distance submatrix with rows and columns identified by the atom list:

BOUNDS smarts (atomlist) submatrix

This example specifies an aromatic iodo-substituent environment:

BOUNDS [ID1]-!@[cD3]:[cD3] (1,2,3) 0.00 2.07 3.11 1.38 0.00 2.07 2.99 1.38 0.00

BOUNDS rules are especially good for specifying long-range constraints in large patterns, such as steroids and other ring systems (such rules are not typically generated by hand, however).

3.4 Automatic bounds rules

Realistic high-quality (low tolerance) rules must be quite specific to a particular molecular environment. One problem with generating automatic rules of this type is that you get lots of them (10,000's) ... CPU time required to match rule definition substructures becomes prohibitive. It is possible to create patterns (SMARTS) in a systematic way that allows unique SMARTS to be generated for particular classes of substructures. Given a molecule, the substructures that rules apply to can be extracted and "hashed" in memory, a very fast process. After this is done for a training set, the amount of time needed to find rules that apply to a given molecule is no longer dependent on the number of rules in the database, only the number of rules that apply to that molecule.

The key to building hashable rules is to specify environments which are both unambiguous and geometrically meaningful. For this purpose, Rubicon uses the following attributes to characterize an atom: atomic number, aromaticity, total connectivity (X), valence (v), and the size of the smallest ring that it is a member of. Subgraphs that are exhaustively generated include: bonded pairs, triples, and quads, 3- and 4-way branches, and all SSSR rings.

Special rule types are used to store such data: PAIRBOUNDS, TRIPLEBOUNDS, QUADBOUNDS, BRANCHBOUNDS, and RINGBOUNDS rules differ from normal BOUNDS rules only in how they are accessed. Instead of looping though all rules looking for one that matches the molecule of interest (as must be done with general BOUNDSrules), rule names are generated from the molecule and looked up directly. For instance, molecules which have the O=n(c)c pydidine-N-oxide moiety would generate and match the automatically generated following rule:

BRANCHBOUNDS [nX3v5r6](=[OX1v2r0])(:[cX3v4r6]):[cX3v4r6] (1,2,3,4) 0.000 1.311 1.392 1.392 1.291 0.000 2.340 2.340 1.370 2.310 0.000 2.408 1.370 2.310 2.376 0.000

The net effect of this is pleasantly surprising: operating with a set of specific rules which is large enough to be reasonably comprehensive (50,000+), the time to find all rules which apply to drug-size molecules is about 1 sec.

4. Autorules Program

The autorules program reads conformations as input and writes a rule file as output. Autorules calling syntax is:

autorules [options] < input.tdt > output.rules

There are only two options:

-u produce (unsigned) volume rules. Unsigned volumes of branches and quads provide additional constraints, but add considerably to the size of the rule file (and much of the information is redundant). The default is not to produce unsigned volume rules.

-v (verbose) add comments to the output file showing SMILES of structures resulting in extremal limits. This option increases the size of the rule file and is useful if you are going to examine it manually. Default is not to produce extremal SMILES as comments.

To run autorules, you will need a TDT file containing desired training conformations as $D3D data items. Autorules uses every conformation in the input file, so limit input to the conformations that you want to train on. Remember: garbage in, garbage out! Trees with isomeric SMILES produce isomeric rules. Autorules has a built-in "slop" parameter of 0.0005 Å, e.g., bounds within 0.0005 Å are considered identical.

Autorules writes progress to standard error, indicating how many rules of each type have been created. An example run follows:

autorules -v < /home2/tdt/demo.tdt > demo.rules

Unsigned volume rules .... OFF Extremal SMILES output ... ON 10 tdts: 44 PAIR 101 TRIP 163 QUAD 55 BRAN 7 RING 370 total 20 tdts: 80 PAIR 209 TRIP 372 QUAD 117 BRAN 19 RING 797 total 30 tdts: 102 PAIR 287 TRIP 535 QUAD 170 BRAN 27 RING 1121 total 40 tdts: 130 PAIR 374 TRIP 709 QUAD 220 BRAN 39 RING 1472 total 50 tdts: 139 PAIR 416 TRIP 816 QUAD 248 BRAN 44 RING 1663 total 60 tdts: 147 PAIR 444 TRIP 890 QUAD 275 BRAN 51 RING 1807 total 70 tdts: 152 PAIR 459 TRIP 933 QUAD 287 BRAN 53 RING 1884 total 80 tdts: 162 PAIR 499 TRIP 1042 QUAD 313 BRAN 61 RING 2077 total 90 tdts: 165 PAIR 510 TRIP 1069 QUAD 327 BRAN 64 RING 2135 total 100 tdts: 176 PAIR 536 TRIP 1123 QUAD 343 BRAN 66 RING 2244 total ... 4800 tdts: 556 PAIR 2285 TRIP 6156 QUAD 1969 BRAN 418 RING 11384 total 4810 tdts: 556 PAIR 2286 TRIP 6160 QUAD 1970 BRAN 418 RING 11390 total 4820 tdts: 556 PAIR 2286 TRIP 6160 QUAD 1971 BRAN 418 RING 11391 total 4830 tdts: 556 PAIR 2286 TRIP 6161 QUAD 1971 BRAN 418 RING 11392 total 4840 tdts: 556 PAIR 2286 TRIP 6161 QUAD 1973 BRAN 419 RING 11395 total 4850 tdts: 557 PAIR 2290 TRIP 6172 QUAD 1976 BRAN 419 RING 11414 total 4860 tdts: 557 PAIR 2292 TRIP 6176 QUAD 1978 BRAN 419 RING 11422 total 4870 tdts: 557 PAIR 2292 TRIP 6176 QUAD 1978 BRAN 419 RING 11422 total

Rules produced by autorules are in the form described in the previous section, "Automatic bounds rules". A rule produced by the above run:

! BRANCHBOUNDS [CX3v4r0](=[OX1v2r0])(-[NX3v3r0])-[NX3v3r0] (1,2,3,4) 0.000 1.260 1.341 1.341 1.239 0.000 2.244 2.244 1.320 2.224 0.000 2.314 1.320 2.224 2.293 0.000 ! ! from 127 examples (260 isomorphs) of above, max range: 0.021 ! 1-3 min: CN(C)C(=O)Nc1ccc(Cl)c(Cl)c1 ! 1-3 max: CN(N=O)C(=O)Nc1ccc(F)cc1 !

The comment trailing the rule is typical of those produced by the -v option.

Other rule files can be appended to those produced by autorules to produce a combined rule file for Rubicon. Autorules-generated files should not be concatenated to each other.

5. Rubicon Library: libdc_rube.a

The Rubicon programming library provides access to the Rubicon method from your own program. The fundamental Rubicon function is dc_rube_conformations() which produces a sequence of conformation objects from a molecule object.

The following code fragments suffice to access Rubicon using the default Rubicon method using the default rule file:

#include "dt_smiles.h" #include "dt_depict.h" #include "dc_rubicon.h" ... dt_Handle mol, conf, confseq; void *rubemethod; .... rubemethod = dc_rube_alloc_method(); /* get the default method */ dc_rube_set_trials(rubemeth, 10); /* set the number of trials */ .... confseq = dc_rube_conformations(mol, rubemethod) while (NULL_OB != (conf = dt_next(confseq))) ... process conformation ...

Rubicon is very usable at this simple level. Most other functions in the Rubicon library are provided for the purpose of customizing the method. If you just want to sample conformations of a molecule, you probably don't need to digest all the gruesome detail that follows. For such purposes, skip down to the sections marked "Rubicon method control attributes" and "Rubicon access".

5.1 The Rubicon library interface

Unlike all other 4.x-level Daylight programming libraries, the Rubicon library is provided as an object library with a C interface (rather than using the strictly defined Daylight Toolkit(TM) object interface). This is not a cool, new feature. The Daylight Toolkit interface doesn't provide the ability to support some constructs required by Rubicon in a language-independent way. (E.g., functions in lexically-scoped languages such as Pascal and Ada present a problem.)

So, for pragmatic reasons only, the 4.3 Rubicon library is supplied with a C-language interface. All function names start with "dc_" in this interface (rather than the usual "dt_"). One header file is required, dc_rubicon.h. Macro names start "DC_". This interface is called the "dc_" interface.

The dc_ interface resembles an object-oriented interface more than a typical C-language interface. Programming with the dc_ interface is almost identical to programming with the dt_ interface, except that handy polymorphic functions aren't available for Rubicon constructs, such as methods. All structs and functions are passed as (void *) arguments or results. Like objects ("handles"), you can't dereference them (they're opaque). The Rubicon library relies on the Daylight Toolkit for almost everything else, e.g., it uses the normal molecule and conformation objects.

Floating-point representation is another special issue for Rubicon. The normal kind of floating point number used by the Daylight Toolkit is defined as dt_Real (float for most compilers). The equivalent definition in the dc_ interface is DC_REAL. (In release 4.3x, it's the same as dt_Real.) In general, it seems satisfactory to pass real numbers back and forth as 32-bit things. Given the popularity of 64-bit computers (and the prospect of 128-bit hardware), it seems prudent to leave the door open for interfaces based on larger representations of floating point numbers. Hence, DC_REAL.

Although the dc_ interface is C-specific, it also may be accessed from Fortran programs by use of a wrapper functions (typically written in C).

5.2 Methods

The libdc_rube.a library includes functions which support generic "methods". (In 4.3, methods are currently implemented as C-structs but will probably become objects in future releases.) Some functions in this this library are intended to be used only by programs which create methods and are not useful for programs which only use such methods. The following discussion is limited to public functions which are defined in the include file dc_rubicon.h.

5.2.1. Attributes common to all methods

All methods have five visible attributes:

class name of method class, e.g. "minimizer"

name name of method, e.g. "Rubicon 4.31"

version integral version number, e.g. 431

clientdata pointer to user data (void *)

next pointer to next method (for linked lists of methods)

These attributes are stored in a "method header". No public functions are provided to create or destroy method headers - these are reserved for functions creating methods, such as dc_cg_alloc_minimizer(). Method attributes should be accessed using the following public functions:

int dc_method_set_name ( void *method, int len, char *name ); int dc_method_set_version ( void *method, int version ); int dc_method_set_clientdata ( void *method, void *clientdata ); int dc_method_set_next ( void *method, void *nextmethod );

Each of these functions sets one attribute of the given method and returns TRUE on success or FALSE on error (e.g. invalid method). The clientdata attribute is intended for use by any function which needs to associate data with a method. The next attribute can be used to form linked lists of methods.

No public function is provided to set the class attribute, since this is only done by higher level functions which create methods. For instance, methods returned by dc_cg_alloc_minimizer() will have the method class attribute permanently set to "minimizer".

char *dc_method_class (void *method, int *len); char *dc_method_name (void *method, int *len); int dc_method_version (void *method ); void *dc_method_clientdata (void *method ); void *dc_method_next (void *method );

Each of these functions returns one attribute of the given method. On error, they return 0 or NULL as appropriate. These functions are intended for use at any level, although some are intended for special purposes. For instance, a function expecting a minimizer method can check a method's class attribute to verify that the method is in fact a "minimizer".

void dc_method_print_header(void *method, int len, char *pre, FILE *fp)

Print header attributes for given method on stream fp. Each line is preceded by len chars of prefix pre. This function works for any method (used mainly for debugging). Output includes the "id" attribute (an invisible magic number).

5.2.2. Minimizer methods

A minimizer is an algorithm which, given an objective function, finds a set of independent variable values which produce a minimal function value (dependent variable). Methods of this type are given the class value minimizer. Minimizers typically have a number of adjustable parameters which control their operation. Minimizer methods may differ in the fundamental algorithm used (e.g. conjugate-gradient method vs. Newton-Raphson) or in their control parameters (e.g. same algorithm with different convergence limits). A "minimizer method" is a fully described algorithm, i.e., every possibly parameter is determined including the objective functions to be minimized.

5.2.2.1 Conjugate-gradient minimizer

Only one minimizer algorithm is provided with version 4.3 of Rubicon, a conjugate-gradient minimizer derived from the algorithm published by D. F. Shanno and K. H. Phua, ACM Transactions on Mathematical Software No. 6, 618-622 (Dec. 1980). Various adjustable control parameters can be combined with various objective functions to provide a wide range of minimizer methods which become part of the Rubicon method. All public conjugate-gradient minimizer functions start dc_cg_.

The following high-level functions operate on CG minimizers per se:

void *dc_cg_alloc_minimizer(void);

Allocate a conjugate-gradient minimizer with default parameters.

int dc_cg_dealloc_minimizer(void *cgmeth)

Deallocate minimizer obtained from dc_cg_alloc_minimizer(). Returns TRUE on success or FALSE on error (i.e., if argument cgmeth is not a conjugate-gradient minimizer method).

void *dc_cg_next_minimizer(void *cgmeth)

Return next minimizer in linked list of minimizers (or NULL).

int dc_cg_print_minimizer(void *cgmeth, int len, char *pre, FILE *fp);

Print all attributes of given conjugate-gradient minimizer method cgmeth on stream fp. Each line is preceded by len chars of prefix pre. Output includes method header attributes. Returns TRUE on success or FALSE on error (i.e., if argument cgmeth is not a conjugate-gradient minimizer method).

5.2.2.2 CG minimizer objective functions

Minimizers specified by the default Rubicon method use a 4-dimensional error function as the objective function. Programs can specify additional (or alternative) objective functions to be minimized. The C prototype for objective functions used with the CG minimizer is:

typedef DC_REAL (*DY_CG_OBJFUNC)(void *cgm, int n, DC_REAL *x, DC_REAL *g);

i.e., a function returning a DC_REAL value with 4 arguments: cgm, pointer to the method; n, the number of parameters (input); x, an n-by-4 array of DC_REAL variables (input); and g, an n-by-4 array of gradients (output). DC_REAL is defined in dc_rubicon.h and is implementation-dependent. (For all architectures supported in v4.3x, it's defined as float.)

CG minimizers have two objective function attributes: objfunc, the "real" objective function; and usrfunc, an additional function. In the default Rubicon method, objfunc is the standard 4-D error function used for minimizing distance-geometry bounds and usrfunc is NULL. A user-supplied function can be specified to minimize an additional function, e.g. for matching a pharmacophore or docking to a binding site. If a usrfunc is specified, the results (error and gradients) are added to those of the standard function to produce the combined function to be minimized.

Four functions are provided to set and get objective functions:

int dc_cg_set_objfunc(void *cgmeth, void *objfunc); int dc_cg_set_usrfunc(void *cgmeth, void *objfunc);

Set the primary and additional objective functions of cgmeth. cgmeth must be obtained from dc_cg_alloc_minimizer(). objfunc and usrfunc must be objective functions in the correct form. Returns TRUE on success, FALSE on error.

CAUTION: For use with Rubicon, it is strongly recommended that modifications to the objective function be done via a usrfunc. The default objfunc has been refined by many people over many years -- it is unlikely to be improved by casual changes.

NOTE: If you want to turn off the standard objective function but still use a usrfunc, do so by setting scale_dist and scale_vol attributes to 0.0, rather than setting objfunc to NULL. (Results will be identical, but Rubicon recognizes these values as a special case which allows slightly faster operation.)

void *dc_cg_objfunc(void *cgmeth); void *dc_cg_usrfunc(void *cgmeth);

Return the objective functions of cgmeth (or NULL on error).

A additional function provides information for the convenience of usrfunc's:

dt_Handle dc_cg_conformation(void *cgmeth, int nv, DC_REAL *xyzw);

Return the xyzw conformation as an object.

NOTE: The molecule that Rubicon is working with is the dt_base() of the conformation object returned by this function. This is not the same molecule as submitted to dc_rube_conformations(). (Rubicon works on a copy of the molecule.)

5.2.2.3 CG minimizer report functions

A report function is a user-supplied function which is called to provide intermediate results. The C prototype for report functions used with the CG minimizer is:

typedef int (*DY_CG_REPORTFUNC)(int iter, int neval, DC_REAL fmin, DC_REAL gsq);

where the function returns TRUE on success (if FALSE, the minimization is aborted) and is supplied with the number of the iteration, the number of function evaluations so far, the minimum function value so far, and the current total gradient-squared.

Two functions are provided to set and get report functions:

int dc_cg_set_stepreport(void *cgmeth, void *reportfunc);

Set the step report function for cgmeth to be reportfunc. reportfunc will be called after each minimizer iteration. cgmeth must be obtained from dc_cg_alloc_minimizer(). reportfunc must be a report function in the correct form.

Returns TRUE on success, FALSE on error.

void *dc_cg_stepreport(void *cgmeth);

Return the step report function of cgmeth (or NULL on error).

5.2.2.4 CG minimizer control parameters

The conjugate-gradient minimizer supplied with Rubicon provides a number of parameters which control how the algorithm operates.

int dc_cg_set_accuracy(void *cgmeth, float accuracy); float dc_cg_accuracy(void *cgmeth);

Set and get the effective machine accuracy. The default value of 10.e-20 seems to work well.

int dc_cg_set_convergence(void *cgmeth, float convergence); float dc_cg_convergence (void *cgmeth);

Set and get the conjugate-gradient convergence parameter (the minimization is assumed to have converged when the root mean square of the gradient vector falls below this value). Values of 0.3 and 0.2 (defaults for 1st/2nd stage) seem to work well.

int dc_cg_set_eval_limit(void *cgmeth, int eval_limit); int dc_cg_eval_limit (void *cgmeth);

Set and get the limit of function evaluations. If exceeded, the minimizer gives up. Default value is 1000.

int dc_cg_set_scale_dist (void *cgmeth, float scale_dist ); int dc_cg_set_scale_vol (void *cgmeth, float scale_vol ); int dc_cg_set_scale_usrfunc(void *cgmeth, float scale_usrfunc); float dc_cg_scale_4d (void *cgmeth); float dc_cg_scale_vol (void *cgmeth); float dc_cg_scale_usrfunc (void *cgmeth);

Set and get error function scaling factors. These should be set to make gradient magnitudes approximately equal between functions returning results with different units.

scale_dist is applied to distance bounds violations and gradients in the standard objective function, which operates in units of normalized distance squared in the gradients and to the fourth power in the error. scale_vol is applied to volume errors (Å³) and gradients (Å²). The default values for scale_dist (1.0) and scale_vol (0.1) work well for distance geometry purposes.

scale_userfunc is used to scale the error and gradients returned by usrfunc to match those of the standard error function. If usrfunc is specified, scale_usrfunc should be set to a value appropriate to the units of the result. The default value is 1.0.

int dc_cg_set_scale_4d(void *cgmeth, float factor4d); float dc_cg_scale_4d (void *cgmeth);

Set and get the scaling factor for the 4th-dimension in the standard error function. This value is defined in terms of distance along the normal 3-D axes, i.e., a value of 0.5 means that 4th-dimensional errors are scaled to one half of the others.

If set to 0.0, a 4-D minimization is done using 3-D coordinates only; atoms are free to move through each other in the 4th dimension (used for the 1st minimizer in the default Rubicon method). If set to a positive value, the 4th dimension is also minimized and coordinates are force back into 3 dimensions (0.5 used for Rubicon's default 2nd stage).

NOTE: To obtain good conformations, it is important that the final minimization is done with a positive 4D scaling factor, especially if the previous minimization allowed the conformation to move in 4 dimensions (i.e., had a 4D scaling factor of 0.0).

5.2.2.5 CG minimizer informational functions

CG methods have 5 attributes which provide information about the operation of the minimization. Functions which get this information are typically used after the minimization is complete, to determine if the minimization was successful, how many iterations were used, etc. Functions to set these attributes are also provided, but are useful only to implementers of new CG methods.

int dc_cg_status(void *cgmeth);

Returns termination status of last-completed minimization. The result is an integer flag defined in dc_rubicon.h. If minimization was successful, the return value will be CG_CONVERGED (0). CG_INVALID is returned on error (e.g., cgmeth is not valid method). Otherwise the return value will indicate the reason for termination e.g., CG_MAXEVAL_STOP, CG_UPHILL_STOP, etc.

int dc_cg_eval_count(void *cgmeth);

Returns number of times the objective function was evaluated, or -1 on error.

int dc_cg_iter_count(void *cgmeth);

Returns the number of iterations completed by the minimizer.

float dc_cg_best_value(void *cgmeth);

Returns the best (lowest) function value found.

float dc_cg_grad_norm(void *cgmeth);

Returns the normalized gradient root mean square.

5.2.3. Rubicon methods

Rubicon samples 3-D conformations for a given 2-D molecular structure. In object-oriented terms, a Rubicon method produces a sequence of conformations for a given molecule object.

Like any method, a Rubicon method is completely defined by attributes which are accessed via functions. In turn, the operation of Rubicon is completely defined by the method. For instance, the "trials" attribute controls how many random distance-geometry trials will be performed per invocation.

Rubicon methods have two attributes which are somewhat unusual: a rule set and a linked list of minimizers. In version 4.3, Rubicon rules that provide distance-bounds constraints are defined by the name of a file containing the rules. (Although simple and effective, this solution is inadequate when operating over networks, and will probably be changed in future releases.)

The other attribute which deserves special mention is a linked list of minimizers. Rubicon does a number of minimization steps (by default, 2) to optimize a 3 dimensional conformation before returning it. Various aspects of minimization can be controlled, e.g. the objective function to be minimized, various control parameters, etc. Each minimization is treated as a separate "minimizer method" (referred to as a "minimizer"). The desired minimizers are specified to Rubicon in the order that they are to be executed.

5.2.3.1 Functions operating on complete Rubicon methods

void *dc_rube_alloc_method(void);

Allocate a default Rubicon method. See below for a list of default attributes. Method attributes can be modified via functions below. No public deallocation routine is provided in v4.3.

void dc_rube_print_method(void *rubemeth, FILE *fp);

Print all attributes of given Rubicon method rubemeth to given stream fp. Each line is preceded by len chars of prefix pre. Output includes all method header and minimizer attributes. Returns TRUE on success or FALSE on error (i.e., if argument rubemeth is not a Rubicon method).

5.2.3.2 Rubicon minimizers

The distance-geometry method samples random conformations then modifies them to minimize distance and volume bounds violations. The default Rubicon method uses two minimizers: the first minimizes bounds violations in 4 dimensions (which allows atoms to pass though each other); the second forces the conformation into 3 dimensions.

By modifying Rubicon's minimizer method(s), one can minimize other functions or concurrently minimize additional constraint violations. For example, one might add constraints to force conformations to fit a pharmacophore model or dock a conformation into a 3-D receptor.

Customized minimizer methods are produced by modifying the default minimizer as discussed above in section 3.2.2. Three functions are provided to control which minimizers are used by a Rubicon method.

void *dc_rube_1st_minimizer(void *rubemeth);

Return the first minimizer for Rubicon method rubemeth, or NULL if none are defined. In v4.3, this should always be a CG minimizer method. Subsequent minimizers may be obtained via the function dc_cg_next_minimizer().

int dc_rube_set_minimizer(void *rubemeth, void *cgmeth);

Replace the list of minimizers for Rubicon method rubemeth with the single minimizer cgmeth, which should be a CG minimizer method. Returns TRUE on success, FALSE on error.

int dc_rube_add_minimizer(void *rubemeth, void *cgmeth);

Append CG minimizer method cgmeth to the end of the list of minimizers used by Rubicon method rubemeth. Returns TRUE on success, FALSE on error.

5.2.3.3 Rubicon rule set

In Rubicon v4.3, rules that provide distance-bounds constraints are defined by the name of a file containing the rules. This approach is is simple and effective when running on a single CPU. Unfortunately, this approach is not adequate when operating over networks, so we will probably have to change it in future releases (e.g. to provide an object-oriented compute server).

int dc_rube_set_rulefile(void *rubemeth, int lens, char *rulefile);

Set the rule file for Rubicon method rubemeth to rulefile(string of length lens). Returns TRUE iff successful.

char *dc_rube_rulefile(void *rubemeth, int *lens);

Return the current rule file for Rubicon method rubmeth, or NULL on error or if a rule file is not defined. The default value is $DY_ROOT/data/rubicon.rules.

5.2.3.4 Rubicon method control attributes

The following functions set and get Rubicon method attributes which control the operation of Rubicon.

int dc_rube_set_trials(void *rubemeth, int trials); int dc_rube_set_nconfs(void *rubemeth, int nconfs); int dc_rube_set_mxdv(void *rubemeth, DC_REAL mxdv); int dc_rube_set_mxvv(void *rubemeth, DC_REAL mxvv);

Rubicon samples trials conformations randomly. Output consists of up to nconfs conformations which meet maximum distance (mxdv) and volume (mxvv) bounds violation criteria. These functions set the number of trials, the maximum number of output conformations to output, and the acceptance criteria respectively. It is an error to set nconfs greater than trials. These functions return TRUE on success, FALSE on error.

Default values are 1 (trials), 1 (nconfs), 0.5 (mxdv) and 0.5 (mxvv).

int dc_rube_trials(void *rubemeth); int dc_rube_nconfs(void *rubemeth); DC_REAL dc_rube_mxdv (void *rubemeth); DC_REAL dc_rube_mxvv (void *rubemeth);

Return the number of trials, number of conformations to output, and maximum accceptable distance aand volume violations, respectively, for Rubicon method rubemeth.

int dc_rube_set_h_flag(void *rubemeth, int h_flag); int dc_rube_h_flag(void *rubemeth);

By design, Rubicon provides four ways of dealing with attached hydrogen atoms, which are represented by integer flags defined in the header file dc_rubicon.h:

DC_RUBE_H_ALL add all hydrogens before processing molecule
DC_RUBE_H_NONE remove all hydrogens before processing molecule
DC_RUBE_H_SOME operate on hydrogen atoms present in molecule

In general, geometries generated for hydrogen-complete molecules are superior to those generated for hydrogen-suppressed molecules, but consume more processing time. DC_RUBE_H_ALL is the option of choice when good geometries are required and/or processing time is not an issue. DC_RUBE_H_NONE is the option of choice when fast, approximate geometries are desired. DC_RUBE_H_SOME is useful when only a few hydrogens are important, e.g. hydroxy rotors for methods which account for hydrogen bonding.

The default setting is DC_RUBE_H_ALL.

NOTE: Rubicon will only work with DC_RUBE_H_ALL if the rule set contains distance bounds constraints for hydrogens. The converse is not strictly true, e.g., using DC_RUBE_H_NONE with a rule set that contains hydrogens works correctly (but works best with a rule set tuned for hydrogen suppressed molecules).

int dc_rube_set_bump14(void *rubemeth, int bump14); int dc_rube_bump14 (void *rubemeth);

These functions set and get the bump14 attribute. When RADIUS rules are applied to the distance bounds matrix, Non-bonded 1-4 distances are subjected to van der Waals bumping constraints only if this attribute is TRUE.

int dc_rube_set_zguess(void *rubemeth, int zguess); int dc_rube_zguess (void *rubemeth);

These functions set and get the zguess attribute. If TRUE, orientations about double bonds which are not otherwise specified are set to the presumed lower energy orientation (rel-cis in rings else biggest substituents rel-trans). If TRUE, unspecified double bond orientations are sampled.

NOTE: The zguess attribute is not implemented in version 4.3:
dc_rube_set_zguess() does nothing, unsuccessfully;
dc_rube_zguess() always returns FALSE.

int dc_rube_set_savebounds(void *rubemeth, int savebounds); int dc_rube_savebounds (void *rubemeth);

These functions set and get the savebounds attribute. If FALSE, all temporary storage used by dc_rube_conformations() is deallocated before returning (the default setting). If TRUE, distance and volume bounds are save in a static area and may be retrived via dc_rube_upperbound(), dc_rube_lowerbound(), dc_rube_vbounds_reset(), and dc_rube_vbounds_next().

NOTE: To generate a bounds matrix but no conformations, use a Rubicon method with savebounds set to TRUE and trials set to 0.

DC_REAL dc_rube_lowerbound(dt_Handle a1, dt_Handle a2, int *rule); DC_REAL dc_rube_upperbound(dt_Handle a1, dt_Handle a2, int *rule);

These functions return smoothed distance bounds for the last molecule processed by dc_rube_conformations(), if the Rubicon method attribute savebounds was TRUE and it returned normally (i.e. not NULL_OB). On error, they return -1.0.

The argument *rule is set to the rule number (line number in the rule file) of the most restrictive rule that was applied to that bound. The value is set to zero if the requested bound was not set by any rule (i.e. set only by bounds smoothing; this shouldn't happen if VDW radii are properly defined).

int dc_rube_vbounds_reset(void); dt_Handle dc_rube_vbounds_next (DC_REAL *vmin, DC_REAL *vmax, int *signed);

These functions provide access to the volume constraints (as per the rule file) for the last molecule processed by dc_rube_conformations(), if the Rubicon method attribute savebounds was TRUE and it returned normally (i.e. not NULL_OB).

dc_rube_vbounds_reset() resets the list and returns the total number of number of volume constraints.

dc_rube_vbounds_next() returns the next volume constraint in the list by returning a sequence of four atoms, setting *vmin and *vmaxto the constraint limits, and setting *signed to FALSE (0) if the rule is for an unsigned volume or TRUE (1) if for a signed (chiral) volume. NULL_OB is returned at the end of the list or on error.

5.2.3.5 Rubicon access

Given a molecule and a method, calling Rubicon is very simple:

dt_Handle dc_rube_conformations(dt_Handle mol, void *rubemeth);

Produce a sequence of conformations for the molecule mol using Rubicon method rubemeth. Returns NULL_OB on error.

Appedix 1: References

Blaney, J, B., "DGEOM", Program 159 of the Quantum Chemical Program, Exchange, University of Indiana, Bloomington, IN (1990).

Marsaglia, G. and Zaman, A., Toward a Universal Random Number Generator, Florida State University Report FSU-SCRI-87-50 (1987).

Appendix 2: Notes about Rubicon

The current C-language-specific interface to Rubicon (libdc_rube.a) will possibly be replaced in future versions by an object-oriented interface in the style of the Daylight Toolkit(TM).

The current version of the Rubicon library has no provisions for setting values in the distance bounds matrix other than via Rubicon rules.

The current version of Rubicon samples distance space directly. The "partial metrization" method (providing improved sampling) is not implemented in this version.

Rubicon Manual

Rubicon Reference Manual

TABLE OF CONTENTS