Daylight v4.9
Release Date: 1 February 2008

Name

rd2smi - converts a connection table-based reaction file into a SMILES-based file.

Unix Synopsis

rd2smi [options] [infile [outfile]]

Description

rd2smi(1) converts an MDL RDfile into a Daylight SMILES (SMI), isomeric SMILES (ISM), Thor Data Tree (TDT) file. Alternatively, the output can be directed to two SQL loader (SQLLDR) files.

The input file must be an rxnfile or RDfile in v2000 format. R-group or S-group features are not recognized.

Default output is to stdout. In the case of SQLLDR output, the user must specify the rootname for the two output files.

Double bond stereochemistry and tetrahedral chirality are inferred from the atom coordinates and bond style information in the connection table and encoded as isomeric SMILES. SMILES, isomeric SMILES, and associated structural information are automatically stored in the TDT and SQLLDR outputs.

Data in SDfiles are converted for TDT and SQLLDR outputs. The SQLLDR format stores data in one file (.dat) and structural information in another (.str). Legal characters for data tags are limited to: $, _, /, A-Z, a-z, and 0-9.

Unless otherwise specified using the ID_FIELD option, the characters in the first line of the header block for a molecule are assumed to be a unique ID. For reactions, the value for $RIREG is used as the default ID. In the SMI and ISM outputs, this ID follows the space-delimited SMILES or isomeric SMILES. The ID is stored in the $NAM field for the TDT format and as the first line of the SQLLDR files. If there is no default ID available, the isomeric SMILES will be used as the ID for TDT and SQLLDR output.

An RDfile with no structural information produces a TDT rooted in the $NAM. Data from an RDfile with no structural information is captured in the SQLLDR rootname.dat file. However, no entry is written for SMI, ISM, or SQLLDR rootname.str output files.

The manual page for "convert" describes features common to this and the other "convert" programs. Please refer to it for more information on general usage and options such as -HELP, -VERSION, -SKIP_RECORDS, -DO_RECORDS, -ERROR_LEVEL, -ERROR_LOG, and -REJECT_LOG.

Options

-OFMT [SMI|ISM|TDT|SQLLDR]

-OUTPUT_FORMAT [SMI|ISM|TDT|SQLLDR]

Controls whether the output is in SMILES, isomeric SMILES, TDT, or SQLLDR format. The default is SMI. For the TDT and SQLLDR formats, information on the first line of each input header block (molecules only) and any non-standard atom labels in the input file are stored as LINE1 and as an atom-tuple in the ASYM datatype, respectively. The original atom is designated by '*' in the SMILES. For TDT output, a special $SMIG datatype is written containing data about the conversion program name and version.
-ADD_2D [TRUE|FALSE]

-ADD_3D [TRUE|FALSE]

Adds 2D and/or 3D coordinates to the TDT or SQLLDR output. These data are taken from the actual coordinates in the input atom block and stored as a comma-separated list of values in unique SMILES order. Default is TRUE for both -ADD_2D and -ADD_3D. If non-zero coordinates are found in the atom block, then either 2D or 3D coordinates are written to the output file depending on which are available. Setting one of these values to FALSE eliminates the entry for that set of coordinates.
-SMI_IS_ISM [TRUE|FALSE]
Replaces SMILES with isomeric SMILES in the output. Some programs such as rubicon require the SMILES datatype carry isomeric information. The default value for -SMI_IS_ISM is FALSE. Setting this option to be TRUE allows isomeric information to be stored in the SMILES datatype.
-ID_FIELD <name>
Sets the data field identifier to be used as unique ID. As described above, the default ID name for molecules is the first line of each header block. Designating all or part of a data field identifier including MIREG or MEREG as the named ID_FIELD causes the data in that field to be used as the ID. For reactions, the value for $RIREG or this value followed by '_' (underscore) and the RXN:VARIATION number if specified as the -PREFIX is used as the default. Designating all or part of a datatype including REREG as the ID_FIELD causes the data in that field to be recognized as the ID. Note one may need to place the data field identifier in quotes and use `\\` before '$'. Input records not containing information in the designated field are rejected.
-PREFIX <name>
Parses the designated prefix from data field identifiers. The default is to use the full $DTYPE name. Specifying a prefix removes the defined string from the output $DTYPE name and stores the prefix in the TDT/SQLLDR output files. For molecules, any string can be used. For reactions, the string must cover from the beginning of the name to the first '('. For example RXN:VARIATION can be stripped from the following $DTYPE names: RXN:VARIATION(1) and RXN:VARIATION(2). Also note that in this case, the reaction is split on RXN:VARIATION number with the ID identified by the $RIREG value followed by '_' (underscore) and the RXN:VARIATION number.
-IMPLICIT_CHIRALITY [TRUE|FALSE]
Alters the way in which chirality is determined in order to detect implicit chiral centers. This is useful for some natural products. For a bond A-hash-B, the interpretation is that B is below A from the perspective of A and A is above B from the perspective of B. The default is FALSE. Setting -IMPLICIT_CHIRALITY to TRUE allows both ends of chiral bonds to be used in the determination of chiral centers when generating isomeric SMILES.
-DB_RINGS_CISTRANS [TRUE|FALSE]
Toggles whether stereochemistry for ring double bonds is indicated. Default is FALSE. Setting this option as TRUE, marks the cis/trans stereochemistry for all ring double bonds when generating isomeric SMILES.
-M__ISO_ARE_DEFECTS [TRUE|FALSE]
Indicates whether the values in the M ISO line of the property block are mass defects or actual masses for the isotopes listed. Default is FALSE. When -M__ISO_ARE_DEFECTS is set as TRUE, values in the line are treated as mass defects when generating isomeric SMILES.
-DB_EXPLICIT_H [TRUE|FALSE]
Determines whether double bonds in the input file must have explicit hydrogens. The default is FALSE. Setting this option as TRUE requires that double bonds have all hydrogens explicitly indicated indicated in order to generate isomeric SMILES.
-CHI_EXPLICIT_H [TRUE|FALSE]
Determines whether chiral atoms in the input file must have explicit hydrogens. The default is FALSE. Setting this option as TRUE requires that chiral atoms have all hydrogens explicitly indicated indicated in order to generate isomeric SMILES.
-FIX_RADICAL_RINGS [TRUE|FALSE]
Converts radical rings to aromatic. The default is TRUE which allows for the certain types of five, six, and seven-membered radical rings to be converted to aromatic. Changing this option to FALSE, keeps the rings as specified in the input file. In order for a ring to be converted, all atoms in the ring must be carbon and designated as doublet radicals. In addition, no atom in the ring may have a charge.
-PTABLE <name>
Provides location of user-defined periodic table. Setting this option with a name of a user-defined PTABLE causes the uncommented lines present in the user PTABLE to used over the information in the default PTABLE. An example table is located in $DY_ROOT/data. Uncomment and edit a specific line of this file in order to change the set of valence/charge pairs to be used for this atom.

Return Value

rd2smi returns 0 to the environment if it succeeds without errors or a non-zero value if there are errors.

Files

$DY_ROOT/bin/rd2smi

Daylight License

programs:

Related Topics

convert(1) mol2smi(1) smi2mol(1) smi2rd(1) sd2smarts(1) rd2smarts(1) rd2smirks(1) rubicon(1) licensing(5) options(5)