Daylight v4.9
Release Date: 1 February 2008

Name

smi2rd - converts a SMILES-based file into a connection table-based file.

Unix Synopsis

smi2rd [options] [infile [outfile]]

Description

smi2mol(1) converts a Daylight SMILES (SMI) or Thor Data Tree (TDT) file into an MDL formatted file containing molecules and/or reactions (RDfile).

The input file must be either a SMI or TDT file. If a unique ID is included in the SMILES file, the SMILES must be followed by ' ' (space) and then the ID. If the structure is absent, the ID in the SMILES file must be preceded by a space. SMILES may include molecules and/or reactions and may or may not contain stereochemistry/isotopes. If an input TDT does not contain a structure it must be rooted in a name identifier. If it is multi-branched it must be rooted in $SMI and will be split into multiple records based upon $NAM unless otherwise specified using the NAME_DATATAG option. The SMI_WITH_TUPLES option is available to set whether output coordinates use TDT structural information associated with $SMI or with ISM.

Default output is to stdout in RDfile (v2000) format. If the input contains a mixture of molecules and reactions, the resulting RDfile output will contain both $RIREG and $MIREG datatypes. If the input TDT contains the LINE1 data tag then, the information associated with this datatype will be placed on the first line of the header block for each molecule connection table.

Stereochemistry is inferred from the isomeric SMILES and appropriate MDL bond styles are set to reflect this. While these follow the MDL rules, some users may have conventions which allow a less rigorous depiction of chirality. Note: The chirality flag is only set in the output, if it is set in the input TDT.

In order to be visualized, RDfiles require coordinates for the structures. If the input file does not contain this information, 2D coordinates and/or bond styles are generated via the standard Daylight depict algorithm. All atoms are assumed to be visible (present in the output file) except normal hydrogen atoms. This can be over-ridden by setting the VIS datatype in the input TDT.

The data in an input TDT file are transferred to the RDfile using the same data tags. Note that the special meaning of the '$' in distinguishing identifiers is lost as the RDfile does not have the same underlying tree structure as the TDT. In addition, consecutive record numbers are automatically generated for $MIREG and $RIREG and the $NAM value is used for $MEREG and $REREG unless specified using the NAME_DATATAG option.

Since MDL allows non-standard atoms in the CTfile format, wildcards such as '*' are replaced by the corresponding string from the ASYM datatype, if available.

The manual page for "convert" describes features common to this and the other "convert" programs. Please refer to it for more information on general usage and options such as -HELP, -VERSION, -SKIP_RECORDS, -DO_RECORDS, -ERROR_LEVEL, -ERROR_LOG, and -REJECT_LOG.

Options

-IFMT [SMI|TDT]
Controls whether the input file is in SMILES or TDT format. Default is SMI.
-NAME_DATATAG <tag>
Designates the data tag to be used as the unique ID. For SMI input, the space-delimited ID after the SMILES is used. For TDT input the default tag is $NAM. Specifying another tag, designates the data associated with that tag as the ID. In addition, multi-branched TDTs are split using the default or designated data tag. For reactions, the ID is placed in the $REREG field. For molecules, the ID is written to the $MEREG field. Note: One may need to place the tag name in quotes on the command line and use '\\' before a '$'if it is an identifier in the TDT.
-SMI_COMMENT [TRUE|FALSE]
Determines whether the SMILES is placed in the comment line. Default is FALSE. Designating -COMMENT_SMI as TRUE writes the SMILES to the comment line (line 3) of the header block in each connection table. Note: The comment line is limited to 80 characters.
-SPLIT_FIELDS [TRUE|FALSE]
Splits multi-field TDT data into separate output entries. The default is FALSE. Setting SPLIT_FIELDS to TRUE allows each multi-line field type to be considered as a separate entry with the same data tag.
-USE_3D [TRUE|FALSE]
Designates whether 3D coordinates are included in the output. Default is FALSE. If -USE_3D is set to TRUE and the input TDT file contains 3D coordinates, then 3D coordinates are included in the output file.
-SMI_WITH_TUPLES [TRUE|FALSE]
Determines whether output tuple information is associated with SMILES or isomeric SMILES. Default is TRUE so that tuples associated with $SMI (2D or $D3D) are saved in the output file. Setting this option to FALSE outputs the tuple information associated with the ISM (2DI or 3DI).

Return Value

smi2rd returns 0 to the environment if it succeeds without errors or a non-zero value if there are errors.

Files

$DY_ROOT/bin/smi2rd

Daylight License

programs: convert

Related Topics

convert(1) mol2smi(1) rd2smi(1) sd2smarts(1) rd2smarts(1) rd2smirks(1) smi2mol(1) licensing(5) options(5)