Name

TDT - Syntax for a Thor Data Tree (TDT)

Description

The TDT format is an ASCII format used by many Daylight applications to handle various data, both structural and non-structural. TDT files or streams consist of one or more TDTs. Thor Data Trees are also objects at the Thor Toolkit level, but this document is specifically about the ASCII TDT format.

Each TDT is comprised of the following subunits, in descending hierarchical order:

sub-tdts
Each sub-tdt consists of exactly one identifier dataitem followed by zero or more non-identifier dataitems.

dataitems
Each dataitem begins with a datatag consisting of one or more non-blank non-special characters followed by the data enclosed in angle brackets, e.g.,
      $NAM<GRAPEJELLY>
    
There are two types of dataitems, identifier dataitems and non-identifier dataitems. Identifier datatags begin with '$' and non-identifier datatags do not. Here is a non-identifier dataitem with three datafields.
    CP<3.219;-55P;4.51>
    

datafields
Each dataitem consists of one or more datafields. Datafields are separated by the semicolon character. Here is a dataitem with six datafields:
    FP<.091A0FFMU,,UJEk42BUe.3hDH,kAWbk6Tb1ekEdVFY.2;2048;103;256;88;1>
    
subfields (obsolete)
Some obsoleted datatypes used multiple subfields within one datafield, separated by the tilde character '~'. The fourth datafield in the following dataitem consists of three subfields separated by tildes (these subfields happen to be indirect references, but this is irrelevant to the TDT format).
    P<-1.30;S5;R1805;F272~F402~F12;;;>
    

Each TDT must begin with an identifier dataitem. If this dataitem is not a SMILES ($SMI), there may be only one sub-tdt and one identifier dataitem in the TDT.

Each TDT terminates with the vertical bar character '|'.

Whitespace between dataitems, sub-tdts, and TDTs is ignored.

Special Characters
The following are special characters used in TDT formatting and must be quoted according to the TDT quoting rules to be used as data.

$     dollar sign
<     less than sign
>     greater than sign
;     semicolon
~     tilde (obsolete)
|     vertical bar
"     double-quote character

Quoting Rules
If a datafield contains any special characters, the datafield must be quoted correctly or the TDT is illegal. If a datafield does not contain any special characters, it may be quoted or not.

If a datafield contains no double-quote characters, it may be quoted correctly by simply enclosing the datafield in double- quote characters. In the following example, the second datafield is correctly quoted.

    PRICE<9.95;"US $">
    
If a datafield does contain one or more double-quote characters, each double-quote character must be replaced with two double-quote character, then the entire dataitem must be enclosed in double-quote characters. E.g.,
    REM<"He said ""Try weakly"", not ""Tri-weekly.""">
    
List vs. Dump Format
For ease of handling, two TDT file formats are defined. List format means one line per dataitem and one line for the vertical bar. Dump format means one line per TDT. The two formats are identical in meaning.

Examples

Here's an example of a TDT in list format (with indentations for clarity) consisting of six sub-TDTs (including the root sub-TDT).
$SMI<CC(C)(C)CNC(=O)N(CCCl)N=O>
	CP<2.580;-0P;4.51>
	FP<W6jZU.0.6s1Ld73I65Y65e..A4VUAUkE.SUEO,MWa0U.2;2048;87;256;77;1;>
	F<C8H16ClN3O2>
	CR<5.611;-0R;4.51>
	CL<66;54;M97;0.1129>
	TS<199705301950.07>
	$GRF<CC(C)(C)CNC(O)N(CCCl)NO>
	$WLN<ONN2GVM1X>
		PCN<PENTAMUSTINE>
		AC<AC1;>
	$NAM<PENTAMUSTINE>
	$CAS<73105-03-0>
	$NAM<NCNU>
|
    

Bugs

  • Not all Daylight applications handle TDTs not in list or dump format.
  • Daylight applications do not handle newlines as data consistently.

    Related topics

    thorfilters(1)


    Daylight Chemical Information Systems, Inc.
    info@daylight.com