Name
TDT - Syntax for a Thor Data Tree (TDT)
Description
The TDT format is an ASCII format used by many Daylight applications to
handle various data, both structural and non-structural. TDT files or
streams consist of one or more TDTs. Thor Data Trees are also
objects at the Thor Toolkit level, but this document is
specifically about the ASCII TDT format.
Each TDT is comprised of the following subunits, in descending
hierarchical order:
- sub-tdts
- Each sub-tdt consists of exactly one identifier dataitem
followed by zero or more non-identifier dataitems.
- dataitems
- Each dataitem begins with a datatag consisting of one
or more non-blank non-special characters followed by
the data enclosed in angle brackets, e.g.,
$NAM<GRAPEJELLY>
There are two types of dataitems, identifier dataitems
and non-identifier dataitems. Identifier datatags
begin with '$' and non-identifier datatags do not.
Here is a non-identifier dataitem with three datafields.
CP<3.219;-55P;4.51>
- datafields
- Each dataitem consists of one or more datafields.
Datafields are separated by the semicolon character.
Here is a dataitem with six datafields:
FP<.091A0FFMU,,UJEk42BUe.3hDH,kAWbk6Tb1ekEdVFY.2;2048;103;256;88;1>
- subfields (obsolete)
- Some obsoleted datatypes used multiple subfields within
one datafield, separated by the tilde character '~'.
The fourth datafield in the following dataitem consists
of three subfields separated by tildes (these subfields
happen to be indirect references, but this is irrelevant
to the TDT format).
P<-1.30;S5;R1805;F272~F402~F12;;;>
Each TDT must begin with an identifier dataitem. If this dataitem
is not a SMILES ($SMI), there may be only one sub-tdt and one
identifier dataitem in the TDT.
Each TDT terminates with the vertical bar character '|'.
Whitespace between dataitems, sub-tdts, and TDTs is ignored.
- Special Characters
- The following are special characters used in TDT formatting and
must be quoted according to the TDT quoting rules to be used
as data.
$ dollar sign
< less than sign
> greater than sign
; semicolon
~ tilde (obsolete)
| vertical bar
" double-quote character
- Quoting Rules
- If a datafield contains any special characters, the datafield
must be quoted correctly or the TDT is illegal. If a datafield
does not contain any special characters, it may be quoted or
not.
If a datafield contains no double-quote characters, it may be
quoted correctly by simply enclosing the datafield in double-
quote characters. In the following example, the second
datafield is correctly quoted.
PRICE<9.95;"US $">
If a datafield does contain one or more double-quote characters,
each double-quote character must be replaced with two
double-quote character, then the entire dataitem must be
enclosed in double-quote characters. E.g.,
REM<"He said ""Try weakly"", not ""Tri-weekly.""">
- List vs. Dump Format
- For ease of handling, two TDT file formats are defined.
List format means one line per dataitem and one line for the
vertical bar. Dump format means one line per TDT. The two
formats are identical in meaning.
Examples
Here's an example of a TDT in list format (with indentations for
clarity) consisting of six sub-TDTs (including the root sub-TDT).
$SMI<CC(C)(C)CNC(=O)N(CCCl)N=O>
CP<2.580;-0P;4.51>
FP<W6jZU.0.6s1Ld73I65Y65e..A4VUAUkE.SUEO,MWa0U.2;2048;87;256;77;1;>
F<C8H16ClN3O2>
CR<5.611;-0R;4.51>
CL<66;54;M97;0.1129>
TS<199705301950.07>
$GRF<CC(C)(C)CNC(O)N(CCCl)NO>
$WLN<ONN2GVM1X>
PCN<PENTAMUSTINE>
AC<AC1;>
$NAM<PENTAMUSTINE>
$CAS<73105-03-0>
$NAM<NCNU>
|
Bugs
Not all Daylight applications handle TDTs not in list or dump format.
Daylight applications do not handle newlines as data consistently.
Related topics
thorfilters(1)
Daylight Chemical Information Systems, Inc.
info@daylight.com