TDT format is comprised of Datatypes, Dataitems, Datafields,Datatrees
- Datatypes are ascii tags preceding data and are of two types
- Identifiers are data tags which begin with "$" which have other pure data tags associated with them
- Pure data tags
- Main "root" of the Data Tree is traditionally the $SMI identifier with a USMILES as data
- Dataitems associated with a given data tag appears between <>
- Datafields are sub-entries of a Dataitem, separated by ";"
- A Datatree is one complete entry in the database for a root identifier, they are separated by a "|"
$D<tag> | Defines the internal tag of a new datatype. | |
_V<vtag[;vtag...]> | The verbose tags. This datatype is the only required part of a datatype definition. It serves two purposes: First, it defines the number of datafields in the datatype being defined. Second, it provides the "verbose tags" (human-readable labels) for the datafields of the datatype being defined. | |
_B<btag[;btag...] | The brief tags. A short name for the datatype suitable for labelling buttons and putting in "pull-down menus". | |
_N<ntype[;ntype...]> | The normalizations of each datafield in the datatype. | |
_P<[*][;[*]...]> | The Merlin-pool-inclusion flag. Each non-zero-length field is loaded into Merlin's in-memory pool when Merlin opens the pool. The _P<!> inclusion flag creates a row in Merlin from each subtree rooted by the identifier datatype to which it applies. For example, if set this flag in the $NAM datatype definition, Merlin would create a row for the $SMI(which is standard), but then it would also create a row for each $NAM. | |
_S<summary> | One-line summary of the datatype's meaning and use. | |
_D<description> | Long description of the datatype's meaning and use. | |
_M<set> | Set membership of the datatype. For administration. | |
_C | Comment. You can put anything you like in this datatype. | |
_O | The "owner" of the datatype. You can put anything you like in this
datatype;
|
$SMI<OCC>2D<1,2,3,4,5,6>| | (original SMILES) |
$SMI<CCO>2D<5,6,3,4,1,2>| | (unique SMILES) |
$SMI<"BrCC=C>>ICC=C"> $RNO<12345> ISM<"BrCC=C>CC(=O)C.CCC(=O)C>ICC=C"> |
$SMI<"BrCC=C>>ICC=C"> $RNO<12345> ISM<"BrCC=C>CC(=O)C.CCC(=O)C>ICC=C"> $RMOL<BrCC=C> $AMOL<CC(=O)C> $AMOL<CCC(=O)C> $PMOL<ICC=C> |
PART_NTUPLE -- Component-order n-tuple data
MAKEGRAPH -- produce a GRAPH subtree
THOR uses the concept of a molecule's graph to allow retrieval of structures that might be tautomers, isomers, or otherwise an inexact match to a particular SMILES. One of the problems in representing molecules in a computer is that we must choose one valence model as the preferred representation, but there are many valid valence models. The graph of a molecule is an information-deficient representation that removes most valence-model information, allowing greater flexibility in retrieving data.
EXAMPLE:
spresi95demo_datatypes.tdt ->