16. THOR and MERLIN Toolkits: Datatypes

Back to Table of Contents

16.1 Datatype and Fieldtype Objects

The syntax and semantics of each datum (i.e. each datafield) in a THOR database or Merlin database are defined by a datatype definition. In this chapter we examine how the THOR and Merlin Toolkits represent these datatype definitions as objects, and how to get a datatype's properties via its datatype object. Datatype definitions are discussed in detail in the Daylight Theory Manual, and the practical aspects of creating and loading datatype definitions into a database are discussed in the Daylight System Administration Manual.

A datatype object represents the definition of a datatype in object form. Datatype objects are considered a constituent part of a database or pool: They are automatically created when the database or pool is opened, and deallocated when it is closed. Datatype objects always exist for the life of the parent database or pool; they cannot be deallocated by dt_dealloc(), nor can they be copied by dt_copy().

A fieldtype object, a child of the datatype object, represents the sub-part of a datatype definition for a particular field in the datatype. For example, if a datatype defines four datafields, the datatype object will have four child fieldtype objects. Like datatype objects, fieldtype objects cannot be deallocated or copied.

If the definition of a datatype is modified while the database or pool is open (that is, the datatype-definition TDTs are re-loaded or edited), the datatype or fieldtype objects are not affected by the change; the database or pool must be closed and reopened before the change will take effect.

16.2 Getting Datatype and Fieldtype Objects

There are several methods a program can use to get datatype-object handles.

  • A specific datatype can be retrieved by name from a database object; a stream over a database will return all datatype objects; and any object associated with a datatype (e.g. dataitems in THOR, columns in Merlin) can be asked for its datatype.

  • Fieldtype objects can be retrieve via a stream over the datatype object, and any object associated with a fieldtype (e.g. datafields in THOR, columns in Merlin) can be asked for its fieldtype.

If you are reading through this manual front-to-back, the uses of datatype objects may not yet be apparent. Datatype objects are heavily used in the THOR and Merlin Toolkits when retrieving data from THOR and Merlin. If you are unfamiliar with how TDTs are retrieved from a THOR server, or how columns are created in a Merlin server, you should skim this material and return to it after studying the chapters on those subjects.

Functions for retrieving datatype objects and fieldtype objects are:

dt_stream(Handle database, integer TYP_DATATYPE)
Returns a stream of all datatypes objects in the THOR database or Merlin pool. For example:
     dstream = dt_stream(database, TYP_DATATYPE);
     while (NULL_OB != (datatype = dt_next(dstream)))
	/* do something with the datatype */

dt_stream(Handle datatype, integer TYP_FIELDTYPE)
Returns a stream of all fieldtype objects in the datatype object. For example:
     fstream = dt_stream(datatype, TYP_FIELDTYPE);
     while (NULL_OB != (fieldtype = dt_next(fstream)))
	/* do something with fieldtype */
dt_getdatatype(Handle database, string tag) => datatype
Retrieves a datatype's definition from the database db using the identifier tag. Returns a datatype object, or NULL_OB if a problem is detected. There will be a problem, for example, if there is no such datatype in db, or if the datatype's definition is badly formed.

Note that this function, called with identical parameters, will return the same handle. There is never more than one copy of a particular datatype object.

dt_datatype(Handle obj) ==> Handle datatype
Returns an object's datatype. Works on dataitems and datafields (THOR), or columns (Merlin).

dt_fieldtype(Handle obj) ==> Handle fieldtype
Returns an object's fieldtype object. Works on datafields (THOR), or columns (Merlin).

Functions for retrieving datatype properties are:

dt_dfnorm(Handle obj, integer norm) ==> boolean isnorm
Tests the object's normalization against "norm"; returns TRUE if "norm" is one of the object's normalizations. The object can be a datafield or fieldtype (THOR), or a column or fieldtype (Merlin). The detailed definitions of these normalizations are discussed in the Daylight Theory Manual; the following is a brief synopsis:

DX_THOR_AUTOGEN generate second datafield from this
DX_THOR_USMILES unique SMILES
DX_THOR_USMILESANY unique SMILES, not TDT's root
DX_THOR_ASMILES absolute SMILES
DX_THOR_ASMILESANY absolute SMILES, not TDT's root
DX_THOR_GRAPH convert SMILES to GRAPH
DX_THOR_MAKEGRAPH produce a GRAPH subtree
DX_THOR_WHITE0 zap all spaces
DX_THOR_WHITE1 compress 2 or more spaces to one space
DX_THOR_WHITE2 compress 3 or more spaces to one space
DX_THOR_UPCASE convert lowercase a-z to uppercase A-Z
DX_THOR_DOWNCASE convert uppercase A-Z to lowercase a-z
DX_THOR_NOPUNCT remove all punctuation
DX_THOR_SOMEPUNCT remove some punctuation
DX_THOR_CASNUM insert hyphens, verify checksum
DX_THOR_D3D compute 3D hash
DX_THOR_REGEXP must match regexp
DX_THOR_SMILES_NTUPLE SMILES-ordered n-tuple data
DX_THOR_BINARY binary data
DX_THOR_READONLY field can't be set by user
DX_THOR_NUMERIC field is numeric
DX_THOR_INDIRECT indirect data field

dt_dfnormdata(Handle obj, integer norm) ==> string normdata
If a normalization has extra data (i.e. DX_THOR_REGEXP, DX_THOR_INDIR, DX_THOR_SMILES_NTUPLE), returns a string containing that data.

dt_name(Handle obj) => string name
dt_briefname (dt_Handle obj) => string briefname
dt_summary(Handle obj) => string summary
dt_description(Handle obj) => string description

These functions return an object's name ("verbose tag"), brief name, summary, and long description, respectively. They apply to datafield or fieldtype objects (THOR), or to column and fieldtype objects (Merlin).
dt_tag(Handle obj) ==> string tag
Returns the internal tag (e.g. "$SMI") of and object; works on datatypes and fieldtypes; in THOR also works on dataitems and datafields; in Merlin also works on columns.

Back to Table of Contents
Go to previous chapter THOR and Merlin Databases
Go to next chapter THOR Datatrees.