16. THOR and MERLIN Toolkits: Datatypes
Back to Table of Contents
16.1 Datatype and Fieldtype Objects
The syntax and semantics of each datum (i.e. each datafield) in a
THOR database or
Merlin database
are defined by a datatype definition. In this chapter we examine how
the THOR and Merlin Toolkits represent these datatype definitions as
objects, and how to get a datatype's properties via its
datatype object.
Datatype definitions are discussed in detail in the Daylight Theory
Manual, and the practical aspects of creating and loading datatype
definitions into a database are discussed in the Daylight System
Administration Manual.
A datatype object represents the definition of a datatype in object
form. Datatype objects are considered a constituent part of a
database or pool: They are automatically created when the database
or pool is opened, and deallocated when it is closed. Datatype
objects always exist for the life of the parent database or pool;
they cannot be deallocated by dt_dealloc(), nor can they be copied by
dt_copy().
A fieldtype object, a child of the datatype object, represents the
sub-part of a datatype definition for a particular field in the
datatype. For example, if a datatype defines four datafields, the
datatype object will have four child fieldtype objects. Like
datatype objects, fieldtype objects cannot be deallocated or copied.
If the definition of a datatype is modified while the database or
pool is open (that is, the datatype-definition TDTs are re-loaded or
edited), the datatype or fieldtype objects are not affected by the
change; the database or pool must be closed and reopened before the
change will take effect.
16.2 Getting Datatype and Fieldtype Objects
There are several methods a program can use to get datatype-object
handles.
- A specific datatype can be retrieved by name from a database
object; a stream over a database will return all datatype objects;
and any object associated with a datatype (e.g. dataitems in THOR,
columns in Merlin) can be asked for its datatype.
- Fieldtype objects can be retrieve via a stream over the datatype
object, and any object associated with a fieldtype (e.g. datafields
in THOR, columns in Merlin) can be asked for its fieldtype.
If you are reading through this manual front-to-back, the uses of
datatype objects may not yet be apparent. Datatype objects are
heavily used in the THOR and Merlin Toolkits when retrieving data
from THOR and Merlin. If you are unfamiliar with how TDTs are
retrieved from a THOR server, or how columns are created in a Merlin
server, you should skim this material and return to it after studying
the chapters on those subjects.
Functions for retrieving
datatype objects and
fieldtype objects
are:
-
dt_stream(Handle database, integer TYP_DATATYPE)
-
Returns a stream of all datatypes objects in the THOR database or
Merlin pool. For example:
dstream = dt_stream(database, TYP_DATATYPE);
while (NULL_OB != (datatype = dt_next(dstream)))
/* do something with the datatype */
-
dt_stream(Handle datatype, integer TYP_FIELDTYPE)
-
Returns a stream of all fieldtype objects in the datatype object.
For example:
fstream = dt_stream(datatype, TYP_FIELDTYPE);
while (NULL_OB != (fieldtype = dt_next(fstream)))
/* do something with fieldtype */
-
dt_getdatatype(Handle database, string tag) => datatype
-
Retrieves a datatype's definition from the database db using the
identifier tag. Returns a datatype object, or NULL_OB if a problem
is detected. There will be a problem, for example, if there is no
such datatype in db, or if the datatype's definition is badly formed.
Note that this function, called with identical parameters, will
return the same handle. There is never more than one copy of a
particular datatype object.
-
dt_datatype(Handle obj) ==> Handle datatype
-
Returns an object's datatype. Works on dataitems and datafields
(THOR), or columns (Merlin).
-
dt_fieldtype(Handle obj) ==> Handle fieldtype
-
Returns an object's fieldtype object. Works on datafields (THOR), or
columns (Merlin).
Functions for retrieving datatype properties are:
-
dt_dfnorm(Handle obj, integer norm) ==> boolean isnorm
-
Tests the object's normalization against "norm"; returns TRUE if
"norm" is one of the object's normalizations. The object can be a
datafield or fieldtype (THOR), or a column or fieldtype (Merlin).
The detailed definitions of these normalizations are discussed in the
Daylight Theory Manual; the following is a brief synopsis:
DX_THOR_AUTOGEN | generate second datafield from this
| DX_THOR_USMILES | unique SMILES
| DX_THOR_USMILESANY | unique SMILES, not TDT's root
| DX_THOR_ASMILES | absolute SMILES
| DX_THOR_ASMILESANY | absolute SMILES, not TDT's root
| DX_THOR_GRAPH | convert SMILES to GRAPH
| DX_THOR_MAKEGRAPH | produce a GRAPH subtree
| DX_THOR_WHITE0 | zap all spaces
| DX_THOR_WHITE1 | compress 2 or more spaces to one space
| DX_THOR_WHITE2 | compress 3 or more spaces to one space
| DX_THOR_UPCASE | convert lowercase a-z to uppercase A-Z
| DX_THOR_DOWNCASE | convert uppercase A-Z to lowercase a-z
| DX_THOR_NOPUNCT | remove all punctuation
| DX_THOR_SOMEPUNCT | remove some punctuation
| DX_THOR_CASNUM | insert hyphens, verify checksum
| DX_THOR_D3D | compute 3D hash
| DX_THOR_REGEXP | must match regexp
| DX_THOR_SMILES_NTUPLE | SMILES-ordered n-tuple data
| DX_THOR_BINARY | binary data
| DX_THOR_READONLY | field can't be set by user
| DX_THOR_NUMERIC | field is numeric
| DX_THOR_INDIRECT | indirect data field
|
-
dt_dfnormdata(Handle obj, integer norm) ==> string normdata
-
If a normalization has extra data (i.e. DX_THOR_REGEXP,
DX_THOR_INDIR, DX_THOR_SMILES_NTUPLE), returns a string containing
that data.
-
dt_name(Handle obj) => string name
dt_briefname (dt_Handle obj) => string briefname
dt_summary(Handle obj) => string summary
dt_description(Handle obj) => string description
-
These functions return an object's name ("verbose tag"), brief name,
summary, and long description, respectively. They apply to datafield
or fieldtype objects (THOR), or to column and fieldtype objects
(Merlin).
-
dt_tag(Handle obj) ==> string tag
-
Returns the internal tag (e.g. "$SMI") of and object; works on
datatypes and fieldtypes; in THOR also works on dataitems and
datafields; in Merlin also works on columns.
Back to Table of Contents
Go to previous chapter THOR and Merlin Databases
Go to next chapter THOR Datatrees.
|