15. THOR and Merlin Toolkits: Databases
Back to Table of Contents
15.1 Introduction
Although THOR and Merlin present very different views of the data in
a database, both systems present the very same data. Because of
this, most operations on databases are identical in the THOR and
Merlin Toolkits. This includes opening and closing databases,
setting the server's "search path" security operations, and
datatype-object operations. This chapter covers all of these common
operations.
An old adage might be paraphrased here: "An example is worth a
thousand words" Many tutorial examples of Thor and
Merlin Toolkit usage can be found in the "contrib"
directory:
$DY_ROOT/contrib/src/thor/
$DY_ROOT/contrib/src/merlin/
We strongly encourage you to study these examples before
attempting to write Thor and/or Merlin Toolkit programs.
15.2 Search Path
THOR
and
Merlin servers
maintain a "search path" -- a list of
directories which are to be searched for databases (see the Daylight
System Administration Manual for more details).
(Note that the search path is a property of a server, not a database. We
put it in the databases chapter rather than the server chapter because
it fits with other database operations.)
Note that the directories in the search path are interpreted by the
server's operating system, hence are in a format appropriate to that
operating system. For example, a Macintosh client connected to a UNIX
server would use UNIX syntax to specify a database path (e.g.
"/thordb/mydb"). Similarly, environment variables are interpreted on
the server's operating system, not the client's.
-
dt_getsearchpath(Handle server) ==> Handle sos
-
Returns sequence of strings (SOS), each string-object of which
contains a directory in the server's search path. The order of
directories in the SOS is the order in which the directories will be
searched to find a database.
-
dt_setsearchpath(Handle server, string password, string path, integer replace);
-
Sets the server's search-path. You can either replace the current path,
or add to it.
-
dt_getdatabases(Handle server) ==> Handle sos
-
Returns a sequence of string objects (SOS) containing all databases
in server's search path.
15.3 Creating and Configuring Databases
Database creation is only done by the THOR server, so the functions
in this section don't apply to the Merlin Toolkit. The following
functions are used to create and configure a
THOR database.
15.3.1 Database Creation
-
dt_thor_createdb(Handle server,int dlen, string path, int sizepri, int sizexref) => Handle database
-
Creates a new empty THOR database and opens it with "executive"
permission.
The parameter
path must be a complete path, not a relative
path or just a filename.
The parameters sizepri and sizexref
are the requested sizes of the
primary and cross-reference hash tables, respectively. For more
information about database sizes, see the
reference page for this function and the
Daylight THOR-Merlin Administration Manual.
15.3.2 Database Configuration
Each database can have one, two or three auxiallary databases associated
with it:
- The datatypes database: Contains special-purpose datatype-
definition TDTs (e.g. "$D<$SMI>_V...|". Each time a new
datatype
is encountered, its definition is retrieved from this
database. For more information about datatypes, see the
Daylight Theory Guide.
- The indirect-data database. Contains the expansions for
indirect references. For more information about indirect data, see the
Daylight Theory Guide.
Generally speaking:
- a datatypes database will have no associated databases
- an indirect-data database will have a datatypes database associated with
it that defines the indirect-datatype definitions
- a regular chemical database will always have a datatypes database
and will often have an indirect-data database.
-
dt_thor_getauxillarydb(Handle database, integer type) => string path
-
Returns the path (directory, filename, and suffix) of the auxillary
database associated with db of type type. Type will be either
DX_THOR_DATATYPESDB or DX_THOR_INDIRECTDB.
-
dt_thor_setauxillarydb(Handle database, integer type, string path) => boolean ok
-
Sets the database that is to be associated with db as type
type , where type is either
DX_THOR_DATATYPESDB or DX_THOR_INDIRECTDB.
15.3.3 Database Crunching
After a series of deletions and/or replacements, a database's data
files may have "holes" in them. For example, if a TDT is enlarged
(e.g. new dataitems added), it will no longer fit in its original
spot; new space is allocated for it and the old space is marked
"unused". THOR can sometimes re-use these available spaces
(depending on the server's implementation and configuration), but
generally the server is unable to make 100% use of the space in a
database that has been extensively modified. This can cause a
database to grow to be much larger than the amount of actual data it
contains.
Crunching is the process of moving all data "forward" in the file to
fill in these unused spaces, leaving all unused space at the end of
the file; a pass is made through the entire database, reading and re-
writing data and rebuilding the hash table. Once this is done, the
file is truncated to get rid of the unused space at the end, freeing
the file-system space for other uses.
The crunch operation should not be undertaken lightly, as the crunch
operation is indivisible; while a crunch is under way, the server
doesn't respond to other clients. Depending on database size, a
crunch can take anywhere from several seconds to many minutes.
During a crunch, the database is temporarily in invalid states; for
example, the hash table file is invalid until the crunch operation is
complete. The database may be corrupted if some error occurs
(usually an interruption such as a power failure) midway through a
crunch. The actual data records may not be damaged, but hash
information is usually destroyed; the data are no longer accessible.
In such a case, the thordump(1) utility may be required to recover
the data.
-
dt_thor_crunchdata(Handle database) => boolean
-
Crunches (recovers unused space from) the primary data file of a
database.
-
dt_thor_crunchxref(Handle database) => boolean
-
Crunches (recovers unused space from) the cross-reference data file
of a database.
-
dt_thor_autocrunch_limit(Handle database, float limit) => float limit
-
The database's "autocrunch" parameter is used to trigger an
automatic database crunch whenever the fraction of free space exceeds
a limit. The fraction is computed as:
bytes_free
free space = -----------------------
bytes free + bytes used
This function both sets and returns the "autocrunch"
limit -- the fraction of free space which, if exceeded, will trigger
an automatic crunch. The limit applies to both the primary and
cross-reference data files. If limit is <= 0.0, the
database's limit is unaffected; this serves as a way to query the
current value without modifying it. Values greater than 1.0 are not
permitted. A value of 1.0 will disable autocrunching.
15.4 Opening and Closing Databases
The Toolkit calls to open and close databases in THOR and Merlin are
identical, but the actual operations performed by the two servers are
quite different:
- THOR opens all of the database's data files (the primary and
cross-reference data files, and the primary and cross-reference hash
tables). These files remain open as long as the database is open.
If caching is enabled (see below), data
are read from the disk files into the Thor server's memory.
If multiple clients open the same database, the server creates a
"client context" for each, but shares the database resources (i.e.
the files) among the clients.
- Merlin opens the primary data file, reads its contents into
memory, and closes the file. The memory remains in use as long as
the database is in use (by any user). Each client that opens the
same database has its own "client context" in the server, but all
clients share the database's in-memory image.
-
dt_open(Handle server, string dbname, string permission, string password,RETURN integer isnew) ==> Handle database
-
Opens a database on a
THOR
or
Merlin
server.
The
path is the path (directories and filename) of the
database on the server machine.
If it is a simple filename (no directory information), the server
will search its search path for the database -- the first database
found in the path that matches the name is used.
If path contains any directory information, it must be a complete
path - partial and relative paths
are not allowed. When a complete path is specified, the server's
search path is ignored.
The string perm is one of "r", "w",
or "e", representing read, read/write, and executive
permission. The password must be the database's password for
the requested permission or higher (i.e. the executive password always
works, the write password works for reading or writing, and the read
password only works for reading.)
-
dt_exists (Handle server, string dbname) ==> boolean isopen
-
Returns TRUE if the named database exists.
-
dt_isopen (Handle server, string dbname) ==> boolean isopen
-
Returns TRUE if the named database is already open (either open by
some other client, or marked "hold" - see
dt_hold() and
below).
-
dt_ispublic(Handle server, string name) ==> boolean ispublic
-
Returns TRUE if the named database is "public"; that is, if
it has an empty read-permission password so that it can be opened
without a password.
15.5 Memory Usage: Cache and Hold
15.5.1 Merlin HOLD
It can take a long time for a Thor or Merlin server to open a
database: Merlin's in-memory high-speed searching requires that it
scan the entire database into memory; Thor provides various levels of
"caching" -- loading heavily-used parts of the database (or
even all of the database) into memory to improve performance.
Because of the potentially high overhead to open a database, both Thor
and Merlin provide a "hold" for databases which causes the
database to remain open even when no client is using it. For Merlin,
"hold" means the database is retained in memory. For Thor,
"hold" means the database files remain open, and cached
portions of the database remain in memory.
-
dt_hold(Handle database, string thorpassword) ==> boolean ok
-
Marks the specified database "held", so that it will be
retained in the Merlin server's memory. The password is that of the
user "thor", and must be supplied even if you connected to
the server as the user "thor". Returns TRUE if the
operation succeeded. The operation fails if the server determines
that the password is incorrect, or if database is not a Merlin
database (pool) object.
-
dt_isheld(Handle database) ==> boolean isheld
-
Returns TRUE if database is marked "hold". Returns FALSE if the
database is not marked "hold", or if database is not a Merlin
database (pool) object.
-
dt_release(Handle database, string execpassword) ==> boolean ok
-
Marks the specified database "released" (not held), so that it will
be removed from the Merlin server's memory when the last client
closes it. The password is that of the user "thor", and must be
supplied even if you connected to the server as the user "thor".
Returns TRUE if the operation succeeded. The operation fails if the
server determines that the password is incorrect, or if database is
not a Merlin database (pool) object. Note that the database is
not released as long as any client (including the one
performing this operation) has the database open. Clients can be
"evicted" to force closure; see
dt_evict().
15.5.2 THOR Caching
A THOR server's performance can be improved by "caching": storing
frequently-used sub-parts of the database in the server's memory.
This is discussed in more detail in the Daylight Theory Manual and
the Daylight System Administration Manual.
Remember that a server is free to silently ignore any and all caching
requests, depending on the particular implementation and the server's
configuration.
Valid caching levels are symbolic constants in the THOR Toolkit:
Thor Caching Levels
|
DX_THOR_OFF
| no caching
|
DX_THOR_RTABLE
| write-through cache of hash table
|
DX_THOR_TABLE
| complete cache of hash table
|
DX_THOR_RALL
| write-through cache of everything
|
DX_THOR_ALL
| complete cache of everything
|
The following functions control caching:
-
dt_thor_cache(Handle database, int level) => boolean
-
Enable caching for the database. The parameter
level
indicates what type of caching to perform; see the table above.
-
dt_thor_cachecontrol(Handle database, int when, int level) => boolean
-
Overrides cache requests from normal users; the cache-control
specification becomes a property of the database, and remains in
effect when the database is closed and reopened. Requires executive
permission. The parameter
level indicates how much caching to
perform, as described above. The parameter when indicates:
- DX_THOR_CACHE_NEVER
- Caching is always disabled; caching requests from other clients
are prohibited and are silently ignored.
- DX_THOR_CACHE_OK
- Caching requests from clients are allowed; the parameter
level is ignored. This is the default.
- DX_THOR_CACHE_ALWAYS
- Caching is forced whenever a database is
opened, to the level specified by level; caching requests from other
clients are prohibited and are silently ignored.
-
dt_thor_cachesync(Handle database) => boolean
-
Forces all cached data to be written to the disk immediately. This
should only be done occasionally, as it is an "atomic"
operation -- the entire sync is completed before any other client
requests are served, which can adversely affect performance.
There is only one function for managing the security of databases.
Note that it is polymorphic; it also applies to server objects; its
behavior when applied to server objects is described in the
Server Security Functions
chapter of this manual.
-
dt_setpassword(Handle database, string what, string authorizing_pw,string newpw) => boolean
-
Changes a password for the database.
Note that when a database's password is changed any existing users of
that database are unaffected; a client program can keep a database
open indefinitely even though the password used to open the database
is no longer valid. Authorization is only checked when the database
is opened.
The string what indicates which of the three passwords is to be changed; it
must be one of "r", "w", or "e", for read,
write, or executive passwords, respectively.
15.7 Record Locking
Thor provides a mechanism for "locking" a TDT ("record").
When a client program locks a record, the record is said to be "owned"
by that client. The owner of a record has exclusive write access to
that record; no other client can modify or delete that record
(although they can read the record). A record can only be locked by
one client at a time.
Record locking is an all-or-nothing affair: Conceptually, if record
locking is enforced, then all records
must be locked before they can be modified. In practice,
if you write an unlocked record, it is automatically locked, written,
then unlocked. This means if another client has that record locked, your
write will fail due to a lock violation.
Once a record is locked, the client that owns the lock can do the
following:
- Change the record:
-
The client with the lock can modify the record; no other client can.
- Write the record to the database:
-
If a modified, locked record is written to the database, the changes
are "invisible" to other clients until that record is
unlocked ("committed"). Other clients will "see"
the original record, even though the client holding the lock sees the
changes.
- Delete the record:
-
A deletion is essentially the same as a change: Only the owner of the
lock can delete the record, and the record will appear unchanged
(undeleted) to other clients until it is unlocked ("committed").
Deleting a record does not unlock it -- the lock remains in
effect until it is explicitely removed (which causes the deletion to
be "committed").
- Rollback modifications:
-
As long as a record remains locked, it can be "rolled back"
to its original state. That is, if it has been modified or deleted,
those changes are undone by the "rollback" operation.
Rolling a record back does not unlock the record.
- Commit modifications:
-
When the record is unlocked, it is "committed". That is,
all modifications are finalized and become visible to other clients
using the database. This includes deletion -- deletions take effect
when the record is unlocked.
When a record is locked by one client, all other clients that try
to use the record are restricted to read-only operations.
That is, they can only retrieve and examine the record (see
dt_thor_tdtget()),
and find out has it locked (see
dt_thor_tdtlockedby()).
It is possible to lock a record that does not exist. This is commonly
necessary when writing a new record to the database -- the record is
locked, then written and finally unlocked ("committed").
The actual record locks are maintained by the Thor server. If a
client disconnects from a Thor server or closes a database while it
still has records locked, the locks are automatically discarded and
the records are "rolled back". Any changes made but not
committed are lost. Locks can only be retained while a client is
connected to a Thor server and has a database open.
Record locking is not necessary in most situations. Thor's ability to
merge records makes it possible for users to simultaneously modify
records with little chance of conflicts. On the rare occasion when
conflicts arise, Thor's timestamp facility provides adequate warning.
The following functions control locking enforcement:
-
dt_thor_settdtlocking(Handle database, string password, dt_Integer enforce_locking) ==> boolean OK
-
Sets or unsets "record locking" enforcement for THOR database.
If
enforce_locking is TRUE, locking is enforced; if it is
FALSE, locking is disabled.
You can't change record locking enforcement while the database
is in use (i.e. open by any other client).
When record locking is enforced, records that are retrieved from a
writeable database are automatically locked (see
dt_thor_tdtget()).
A writeable database is one opened with "w" or
"e" permission using dt_open().
Record locking is a permanent property of the database
(i.e. it is retained when the database is closed and reopened), and it applies
to all client programs using the database.
-
dt_thor_tdtdttlocking(Handle database)
-
Returns TRUE or FALSE, indicating respectively that record locking is or is not
enforced for the specified database.
Other functions related to or affectd by record-locking enforcement are:
dt_thor_tdtget
dt_thor_tdtget_raw
dt_thor_tdtlockedby
dt_thor_tdtput
dt_thor_tdtput_raw
dt_thor_tdtremove
dt_thor_tdtremove_raw
Back to Table of Contents
Go to previous chapter THOR and Merline Servers
Go to next chapter THOR and Merlin Datatypes.
|