Daylight v4.9
Release Date: 1 February 2008

Name

thorload - load data into a Thor database

Unix Synopsis

thorload [options ...] database [inputfile.tdt]

Description

thorload loads Thor Datatrees (TDTs) into a Thor database via a Thor server. TDTs are loaded from inputfile.tdt; if it is not specified, standard input is used.

When the load is finished, thorload reports on the total number of TDTs loaded and the number of errors encountered.

Options

-REJECT_LOG file
TDTs that can't be loaded are stored in this file, prefixed with an error message indicating what the problem is. Default is "standard error".
-MERGE TRUE|FALSE
If TRUE, TDTs are merged with existing data; if FALSE, new TDTs replace existing data. If MERGE is FALSE and OVERWRITE is FALSE, then a TDT whose root identifier is already in the database will be rejected unless it has a timestamp that is newer than the one in the database. It is illegal to have both MERGE and OVERWRITE be TRUE. It is also illegal to merge datatype-definition TDTs; when loading a datatype-definition database, MERGE must be FALSE. Default is TRUE.
-OVERWRITE TRUE|FALSE
If TRUE, a new TDT will replace an existing TDT with the same root identifier. If OVERWRITE is TRUE, the database is opened with executive permission; the password you supply must be for executive permission. It is illegal to have both MERGE and OVERWRITE be TRUE. Default is FALSE.
-RAW_DATA TRUE|FALSE
Causes TDTs to be loaded "raw" - a faster load, but with no standardization or validation. This option should ONLY be used with data that were obtained by dumping an existing Thor database. You should never load unstandardized data.
-CACHE_LEVEL OFF|WRITETHRU|READWRITE|WRITETHRU_ALL|READWRITE_ALL
Specifies the cache level. OFF disables caching altogether. WRITETHRU causes the database's hash table to be cached for reading, but writes are immediately posted to the disk. READWRITE caches the hash table for reading and writing; changes aren't posted to the disk until the database is closed. WRITETHRU_ALL caches the entire database (which may require considerable memory, depending on the database's size) for reading, but immediately posts modified records to the disk. READWRITE_ALL caches the entire database; changes aren't posted to the disk until the database is closed or "sync'ed". Note that this option has no effect if caching is disabled or forced for the database (see thormake(1), sthorman(1)). Default is "" (unspecified -- use the database's default).
-DO_RECORDS N
Write N records to the database then quit. A typical use of this option might be in conjunction with the -SKIP_RECORDS option (below) to process a file in "chunks". Default is -1 (do all records).
-SKIP_RECORDS N
Indicates that the first N records of input are to be ignored. This is useful for resuming an interrupted database load, or for processing a database in "chunks" (see DO_RECORDS, above). Default is zero.
-DEAL n/N
Only process the nth of each N input TDTs (similar in concept to dealing cards to players). This is useful for "parallel processing", in which several clients (typically on different CPUs) simultaneously load the same database from the same input file. For example, if three CPUs are available, all with access to the input file and with network access to the server, the first would use "-DEAL 1/3", the second would use "-DEAL 2/3", and the third would use "-DEAL 3/3". This option can also be used to build a "sample" database from a large TDT file. For example, loading a database with "-DEAL 1/10" will build a database with 1/10th of the whole TDT file. (Note that "-DEAL 3/10" also loads 1/10th of the whole file, not 3/10ths of the whole file. "-DEAL 3/10" selects the 3rd TDT of each group of 10, not the first 3 of each group of 10.) Default is 0/0 (i.e. process all input data).
-GENERATE_INDIRECT TRUE|FALSE
Controls automatic generation of indirect references. For databases with indirect datatypes, creates indirect TDT records automatically. Thorload keeps generates an arbitrary key from the data, and stores the indirect data under that key. It also replaces the data in the main database with the key and stores it. The key used is actually generated using a hash function on the data, so Thorload does not generate duplicate indirect references. Default: FALSE.
-EXCLUDE_INDIRECT ALL or [tag ...]
-INCLUDE_INDIRECT ALL or [tag ...]
These two options select which indirect datatypes are to be automatically processed. Each takes a list of datatypes tags (e.g. "$ISK $IST") or the keyword "ALL", indicating that all datatypes are included. Tags may be separated in lists by space, bar (|), or comma (,). The option INCLUDE_INDIRECT is processed before option EXCLUDE_INDIRECT, i.e. starting with no datatypes selected, the INCLUDEd datatypes are added, then the EXCLUDEd datatypes are removed.

The default for INCLUDE_INDIRECT is "ALL", and for EXCLUDE_INDIRECT is "NONE". The default values result in selection of all datatypes. Ancestors of included dataitems are included regardless of datatype.

Tags in the tag lists must be exactly as per the internal tag, e.g., $IST" for "indirect title" (punctuation and case count).

-INDIRECT_DATABASE dbname
Specifies the indirect database name to which the generated indirect references are registered. This is required when -GENERATE_INDIRECT TRUE is specified. This must match the indirect database which is specified for the main database in its THOR file. This option is mainly present as a way to specify passwords to the indirect database. This option will NEVER override the indirect database which is set for the main database.
STANDARD THORFILTER OPTIONS:
----------------------------

The following options are common to most or all "thorfilter" programs. They are described in more detail in thorfilters(1).

-SECURE_PASSWORDS TRUE|FALSE

TRUE means don't allow passwords on the command line (require interactive entry). Default: TRUE.
-THOR_IPC_SERVICE service
Names the default TCP/IP service or "port" of the Thor server. Default: thor.
-MINOR_REPORT N
The interval (number of TDTs) between minor reports. The minor report is a period "." printed on "standard error". N = 0 suppresses the minor report. Default is 10.
-MAJOR_REPORT N
The interval (number of TDTs) between major reports. The major report prints the number of TDTs processed and the number of errors so far, followed by a newline, to "standard error". N = 0 suppresses the major report. If both MINOR_REPORT and MAJOR_REPORT are zero, summary information that is normally printed at the end is also suppressed. Default is 500.

Return Value

Returns status is zero if the load succeeds, or one if it fails. Failure includes failing to open the database or the input file, or invalid options. Errors loading particular TDTs do not count as a failure.

Files

$DY_ROOT/bin/thorload

Daylight License

programs: thormanager

Related Topics

dayevict(1) daymessage(1) merlindbping(1) merlinload(1) merlinls(1) merlinping(1) merlinwho(1) thorchange(1) thorcrunch(1) thordbping(1) thordelete(1) thordestroy(1) thordiff(1) thordump(1) thorlist(1) thorlookup(1) thorls(1) thormake(1) thorping(1) thorwho(1)

sthorman(1) thorserver(1) merlinserver(1) licensing(5)

Daylight Theory Manual, Daylight System Administration Manual

$DY_ROOT/contrib/src/thor - several example shell scripts for Thor database administration.

Bugs

None known.