13. Program Object Toolkit

Back to Table of Contents

13.1 Introduction

Program objects are used to provide two-way communication with an external process, e.g., the clogp program for computing hydrophobicity for a structure represented in SMILES. Using program objects, a calling program can start an external program, send it input, receive its output, and perform other tasks while the external program remains running and ready for more input.

A number of programs supporting program objects are supplied with the release of Daylight Software. Most of these are supplied as contributed code, in the directory:

     $DY_ROOT/contrib/src/progob
The programs clogptalk and cmrtalk operate as program objects (in $DY_ROOT/bin).

13.2 Using Program Objects

Program objects are normal UNIX programs, scripts, etc. which communicate through standard input and standard output using ASCII messages with a specifically defined protocol (the "PIPETALK" protocol). Any executable within the UNIX environment which adheres to this protocol can be used as a Program Object. Note that program object programs need not be Daylight Toolkit programs. There are example program objects within the "contrib/src" directory in the standard distribution.

This approach allows a program to be used like a function, but without the need to link to the object libraries underlying the program. For instance, linking a program to functions written in C (e.g. X-windows) and in FORTRAN (e.g. the MedChem library) is extremely difficult in some versions of UNIX.

This approach also avoids the high overhead associated with running external programs from files whenever their functions are needed. For instance, some users have implemented the following approach to clogp computation from a SMILES:

  • write the SMILES to a Thor datatree file, e.g. in.tdt,
  • exectute the clogp program via system("clogp /tmp/in.tdt /tmp/out.tdt"),
  • open the output .tdt file and interpret the results,
  • remove the .tdt files via system("/bin/rm -f /tmp/in.tdt /tmp/out.tdt").

Aside from all that file manipulation, this is an extremely slow method because the clogp program must initialize itself each time a computation is run (although clogp's computations are fast, its initialization is slow because it has to read in the fragment database, read in customizations, etc.) This poor perfomance is due to the one-way nature of pipe communication via the shell. Use of clogp as a program object eliminates such problems.

Program objects are created by the function dt_alloc_program() from an executable file name. Messages consist of zero or more ASCII strings and are represented in the Daylight Toolkit by a sequence of string objects. Once the calling program has created a program object, it can converse with it using messages, via dt_converse(). Program objects are deallocated with dt_dealloc().

13.2.1 Welcome and Farewell Messages

The primary type of communication with a program object is that the calling program sends a message and the program object responds with a message. There are two other situations where program objects can send messages.

A program object sends an unsolicited message when it is first invoked; this is called the "welcome message" and is obtained with dt_welcome().

All program objects must send a welcome message (although it may be empty), and all programs which allocate program objects should call dt_welcome() after a sucessful return from dt_alloc_program().

A program object also sends a message when it is terminated; this is called the "farewell message". Calling dt_converse() with a NULL_OB message terminates the program and returns the farewell message. Sending NULL_OB is like sending a program an end-of-file. Any further calls to dt_converse() will produce empty messages. The program object should still be deallocated via dt_dealloc(). It is acceptable to deallocate a program with dt_dealloc() at any time (however, the farewell message will be lost).

13.2.2 Other Special Messages

There are several properties which are useful to know for all programs. A number of special messages are defined which all program objects will respond to:

DX_PT_HELP respond with information about how to use the program
DX_PT_PROGRAM respond with the name of the program
DX_PT_VERSION respond with the integral version number of the program
DX_PT_NOTICE respond with copyright (and/or other) notices

These definitions aren't all that special, they are simply string constants which are sent to program objects, e.g. DX_PT_HELP is defined as "Qwerty: Say HELP." All program objects should respond to the above messages in some useful way.

You may define (and document!) such messages as needed, for instance the program clogptalk recognizes the DX_TABLE message as a request for tabluated output (DX_TABLE is defined in medchemtalk.h as "Set TABLEOUTPUT.").

It is probable that other special messages will be defined in the future. You may register messages with Daylight Support - they will be included as comments in dt_progob.h so we (and others) will know not to use them.

13.2.3 Program Object Toolkit Functions

dt_alloc_program(Handle args) => Handle prog
Allocates a program object and executes a program. The parameter 'args' is an object containing the program name and any required arguements. 'Args' must be a stream or sequence of objects which respond to dt_stringvalue(), or a single object which responds to dt_stringvalue().

dt_welcome(Handle prog) => Handle sos
Return prog's welcome message, i.e., its response to being executed before being sent data. All child programs which follow the pipetalk protocol write a welcome message (which may be empty), so calling programs *must* call dt_welcome() after allocating a program object.

dt_converse(Handle prog, Handle msgob) => Handle sos
Send strings in msgob to prog as standard input. msgob may be a string object, a sequence of string objects, or NULL_OB (means end- of-transmission). The return value is a sequence of strings containing prog's standard output response. NULL_OB is returned on error (e.g., program inaccessible).

dt_delimiter(Handle prog) => Integer value
Gets the delimiter property for the program object. The delimiter property will be either DX_PT_CR or DX_PT_NONE. DX_PT_CR (the default) means that returned messages from the program object are delimited by 'newline', and the message is returned as a sequence of string objects. DX_PT_NONE means that the returned messages are not delimited, and are returned as a single string object. This string object may have multiple newlines in it, and will have a trailing newline.

dt_setdelimiter(Handle prog, Integer value) => Boolean status
Sets the delimiter property for the program object. The delimiter property will be either DX_PT_CR or DX_PT_NONE.

13.3 PIPETALK Protocol

The "pipetalk protocol" is the communication protocol which programs must follow if they are to be successfully used as program objects. Note that the contributed examples implement this protocol, so they can be modified rather than developed from scratch.

13.3.1 Definitions

End-of-message (EOM), end-of-transmission (EOT), and send-message- list (MSGLIST) strings are defined in dt_progob.h as:

	#define DX_PT_EOM	"Qwerty: Over."
	#define DX_PT_EOT	"Qwerty: Over and out."
	#define DX_PT_MSGLIST	"Qwerty: Say MSGLIST."
These definitions should not be changed. Programs should not write them on a single line for other purposes (intended to be unlikely, given the "Qwerty: " prefix).

A message is defined as zero or more strings followed by the EOM string.

13.3.2 Receiving Messages

Messages are received by reading standard input until the receipt of a line containing only the EOM string.

Note that input lines to a program object can be arbitrarily long. Programmers should be careful not to use fixed-length buffers to receive input. The Daylight contributed code directory contains examples showing the correct way for a program object to read from standard input (see $DY_ROOT/contrib/src/c/progob).

13.3.3 Sending Messages

All messages must written to standard output in this manner:

     message contents as string followed by newline
     EOT message string followed by newline
     flush standard output

13.3.4 Initial Response to Execution

Programs must send an initial "welcome" message upon execution. The sent message may be empty (i.e., the program must send at least the EOM line).

13.3.5 Program Operation

Programs operate by receiving a message then sending a message. Each time a message is received, one message must be sent. The sent message may be empty (i.e., the program must send only EOM). (It's OK to start sending while reading.)

To prevent "deadlock", it is critical that programs never send unsolicited messages, and that they never begin their replies until the entire input message is received (i.e. the EOM message is encountered). In addition, the program must ensure that its output buffer (standard output) is "flushed" after each message, as otherwise the parent program will sit waiting forever for a message that is stuck in the child program's internal buffers.

13.3.6 Response to Special Messages

The following strings are defined in dt_progob.h:

     #define DX_PT_HELP         "Qwerty: Say HELP."
     #define DX_PT_PROGRAM      "Qwerty: Say PROGRAM."
     #define DX_PT_VERSION      "Qwerty: Say VERSION."
     #define DX_PT_NOTICE       "Qwerty: Say NOTICE."
     #define DX_PT_MSGLIST      "Qwerty: Say MSGLIST."
On receipt of one of the first four strings, programs should respond with an appropriate message (containing help on program operation, program name, program version, and copyright notice, respectively).

On receipt of a DX_PT_MSGLIST message, programs should send a message containing all other recognized control strings. This response can be empty. Each of the supplied messages should be responded to in a sensible manner, but it is left entirely up to program to do so.

13.3.7 Program Termination

On receipt of an EOT message, programs must send their final "farewell" message (which may be empty) and go into a quiescent state awaiting EOF on standard input, at which time the program must exit. While in the quiescent state (after EOT but before EOF), the program should respond to all messages with an empty message (just EOM).

13.3.8 Naming Convention

By convention, programs which communicate via pipetalk protocol have names that end in "talk", e.g. clogptalk.

Back to Table of Contents
Go to previous chapter Depict Toolkit
Go to next chapter THOR and Merline Servers.