Euromug01 24th-26th September 2002, Cambridge UK

Program objects

John Bradshaw
Daylight CIS Inc., Sheraton House, Castle Park, Cambridge, CB3 0AX, UK

Introduction

Program objects are used to provide two-way communication with an external process, e.g., the clogptalk program for computing hydrophobicity for a structure represented in SMILES. Using program objects, a calling program can start an external program, send it input, receive its output, and perform other tasks while the external program remains running and ready for more input.

A number of programs supporting program objects are supplied with the release of Daylight Software. Most of these are supplied as contributed code, in the directory: $DY_ROOT/contrib/src/c/progob/. The commercial programs clogptalk and cmrtalk operate as program objects (in $DY_ROOT/bin).

Using program objects

Program objects are normal UNIX programs, scripts, etc. which communicate through standard input and standard output using ASCII messages with a specifically defined protocol (the "PIPETALK" protocol). Any executable within the UNIX environment which adheres to this protocol can be used as a Program Object. Note that program object programs need not be Daylight Toolkit programs. There are example program objects within the $DY_ROOT/contrib/src/c/progob directory in the standard distribution.

This approach allows a program to be used like a function, but without the need to link to the object libraries underlying the program. For instance, linking a program to functions written in C (e.g. X-windows) and in FORTRAN (e.g. the MedChem library) is extremely difficult in some versions of UNIX.

This approach also avoids the high overhead associated with running external programs from files whenever their functions are needed. For instance, some users have implemented the following approach to clogp computation from a SMILES:

Aside from all that file manipulation, this is an extremely slow method because the clogp program must initialize itself each time a computation is run (although clogp's computations are fast, its initialization is slow because it has to read in the fragment database, read in customizations, etc.) This poor perfomance is due to the one-way nature of pipe communication via the shell. Use of clogp as a program object eliminates such problems.

Program objects are created by the function dt_alloc_program() from an executable file name. Messages consist of zero or more ASCII strings and are represented in the Daylight Toolkit by a sequence of string objects. Once the calling program has created a program object, it can converse with it using messages, via dt_converse(). Program objects are deallocated with dt_dealloc().

Note that the program may itself be a communication program with another protocol such as rsh(1) to talk to another UNIX box, or a socket communication program allowing connection to any machine on the network, irrespective of its operating system e.g. Windows2000.

Program operation

Programs operate by receiving a message then sending a message. Each time a message is received, one message must be sent. The sent message may be empty (i.e., the program must send only EOM). (It's OK to start sending while reading.)

To prevent "deadlock", it is critical that programs never send unsolicited messages, and that they never begin their replies until the entire input message is received (i.e. the EOM message is encountered). In addition, the program must ensure that its output buffer (standard output) is "flushed" after each message, as otherwise the parent program will sit waiting forever for a message that is stuck in the child program's internal buffers.

Naming conventions

By convention, programs which communicate via pipetalk protocol have names that end in "talk", e.g. clogptalk. The programs which drive them usually have names which end in "talker", e.g. $DY_ROOT/contrib/src/c/progob/pipetalker. To maximise the value of this approach the talker program needs to be persistant. So care needs to be taken when running in a stateless environment such as the web.

Example

As the only requirement is that a string is received and a string is sent, we can make use of the very powerful concept in the Daylight software viz there is a lexical ( and printable ) external form of all our internal objects. So for instance we can send a structure as a SMILES and return the SMARTS of the ring systems it contains.
Note that this particular example does not make use of the persistent nature of program objects. In a production system the link would remain open for continued input by some means or other.

Using program objects with Daycart®

The PIPETALK protocol provides an ideal way for the Daylight Oracle cartridge to communicate with external function for instance to populate a table column with clogp values say with a sql line like

UPDATE my_table SET clogP = clogp(smiles);

Example talk programs

One of the advantages of talk programs is that you only need to have one copy of the code and a whole variety of programs can access the same algorithm/code. This means, for physical property calculations for instance, for a given structure you always get the same answer, irrespective of the calling program. To this end we have produced a talk program which will calculate a variety of physical properties commonly used in molecular selection and design directly from the structures. These values can then be stored along with the structure from which they were derived. These functions are available via a program_object interface which allows them to be called from within DayCart®. described above. Many of these are included in the VCS building routines, calculated from the parent or version smiles, as appropriate. All values are returned as part_tuples except for PART_COUNT which is an integer.

These programs can be accessed in the a demo form with the same caveats as above and will be included in the 4.8x release.



Daylight Chemical Information Systems, Inc.
support@daylight.com

John Bradshaw.