Fedora Servers

Scott Dixon and Dave Weininger
Metaphorics, LLC

1. Purpose

To provide simple interface to allow exploration of chemical and biological information and to show the relationships between the different types of information.

To provide access with as little up-front structure as possible.

Provide simple installation and need little training.

2. Technology

Fedora is built on top of HTTP toolkit as well as Smiles, Smarts, Depict, etc as appropriate.

Fedora servers use HTTP protocol to transfer information to client (browser) as well as to communicate between each other.

Each server provides access to a particular kind of information or provides a particular utility service.

3. Why HTTP

HTTP is ubiquitous and it widely used for carrying large amounts of information. Networks are already set up for using HTTP.

Web browsers are also ubiquitous on client desktops and users are well trained in using browsers.

Each server can provide services to a whole corporate network. Can provide plain text, XML, images, applications, etc, etc.

4. Why Fedora

Many data integration efforts try to create a unified data model for disparate data (i. e. ontologies, controlled vocabularies, etc.). This a a worthy but difficult task to accomplish. Such complex data models also are difficult for users to understand and are difficult to extend to new information sources.

Fedora servers take a simpler approach in which each server specializes in a particular kind of information. A server can ask the other servers whether they have anything relevant to contribute. If they do, they send a pointer to where that information can be viewed (rather than sending the data itself). Usually the servers use similarity methods to determine what they can contribute.

Advantages:

Information is best represented by a native data model.
Each server does the best job representing its data, regardless of application..
Servers don't have to know about each other's conventions so it is easier to add new servers.

5. Current servers

WDI

World Drug Index which contains over 60,000 entries and a large variety of related pharmacological information (~800,000 fields). The WDI Fedora server implements a full complement of structure/name/data searching methods.

Pathos

Biological pathways server. Currently metabolic pathway information. Has a complex data model including agents, cofactors, compounds, diseases, enzymes, landmarks, notes, pathways, regulators and steps. Pathway information provides a natural index for uniting chemical and biological information.

Planet

Protein-Ligand Association NETwork. Each data object is a protein-ligand association, i.e., a relationship between one or more proteins and one or more ligands. The data is taken from the Protein Data Base with considerable processing of the ligand data to get good connectivity information. Sequence similarity and small molecule similarity search methods are implemented. Could also contain other protein-ligand complexes (e.g. computed from docking, etc).

Plaid

Protein-Ligand Accessibility and Interaction Diagrams. Generates 2D schematic diagrams of protein-ligand complexes. It's services will be used by Planet and perhaps other servers.

Plaid consists of three parts:

Analysis. Uses a set of pattern and geometry based rules to analyze a protein structure for interactions and geometric features. Provides an annotated structure for the next step. We are also using this engine to develop other tools for analysis of protein-ligand complexes and docking results.
Layout. Subsets the protein into ligands and interacting parts of the protein and generates a 2D layout using a 2D distance geometry algorithm. The 2D DG algorithm does well at generating ligand depictions (for example, this depiction of a heme v.s. the standard depict depiction) and also can incorporate constraints representing hydrogen bonds and other interactions with a protein. It should also be possible to extend this to laying out, for example, multiple ligands binding to the same protein. However, we have not tried this yet.
Rendering. Uses the 2D layout to generate a graphical depiction of the diagram. The current renderer generates Postscript which is intended for printing out at high resolution. Future renderings will be aimed at screen display and perhaps other formats.

TCM

Traditional Chinese Medicine: structures, indications and effects. Contains information about primarily plant extract preparations along with pharmacological effects. Chemical entries contain structures observed in Chinese medicinal preparations.

DCM

Dictionary of Chinese Medicine. Provides definitions of TCM terms as well as some cross references to western medical concepts.

Utility servers

ecbook supplies information from the Enzyme Commission codebook. park supplies images. Various other servers provide for logging and process control.

Ruby

Example computational server which computes coordinates using the Rubicon distance geometry package. Fedora servers can be used to deliver computed results.

6. Demo

7. Conclusions

Fedora servers provide a method for suppling diverse chemical and biological information to a network with low setup and training costs. The Feodora servers can host both data sources as well as provide utility and computational services and allow scientists to rapidly explore the links and relationships between different types of information to develop questions or new ideas. Thus, the Fedora technology complements and supplements existing database resources and informatics and computational groups.