Infinity's Compound Registration System:
A Case Study

At Infinity Pharmaceuticals we use the Daylight toolkits for in-house development and have implemented DayCart® as a central component of our chemical registration system. This system has been fully integrated within all aspects of our drug discovery processes and software applications.

We have successfully "pushed" the registration process to the chemists; they can quickly and easily register compounds within the context of their electronic notebooks. This has provided a significant enhancement to both the efficiency of data capture and the quality of the data within the chemistry database(s).

DayCart® integrates well into our overall application and data environment providing a flexible chemical data model which facilitates a data architecture that is tailored to Infinity's processes and scientific approach while at the same time allows speed and scalability.

The benefit delivered to Infinity is measured in the strong capability to rapidly and efficiently develop intellectual property and access corporate knowledge effectively.

BACKGROUND

Infinity Pharmaceuticals is based on the scientific power of diversity-oriented organic synthesis (DOS), informatics, and chemical genetic screening of pathways and phenotypes. In concert, these technologies are being developed to transform the drug discovery process while addressing the pharmaceutical industry's shortage of biologically active and selective drug candidates directed against biologically well-validated targets.

Novel chemistry accessed through diversity-oriented synthesis generates molecules active and selective against a broader range of targets, including those previously thought to be intractable and "non-drugable." Chemical genetic screening strategies enable exploration of the entire range of possible drug targets in a non-biased manner. The efficient integration of these two technologies, in combination with advanced informatics and automation, yields expanded pools of both validated drug targets and active drug candidates. By working with better validated targets and a larger selection of active drug candidates, Infinity's goal is to increase the success rate of drug discovery and thereby create unique value in the pharmaceutical industry.

Infinity is pioneering the use of diversity-oriented synthetic approaches to generate its unique chemical library. Each library consists of large numbers of molecules with key features: complexity of three-dimensional structure, stereochemistry, and rigidity; the hallmark of drugs that are both highly active and selective. Previously, compounds with these highly desirable "drug-like" features most often came from natural product sources. While a historically important source of drugs, natural products suffer from liabilities that severely limited their utility as a starting material for drug discovery.

Infinity's compounds suffer none of these limitations; they are designed to be readily synthesized (including at scale), are tractable for medicinal chemistry optimization, do not contain any toxic elements, can be biased with rationally designed pharmacophores to rapidly create focused sub-libraries, and can be formatted for all biological screening modalities. Structurally distinct, virtual diversity-oriented compound sets have been generated, ranging in size from 1,000 molecules to over 2 million molecules.

A chemical registration system is the essential connection between the science and business aspects of a chemistry-based company. This system rigorously links a company's scientific data to the information required to establish intellectual property. Properly built, a chemical registration system efficiently represents the scientific and business vision of the company and it to quickly and powerfully execute strategy by asking complex scientific questions, the answers to which are at the intersection of chemical structures and biological activity and characteristics of those structures.

Currently, there are no chemical registration systems available from 3rd parties which provide the flexibility and control required to support Infinity's ground-breaking new chemistry.

Infinity has attracted a world-class team of software development, infrastructure management, and informatics professionals. This team is deploying a state-of-the-art framework for the management and integration of chemical and biological knowledge: a standards-based framework, which leverages a next-generation "XML web services" software architecture and enables scientists to focus on science instead of administration. At the highest level, the goals of the Discovery Informatics team are as follows:
  • Enable true high-throughput tightly integrated chemistry and biology by delivering "real-time" analytical, operational and predictive applications to Infinity scientists and collaborators.
  • Minimize the cost of integrating large numbers of heterogeneous information sources through the standardization of interfaces for biological, medical, chemical, and business information applications and equipment.
  • Enable employees to share information seamlessly thus improving the quality and velocity of decision-making without disrupting workflows
Discovery Informatics' work encompasses many aspects of the scientific and business processes. While the primary focus continues to be on the goals described above, emphasis has also been placed on the underlying infrastructure required to make this possible. The group works in a number of areas, including the following:
  • Infrastructure including high performance computing (LINUX), databases (Oracle), e-mail, document repositories and basic computer needs.
  • Compound Management sample, compound and plate tracking.
  • Results management trapping, vetting, curating and publishing of data.
  • Data Searching information access and analysis
  • Computational Science this infrastructure and eclectic mix of results provide the computational arm of DI the ability to perform critical in-silico analyses and experiments, including SAR, pattern discovery and docking.
  • Library Production including RFID encoding and library enumeration.
  • Ad hoc Support the group provides assistance in both Chemistry and Biology for data capture, analysis, curation and reporting.
Data flows into the current architecture (see below) through a variety of sources:
  • Compound registration: Every compound generated and tested at Infinity is registered and annotated through a standard mechanism, using a combination of in-house tools and third party APIs to provide structures (SMILESTM, MOL and images) and common descriptors, including Lipinski parameters and Tanimoto similarity.
  • Internal high throughput screening (HTS) data: This includes both primary and retest data generated in both automated and semi-automated fashion.
  • External HTS data: Data generated by corporate partners is curated and loaded into the appropriate databases with available annotations.
  • Internal low throughput screening data (LTS) data: Data of this sort includes dose response and secondary assay results.


  • External LTS data: Data generated by corporate partners is curated and loaded into the appropriate databases with available annotations. In addition, derived values may be recalculated based on internally validated algorithms (i.e. Ki).
  • Compound management details: All results are associated with samples in the compound management system. As samples move throughout the drug discovery process at Infinity, we attempt to log all important aspects of the life cycle including container and aliquot volume, composition and location.
  • Computational Sciences: SAR, pattern discovery, clustering and other computational approaches leverage the infrastructure and data to provide scientists with analysis of data sets for critical path decisions.
  • Library production: Information regarding synthesis pathways is trapped in ELN and used to encode RFID tags for library production.
  • Analytical data: Currently calculated values as well as instrument spectra are stored in existing repositories, in the case of spectra we have converted our data to GAML. (not sure if this is necessary)

THE CHALLENGE

Infinity envisioned a compound registration system which would serve as the primary index for scientific information within the company. It would link chemical samples to biological data and would have the following key features:
  • Allow registration of large libraries of compounds having sophisticated skeletal, chiral and functional diversity.
  • Allow registration of compounds from a variety of outside sources including building blocks (reagents) used in Infinity's synthetic processes
  • Allow annotation with aliases that permit flexible access to data through any of the ways in which compounds are named or aliased.
  • Handle salts and mixtures
  • Allow corrections to be made to structures without changing the key index to the biological data and other data generated within Infinity.
  • Allow many interfaces to leverage the same API.
  • Provide an end-user registration facility to dozens of chemists who would be registering up to 10 new compounds per day.

THE SOLUTION

At Infinity we explored and implemented a number of third party solutions all which failed to meet our requirements, before we turned to Daylight Chemical Information Systems, Inc., a company known for enabling the customized development of sophisticated cheminformatics systems for the software and tool suites that Infinity would use to underpin its informatics infrastructure. Daylight provides tools that enable its customers to deploy the most advanced registration systems in use today. By providing an open architecture, Daylight empowers its customers with the flexibility to build systems that meet unique requirements and give users control over a broad range of variables.

A key part of Daylight's technology is DayCart®, the chemistry cartridge which extends an Oracle database environment with comprehensive chemical intelligence established in the Daylight Toolkits. The critical value Infinity sees in DayCart® is the flexibility to design its own data model. Unlike other chemistry cartridges designed with specific application interfaces in mind, DayCart® is built to allow direct access to regular Oracle tables. This means that there are no pre-set limitations set to the design of the data model.

"Daylight gives us the control to do it the way we want," says John Walker, an Informatics Analyst at Infinity, "No other companies can do that."

Infinity requires a registration system that tracks specific samples of compounds through the Infinity development process. Registering samples enables Infinity's informaticians and scientists to accurately link biological information collected at all stages of the process directly back to the actual samples and their requisite structures. Designing a data model that is sample-centric rather than structure-centric can only be done using DayCart®.

Another aspect of DayCart® which is important to Infinity is the structure normalization tools built into the cartridge. Simply stated, these tools allow users to define structure conventions for storage and retrieval. Scientist can enter a structure in a variety of ways and DayCart still knows what it is. Furthermore, the SMILESTM,; and SMARTS® languages and Daylight tools allow you to build a hierarchy of relationships between structures, thereby linking compounds together in a meaningful way to provide greater efficiency in their analysis. This level of control over structure conventions and relationships enables Infinity to capture information at a much deeper level than other cartridges allow.

Also important to Infinity is the unparalleled speed and scalability of DayCart®. Infinity's use of chemical-genetic screening strategies means that they will need to register an extraordinarily large number of compounds and samples. DayCart® is uniquely capable of handling the largest possible databases without compromising performance. It remains robust over data sets of millions of substances. A substructure search on a database of 22 million structures has been benchmarked at 2 seconds; more complex searches are also routinely performed with impressive speed characteristic only to DayCart®. This is the level of performance that sets DayCart® apart from other chemistry cartridges as the best-of-breed; it is also the high performance that Infinity requires for its registration system.

"My experience with Daylight goes back many years and IUve always been impressed with them. They provide us with the ability to structure our informatics systems the way we want rather than constraining our development processes and limiting our performance", said Dennis Underwood, Infinity's VP of Computational Sciences and Discovery Informatics. "Daylight tools are not only a central component of our compound registration system but also enable us to build flexible searching tools that integrate chemical data with biological data and data from other process oriented systems such as library production, analytics and purification"

In its current form, Infinity's Registry System has four interfaces that input of the system:
  • Compound Registration a facile user interface integrated with the e-notebook system which allows chemists to input compound submissions
  • Library Registration A bulk structure loading tool available to the informatics team
  • Purification - All purified samples are registered with structures based on the analytical analysts
  • Structure Correction a tool suite called by a variety of applications which allows for single and bulk structure correction
Each of these provides a different entry to the system, but each is accessing the same tables within Oracle. Infinity can continue to develop custom interfaces to the registration system, or can tie in third party applications such as visualization or chemometric tools.

On the other side of the registration system are several ways to retrieve information. Applications have been deployed which use the registration data for physical management of samples, management of biological results, and third-party document management, reaction planning, and e-lab notebook systems. Again, this is just the beginning. Infinity plans to continue development of the system by adding more clustering tools, property calculators, and the means to register molecular building blocks to add even greater speed to the QSAR analysis.

The value derived from having a customized, high-performance registration system at Infinity is clear to both its scientists and management. Infinity's new registration system is at the core of our entire drug discovery process, from synthesis of new, novel, natural product like compounds, the high throughput screening of those compounds in biological assays, as well as the analoging of new compounds around validated hits.

A registration system which both allows chemists to rapidly register the analogs, as well as keep track of all compounds which are active against a particular biological target, is of paramount importance in determining the limits of the patentable intellectual property as quickly as possible.

"A year ago I was the only person who could register compounds; fortunately for Infinity, with the exception of four analogs our entire collection was from DOS libraries." said Molly Wasserman, Informatics Analyst at Infinity, "Within a few months of deploying our Daylight based registration system, we had over one hundred analogs as well as several thousand purified samples. It would have been impossible to implement tactical solutions without the chemists registering their own compounds."

By customizing its data model, distributing registration, and centralizing data quality control Infinity has built a registration system which frees it to pursue a unique scientific strategy without being limited by the capabilities of a 3rd party registration system.

Contact:
Peter Nielsen
Daylight CIS, Inc.
(802) 223-9831
peter@daylight.com