Euromug '96
- New projects
- previous
- next
Synopsis
The project goal is to implement a medium-to-high performance chemical
information server and use it to deploy large chemical databases
on the Internet.
Status
Design and initial negotiations with data vendors are underway.
Human resources and computer infrastructure could be in place mid-1997.
Databases and pre-production software could be in-place by
the end of 1997.
People
- Dave Weininger (lead and system design)
- Daylight employee to be announced later (interfaces and integration)
- Jeremy Yang and Norah Shemultuskis (database design)
- Jack Delany (reaction databases)
- Yosef Taitz (business and legal)
Description
The Mjollnir project was spawned by an analysis of the sociological
aspects of scientific information exchange by Howard Winant about
16 years ago (not only pre-Daylight, but pre-MedChem!)
One of the conclusions of this analysis was that
of the three sociologically-evolved methods of scientific information
exchange (reviewed journal, library, forum), only the forum is suitable
for use on a global network in a fully distributed manner.
In an idealistic forum, participants must identify themselves,
may freely (or at least equally) have access to all information,
and may make contributions without prior review or censorship
(including comments about other people's contributions).
Most scientific forums take the form of conferences, but the forum has
no requirement for immediacy, e.g., usenet is a good example of a forum.
The Thor database system was designed with the forum concept in mind,
although it is not (yet) used this way.
Universal structure-based indexing, constant time retrieval and
strict ID-data distinction make Thor particularly suitable for use
in a chemical information forum.
An early version of Mjollnir which implements (only) universal access
of low-performance data retrieval via e-mail has been operational for
several years. It is primarily used by academics with poor access to
information. Although this system makes a lot of data available
(the medchem, tsca, spresipreps, and wdi databases), its very low
performance makes it immune to "database-raping" and limits its use
to delivering data to desparately poor students and
marketing databases to better-funded researchers (e.g., database demos).
The current Mjollnir project is aimed at raising the system's performance
to a more usable level and introducing some forum features.
Key design features include:
- Very high performance database server capable of handling
10's to 100's of Merlin search requests per minute,
i.e., the combined educational database needs in the US.
- Integrated interfaces which reduce per-user bandwidth to allow
data delivery at typical internet speeds.
- Multiple large databases totalling over 7 million
structures and reactions are on the table.
- Free access.
- User's identities and search requests are available as data:
- Users find out who else is interested in similar data (the carrot).
- Not suitable for proprietary queries (the stick).
- Allows users to examine databases before subscribing to them or
buying them for high-performance, in-house use.
- Restrictions against database dumping will be automatically enforced.
Given the realities of how most chemical information is collected
(expensively) and disseminated (expensively and not that universally),
the Mjollnir approach seems to be in everyone's benefit:
- Academic chemists and chemistry students
-- Academic chemists (and libraries) are being pinched by severe
budget cuts in many colleges and universities. Research monies
which used to take up the slack are not so plentiful anymore.
Mjollnir should provide such chemists with free access to data
- Industrial chemists
-- Mjollnir will provide a mechanism for industrial chemists with
Internet access to explore many sources of chemical information
quickly and easily.
For all the advances in modern chemical informatics, this is
something which hasn't reached most bench chemists.
It is unlikely that industrial chemists will be able to use the
public Mjollnir server for their day-to-day work since it is not
possible to ask proprietary questions.
If the service is something they really need, the assumption is
that they have the resources to obtain it (e.g., obtain the
database for in-house use).
- IS/IT specialists
-- This system provides try-before-you-buy functionality for both
software and databases.
Current data aquisition decisions are often made for historical
reasons or based on "blurbs".
Mjollnir should allow better-informed decision-making
low cost in time and money.
- Database vendors
-- Publishing an up-to-date chemical information database is
intrinsically expensive and any workable system must ensure that
database vendors must get a good return for their efforts.
The keys to selling data are quality (data that people actually need)
and volume (multiplies the effect of the effort).
The Internet provides a very effective mechanism to reach potential
customers and to allow them to become familiar with the product.
In this context, "free academic use" is an advantage: most chemical
databases are way too expensive for academics anyway (no real loss)
and having students be trained to use a database as one of their
tools is a great advantage in the long term.
The assumption that those that can buy it are the same people as
those with proprietary questions is arguable, but eminently testable.
One nice assurance for the vendors is that the data is maintained
in a central place with provisions against dumping -- if it seems
that the system is being abused, they can "pull the plug".
- Online services --
-- Once it proves itself, the Mjollnir server will be made available
as normal Daylight product.
Our intention is that Mjollnir should provide a cost-effective
delivery system for existing- and would-be online services,
whether public/free, commercial (charge by subscription or usage),
or private (in-house, secure).
There don't appear to be existing products of this type available,
especially for small services.
- Daylight --
-- Mjollnir represents both a commercial product and a step in
Daylight's mission to bring chemical informatics to all chemists.
Since Daylight sells the databases, the underlying database servers
and Mjollnir itself, it is expected to serve as a marketing tool.
We hope that companies will say, "Gee, if you can deliver eight
million structures across the Internet you can surely handle our
few hundred thousand structures on our local network".
The only way to build a reliable and high-performance system is
to actually maintain one in active use: Mjollnir will give us the
opprotunity to do so in a big way. It also serves as part of our
continuing efforts to support chemical education which is now in
crisis.
Hardware
We expect to start with a Sun 4000 Enterprise which Sun has loaned us
for server development.
Enterprise machines are great for servers such as ours:
the architecture scales up to 30x500 MHz CPUs, 30 GB RAM, and 6 TB disk.
They claim that the new asynchronus memory manager can keep up with all
those CPU cycles (the 500 MHz CPUs aren't shipping yet, we'll see...)
As a bonus, they have very robust (fault-tolerant) features.
|
|
Physical environment
We are building an office with provisions for uninterruptible power
and a T1 line which will serve as a stable environment for the service.
Should be ready by mid-1997.
|
Human resources
Two additional research office staff will come on board in early 1997:
one primarily for research support,
the other a network/java-interface guru.
Both will spend some of their time on this project.
|
As always, we are looking for input and feedback from our users.
If you have ideas about the role, design, or implementation of
Mjollnir, let us know. Soon!
Daylight Chemical Information Systems, Inc.
info@daylight.com