Synopsis
The project goal is to implement a medium-to-high performance chemical
information server and use it to deploy large chemical databases
on the Internet.
Status
- Human resources are in place.
- Design of interfaces and data system complete.
- User interface components are being prototyped.
- Negotiations with data vendors are in progress.
- Pre-production software on track for 1Q98.
- Production facilities won't be ready until 3Q98.
The experiment
Given today's technological, academic, commercial and political realities,
can we create a viable public system which delivers "full-strength" access
to chemical information?
People
- Dave Weininger (lead & system design)
- Norah Shemultuskis (interface design & support)
- Ragu Bharadwaj (Java interfaces & integration)
- Jack Delany (reaction databases & interface)
- Jeremy Yang (database design & management)
- Yosef Taitz (business & legal)
Mjollnir's background
Three sociologically-evolved methods of scientific information exchange
exist{1}:
the reviewed journal,
the centralized library,
and the forum.
Of these, only the forum is suitable for use over a global network in a
fully distributed manner.
In an idealistic forum, participants must identify themselves, may freely
(or at least equally) have access to all relevant information, and may make
contributions without prior review or censorship (including comments about
contributions made by other participants).
Most scientific forums take the form of conferences where all participants
are physically present (such as this meeting). However, the forum per se
has no requirement for proximity or immediacy, e.g., usenet is a good
example of a forum.
The Mjollnir project represents an attempt to build a practical forum
for chemical information exchange.
Earlier systems: Mjollnir-I
The Thor database system
{2}
was designed with the forum concept in mind,
although it is not (yet) used this way.
Universal structure-based indexing, constant time retrieval and
strict ID-data distinction make Thor particularly suitable for use
in a chemical information forum.
An early version of Mjollnir (now called Mjollnir-I) implements universal
access to large amounts of chemical data. This low-performance data
retrieval system operates via e-mail has been operational for several
years.3 It is primarily used by academics with poor access to
information.
Although Mjollnir-I makes a lot of data available (the complete
medchem,
tsca,
spresipreps, and
wdi
databases), its very low performance limits its
use to delivering data to resource-poor academics and to marketing
databases to better-funded researchers (database demos). The same low
performance makes it immune to "database-dumping".
Key design features
- Very high performance database server capable of handling
10's to 100's of requests per minute
(combined educational database needs in the US)
- Java interfaces reduce per-user bandwidth to support high volumes
at typical internet speeds
- Multiple large databases totalling over 7 million structures and
reactions are on the table.
- Free access to public Mjollnir server via Internet.
- User identities and requests are available as data:
- Users can find others who are interested in similar information
(the carrot).
- Not suitable for proprietary queries (the stick).
- Allows corporate users to examine databases before subscribing to
them or buying them for high-performance, in-house use.
- Restrictions against database dumping will be automatically enforced.
User interfaces
- Simple web-based interfaces
- Common sense interface (Internet, ChemDraw)
- Minimal learning curve
- Daylight/database expertise is not required
- Graphical orientation
- Graphical input of structures (Grins)
- Result visualization tools (spreadsheets, trees)
- Links to additional data and resources
- Based on Java applets / servlets
- Automatic downloads and upgrades
- Platform independent
- Java 1.1, servlet-capable webserver
- Browsers: Netscape! Hot Java. Explorer?
Natural language interface
- User types in words or phrases
- much like web search engines (e.g., Alta Vista)
- Graphical or text structure entry
- Grins molecular editor is a Java "bean"
- SMILES may be typed or pasted-in
- Other formats accepted file names (e.g., MOL)
- What you know - what you need
- two separate entry areas are provided
- Interpreter translates to query
- primary seach criterion is structure if available
- all query components contribute to scoring
- Datatypes & databases deduced
- text and values are translated to datatypes
- databases with relevant data are queried
- query/result recorded in transaction database
Language interface example
- "What you know":
- "What you need":
abstracts of german patents for similar structures
- Interpretation
- Structure is interpreted from Java grins
- find "similar" "structures" (both are keywords)
- datatypes are deduced:
- patents:
PAT datatype
- abstracts:
ABS field of PAT datatype
- german: keyword,
CO field of PAT datatype
- Method is robust
e.g., same as:
find similar structures then show German patent abstracts
- System learns from user queries
- Trains on local knowledge and language
POV: Academic chemists and students
Academic chemists (and libraries) are being pinched by severe budget cuts
in many colleges and universities. Research monies which used to take up
the slack are not so plentiful anymore. Mjollnir should provide such
chemists with free access to data.
POV: Industrial chemists
Mjollnir will provide a mechanism for
industrial chemists with Internet access to explore many
sources of chemical information quickly and easily. For
all the advances in modern chemical informatics, this is
something which hasn't reached most bench chemists. It is
unlikely that industrial chemists will be able to use the
public Mjollnir server for their day-to-day work since it
is not possible to ask proprietary questions. If the
service is something they really need, the assumption is
that they have the resources to obtain it (e.g., obtain
the database for in-house use).
POV: Database vendors
Publishing an up-to-date chemical
information database is intrinsically expensive and any
workable system must ensure that database vendors must get
a good return for their efforts. The keys to selling data
are quality (data that people actually need) and volume
(multiplies the effect of the effort).
The Internet provides a very effective mechanism to reach
potential customers and to allow them to become familiar
with the product. In this context, "free academic use" is
an advantage: most chemical databases are way too
expensive for academics anyway (no real loss) and having
students be trained to use a database as one of their
tools is a great advantage in the long term. The
assumption that those that can buy it are the same people
as thosewith proprietary questions is arguable, but
eminently testable.
One nice assurance for the vendors is that the data is
maintained in a central place with provisions against
dumping -- if it seems that the system is being abused,
they can "pull the plug".
POV: IS/IT specialists
This system provides convenient "try before you buy" functionality for
both software and databases. Current data aquisition decisions are
often made for historical reasons or based on "blurbs".
Mjollnir should allow better-informed decision-making
low cost in time and money. Online services Once it
proves itself, the Mjollnir server will be made
available as normal Daylight product. Our intention is
that Mjollnir should provide a cost-effective delivery
system for existing- and would-be online services,
whether public (actually free), commercial (charge by
subscription or usage), or private (in-house, secure).
There don't appear to be existing products of this type
available, especially for small services.
POV: Daylight
Mjollnir represents both a commercial product and
a step in Daylight's mission to bring chemical informatics
to all chemists.
Since Daylight sells the databases, the underlying
database servers and Mjollnir itself, it is expected to
serve as a marketing tool. We hope that companies will
say, ""Gee, if you can deliver eight million structures
across the Internet you can surely handle our few hundred
thousand structures on our local network".
The only way to build a reliable and high-performance
system is to actually maintain one in active use: Mjollnir
will give us the opprotunity to do so in a big way. It
also serves as part of our continuing efforts to support
chemical education which is now in crisis.
Actualization
- Hardware
To achieve our performance goals, we need one or
more reliable, big-memory, multi-CPU production servers.
"Big memory" in this case means more than 3-6 GB, which
implies a requirement for 64 bit addressing Fortunately,
current server technology is up to the task. We will
implement Mjollnir on three machines: a DEC Alphaserver,
an SGI Origin, and a Sun Enterprise UltraServer.
- Internet connection
The initial Internet connection is expected to
be a full-time T1 via a peer-provider with no hops to the
backbone. If successful, an upgrade to T3
might be required in 1-2 years. We are also considering
the possibility of mirror sites in Europe and Japan.
- Infrastructure
Daylight is building a new research office in Santa Fe, NM which
with a purpose-built machine room which will provide a
stable environment for the Mjollnir server. Siting and
zoning this facility is the only part of of the project
that is behind schedule; it should be competed by 3Q98
References
- Dr. Howard Winant, discussions beginning 1979 in San Francisco,
now at Sociology Department, Temple University, Philiadelphia, PA.
- MedChem Software Manual, Release 3.51, Medicinal Chemistry Project,
Pomona College, Claremont, CA; 1987
- mjollnir@daylight.com, operating since 1993 from the Research Office
of Daylight Chemical Information Systems, Inc, Santa Fe, NM
As always, we are looking for input and feedback from our users.
If you have ideas about the role, design, or implementation of
Mjollnir, let us know. Soon!
Daylight Chemical Information Systems, Inc.
info@daylight.com