MUG '97 -- 11th Annual Daylight User Group Meeting -- 26 February 1997
Database building from text
John Bradshaw GlaxoWellcome, Stevenage, Herts SG1 2NY, UK
The database was built from "Handbook of Enzyme Inhibitors" H Zollner, VCH 1993
Use EC# which is a true classifier, like Dewey decimal system for books.
Sources such as SWISSPROT and Brookhaven to find enzymes for which there is a 3D structure.
Abstract Brookhaven code.
Experimented with storing gifs of binding site.
Manually lookup information on these EC# and create trees which are rooted in $INH<>
Load into thor which merges all the trees correctly.
Use nam2smi (contributed) to create trees $SMI<>$INH<>| and merge in thor.
Caveat. (Posh name for bug). Pre 4.5 thor does not merge on all ambiguous names. If there is a many $SMI to one $INH relationship need the merge to add the subtree to all new $SMI roots.
The resulting roots are now
$SMI<> with a real structure
$INH<> because
No structure defined
OK
Generic structure e.g. FATTY ACIDS
Can be dealt with by *
Can' t find name
Other sources can be prohibitively expensive or impossible to use.
Many on line facilities such as SciFinder are not geared to handle lists.
Even if they did it costs $5 per connection table
Could add reaction/transform too as data about the enzyme.
dt_wish()
Virtual databases
nam2smi would work better
Effectively merge in-house data in a maintainable fashion.
Information Extraction tools for building databases from literature sources.
Large preferably public database of chemical synonyms