The Global Pillage
Roger Sayle
Bioinformatics Group, Metaphorics LLC,
Santa Fe, New Mexico, USA.
Overview
Availability of Biological Data
Rate of Increase of Biological Data
Scientific Importance of Biological Data
Economic Importance of Biological Data
Paradigm Change #1: Intelligent Algorithms
Paradigm Change #2: Active Database Searching
Example Application - Rational Drug Design
Final Remarks
Availability of Biological Data
Sequence Databases:
Protein: SwissProt, PIR, OWL, GenPept, TREMBL, NRL3D.
Nucleic Acid: GenBank, EMBL, DDBJ, dbEST, UniGene.
Nucleic Features: EPD, CPGISLE, REPBASE, VecBase.
Structure Databases: PDB, NDB, UCSD PDB, PDBSelect.
Protein Domains and Folds: SCOP, CATH, See3D, SBase.
Taxonomic Databases: NCBI Taxonomy, ATCC, ICTV.
Organism Databases: AceDB, YEAST, MycDB, FlyBase.
Genetic Locii and Mapping: GDB, RHDb, Chromosome AceDBs.
Genetic Diseases: Mendelian Inheritance in Man (OMIM).
Enzymes and Pathways: Enzyme, EMP, Kegg, *Cyc, ReBase.
Alignment Databases: PFAM, FSSP, ProDom, SwissDom.
Mutation Databases: HGMD, Protein Mutation Database, MITOMAP.
2D-Gel Electrophoresis: Swiss-2DPage, YEPD, 2DHEART.
Carbohydrate Modifications: CarbBase.
Motif Databases: Prosite, Prints.
Rate of Increase of Biological Data
Exponential growth of sequence and structure databases.
Doubling time of nucleic acid databases is 18 months.
Approximately 10 new protein structures added to PDB daily.
Most major sequence databases updated nightly.
Functional analysis of Haemophilus influenzae genome.
Scientific Importance of Biological Data
Protein sequence of isozyme Cyclooxygenase COX-2.
Human cytogenetic locus of breast cancer gene BRAC-A1.
Protein structure of HIV (or other viral) protease.
Protein structure of apoptosis protein BCL-2.
Economic Importance of Biological Data
Helicobacter pylori genome sequence ($8M)
Merck/WashU and TIGR EST sequences ($1M).
Paradigm Change #1: Intelligent Algorithms
Blurring of distinction between algorithm and database.
Continual reparameterization with training set growth.
Knowledge based selection of appropriate algorithm.
Integration of computational methods - meta-algorithms.
Paradigm Change #2: Active Database Searching
Traditional interactive database queries.
Automatic database updates (weekly, nightly or hourly).
Database of user queries and update trigger procedures.
Results returned to user/project by e-mail, news or WWW.
Processing of updates to generate derived data.
Automatic maintenance of links/relationships.
Incremental (and stable) query processing.
Example Application - Rational Drug Design
Determination of sequence for enzyme class or family.
Discovery of family member or receptor subtype for target.
Structure for fold class, homology modelling, target.
Structure with improved resolution or alternate conformation.
Structure with bound ligand or novel ligand-receptor interactions.
Sequence of isozyme, clincal mutant or human polymorphism.
Final Remarks
Metaphorics infrastructure for collection, integration and analysis.
Obvious benefits for both data providers and data consumers.
roger@metaphorics.com