| Making Tables from Trees | EuroMUG'01 -- 13-14 Sept 2001  -- Cambridge, UK | Jeremy Yang | 
"Making Tables from Trees"
Building DayCart/Oracle tables from Daylight/Thor databases
Jeremy Yang
Daylight CIS
Abstract: The different data models and datatypes of Thor vs.
Oracle should guide the interconversion of datasets between these two
DBMS's.  These issues will be described and related to an example
conversion of WDI to DayCart-ready format.
Daylight data structure
  
  
    - Each server -> multiple DBs of related content.
    
     
- Each DB -> N SMILES-rooted Thor Data Trees (other primary IDs also ok).
    
     
- Each TDT -> multiple xref-ID-rooted subtrees.
    
     
- Subtrees may correspond with isomers.
    
     
- Each ID may have associated dataitems.
    
     
- Each dataitem may consist of multiple datafields.
    
     
- Each field is typed (text, numeric, binary, etc.).
    
   
Example TDT
RDB data structure
  
  
    - Each server -> multiple schema.
    
     
- Each server -> multiple tables.
    
     
- Related content (i.e., one "DB") typically contained in multiple tables.
    
     
- Each table 2D array of distinct rows of typed columns.
    
     
- Primary key may uniquely identify each row.
    
     
- "Database" typically means an Oracle instance w/ data.
    
   
Considerations for read-only compound databases (e.g. WDI, ACD)
  
  
    - Data curators (Derwent, MDLIS Inc.) have already defined chemical
        entities and corresponding primary IDs (DXRN, ACD#).
    
     
- Read-only so transaction-controls are not needed.
  
Supporting ALLNAMES queries
  
  
    - WDI has many name datatypes (Derwent name, trade name, international
name, approved name, etc.)
    
     
- Search-all-names query is desireable (like Thor $NAM feature).
    
     
- Oracle VIEW defined for ALLNAMES queries.
    
 
CREATE VIEW WDI_ALLNAMES AS (
        SELECT DXRN,DERWENT_NAME "NAME" FROM WDI_DRN
        UNION SELECT DXRN,US_ADOPTED_NAME FROM WDI_USAN
        UNION SELECT DXRN,NAM FROM WDI_NAM
        UNION SELECT DXRN,APPR_NAME FROM WDI_APN
        UNION SELECT DXRN,TRADENAME FROM WDI_TN
        );
Conversion plan / table design
  
  
    - One main table using existing primary key.  
    
     
- For each datatype where not more than one occurs per primary id
        create a column in the main table.  For N-field datatypes 
        create N columns.
    
     
- For each datatype where at least two occurances correspond with
        a single primary id value, create an auxillary table using
        the main table primary key as foreign key.  Both columns are
        be non-unique.
    
   
WDI TDT summary
  WDI tables
  
  ACD TDT summary
  ACD tables
Demos:
  
  
Open issues
  
  
    - Performance tuning: indices, table-clustering
    
     
- Normalization, constraints, and triggers for supportability and
consistency
    
     
- Organizing tables by schema objects.  Standardizing table,
        column, schema names.
    
     
- Datatypes: user-defined, objects (e.g., multi-field types may
        map to one complex RDB type instead of many simple)
    
     
- DBs without existing primary key.  Use SMILES?
    
     
- Thor indirect references.
    
     
- Defining standard, efficient queries.
    
     
- Transaction processing for read/write DBs.
    
     
- Security
    
     
- Parent structures (not a conversion issue really)
    
     
- DBs which have chemical structures but are not essentially
        compound databases (e.g., NCI bio + chem datasets).
    
     
- Application issues: parsing SQL output, handling CLOBs, binary
        data, atom-N-tuple data, etc.
    
   
Thanks!
  
  
    - Jack Delany - DayCart, Oracle know-how
    
- Mick Kappler - ACD conversion and web apps
  
EuroMUG'01 -- 13-14 Sept 2001  -- Cambridge, UK
 Daylight Chemical Information Systems Inc.
Daylight Chemical Information Systems Inc.
support@daylight.com