We're converting our vendor databases into Oracle format and making them available. These are being updated on the regular update schedule with the Thor versions of these databases. Others will be made available as we do the conversions. What are available now are:
Each database has the following common tables:
MAIN table - The main table, linked to the other tables by the primary key REG_NO. The REG_NO is defined by the vendor: for ACD it's the ACD Number, for WDI it's the Derwent External Registry Name, for WOMBAT it's the WOMBAT ID. DBINFO table - A table which describes the databases. This contains the creation date, version number, copyright message. CLUSTER table - The table which contains the Jarvis-Patrick clustering information. Linked with the main table by smi2cansmi(smi, 0).
Each database also has tables which are unique, based on the particular vendors content:
WDI041 ----------------------------------------------------------------------------- WDI_CAS - CAS Registry number; linked to main table by reg_no (32483) WDI_CLUSTER - Jarpat cluster data (55700) WDI_COMB_PREP - Combination/prep; linked to main table by reg_no (34659) WDI_COMMENT - Comments; linked to main table by reg_no (28150) WDI_KEYWORD - Keywords; linked to main table by reg_no (721960) WDI_KEYWORD_DEF - Definition of Keyword codes; linked to wdi_keyword table by keyword_cd WDI_MAIN - Main table (75502) WDI_NAME - Names; linked to main table by reg_no (226102) WDI_NAME_DEF - Definition of Name codes; linked to wdi_name table by name_cd WDI_REFERENCE - References; linked to main table by reg_no (30829) WDI_TRADE - Trade names; linked to main table by reg_no (116583) ACD041 ----------------------------------------------------------------------------- ACD_CAS - CAS Registry number data (79222) ACD_CATALOG - Supplier catalog data (1516569) ACD_CLUSTER - Jarpat cluster data (168415) ACD_MAIN - Main table (202768) ACD_NAME - Names (1263977) ACD_SUPPLIER - Supplier information (659) ACD_SUPPLIER_ADDRESS - Supplier contact information (5046) WOMBAT041 ----------------------------------------------------------------------------- WOMBAT_ACTIVITY - Activity table; linked to main table by reg_no and activity_id (143090) WOMBAT_ACT_PROTOCOL - Activity protocol table; linked to wombat_activity by pro_id (8029) WOMBAT_CLUSTER - Jarpat cluster data (61498) WOMBAT_GNAM - Generic name table; linked to main table by reg_no (6990) WOMBAT_KEYWORD - Keyword table; linked to main table by reg_no (90605) WOMBAT_MAIN - Main table (79165) WOMBAT_PROP - Properties table linked to main table by reg_no (79165) WOMBAT_REFERENCE - Reference table; linked to main table by reg_no (3039) SPRESI95 ----------------------------------------------------------------------------- SPRESI_BOILINGPNT - Boiling Point table; linked to main table by reg_no (182697) SPRESI_CLUSTER - Jarvis-Patrick clustering results (2349778) SPRESI_DECOMPOSITION - Decomposition table; linked to main table by reg_no (15278) SPRESI_DENSITY - Density table; linked to main table by reg_no (26456) SPRESI_DISSOCIATION - Dissociation table; linked to main table by reg_no(2606) SPRESI_JOURNAL - Journal data table; linked to main table by reg_no (2960290) SPRESI_KEYWORD - Keyword table; linked to SPRESI_PATENT and SPRESI_JOURNAL tables by reg_no and document_id (8391060) SPRESI_MAIN - Main data table; one record per reg_no (3177717) SPRESI_MELTINGPNT - Melting Point table; linked to main table by reg_no (961436) SPRESI_MUTAROTATION - Mutarotation table; linked to main table by reg_no(6) SPRESI_PATENT - Patent table; linked to main table by reg_no (1020361) SPRESI_REFRACTIVE_INDEX - Refractive Index table; linked to main table by reg_no (70546) SPRESI_ROTATION - Optical Rotation table; linked to main table by reg_no (64049) SPRESI_SUBLIMATION - Sublimation table; linked to main table by reg_no (2577)
In the WDI Database WDI_MAIN is linked to WDI_CAS, WDI_NAME, WDI_KEYWORD,
WDI_TRADE, WDI_COMB_PREP, WDI_COMMENT, and WDI_REFERENCE by the unique
identifier REG_NO. WDI_KEYWORD_DEF is linked to WDI_KEYWORD by the
identifier keyword_cd and WDI_NAME_DEF is linked to WDI_NAME by the
identifier name_cd. WDI_MAIN is linked to WDI_CLUSTER by using the
smi2cansmi function on the column SMI.
In WDI the following THOR datatypes correspond to the
following columns.
ORACLE SQL> select * from wdi_main where preferred_name='CAFFEINE'; REG_NO ISOMER PREFERRED_NAME -------------- ------------------------------- ------------------ CAFFEINE Cn1cnc2n(C)c(=O)n(C)c(=O)c12 CAFFEINE D2D_COORDS -------------------------------------------------------------------------------------------------------- 1.48,1.35,1.22,0.57,1.70,-0.11,1.22,-0.77,0.44,-0.51,-0.28,-0.93,-0.28,-1.75,-0.99,-0.51,-1.71,-0.93.... AMW MOLWEIGHT UPDCODE MOL_FORM --------- -------------- --------------- ------------------- 194.192 194.192 1990.1-2001.4 C8H10N4O2 Thor > thorlookup -INCLUDE_DATATYPES "DMF DPN AMW 2D DYQ" wdi041%@laluz:thor:thor% Cn1cnc2n(C)c(=O)n(C)c(=O)c12 $SMI<Cn1cnc2n(C)c(=O)n(C)c(=O)c12> $DXRN<CAFFEINE> DPN<CAFFEINE> 2D<1.48,1.35,1.22,0.57,1.70,-0.11,1.22,-0.77,0.44,-0.51,-0.28,-0.93,-0.28,-1.75,-0.99,-0.51,-1.71,-0.93..;> DMF<C8 H10 N4 O2> DYQ<1990.1-2001.4> AMW<194.192> | $DXRN is the Derwent external registry name which is the primary key REG_NO in the WDI_TABLES. Otherwise, the correspondance of columns to Thor datatypes is pretty clear. One exception is the molecular weight; in some cases the vendor supplies a molecular weight with their record; these are kept as a column in the MAIN table. We populate the AMW column with the result from smi2amw(isomer). The other change worth understanding is how we've mapped names and keywords into the Oracle tables. For searching efficiency we've used tables with multiple type codes for wdi_name and wdi_keyword. The keyword types are described in the wdi_keyword_def table, and the name types are described in the wdi_name_def table.
SQL> select * from wdi_keyword_def; KEY DEFINITION --- ---------------------------------------- PT ACTIVITY CLASS SS SUBSTRUCTURE KEYWORDS IU INDICATIONS AND USAGE PW PRECAUTIONS AND WARNINGS CI CONTRA-INDICATIONS IA INTERACTIONS AE ADVERSE EFFECTS MA MECHANISM OF ACTION SQL> select * from wdi_name_def; NAME_CD DEFINITION ------- ---------------------------------------- RN DERWENT REGISTRY NAME INN INTERNATIONAL NON_PROPRIETARY NAME USAN UNITED STATED ADOPTED NAME APP APPROVED NAMES OTHER SYNONYMS OF UNKNOWN STATUS TP MANUFACTURERS CODE NUMBERS ORACLE
SQL> select * from wdi_keyword where reg_no='CAFFEINE'; REG_NO KEYWORD_CD TEXT ---------------- ----------- ---------------------- CAFFEINE SS IMIDE,CYCLIC CAFFEINE SS UREA,CYCLIC CAFFEINE SS PURINE CAFFEINE IU MISC: analgesia CAFFEINE IU MISC: drowsiness CAFFEINE PW hypertension ... Thor > thorlookup -INCLUDE_DATATYPES "SSK IU PW" wdi041%@laluz:thor:thor% Cn1cnc2n(C)c(=O)n(C)c(=O)c12 $SMI<Cn1cnc2n(C)c(=O)n(C)c(=O)c12> $DXRN<CAFFEINE> SSK<IMIDE,CYCLIC; UREA,CYCLIC; PURINE> IU<MISC: analgesia;MISC: drowsiness; ...> PW<hypertension; ... > ... |
|