Daycart / Oracle Databases



Introduction

We have made the following vendor databases available in Oracle:
  1. General DB Structure

  2. ACD
  3.     •    MDL's Available Chemicals Directory
  4. WOMBAT
  5.     •    World of Molecular Bioactivity
  6. WDI
  7.     •    Derwent's World Drug Index
  8. SPRESI
  9.     •    SPeicherung und Recherche Strukturchemischer Information
  10. NCI
  11.     •    National Cancer Institute Public Screening Data
  12. WDI Oracle / Thor Compare
  13.     •    Thor and Oracle Comparison
  14. A SPRESI Journal Example
  15.     •    Example of Journal references from SPRESI
  16. Oracle / Thor Definitions
  17.     •    Mapping Name and Keyword Definitions in Thor and Oracle



General Database Structure


Each database has the following common tables:

	MAIN table    - The main table, linked to the other tables by
	                the primary key REG_NO.  The REG_NO is defined by the vendor:
	                for ACD it's the ACD Number, for WDI it's the Derwent 
	                External Registry Name, for WOMBAT it's the WOMBAT ID.

	DBINFO table  - A table which describes the databases.  This contains
	                the creation date, version number, copyright message.

	CLUSTER table - The table which contains the Jarvis-Patrick clustering
	                information.  Linked with the main table by smi2cansmi(smi, 0).

Most of the Databases refer to:
	CAS number    - The Chemistry Abstract Registry number.


Each database also has tables which are unique, based on the particular vendors content:



Database Structure - ACD Database Organization


		ACD043
		-----------------------------------------------------------------------------
		ACD_CATALOG            - Supplier Compound data  (1516569)
		ACD_NAME               - Names (1263977)
		ACD_SUPPLIER           - Supplier information (659)
		ACD_SUPPLIER_ADDRESS   - Supplier contact information (5046)


Database Structure - WOMBAT Database Organization


		WOMBAT041
		-----------------------------------------------------------------------------
		WOMBAT_ACTIVITY        - Activity table (143090) 
		WOMBAT_ACT_PROTOCOL    - Activity protocol table (8029) 
		WOMBAT_GNAM            - Generic name table (6990)
		WOMBAT_KEYWORD         - Keyword table (90605)
		WOMBAT_PROP            - Properties table (79165)
		WOMBAT_REFERENCE       - Reference table (3039)


Database Structure - WDI Database Organization

  


		WDI043
		-----------------------------------------------------------------------------
		WDI_COMB_PREP          - Combination/prep (34659)
		WDI_COMMENT            - Comments (28150)
		WDI_KEYWORD            - Keywords (721960)
		WDI_KEYWORD_DEF        - Definition of Keyword codes table by keyword_cd
		WDI_NAME               - Names (226102)
		WDI_NAME_DEF           - Definition of Name codes
		WDI_REFERENCE          - References (30829)
		WDI_TRADE              - Trade names (116583)


Database Structure - SPRESI Database Organization


		SPRESI00
		-----------------------------------------------------------------------------
		SPRESI_BOILINGPNT       - Boiling Point table (182697)
		SPRESI_DECOMPOSITION    - Decomposition table (15278)
		SPRESI_DENSITY          - Density table (26456)
		SPRESI_DISSOCIATION     - Dissociation table (2606)
		SPRESI_JOURNAL          - Journal data table (2960290)
		SPRESI_KEYWORD          - Keyword table (8391060)
		SPRESI_MELTINGPNT       - Melting Point table (961436)
		SPRESI_MUTAROTATION     - Mutarotation table (6)
		SPRESI_PATENT           - Patent table (1020361)
		SPRESI_REFRACTIVE_INDEX - Refractive Index table (70546)
		SPRESI_ROTATION         - Optical Rotation table (64049)
		SPRESI_SUBLIMATION      - Sublimation table (2577)


Database Structure - NCI Database Organization


		NCI041
		-----------------------------------------------------------------------------
		NCI_AIDS_CONC          - Aids screening results (43850)
		NCI_AIDS_EC50          - EC50 Results; The concentrations necessary to see a protective
		                         effect on infected cells (39365)
		NCI_AIDS_IC50          - IC50 Results; The concentrations necessary to inhibit the
		                         growth of uninfected cells (38424)
		NCI_GI50               - The concentration that causes 50% inhibition of cell growth
		                         emphasizing the correction for cell grown at time zero. (2473096)
		NCI_LC50               - The concentration that signifies the cytotoxic effect 2478506)
		NCI_TGI                - The concentration that signifies the cytostatic effect (2479327)







Comparison between WDI datatypes in THOR and ORACLE

This table is a mapping between the returned value and the Oracle and Thor datatypes:

WDI Values Oracle Thor
CAFFEINE REG_NO DATATYPE   $DXRN
Cn1cnc2n(C)c(=O)n(C)c(=O)c12 ISOMER (Value used for Lookup)
CAFFEINE PREFERRED_NAME DATATYPE   DPN
Cn1cnc2n(C)c(=O)n(C)c(=O)c12 SMI (From WDI_Cluster Table) SMI
194.192 AMW DATATYPE   AMW
1990.1-2001.4 UPDCODE DATATYPE   DYQ
C8H10N4O2 MOL_FORM DATATYPE   DMF (C8 H10 N4 O2)
1.48,1.35,1.22,0.57,1.70,-0.11,1.22,-0.77,0.44,... D2D_COORDS DATATYPE   2D


An example of requesting data from Oracle:

Oracle Select:
	SQL> select reg_no,isomer,amw from wdi_main where preferred_name='CAFFEINE';

	REG_NO          ISOMER                           AMW
	--------------  -------------------------------  ------------------
	CAFFEINE        Cn1cnc2n(C)c(=O)n(C)c(=O)c12     194.192


An example of requesting data from Thor:

thorlookup:
	> thorlookup -INCLUDE_DATATYPES "DMF DPN AMW 2D DYQ" wdi043%@laluz:thor:thor%
	Cn1cnc2n(C)c(=O)n(C)c(=O)c12
	$SMI<Cn1cnc2n(C)c(=O)n(C)c(=O)c12>
	$DXRN<CAFFEINE>
	DPN<CAFFEINE>
	2D<1.48,1.35,1.22,0.57,1.70,-0.11,1.22,-0.77,0.44,-0.51,-0.28,-0.93,-0.28,-1.75,-0.99,-0.51,-1.71,-0.93..;>
	DMF<C8 H10 N4 O2>
	DYQ<1990.1-2001.4>
	AMW<194.192>
	
WDI Thor DataTypes

More information about Thor:

$DXRN is the Derwent external registry name which is the primary key REG_NO in the Oracle WDI_TABLES.
Otherwise, the correspondance of columns to Thor datatypes is pretty clear.

One exception is the molecular weight; in some cases the vendor supplies a molecular weight
with their record; these are kept as a column in the MAIN table.

We populate the AMW column with the result from smi2amw(isomer).



SPRESI Journal Search Example from Oracle


	SQL> select j.article,j.authors,j.citation,j.year,j.document_id,j.language,n.name 
	  2  from spresi_main m, spresi_journal j, spresi_name n
	  3  where m.reg_no=n.reg_no and m.reg_no=j.reg_no
	  4  and n.name='BUTYLLITHIUM'; 



	ARTICLE
	-------------------------------------
	AUTHORS
	-------------------------------------
	CITATION
	-------------------------------------
	      YEAR DOCUMENT_I LANGUAGE
	---------- ---------- -------------------------------------------------------------
	PROTONATION OF ACYLLITHIUM REAGENTS BY ACETONITRILE: SYNTHESIS OF BETA-HYDROXY NITRILES
	LI NANSHENG, YU SU, KABALKA GEORGE
	J. ORG. CHEM., (1995), Vol. 60, Num. 18, pg. 5973-5974
	      1995 02X0101 96 ENG

	LITHOGRAPHIC EVALUATION OF POLY(METHYL METHACRYLATE (-G-POLY(DIMETHYL SILOXANE) COPOLYMERS AS POSITIVE RESISTS
	BOWDEN M., GOZDZ A., KLAUSNER C., MCGRATH J., SMITH S.
	(1986), Vol. 1986, pg. 298
	      1986 10H0253 87 ENG

	LI(,X)MO(,2)O(,4) A NEW PROMISING CATHODE MATERIAL FOR SECONDARY LITHIUM CELLS
	TARASCON, J.
	J. ELECTROCHEM. SOC., (1987), Vol. 134, Num. 6, pg. 1345-1351
	      1987 01L0243 88 ENG

	SPEICHERUNG VON SILAN- UND GERMANIMINEN ME(,2)E=NR (E=SI, GE) IN FORM VON SILA- UND GERMADIHYDROTETRAZOLEN
	WIBERG N., KARAMPATSES P., KIM CHUNGKYUN
	CHEM. BER., (1987), Vol. 120, Num. 7, pg. 1213
	      1987 23X0376 87 GER

	ETHYL ISOCROTONATE
	TASCHNER M., ROSEN T., HEATHCOCK C., BOLTON J., FLEMING I.
	(1986), pg. 108
	      1986 24X0144 87 ENG

	SYNTHESE, STRUKTUR UND EINIGE REAKTIONEN VON TRIS- UND TETRAKIS(DIPHENYLPHOSPHINO)ALLEN
	SCHMIDBAUR H., POLLOK T., REBER G., MULLER G.
	CHEM. BER., (1987), Vol. 120, Num. 8, pg. 1403-1412
	      1987 02X0444 88 GER

	SINTEZ I NEKOTORYE REAKTSII BROMIDA (TRIFENILFOSFONIO)METILDITIOKARBAMATA
	ISHIDA MASARU, MIMURA TADASHI, KATO SHINZI
	NIPPON KAGAKU KAISHI (J. CHEM. SOC. JPN., CHEM. IND. CHEM.), (1987), Num. 7, pg. 1317-1322
	      1987 02X0445 88 JAP

	THERMAL REARRANGEMENT OF TERT-BUTYL SUBSTITUTED 9,10- AND 11,12-ALLENIC RETINOIDS: 11-CIS-ISOMERS OF 19,19,19- AND 20,20
	,20-TRIMETHYLRETINOIDS
	SHEN GIINYUAN, DELERA ANGEL, NORMAN THEA, HACES ALBERTO, OKAMURA WILLIAM
	TETRAHEDRON LETT., (1987), Vol. 28, Num. 26, pg. 2917-2920
	      1987 04E0125 88 ENG

	LIGNIN AND RELATED COMPOUNDS. X. THE SYNTHESIS OF DIMERIC-TYPE COMPOUNDS DERIVED FROM LIGNIN
	RAHMAN M., ABRAMS G., PEPPER J.
	(1987), Vol. 7, Num. 2, pg. 187-196
	      1987 07P0019 88 ENG

	ASYMMETRIC ADDITIONS TO CHIRAL NAPHTHALENES. 4. AN ASYMMETRIC SYNTHESIS OF THE AB-RING OF AKLAVINONE
	MEYERS A., HIGASHIYAMA KIMIO
	J. ORG. CHEM., (1987), Vol. 52, Num. 20, pg. 4592-4597
	      1987 08E0184 88 ENG

	HIGHLY EFFICIENT TOTAL SYNTHESIS OF MANOALIDE AND SECO-MANOALIDE VIA PD(0) CATALYZED COUPLING OF ALLYLHALIDE WITH CO AND
	 2-SILYL-4-STANNYLFURAN
	KATSUMURA SHIGEO, FUJIWARA SHINYA, ISOE SACHIHIKO
	TETRAHEDRON LETT., (1988), Vol. 29, Num. 10, pg. 1173-1176
	      1988 17E0044 88 ENG

	PREPARATION OF BRIDGED BICYCLOALKANES VIA INTRAMOLECULAR <2+2> CYCLOADDITIONS OF KETENES A SHORT TOTAL SYNTHESIS OF ('+-
	')-CLOVENE
	FUNK RAYMOND, NOVAK PERRY, ABELMAN MATTHEW
	TETRAHEDRON LETT., (1988), Vol. 29, Num. 13, pg. 1493-1496
	      1988 17E0062 88 ENG

	SYNTHESIS OF 3'ALPHA'-HYDROXY-5'BETA',10'BETA'-EPOXYCHILIOLIDE, AN ISOLABDANE DERIVATIVE FROM CHILIOTRICHIUM ROSMARINIF
	OLIUM
	HARDE C., BOHLMANN F.
	TETRAHEDRON, (1988), Vol. 44, Num. 1, pg. 81-90
	      1988 17E0074 88 ENG

	ACETYLENIC CHOLESTERYL DERIVATIVES AS IRREVERSIBLE INHIBITORS OF ECDYSONE BIOSYNTHESIS
	BURGER ALAIN, COLOBERT FRANCOISE, HETRU CHARLES, LUU BANG
	TETRAHEDRON, (1988), Vol. 44, Num. 4, pg. 1141-1152
	      1988 17E0115 88 ENG

	SYNTHESIS OF FLUORINE ANALOGUES OF VITAMIN E
	KUMADAKI ITSUMARO, TAMURA MIHOKO, ANDO AKIRA, NANAI TAKABUMI, KOYAMA MAYUMI, MIKI TAKUICHI
	CHEM. PHARM. BULL., (1988), Vol. 36, Num. 2, pg. 515-520
	      1988 17E0162 88 ENG

	SYNTHESIS OF SPECIFICALLY DEUTERIATED 9- AND 13-DEMETHYLRETINALS
	BERG ELLEN, BENT ARIE, LUGTENBURG JOHAN
	RECL. TRAV. CHIM. PAYS-BAS, (1990), Vol. 109, pg. 160-167
	      1990 20E0124 90 ENG

	GENERATION AND OLIGOMERIZATION OF BICYCLO<2.2.2>OCTUNE AND PROPERTIES OF TRIS(BICYCLO<2.2.2>OCTENO)BENZENE OBTAINED FROM
	 THE LINEAR TRIMER
	KOMATSU KOICHI, AONUMA SHUJI, JINBU YASUHISA, TSUJI RYOTARO, HIROSAWA CHITARU, TAKEUCHI KENICHI
	J. ORG. CHEM., (1991), Vol. 56, pg. 195-203
	      1991 13X0096 91 ENG

	CONSTRUCTION OF CHIRAL QUARTERNARY CARBON CENTERS USING 3-SUBSTITUTED 5-TRIMETHYLSILYL-2-CYCLOHEXENONES: SYNTHESIS OF (+
	)-'ALPHA'-CUPARENONE
	ASAOKA MORIO, TAKENOUCHI KAZUYA, TAKEI HISASHI
	TETRAHEDRON LETT., (1988), Vol. 29, Num. 3, pg. 325-328
	      1988 17E0039 88 ENG



Name and Keyword Definitions Mapping

The other change worth understanding is how we've mapped names and keywords into the Oracle tables.
For searching efficiency we've used tables with multiple type codes for wdi_name and wdi_keyword.

The keyword types are described in the wdi_keyword_def table, and the name types are
described in the wdi_name_def table.

Oracle WDI Definitions:

	SQL> select * from wdi_keyword_def;

	KEY DEFINITION
	--- ----------------------------------------
	PT  ACTIVITY CLASS
	SS  SUBSTRUCTURE KEYWORDS
	IU  INDICATIONS AND USAGE
	PW  PRECAUTIONS AND WARNINGS
	CI  CONTRA-INDICATIONS
	IA  INTERACTIONS
	AE  ADVERSE EFFECTS
	MA  MECHANISM OF ACTION


	SQL> select * from wdi_name_def;

	NAME_CD  DEFINITION
	-------  ----------------------------------------
	RN       DERWENT REGISTRY NAME
	INN      INTERNATIONAL NON_PROPRIETARY NAME
	USAN     UNITED STATED ADOPTED NAME
	APP      APPROVED NAMES
	OTHER    SYNONYMS OF UNKNOWN STATUS
	TP       MANUFACTURERS CODE NUMBERS

Now, match the Keywords to the Example:

In Oracle

	SQL> select * from wdi_keyword where reg_no='CAFFEINE';

	REG_NO            KEYWORD_CD   TEXT
	----------------  -----------  ----------------------
	CAFFEINE          SS           IMIDE,CYCLIC
	CAFFEINE          SS           UREA,CYCLIC
	CAFFEINE          SS           PURINE
	CAFFEINE          IU           MISC: analgesia
	CAFFEINE          IU           MISC: drowsiness
	CAFFEINE          PW           hypertension
	...



In Thor

	> thorlookup -INCLUDE_DATATYPES "SSK IU PW" wdi043%@laluz:thor:thor%
	Cn1cnc2n(C)c(=O)n(C)c(=O)c12
	$SMI<Cn1cnc2n(C)c(=O)n(C)c(=O)c12>
	$DXRN<CAFFEINE>
	SSK<IMIDE,CYCLIC; UREA,CYCLIC; PURINE>
	IU<MISC: analgesia;MISC: drowsiness; ...>
	PW<hypertension; ... >
	...