The Daycart TCM interface is a simple CGI interface which uses a combination of normal CGI and Oracle Call Interface code to provide a web interface to the Traditional Chinese Medicines data. The interface can be accessed HERE. The web interface shows, in the source for each HTML page, the SQL query or queries which were used to retrieve the data. You will query the TCM data from the Web interface and then will perform the same queries from SQLPlus and get the results.
Starting SQLPLUS:
telnet day0, login mug/coffee
sqlplus mug/coffee
The main table of interest is: tcm01_mol. It has chemical structures.
The other four tables (tcm01_plant, tcm01_plant_mol,
tcm01_ref, tcm01_mol_ref) contain data about plants and references which
are associated with the molecules.
You can get information about the columns in the tables with the 'describe'
command. For example:
describe tcm01_mol;
You can either draw a structure using JavaGrins or cut-and-paste the SMILES. After viewing the Web results, try the SQLPlus query on day0.
The SQL command is:
select pcn, smi, tcmm from tcm01_mol where exact(smi, smi2cansmi('NCCc1cc(O)c(O)cc1', 0)) = 1
Note that the SQLPlus query returns the TCMM field, but this isn't displayed in the web page. Where is it used?
Follow one of the links and view the query which is used. Note that
the tcmm field is used as a query value to retrieve additional molecular data:
select smi, ism, pcn, mf, mw, src, rem, cas, tcmm from tcm01_mol
where tcmm = 'M1831'
The SQL command is:
Note that Oracle, as part of the SQL query, is used to order the results. Try
the query without the "order by ..." clause.
select pcn, smi, tcmm, tanimoto(smi, 'NCCc1ccc(S)cc1') from tcm01_mol
where tanimoto(smi, 'NCCc1ccc(S)cc1') > 0.6
order by tanimoto(smi, 'NCCc1ccc(S)cc1') desc
Modify the SQL query to display the molecular weight of each hit in the
results.
The SQL command is:
select pcn, smi, tcmm, tanimoto(smi, 'NCCc1ccc(S)cc1'), smi2amw(smi)
from tcm01_mol where tanimoto(smi, 'NCCc1ccc(S)cc1') > 0.6
order by tanimoto(smi, 'NCCc1ccc(S)cc1') desc
Modify the SQL query to only display the hits with a molecular weight of
greater than 150.
The SQL command is:
select pcn, smi, tcmm, tanimoto(smi, 'NCCc1ccc(S)cc1'), smi2amw(smi)
from tcm01_mol where tanimoto(smi, 'NCCc1ccc(S)cc1') > 0.6
and smi2amw(smi) > 150
order by tanimoto(smi, 'NCCc1ccc(S)cc1') desc
We'll use a variation the 'CREATE TABLE' command, which allows you to get the data from a select statement. The 'rownum' pseudocolumn allows us to only get the first 100 rows.
create table tcm_jjd as select * from tcm01_mol where rownum <= 100;
We use the 'ALTER TABLE' command to add the column, then use the
Daycart function smi2amw(smi) to add the data.
To populate the columns:
alter table tcm_jjd add new_mw number;
desc tcm_jjd;
update tcm_jjd
set new_mw=smi2amw(smi);
We use a select query. The SQL function "ABS()" returns the absolute
value of a number. It might come in handy.
Note that we can also do the validation by calculating the molecular weight
on the fly:
select smi, mw, new_mw from tcm_jjd
where abs(mw - new_mw) > 1.0;
select smi, mw, smi2amw(smi) from tcm_jjd
where abs(mw - smi2amw(smi)) > 1.0;
We can use a regular join query. Since the SMILES in the two tables have
been canonicalized and are stored as simple VARCHAR2 data, the following
simple query will work:
select count(1) from tcm01_mol, nci
where tcm01_mol.smi = nci.smi;
There are several ways to answer this query. A nested query will work:
The above query finds all SMILES which match the contains() query, and are
also found in the list of SMILES in NCI.
An intersection query will also work.
Note that the two queries above give different answers. In fact, the
INTERSECT set operation removes duplicates, so the second query has less
answers. The query below gives the same answers as the INTERSECT-based
query.
select smi from tcm01_mol
where contains(smi, 'NCCc1ccccc1') = 1
and smi in (select smi from nci)
select smi from tcm01_mol
where contains(smi, 'NCCc1ccccc1') = 1
intersect
select smi from nci
select distinct(smi) from tcm01_mol
where contains(smi, 'NCCc1ccccc1') = 1
and smi in (select smi from nci)
Building on top of the previous queries we can find all rows with common
SMILES and then further prune based on CAS registry number. First, simply
print out both CAS registry number columns with the common SMILES.
Next, lets skip the NULL cas numbers...
Finally, add the clause to compare the two cas registry number columns:
select nci.smi, tcm01_mol.cas, nci.cas_rn from tcm01_mol, nci
where tcm01_mol.smi = nci.smi;
select nci.smi, tcm01_mol.cas, nci.cas_rn from tcm01_mol, nci
where tcm01_mol.smi = nci.smi
and nci.cas_rn is not null
and tcm01_mol.cas is not null
select nci.smi, tcm01_mol.cas, nci.cas_rn from tcm01_mol, nci
where tcm01_mol.smi = nci.smi
and nci.cas_rn is not null
and tcm01_mol.cas is not null
and nci.cas_rn != tcm01_mol.cas