PDB: Cruft to Content

(Perception of Molecular Connectivity from 3D Coordinates)

Roger Sayle
Bioinformatics Group,
Metaphorics LLC,
Santa Fe, New Mexico.

1. Abstract

A automated method for extracting small molecule ligands, including assigning hybridization states and bond orders, from the PDB is presented. This processing proceeds in two distinct phases. The first phase results in the bonded framework of each ligand, giving just the element and 3D co-ordinates of each atom. The second phase involves the assignment of bond orders and hydrogen counts, based up recognition of functional groups and conjugated ring systems, and on an analysis of bond angles and bond lengths.

The main steps of the algorithm are:

2. Determination of Real Atoms

In most molecular file formats, the number of real atoms in a molecule can be trivially determined from both the count line in the header, and the corresponding number of atom lines in the following connection table. The PDB file format on the other hand contains numerous pseudo atoms, dummy atoms, and alternate locations. The following rules were used in parsing PDB files:

2a. Each PDB file is considered internally delimited by both "END" and "ENDM" records. This allows processing of multiple models in NMR files, and of concatenated PDB files. A command line option controls whether just the first or all of the models/structures within a file are processed.

2b. All atom records containing a character other than " ", "A" or "1" in the alternate location column (column 17) were ignored. The convention on alternate locations is that the column be blank if the atom is unambiguously located, or contain the sequence "A", "B", "C" etc.. for each potential conformation for ambiguous atoms. Some PDB files use digits instead of letters, for example pdb code 1icn.

2c. All atom records containing " Q" as the first two characters of the atom name were ignored. The element " Q" is commonly used in NMR processing to represent pseudo atoms used in refinement.

2d. The residue name "DUM" is officially used to denote dummy atoms, such as in pdbcodes 1b3o and 1som, to represent unexplained electron density. All atom records with residue name "DUM" are ignored.

2e. All atoms with X, Y and Z co-ordinates equal 9999.000 are also ignored. These dummy atoms are created by XPLOR when the actual atom is not used in the structure refinement.

Occasionally even the above rules are insufficient. For example, pdbcode 1a34 contains the same atom, O3* of U1D, twice. First as atom serial number 2960 and again as 2973, both with identical co-ordinates.

3. Determination of Atomic Number

One of the hardest steps in processing PDB files is determining the appropriate element for each atom. This was acknowledged by the PDB with revision 2.0 of their file format specification, that introduced a two character atomic symbol field to each ATOM and HETATM record. Unfortunately, this field is often more misleading than helpfull. Many contain invalid atomic symbols (G in 1e8b) or incorrect atomic symbols (Nd in 1d00, U in 1bt6, 1fgg, 1hh6, 3bcc). Combined with the slow uptake of this field and the requirement to process pre-version 2.0 files, its is better to just process the atom name, which together with the residue name typically provide enough contextual information to determine the atomic number.

3a. The atom name " UNK" is interpreted as an atom of atomic number, and is assigned element number 0, represented "*" in SMILES.

3b. If the first character of the atom name is blank, and the third character is lower case, the PDB file was generated by CONCORD and the atomic symbol is taken from the 2nd and 3rd characters. As PDB files are normally only uppercase, CONCORD's incorrect column alignment is easy to recognize.

3c. If the first character is blank, and residue name denotes a hetero group that must have a prefix, and the third character is "H", "C", "N", "O", "P" or "S", the appropriate atomic number is used. If the third character is not one of these, the 2nd character is used as the atomic symbol. The current hetero groups that require a prefix are "GPC", "NAD" and "NDP" (which fixes 1rds).

3d. If the first character is blank, the second character is assumed to be the atomic symbol. If this character isn't recognized as an atomic symbol, and the third character contains "H", "C", "N", "O", "P" or "S", then the appropriate atomic number is used.

3e. If the first character is a digit, the second character is assumed to be the atomic symbol.

3f. If the first character is "H", and the residue is a recognised amino acid, nucleic acid or special hetero group, then the atom is assumed to be hydrogen. For remaining groups, if the first two characters are a recognized atomic symbol, this is the element, or hydrogen if the first two characters are not a recognized atomic symbol. This fixes the Holmiums "Ho" and Mercuries "Hg" in numerous files (including 1f85).

The recognized amino acid residues are "ACE", "ALA", "ARG", "ASN", "ASP", "ASX", "CYS", "FOR", "GLN", "GLU", "GLX", "GLY", "HIS", "HYP", "ILE", "LEU", "LYS", "MET", "PHE", "PRO", "PCA", "SER", "THR", "TRP", "TYR", "UNK" and "VAL".

The recognized nucleic acid residues are " A", " C", " G", " T", " U", " +U", " YG", "1MA", "1MG", "2MG", "5MC", "5MU", "7MG", "H2U", "M2G", "OMC", "OMG" and "PSU".

The recognized special hetero groups are "101", "12A", "1AR", "1GL", "2AS", "2GL", "3AA", "3AT", "3DR", "3PO", "6HA", "6HC", "6HG", "6HT", "A26", "AA6", "ABD", "AC1", "ACO", "AIR", "AMU", "AMX", "AP5", "AMG", "APU", "B9A", "BCA", "BNA", "CAA", "CBS", "CGS", "CMC", "CND", "CO8", "COA", "COF", "COS", "DCA", "DGD", "FAB", "FAD", "FAG", "FAM", "FDA", "GPC", "IB2", "NAD", "NAH", "NAI", "NAL", "NAP", "NBD", "NDP", "PAD", "SAD", "SAE", "T5A", "tRE", "UP5" and "ZID". Although it shouldn't be required including "BU1" fixes 1bk9.

3g. If the first character is one of the symbols """, "'" or *" then the second character is treated as the atomic symbol.

3h. If the residue name indicates a hetero group that uses an suffix, the the first character of the atom name is treated as the atomic symbol. The current hetero groups that require a suffix are "AGF", "COT" amd "FVF" (which fixes 1cjw).

3i. If the residue name is one of the special hetero groups listed in 3f, the second character of the atom name is treated as the atomic symbol.

3j. The default, all remaining cases, treats the first two characters of the atom name as the atomic symbol. If the first two characters aren't a valid element, then the second character is treated as the atomic symbol.

3k. An exception to the above rule 3j is that "Nd" (atomic number 60) doesn't occur in any of the amino acids listed in section 3f, and occurances are corrected to nitrogen. This fixes pdbcode 1d00.

3l. Finally, exceptions to all of the above rules are handled explicitly. Currently, the only exception is the atom name "NSE1" which occurs in residues "SAD" and "SAE". This atom name actually represents a selenium atom, encoded in the 2nd and 3rd columns! (which handles 1adg and 1b3o).

4. Determination of Atomic Connectivity

Given a set of real atoms together with their atomic numbers and 3D co-ordinates, the next step is to determine their covalent bonding. Although the PDB format prescribes the use of explicit "CONECT" and "LINK", these are not universally used and when they are typically only contain connectivity information for one of two residues. The necessity to robustly handle PDB files without such records, means that connectivity is currently entirely determined using covalent bonding radii.

4a. Metal atoms are prevented from forming covalent bonds. It appears that software used for protein x-ray crystallography (such as XPLOR and CNS) are poorly parameterized for inorganic compounds when compared with small molecule crystallography software (such as SHEL-X). The false positive rate when using proximity based connectivity perception is much higher in PDB than in the Cambridge Structural Database (CSD).

Unfortunately, this constraint means that algorithm fails for some of the organo-metalics in PDB, including the ferrocene in 1a3l and the uridine vanadate in 6rsa.

There is a remarkable diversity of metalic elements in the PDB:

ElementSymbolNumberExample PDB Code
LithiumLi31e5k
BerylliumBe44ukd
MagnesiumMg121dpl
AluminiumAl131xlm
ArgonAr181c6i
VanadiumV231dkt
ChromiumCr241cf6
ManganeseMn251nls
CobaltCo271b6a
CopperCu291mfm
ZincZn301dzv
GalliumGa311cfw
ArsenicAs333cao
KryptonKr361c6g
RubidiumRb37460d
StrontiumSr38434d
YttriumY391dde
MolybdenumMo421g8k
SilverAg471aoo
CadmiumCd481mfm
IndiumIn491ind
AntimonySb511f48
TelluriumTe521el7
XenonXe541c6e
CaesiumCs551av2
BariumBa56284d
LanthanumLa571djg
CeriumCe581ak8
SamariumSm621a3c
EuropiumEu631qsl
GadoliniumGd642hhm
TerbiumTb651ncz
HolmiumHo671psr
YtterbiumYb702bop
LutetiumLu711e8x
TungstenW746fit
RheniumRe751b0q
OsmiumOs761qa6
IridiumIr771c1k
PlatinumPt781qbi
GoldAu791a8d
MercuryHg801cc8
ThalliumTl811fpj
LeadPb821xxa
UraniumU921b5j

4b. Two atoms are considered bonded if they are closer than the sum of their covalent radii plus a small tolerance. The values of covalent radii are those published by the Cambridge Crystallographic Data Center and are given in the table below. Although the CCDC recommend a tolerance of 0.4 Angstroms, the current algorithm uses the larger value 0.45 Angstroms used by Baber and Hodgkin, and by Hendlich et al.. This value is lower than the 0.56 Angstrom tolerance used by the molecular graphics program RasMol. A lower bound of 0.4A for bond lengths is also imposed.

ElementSymbolNumberRadius (A)
HydrogenH10.23
BoronB50.83
CarbonC60.68
NitrogenN70.68
OxygenO80.68
FluorineF90.64
SiliconSi141.20
PhosphorusP151.05
SulfurS161.02
ChlorineCl170.99
ArsenicAs331.21
SeleniumSe341.22
BromineBr351.21
TeluriumTe521.47
IodineI531.40

4c. Bonds can only be formed between atoms with the same chain identifier and that are not separated by a TER (chain terminator) record in the PDB file. Unfortunately the TER record in PDB file 1atl is incorrectly placed, cleaving a methyl group.

4d. All solvent atoms are prevented from forming covalent bonds other than to the same residue. Close contacts with waters account for a frequent source of false positive bonds. Recognizing the residue names for water ("HOH", "H20", "WAT", "TIP", "SOL", "DOD" and "D20") and for other solvents ("EOH", "MOH", "PER", "PO4", "SO4" and "SUL"), and explicitly excluding these atoms from inter-residue bonding improves the resuting topologies.

4e. Any hydrogens that are two or more connected by the above procedure are then fixed. All its bonds, except the one to the closest non-hydrogen atom in the same residue, are deleted.

4f. Occasionally, atoms with identical co-ordinates are present in a PDB file. If two atoms in different PDB groups have the same co-ordinates and are bonded to a common third atom, all bonds between the these two groups are broken. If two atoms with the same co-ordinate remain bonded to a common third atom, the second atom is deleted.

4g. If any atom is bonded to more than a maximum number of neighbors for that element, then bonds between the PDB group containing that atom and groups connected via that atom are broken. This handles multiple occupancy problems such as 1abe. The maximum neighbor count for each element is four unless tabulated below.

ElementSymbolNumberMax Neighbors
HydrogenH11
BoronB53
OxygenO82
FluorineF91
BromineBr351
IodineI533

5. Extraction of Small Molecule Ligands

The next task is the step of determining which parts of a molecule are a small molecule ligand. This is complicated by several factors, including covalently bound ligands, peptide and peptide-like ligands. Conventionally, the small molecule ligands are denoted by HETATM records rather than ATOM records, but this rule is not applicable for ligands containing amino acids (such as 1ela for example) and not always honored (pdbcodes 1dxd, 1qlb, 5ana and 1sdg).

5a. All connected components containing more than 100 heavy atoms are considered to be protein (or nucleic acid). If the entire component originated from ATOM records, it is ignored. Otherwise, all ATOM atoms are deleted, except for those covalently bonded directly to HETATM atoms. The atoms are converted to an asterisk to indicate an attachment point, and bonds between pairs of asterisks are deleted. This should have the effect of eliminating all large proteins and nucleic acids, after cleaving all covalently bound ligands and post-translational modifications.

5b. The next step is a fragment size filter. All remaining connected components (after step 5a) are ignored if they contain more than 100 or less than six heavy atoms (typically metals, waters, sulphates and phosphates).

5c. The final optional step is to remove all protein and nucleic acid fragments. These are all connected components (after 5b) that contain no HETATM records. If the PDB file contained a connected component of more than 100 atoms (a protein or nucleic acid), an exception is made for components whose chain identifiers occured in no other components. This correctly identifies peptide ligands (such as chain "I" in 4hvp, or chain "C" in 1hsl). Unfortunately, the lysine in 1lst is still not correctly perceived as a ligand.

6. Hybridization State Determination

The next step is to assign a geometrical or putative hybridization state to each non-terminal atom.

6a. Initial assignment of hybridization state is on the basis of average bond angle. Two connected atoms with a bond angle of greater than 155 degrees are typed as sp-hybridized. Remaining atoms with an average bond angle greater than 115 degrees are typed as sp2-hybridized and those with an average bond angle less than or equal to 115 degrees as sp3-hybridized.

6b. Average bond angles are unable to descriminate the correct hybridization state of two connected atoms in five membered rings. For example, in an aromatic ring such as pyrole or furan, the sp2 carbons have bond angles near 108 degrees! A second pass sets the hybridization of all two connected atoms in a five membered ring, as sp2-hybridized when the average in-ring torsion is less than 7.5 degrees. A similar planarity test is used for six membered rings, where a 12 degree threshold is used.

6c. A final 'antialiasing' pass is used to detect and correct misassigned hybridization states. If an sp-hybridized atom doesn't have a sp-hybridized or terminal neighbor with unfilled valence, it is reassigned as sp2-hybridized. Similarly, if an sp2-hybridized atom doesn't have a sp2-hybridized or terminal neighbor with unfilled valence, it is reassigned as sp3-hybridized.

7. Functional Group Recognition

Once hydrization states have been assigned, the program performs pattern matching to identify commonly occuring chemical motifs or functional groups that have fixed bond orders. These pattern explicitly cover all of the cases where a central atom may have more than one incident multiple bond, including azides, nitros and sulphones.

Because the recognition is applied after hybridization state assigment, the a patterns can make use of geometry at each pattern position. Typically, however, hybridization information is only used for ring atoms. This correctly handles distorted carboxylic acid groups (such as in 1tdr) but also correctly handles difficult systems like 2ada.

When multiple patterns are applicable heuristics based upon atomic electronegativity are used to place the double bond. A double bond to a terminal oxygen is chosen over a double bond to a terminal sulphur. For guanadine groups, if the central carbon is in a ring a ring nitrogen is chosen over a non-ring nitrogen, otherwise a terminal nitrogen is chosen over a non-terminal nitrogen.

8. Aromatic Ring Perception

All five and six membered rings of sp2-hybridized atoms are checked for aromaticity and assigned bond orders before all other bonds. The first pass marks potentially aromatic rings. All atoms in the five and six membered rings containing only sp2-hybridized atoms are typed. The patterns for each atom type, indicating allowable aromatic atoms, are tabulated below. If any atom in the ring is not amongst the patterns listed the ring is rejected from further analysis.

Under each of the atom types above are listed the number of electrons each contributes to the ring for the Huckel 4n+2 aromaticity calculation. Carbons typically contribute one, except when doubly bonded to a terminal oxygen atom. Oxygen, Sulphur and Selenium contribute two. Pyridine and N-oxide nitrogens contribute one, and pyrole nitrogens contribute two.

At this point, there are two ambiguous cases. The first is that a carbon of unfilled valence can double bond to the oxygen or the ring, resulting in zero or one electrons respectively. The second is that a two connected nitrogen of unfilled valence could be double bonded to the ring to become pyridine-like, or gain an implicit hydrogen to become pyrole-like, contributing one and two electrons respectively.

The heuristic used by the algorithm is that a sp2-hybridized ring should be made aromatic if possible. This is done in the following steps.

8a. First, a test is applied to the ambiguous cases (*-[C](O)-* and *-[N]-*) to see whether they can be resolved by their neighbours. If both neighbours have full valences or incident multiple bonds, both ring bonds must be single and their unique types are reassigned (*-C(=O)-* and *-[NH]-* respectively).

8b. If the count of electrons in the ring modulo four is one, and the ring contains an ambiguous nitrogen, it is retyped pyrole-like (with an implicit hydrogen).

8c. If the count of electrons in the ring modulo four is three, and the ring contains an ambiguous carbon, it is retyped with an exo double bond to the oxygen.

8d. If the count of electrons in the ring modulo four is three, and the ring contains a uncharged pyrole-like nitrogen, the nitrogen is charged.

8e. If, after all the above reassignments, the count of electrons in the ring modulo four is two, the Huckel condition, all the atoms and bonds are marked as potentially aromatic.

After the above processing has been applied to all rings in a molecule, the molecule is passed to a kekule form assignment routine that attempt to provide a kekule form of alternating single and double bonds. The actual kekule form assignment algorithm is complex and described in detail elsewhere (parts of the algorithm were presented in the tautomerism talk at EuroMug99).

9. Bond Order Assignment

After aromatic ring system perception, the remaining bond orders are assigned.

9a. The first stage of final bond order assignment is to mark all ring bonds between sp2-hybridized atoms as aromatic, and call the kekule assignment algorithm a second time. This should assign alternating single and double bonds to conjugated ring systems including aromatic rings with more than seven atoms.

9b. The next step is to process any unsatisfied sp2-hybridized atom by checking each neighboring atom for a terminal oxygen, and when found testing the bond length against the table of bond lengths below. This pass preferencially selects the keto over the enol forms of conjugated molecules.

9c. The very last step is to check each bond using distance criteria. Only bonds between atoms with unfilled valences and without an incident multiple bond are considered. Bonds between sp-hybrized atoms or between sp-hybridized and terminal atoms are tested against triple bond lengths, and bonds between sp2-hybridized atoms or between sp2-hybridized atoms and terminal atoms are tested against double bond lengths. Bonds between a pair of terminal atoms are tested against both double and triple bond lengths.

BondDistanceExample
C#C1.251nco
C#N1.222cgr
C=C1.381rbp
C=O1.283cpp
C=S1.708cpp
N=N1.321srj

9d. All remaining unfilled valences are filled with implicit hydrogen atoms. The atomic valences are the standard values used by the Daylight toolkit.

10. Results

The current implementation of the above algorithm is able to process all 14596 files in PDB in just under eight hours. Of these, 6941 were identified as containing one or more ligands. Removing duplicate ligands from each PDB file, results in 10561 ligand/pdbfile pairs. These 10561 pairs correspond to 3501 unique small molecules.

The most common ligands are heme and cytochrome (456+242 occurences), followed by N-acetyl-D-glucosamine "NAG" (341 occurences), followed by glycerol "GOL" (222 occurences). 2426 ligands occur only once, with the remaining 1075 occuring multiple times (135 >10, 39 >25, 23 >50 and 12 >100).

On test set 1 of protein ligands from the review by Ricketts et al.. The algorithm gets all 18 of the ligands correct. Back in 1996 the author did a review of existing techniques on a subset of 17 of these files. The results are shown in the table below. The column labelled "All Single" shows the structures correct when all bonds are left single. The "Covalent" method (inspired by Pauling's nature of the chemical bond) assigns a triple bond to lengths less than 81% of the sum of the attached covalent radii, a double bond between 81% and 87%, and a single bond above 87%. The "SMARTS" method used simple pattern matching of terminal groups (similar to the functional group recognition in section 7). The "Cambridge" and "Oxbridge" algorithms are based on the algorithm of Baber and Hodgkin. The "Cambridge" column are the results using just bond lengths for CSD, and "Oxbridge" is the full algorithm using bond lengths and bond angles. The "IDATM" column represents the algorithm of Meng and Lewis as encoded in Babel v1.4. The "COBRA" column is Andrew Leach's perception code as used by Oxford Molecular's COBRA package. The "BONDAGE" column contains the results of algorithm by Blaney, Dixon and Swanson in the DGEOM95 package.

No.PDB CodeAll SingleCovalentSMARTSCambridgeOxbridgeIDATMCOBRABondageThis work
11claNNNNNNNNY
22aatNNNNNNNNY
32gbpYYYYYYYYY
43cppNYNYYYNYY
52trmNNNNYNNNY
63ptbNNNNYNNYY
75xiaYYYYYYYYY
84xiaYYYYYYYYY
98ldhNNYYNNYYY
108atcNNNNNNYNY
118rsaNNNNYYNNY
121fcbNNNNNNNNY
131fx1NNNNNNNNY
141goxNNNNNNNNY
152dhfNNNNNNYNY
164dfrNNNNNNNNY
177dfrNNNNNNNNY
Totals3445756617

Note these results are slightly biased as results were used in the design and development of the algorithm presented above. Also note that the table is actually presented in approximate order of the quality of results, and although Baber and Hodgkin's Oxbridge algorithm looks impressive, for the structures is got, it performed far worse than the methods to its right.

On the DockIt test set of 10 PDB files, the program gets all 10 ligands correct. The original FORTRAN bondage algorithm gets 8/10, and CCDC's ReliBase, which is hand curated, also gets 8/10. ReliBase doesn't consider the peptide inhibitor of HIV protease in 4hvp to be a ligand, and gets the element typing in 1rds wrong.

Hendlich reports in his paper, that the BALI program gets the biotin in 1bib wrong due to an abnormal bond length of 1.133A which is typed as a triple bond. The described algorithm get 1bib correct as a triple bond requires supporting sp-hybridization evidence.

Finally the 133 pdb codes below are taken from the extended GOLD benchmark suite of structures. The "This" column represents the algorithm described here, the "Gold" column represents the "ligand.mol2" structures distributed with gold, and the "Reli" column represents the structures in Relibase v4.0.

No.PDB CodeThisGoldReli
11aaqYYY
21abeYYY
31acjYNY
41ackYY-
51acmYY?
61acoYY?
71aecYN?
81ahaYYY
91aptYYN
101aseYNN
111atlNYN
121azmYYY
131bafYYY
141bbpNYY
151blhYYY
161bmaYYY
171bybYYY
181cbsYYY
191cbxYYY
201cdgYYN
211cilYYY
221comYNY
231coyYYY
241cpsYYY
251ctrYYY
261dbbYYY
271dbjYYY
281didYYY
291dieYYY
301dr1YYY
311dwdYYY
321eapYYY
331eedY?Y
341epbYYY
351etaYYY
361etrYNY
371fenNYY
381fkgYYY
391fkiYYY
401frpYYY
411ghbYYY
421glpYN?
431glqYYN
441hdcNYY
451hdyYYY
461hefYYY
471hfcYYY
481hriYYY
491hslYYY
501hytYNY
511icnYN?
521idaYYY
531igjY?Y
541imbYYY
551iveYYY
561lahYY?
571lcpYY?
581ldmYY?
591licYNY
601lmoYYY
611lnaYYY
621lpmYNY
631lst-Y-
641mcrYYY
651mdrYYY
661mmqYYY
671mrgYYY
681mrkYYY
691mupNYY
701ncoNYY
711nisYY?
721pbdYYY
731phaYYY
741phdYYY
751phgYYY
761pocYYY
771rdsYYN
781rneYYY
791robYYY
801sltYYY
811sncYNY
821srjYYY
831stpYYY
841tdbYYY
851tkaNYY
861tmnNYY
871tngYYY
881tniYYY
891tnlYYY
901tphYNY
911tppY?Y
921trkNNY
931tylYYY
941ukzYYY
951ulbYYY
961wapYYY
971xidYYY
981xieYYY
992adaYYY
1002ak3YYY
1012cgrYNY
1022chtYYY
1032cmdYY?
1042ctcYYY
1052dblNYY
1062gbpYYY
1072lgsYY?
1082mcpYY?
1092mthYY-
1102phhYYY
1112pk4YY?
1122plvNYY
1132r07YYN
1142simYYY
1152yhxYYY
1163aahNY-
1173claYYY
1183cpaYYY
1193gchN?Y
1203hvtYYY
1213ptbYYY
1223tpiYY?
1234ctsYY?
1244dfrYYY
1254estN?Y
1264fabYNY
1274phvYYY
1285p2pYY?
1296abpYYY
1306rntNYY
1316rsaN?Y
1327timYY?
1338gchYYY
Totals116112105

The files 1ack, 2mth and 3aah are obsolete and have been replaced in the PDB by corrected versions. Relibase consistently represents nitro groups incorrectly. The lysine in 1lst is not perceived as a ligand in this work or Relibase. The vanadium atom in 6rsa causes problems for the described algorithm and GOLD (where it is replaced by a phosphorus). 1bbp is a particularly tricky chromophore. There's a misplaced "TER" record in 1atl.

11. Acknowledgements

I'd like to thank Jeff Blaney and Scott Dixon for developing the FORTRAN program BONDAGE and contributing it to the CEX project, Matt Stahl and Pat Walters both for developing Babel (including an implementation of Meng and Lewis' IDATM) and for providing me the CSD benchmark set, Andrew Leach for running COBRA's connectivity perception on the Ricketts1 set, Edward Hodgkin for providing the source code to the Oxbridge algorithm, Jack Delany for merlin administration and finally SGI and Compaq for providing hardware.

12. Bibliography

  1. D.M.F. van Aalten, R. Bywater, J.B.C. Findlay, M. Hendlich, R.W.W. Hooft and G. Vriend, "PRODRG: A Program for Generating Molecular Topologies and Unique Molecular Descriptors from Coordinates of Small Molecules, Journal of Computer-Aided Molecular Design, Vol. 10, pp. 255-262, 1996.
  2. Frank H. Allen, Olga Kennard and David G. Watson, "Tables of Bond Lengths Determined by X-Ray and Neutron Diffraction. Part 1: Bond Lengths in Organic Compounds", Journal of the Chemical Society, Perkin Transactions II, pp. S1-S19, 1987.
  3. Jon C. Baber and Edward E. Hodgkin, "Automatic Assignment of Chemical Connectivity to Organic Molecules in the Cambridge Structural Database", Journal of Chemical Information and Computer Science (JCICS), Vol. 32, No. 5, pp. 401-406, 1992.
  4. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov and P.E. Bourne, "The Protein Data Bank", Nucleic Acids Research, Vol. 28, pp. 235-242, 2000.
  5. F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr., M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi and M. Tasumi, "The Protein Data Bank: A Computer-Based Archival File for Macromolecular Structures", Journal of Molecular Biology, Vol.112, pp. 535-542, 1977.
  6. Richard A. Engh and Robert Huber, "Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement", Acta Crystallographica, Vol. A47, pp. 392-400, 1991.
  7. M. Hendlich, F. Rippmann and G. Barnickel, "BALI: Automatic Assignment of Bond and Atom Types for Protein Ligands in the Brookhaven Protein Databank", Journal of Chemical Information and Computer Science (JCICS), Vol. 37, No. 4, pp. 774-778, 1997.
  8. G.J. Kleywegt and T.A. Jones, "Databases in Protein Crystallography", Acta Crystallographica, Vol. D54, (CCP4 Proceedings) pp. 1119-1131, 1998.
  9. Elaine C. Meng and Richard A. Lewis, "Determination of Molecular Topology and Atomic Hybridization States from Heavy Atom Coordinates", Journal of Computational Chemistry, Vol. 12, No. 7, pp. 891-898, 1991.
  10. Eleanor M. Ricketts, John Bradshaw, Mike Hann, Fiona Hayes, Neil Tanna and David M. Ricketts, "Comparison of Conformations of Small Molecule Structures from the Protein Data Bank with those Generated by Concord, Cobra, ChemDBS-3D and Convertor and those Extracted from the Cambridge Structural Database", Journal of Chemical Information and Computer Science (JCICS), Vol. 33, No. 6, pp. 905-925, 1993.
info@metaphorics.com