---- User Talk ---- |
This talk describes the efficient implementation of an algorithm to recognise polypeptides and nucleic acids from simple atomic connectivity. This "chains" algorithm was originally developed as part of the CEX initiative for exchange of chemical information between (bio)chemistry software. A large number of molecular graphics programs, such as RasMol, behave differently to the same molecule read from different file formats. For example, a SMILES string or MDL .mol file contains bond order information that often cannot be represented in a Brookhaven PDB file. Similarly the residue naming, residue numbering, atom naming, chain identifier and HETATM flags within a PDB file cannot be represented in a SMILES string or MDL .mol file.The first step to solving this problem within CEX is the contribution of the "Bondage" program that can determine bond order and atom hybridization from 3D co-ordinates. The remaining step was to perform graph matching of 2D structures to match macromolecular backbones and correctly identify amino acids and nucleic acid bases. This talk describes the efficient implementation of such an algorithm capable of correctly identifing large proteins allowing efficient "chemically intelegent" file format conversion.