Dave Weininger
Metaphorics LLC
When the activity is a reaction rate or equilibrium constant, the QSAR is known as a physiochemical QSAR. An example of a physiochemical QSAR is the correlation of a dissociation constant (e.g., pKa) with σ (sigma, electron-withdrawing potential), as in the Hammett equation. Example. Another example.
When the activity is an observed biological endpoint (example) or is measured in a biological system (example), the QSAR is known as a biological QSAR.
QSARs are most commonly dertived for structurally-related compounds, e.g., congeneric series (example), but can also be derived for miscellaneous compounds using molecular parameters (example).
A number of QSAR models are used, the most common being a "single parameter linear model" (i.e., a simple correlation, example). More complex, non-linear multiparameter models can be supported when enough data are available (example).
Two important QSAR research techniques are comparative QSAR (comparison of QSARs for similar activities in different systems or for different kinds of structures in the same system), and lateral validation of biological activities with physiochemical reactivities (IMHO, a bit of a misnomer, but the name has stuck).
It is important to understand that a QSAR is simply an observation of a correlation between one or more molecular parameters and an observed property. By themselves, QSARs do not establish cause-and-effect relationships between parameters and properties. However, QSARs can be predictive in the statistical sense.
The qsar database contains about 20,000 QSAR relationships, about evenly divided between physicochemical (45%) and biological (55%) QSARs. A huge amount of data are represented: 265,785 molecular structures with 2,022,106 total parameter values recorded (of which 51,755 are used in final QSARs). These QSAR data were obtained from 567 journals and 50 books, representing 20,032 specific citations written by 24,700 different authors.
This database represents the bulk of what has been observed to date about the effects of molecular structure on chemical reactivity and biological activity (discounting proprietary information). For anyone involved in any kind of molecular discovery, access to this data should be the "ante". I.e., before setting out to discover something new, one should have access to what has already been noticed.
Existing programs which provide access to QSAR data (e.g., C-QSAR) can only be effectively used by a dedicated and well-trained specialist. Part of this problem is that the nature of QSAR information is not as "flat" as most data -- each entry in a QSAR database represents a relationship with a wealth of underlying information. Another aspect leading to apparent QSAR complexity is that a large number of molecular parameters have evolved over the last 30-40 years, many of which are poorly documented and thus poorly understood. Additionally, QSARs have been usefully applied to many sub-fields but QSAR remains on the fringe of each.
The QSAR dataset is therefore a perfect candidate for a inclusion in a federated database system such as fedora. The qsar fedora service implements a primary object database in which each entry is a QSAR relationship with component objects such as molecular structures, observed properties, molecular parameters, equations, and references. (Status.) Internal object databases are maintained for QSAR classification, molecular parameters, enzyme functionality, publications, and authors.
The overall design goals of the qsar fedora service are divided into four levels:
All QSAR structures (structures used to derive a QSAR and omitted ones). All observed values (dependent variables). All available molecular parameters (independent variables, whether used in the final QSAR or not). Complete references, indexed by publication and author. Automatic structure crossreferencing with other fedora servers.
QSAR classification and system. The QSAR equation, indexed by parameter (term). Regression statistics, including confidence limits on the coefficient of each term and on predicted values. Worked out computation for each structure. Enrichment with related enzyme functionality assignments and automatic enzyme crossreferencing with other fedora servers.
Evaluation of the structural scope and parametric scope of each QSAR.
Given the computation of applicability above, calculate all required molecular parameters and estimate predicted value and error of prediction. Note that, by their nature, not all QSARs are predictive.