Spring Meeting 2007

Alderley Park, UK


  • Eleanor Gardiner (University of Sheffield, UK)
    Cluster representation using reduced graphs [Slides] [Abstract]
  • Claire Gavaghan (Safety Assessment, AstraZeneca, UK)
    Practical considerations in using QSARs in pharmaceutical safety assessment [Slides] [Abstract]
  • Peter Hunt (Novartis Institutes for BioMedical Research, UK)
    Transformation vectors in drug optimisation [Slides] [Abstract]
  • Phil Jewsbury (Director of Medicinal Chemistry, AstraZeneca, UK)
    Welcome and introduction
  • Johannes Reynisson (ICR, UK)
    Benchmarking the reliability of QikProp. Correlation between experimental and predicted values [Slides] [Abstract]
  • Mike Sutcliffe (Manchester Interdisciplinary Biocentre & School of Chemical Engineering and Analytical Science, University of Manchester, UK)
    Modelling the hERG potassium channel: activation, inactivation and drug binding [Slides] [Abstract]
  • Marcel Verdonk (Astex Therapeutics, UK)
    A diverse and high-quality test set for the validation of protein-ligand docking performance [Abstract]
  • David Wood (University of Sheffield, UK)
    The use of kernel discrimination algorithms in virtual screening [Slides] [Abstract]


  • Tim Cheeseright (Cresset BioMolecular Discovery Ltd., Letchworth, UK)
    A General Method for Defining the 3D Active Site Requirements of GPCRs: CCK2 Ligands Example. [Abstract]
  • J.C. Dearden (School of Pharmacy and Chemistry, Liverpool John Moores University, UK)
    A comparison of commercially available software for the prediction of pKa values [Abstract]
  • A. Demco (ISIS Group, School of Electronics and Computer Science, University of Southampton, Southampton, UK)
    Graph Kernels for Molecular and Reduced Graphs [Abstract]
  • Mark Hewitt (Liverpool John Moores University, UK)
    (Q)SAR Models in Reproductive Toxicity: Prediction of Drug Transport Across the Human Placenta [Abstract]
  • Robert W. Stanforth (IDBS, UK; Birkbeck College, UK)
    Extending k-means clustering for non-linear QSAR modelling [Abstract]


Presentation: Eleanor Gardiner , David Cosgrove, Valerie Gillet, Peter Willett
Cluster representation using reduced graphs

University of Sheffield, UK [Slides]

Chemical databases are routinely clustered, with the aim of grouping molecules which share similar structural features. However when molecules are clustered using fingerprints it may be difficult to decipher the structural commonalities which are present. Previously we have used reduced graphs, where each node corresponds to a eneralized functional group, as topological molecular descriptors for virtual screening. Here we represent the molecules of a cluster as reduced graphs. By repeated application of a Maximum Common Substructure (MCS) algorithm we obtain one or more reduced graph cluster representatives. The sparsity of the reduced graphs means that the MCS calculations can be performed in real time. The reduced graph cluster representatives are readily interpretable in terms of functional activity and can be mapped directly back to the molecules to which they correspond, giving the chemist a rapid means of assessing potential activities contained within the cluster.

Presentation: Claire Gavaghan
Practical considerations in using QSARs in pharmaceutical safety assessment

Safety Assessment, AstraZeneca, UK [Slides]

Modelling of safety endpoints has been an active area nearly since the beginning of formal QSAR and some valuable insights into mechanisms of toxicity have come from these studies. Depending on the quality of the data and the choice of descriptors and modelling methods, models for safety endpoints have proven to be reasonably useful in pharmaceutical safety assessment in both regards. One important aspect of making QSAR models is the exploration of the data prior to modelling and the subsequent derivation of appropriate descriptors from this exercise. The learnings from both this pre-treatment of the data and the QSAR modelling can then be used in a variety of other forms to enhance the overall safety of new compounds such as construction of semi-automatic guides for compound design that take safety aspects into account. With respect to predictions from QSAR models, many practical steps can be taken to improve predictive performance, such as using associative libraries and monitoring the ‘drift’ of new chemistry away from the chemical space, or applicability domain of the original model. Examples of both of these aspects will be presented. Finally, safety data sometimes cannot be modelled, but need to be presented to the chemist in some form. SAR for endpoints such as reactive metabolite formation can be put into a form that is useful, but that does not involve any formal prediction. Thus for safety endpoints, the data, the actual endpoint and the end user must all be taken into account when construction SAR/QSAR tools. When all three aspects are considered, no two approaches are the same.

Presentation: Peter Hunt
Transformation vectors in drug optimisation

Novartis Institutes for BioMedical Research, UK [Slides]

Transformation vectors are generated by the sequential, pair-wise comparison of two molecular fingerprints for all members of a data set. These vectors are themselves fingerprint descriptions which contain positive and negative descriptor frequencies and similarities between vectors can be determined by cosine similarity. From such similarity matrices, clustering of transformations is possible and hence consistent or partially consistent trends in local QSAR can be determined from any data set. These transformations are also directions of molecular change which can be applied to a molecule of interest to suggest new molecules for synthesis to improve the desired endpoint. Methods to produce these new molecules, both theoretical and practical will be discussed and their application to drug optimisation highlighted.

Presentation: Phil Jewsbury
Welcome and introduction

Director of Medicinal Chemistry, AstraZeneca, UK


Presentation: Johannes Reynisson
Benchmarking the reliability of QikProp. Correlation between experimental and predicted values

ICR, UK [Slides]

The theoretical prediction power of the software package, QikProp, was tested. This was achieved by comparing experimentally known results to the predicted values and investigating the statistical distribution of a collection of marketed orally bio-available drug compounds (~ 470). The simpler, molecular descriptors such as Log P, Log S, dipole moment, and ionisation potentials a good correlation was obtained. For the more ambitious ADME prediction modules such as the cancer cell permeability using Caco-2 and MDCK cell line models, hERG+ interaction and blood-brain-barrier penetration ambiguous results were obtained suggesting that more development work is needed.

Presentation: Mike Sutcliffe , Phillip J. Stansfeld, Peter Gedeck, John S. Mitcheson
Modelling the hERG potassium channel: activation, inactivation and drug binding

Manchester Interdisciplinary Biocentre & School of Chemical Engineering and Analytical Science, University of Manchester, UK [Slides]

The human ether-a-go-go related gene (hERG) potassium ion channel is a voltage-dependant protein, found in the membranes of cardiac cells and neurons. Many commonly used, structurally diverse, drugs block the hERG channel to cause acquired long QT syndrome (LQTS), which can lead to sudden death via lethal cardiac arrhythmias. This undesirable side effect is a major hurdle in the development of safe drugs. To gain insight into the structure of hERG, and the nature of drug block, we have produced structural models of the channel Pore domain, into each of which we have docked a set of 20 hERG blockers. In the absence of an experimentally determined 3-dimensional structure of hERG, each of the models was validated against site directed mutagenesis data. First, hERG models were produced of the open and closed channel states, based on homology with the prokaryotic K+ channel crystal structures. The modelled complexes were in partial agreement with the mutagenesis data. To improve agreement with mutagenesis data, a KcsA-based model was refined by rotating the four copies of the S6 transmembrane helix half a residue position towards the C-terminus, so as to place all residues known to be involved in drug binding in positions lining the central cavity. This model produces complexes which are consistent with mutagenesis data for smaller, but not larger, ligands. Larger ligands could be accommodated following refinement of this model by enlarging the cavity using the inherent flexibility about the glycine hinge (Gly648) in S6, to produce results consistent with the experimental data for the majority of ligands tested. The hERG channel undergoes rapid C-type inactivation – involving gating via the selectivity filter. Although mutations within hERG are known to remove this process, a structural basis for the inactivation mechanism has yet to be characterised. Using MD simulations coupled with homology modelling, we observe in the wild-type hERG channel that the carbonyl of the filter aromatic – Phe627 – swiftly rotates away from the conduction axis, thus transferring the channel into a non-conducting state. In contrast, in non-inactivating mutant channels this conformational change occurs less frequently. In these mutant channels, interactions of a water molecule located behind the selectivity filter are critical to the enhanced stability of the conducting state. The simulations suggest a mechanism for regulating K+ ion efflux through the selectivity filter, which is likely to be shared by other members of the K+ channel family.

Presentation: Marcel Verdonk
A diverse and high-quality test set for the validation of protein-ligand docking performance

Astex Therapeutics, UK

Validation of the methodologies used in structure-based design is nearly as hard as applying the techniques successfully. One source of uncertainty is the X-ray structures used to validate structure-based design techniques. We will describe the development of a new validation set that contains 85 complexes, each of a unique drug target, for which the ligand is a drug-like compound and has unambiguous electron density.

This new test set will then be used to assess how the performance of a protein-ligand docking program may be influenced by the protocols used to treat the protein and ligand structures. It turns out that relatively unbiased protocols give success rates of approximately 80% for re-docking into native structures, but it is possible to get success rates of over 90% with some protocols

Presentation: David Wood , Beining Chen, Rob Harrison, Peter Willett, Xiao Lewell
The use of kernel discrimination algorithms in virtual screening

University of Sheffield, UK [Slides]

Machine learning methods can be applied to virtual screening by analysing the structural and physical characteristics of a set of compounds of known activity. From this, models can be constructed that predict the likelihood of activity of unscreened compounds. The work here presents a Kernel-Discrimination (KD) machine learning method that allows the processing of compounds represented by multivariate binary, integer and real-valued descriptors. The KD method is applied to a selection of activity classes of the MDL Drug Data Report database of drugs, with a range of commonly used descriptors including binary fingerprints, holograms, molconnZ and a set of physical chemical property descriptors. The resulting enrichments of test set compounds are found to be consistently better than those achieved with a Support Vector Machine. The models of activity generated by the method are interpretable and can provide useful insights into the common structural and physical properties of the compounds that are active to the biological target in question. The n-dimensional model can be converted into n 1-dimensional models, enabling the relationships between the individual descriptor components and the likelihood of activity to be visualised.

Poster: Tim Cheeseright , Mark Mackey, Sally Rose and Andy Vinter
A General Method for Defining the 3D Active Site Requirements of GPCRs: CCK2 Ligands Example.

Cresset BioMolecular Discovery Ltd., Letchworth, UK

Structure-based drug design is widely accepted as a valuable tool to aid lead optimisation for targets with an x-ray structure. However, designing ligands for GPCR receptors presents a greater challenge due to the lack of good 3D structural data on the targets. Certain groups have found homology models based on rhodopsin useful, but there can be large errors in these model. We have therefore approached this problem from the ligand’s viewpoint.

Cresset’s molecular fields model the binding characteristics of a ligand. They have been extensively validated for virtual screening against a variety of targets. We have now applied fields to create a predictive ligand bioactive conformation model. Our hypothesis is that if a set of diverse ligands can adopt a conformation in which they all display the same field pattern, then this aligned template must be a hypothesis for the bioactive conformation.

We will show how the 2D structure of 3 diverse CCK2 ligands can be used to create a 3D field template for activity using the FieldTemplater software. To validate the model, a further set of diverse CCK2 ligands was fitted to the template using our FieldAlign software and a similarity value calculated. This similarity was found to correlate with activity, so providing further evidence to support the use of fields to model binding.

Poster: J.C. Dearden , M.T.D. Cronin, D.C. Lappin
A comparison of commercially available software for the prediction of pKa values

School of Pharmacy and Chemistry, Liverpool John Moores University, UK

There are now a number of commercially available software programs for the prediction of pKa values, but to date there has been no published comparison of their performance. Using a large test-set of 653 compounds, which included a large number of drugs and about 40 tautomeric compounds, we have examined the performance of ten such programs.

We found that the predictive ability of the programs varied widely, with the best having a mean absolute error (MAE) of prediction of 0.32 pKa units, and the worst having a MAE of 1.48 units. Only two of the programs were able to make predictions for all 653 compounds; the lowest number predicted was 610 compounds.

Poster: A. Demco , C.Saunders
Graph Kernels for Molecular and Reduced Graphs

ISIS Group, School of Electronics and Computer Science, University of Southampton, Southampton, UK

The development of structured kernel methods has had a significant impact on a number of domains and allows for a wide range of analysis including classification, clustering and ranking. In particular defining kernel functions between graphs provides the basis of an efficient algorithm to compare and classify molecular structures. Molecular data is naturally represented as a graph where nodes represent atom types and edges represent bond types.

Furthermore, reduced graphs can be derived from molecular graphs, and present an alternative graph-based representation. Graph kernels can be constructed in many different ways e.g. using features such as walks, cycles and trees. Here we develop two new graph kernels which extend walk-based graph kernels and allow soft matching and gaps to be considered. These extensions have particular application to molecular data because walks which do not have matching atom-labels but contain matching topological pharmacophore labels can be included but down-weighted appropriately. Here we test these kernels using known extensions such as coloring and non-tottering. Both of our extensions increase the flexibility and the applicability of graph kernels to structured data. Specifically here we show that the classification performance for both extensions on the MuTag and NCI-HIV datasets outperforms standard walk-based graph kernels.

Poster: Mark Hewitt , Judith C. Madden, Philip H. Rowe, Mark T. D. Cronin
(Q)SAR Models in Reproductive Toxicity: Prediction of Drug Transport Across the Human Placenta

Liverpool John Moores University, UK

The replacement of animal testing for endpoints including reproductive toxicity is a long-term goal. Techniques combining together in silico and in vitro approaches to form integrated testing strategies, are likely to be a productive alternative. This study describes the possibilities of using simple (quantitative) structure-activity relationships ((Q)SARs) to predict whether a molecule may cross the placental membrane. The concept is straightforward, if a molecule is not able to cross the placental membrane, then it will not be a toxicant to the developing foetus. Such a model could be placed at the start of any integrated testing strategy. To develop these models the literature was reviewed to obtain data relating to the transfer of molecules across the placenta. Clearance or transfer indices data were sought due to their ability to eliminate inter-placental variation by standardising drug clearance to the reference compound antipyrine. Modelling of the permeability data indicates that significant (Q)SARs can be developed for the ability of molecules to cross the placental membrane. Modelling indicated that molecular size, hydrophobicity and hydrogen-bonding ability are significant molecular properties governing the ability of a molecule to cross the placental barrier. Funding from the EU 6th Framework ReProTect Project (LSHB-CT-2004-503257) is gratefully acknowledged.

Poster: Robert W. Stanforth , Evgueni Kolossov and Boris Mirkin
Extending k-means clustering for non-linear QSAR modelling

IDBS, UK; Birkbeck College, UK

K-means clustering can be used to generate a non-parametric model of the shape of a QSAR training dataset in descriptor space. This cluster model is of proven use in applicability domain estimation and in the extraction of a representative test set.

In traditional K-means clustering each cluster is represented by its gravity centre (mean) in descriptor space. We extend this by incorporating the training set’s experimental activity values in the clustering. In so-called regression-wise K-means, the cluster representation involves a linear regression on the cluster, promoting coherent clusters within which activity has linear dependence on the descriptors.

We have investigated two ways of applying regression-wise clustering to QSAR modelling. Firstly, using regression-wise K-means directly will partition descriptor space into regions in which different linear descriptor-activity relationships apply. The ensuing ‘piecewise linear’ model can achieve a good fit of the training data with fewer descriptors than a global linear model, and still maintains predictive quality. The clustering also has the potential to provide insight into regions of descriptor space on which the activity results from different biological mechanisms.

Secondly, adjusting the regression-wise K-means algorithm to operate within certain constraints was discovered to be equivalent to augmenting the linear model with extra descriptors. These additional descriptors introduce non-linear components into the model, and have a satisfactory interpretation as capturing localised phenomena in descriptor space.