Spring Meeting 2010

Newhouse, Scotland

The event started with the chance to attend a conference dinner and suitably provoking lecture on the evening of the 12th May 2010 from Prof. Andrew Hopkins, University of Dundee.

The main session was held on the 13th May with Prof. Graeme Milligan from the University of Glasgow on approaches for GPCR drug discovery and insights into his latest research, followed by Ben Tehan from Heptares talking about how stabilised GPCRs can be used for Structure-Based Drug Design and Fragment-based Screening.

Prof. Malcolm Walkinshaw was talking about Finding and Filling Allosteric pockets and the programs developed to support such at the University of Edinburgh.

For those of a small molecule mind Dr Scott Cockroft, also from Edinburgh, gave a talk on non-covalent interactions; whilst Rob Brown and Willem van Hoorn both now at Accelrys spoke about QSAR model applicability and Pareto-based library design, and John Mitchell (St Andrews) reviewed available methods for solubility predictions and revealing their relative performance.


  • Robert Brown (Accelrys, UK and USA)
    Quantifying Model Errors Using Similarity to Training Data [Slides] [Abstract]
  • Scott L. Cockroft (University of Edinburgh, UK)
    The interplay of geometry and solvation on non-covalent interactions [Abstract]
  • Willem van Hoorn (Pfizer, Sandwich, UK; Accelrys, UK)
    Fast multi-objective design of combinatorial libraries using Pareto ranking [Abstract]
  • Andrew Hopkins (University of Dundee, UK)
    The future of drug design? What can computational chemistry contribute to drug innovation [Abstract]
  • Graeme Milligan (University of Glasgow, UK)
  • John Mitchell (University of St. Andrews, UK)
    Computational Prediction of Aqueous Solubility [Slides] [Abstract]
  • Ben Tehan (Heptares Therapeutics, UK)
    SBDD and Biophysical Screening for GPCRs [Slides] [Abstract]
  • Malcolm Walkinshaw (University of Edinburgh, UK)
    Finding and Filling Allosteric Pockets [Abstract]


Presentation: Robert Brown , Dana Honeycutt, Sarah Aaron
Quantifying Model Errors Using Similarity to Training Data

Accelrys, UK and USA [Slides]

When making a prediction with a statistical model, it is not sufficient to know that the model is “good”, in the sense that it is able to make accurate predictions on test data. It is also important to ask: How good is the model for a specific sample whose properties we wish to predict? Stated another way: Is the sample within or outside the model’s domain of applicability or what is the degree to which a test compound is within the model’s domain of applicability? A variety of studies have been done on determining appropriate measures to address this question [1-4].

In this talk we focus on a derivative question: Can we determine an applicability domain measure suitable for deriving quantitative error bars – that is, error bars which accurately reflect the expected error when making predictions for specified values of the domain measure? Such a measure could then be used to provide an indication of the confidence in a given prediction (i.e. the likely error in a prediction based on to what degree the test compound is part of the model’s domain of applicability).

Ideally, we wish such a measure to

  • be simple to calculate and to understand,
  • apply to models of all types — including classification and regression models for both molecular and non-molecular data
  • be free of adjustable parameters.


Consistent with recent work by others [5-6], the measures we have seen that best meet these criteria are distances to individual samples in the training data. In this talk we will describe our attempts to construct a recipe for deriving quantitative error bars from these distances.


  1. “Methods for Reliability and Uncertainty Assessment and for Applicability Evaluations of Classification- and Regression-Based QSARs,” L. Eriksson, J. Jaworska, A.P. Worth, M.T.D. Cronin, R.M. McDowell, P. Gramatica, Environmental Health Perspectives, 111, 1361 (2003).
  2. “The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models,” A. Tropsha, P. Gramatica, V.K. Gombar, QSAR Comb. Sci., 22, 69 (2003).
  3. “QSAR applicabilty domain estimation by projection of the training set descriptor space: a review,” J. Jaworska, N. Nikolova-Jeliazkova, T. Aldenberg, Altern. Lab Anim. 33, 445-59 (2005).
  4. “A Measure of Domain of Applicability for QSAR Modelling Based on Intelligent K-Means Clustering,” R.W. Stanforth, E. Kolossov, B. Mirkin, QSAR & Combinatorial Science, 26, 837 (2007).
  5. “Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR,” R.P. Sheridan, B.P. Feuston, V.N. Maiorov, and S.K. Kearsley, J. Chem. Inf. Comput. Sci., 44, 1912 (2004).
  6. “Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models,” H. Dragos, M. Gilles, V. Alexandre, J. Chem. Inf. Comput. Sci., 49, 1762 (2009).


Presentation: Scott L. Cockroft
The interplay of geometry and solvation on non-covalent interactions

University of Edinburgh, UK

Supramolecular systems provide unique opportunities for studying chemical and biological processes.

Detailed study of protein-ligand interactions in biological systems is often hindered due to the presence of complicated arrays of interactions featuring multiple molecular contacts and solvent molecules. Furthermore, the precise geometry of an interaction of interest is hard to determine in conformationally dynamic biomolecules. Such complications can be side-stepped using the tools provided by organic synthesis and physical organic chemistry[1]. The use of synthetic supramolecular complexes[2] and two-state folding molecules[3,4] for the systematic quantification of non-covalent interactions will be described. I will show how the information gleaned from these studies can be used to develop our theoretical understanding of non-covalent interactions and solvation.


    1. Chemical double-mutant cycles: Dissecting non-covalent interactions. S. L. Cockroft, C. A. Hunter. Chem. Soc. Rev. (2007) 36, 172-188
    2. Electrostatic control of aromatic stacking interactions. S. L. Cockroft, C. A. Hunter, K. R. Lawson, J. Perkins, C. J. Urch. J. Am. Chem. Soc. (2005) 127, 8594-8595
    3. Desolvation tips the balance: Solvent effects on aromatic interactions.

S. L. Cockroft, C. A. Hunter, Chem. Commun. (2006) 36, 3806-3808 (cover article)

    1. The interplay of solvent and substituent effects on edge-to-face aromatic interactions.

S. L. Cockroft, C. A. Hunter (2008) submitted in response to F. Diederich et al. Chem. Commun. (2008)

    1. Modular multi-level circuits from immobilized DNA-based logic gates.

B. M. Frezza, S. L. Cockroft, M. R. Ghadiri. J. Am. Chem. Soc. (2007), 129, 14875-14879. Featured in: Nature

    1. A single-molecule nanopore device detects DNA polymerase activity with single-nucleotide resolution.

S. L. Cockroft, J. Chu, M. Amorin, M. R. Ghadiri. J. Am. Chem. Soc. (2008), 130,818-820 Featured in: Nature | Nature Nanotechnology | ACS Chemical Biology


Presentation: Willem van Hoorn
Fast multi-objective design of combinatorial libraries using Pareto ranking

Pfizer, Sandwich, UK; Accelrys, UK

Pareto sorting is a well-established method for problems that require a trade-off between conflicting objectives. A typical example from medicinal chemistry would be that an increase of the potency of a series of compounds is often accompanied by decreasing solubility and/or metabolic stability. In practise the ideal compound often does not exist, there is no compound that is very potent and very soluble and very stable. Pareto ranking can help identifying the compound or compounds that are best compromises of the conflicting attributes. Pareto ranking is based on the concept of dominance: if compound A is more potent and more soluble and more stable than compound B it is dominating compound B. If two compounds are equipotent, and one is more soluble but the other is more stable they do not dominate each other. The group of compounds that is not dominated by any other compound are the best compromises and together they form the Pareto frontier. The group of compounds that are dominated only by the frontier compounds form the next front, etc. Pareto ranking is a N2 problem, doubling the number of compounds to rank increases the length of the calculation by a factor of four. Pareto ranking becomes intractable for larger N.

Combinatorial chemistry is an efficient method to make many compounds from a limited number of reagents. For instance, 100 amines and 20 acids can form 2000 different amides. This collection of 2000 amides forms a virtual library: compounds that could be made given the available starting material and a validated reaction protocol. The size of a virtual library can easily run into the tens of millions, it is impossible to make, purify, store and put all of these through a biological assay. At Pfizer as at most other pharmaceutical company, models have been generated to predict activity, solubility, metabolic stability, etc. In this presentation it is shown how these models can be applied together with Pareto ranking to identify the best (predicted) compromises in large virtual libraries. Two approaches will be shown to speed up the Pareto ranking so that even very large libraries can be processed.

Presentation: Andrew Hopkins
The future of drug design? What can computational chemistry contribute to drug innovation

University of Dundee, UK

The pharmaceutical industry has witnessed a decline of $593 billion (65%) of it market capitalization over the past 8 years. With further falls in revenues expected as the patents on many key blockbuster drugs expire over the next few years, the industry’s response has been cost cutting, job losses and move towards biological therapeutics. Given the financial environment how can computational chemistry and chemoinformatics contribute to innovation where efficiency and effectiveness are of increasing importance. We will open the conference with a lecture that will explore the scientific challenges facing computational chemistry and the opportunities presented by recent advances. In particular we will focus on how computational approaches can work in tandem with experimental methods to impact on improving the efficiency and effectiveness of drug discovery.


Presentation: Graeme Milligan

University of Glasgow, UK


Presentation: John Mitchell
Computational Prediction of Aqueous Solubility

University of St. Andrews, UK [Slides]

This talk will discuss both empirical informatics methods, such as QSPR and machine learning, and also physics-based theoretical approaches involving quantum chemistry, thermodynamic cycles and molecular simulation. The presentation will contrast the two kinds of approach, discussing the strengths and weaknesses of each, and their different areas of applicability. Also included is a discussion of how successful current methods really are.

Presentation: Ben Tehan
SBDD and Biophysical Screening for GPCRs

Heptares Therapeutics, UK [Slides]

G protein-coupled receptors (GPCRs) constitute a significant part of the pharmaceutical industry portfolio, unfortunately GPCRs as a target class are notoriously difficult to prosecute. The fact they are membrane bound combined with their conformational flexibility, heterogeneity and instability in detergents makes them particularly hard to crystallize and limits their use in screening applications. Also as there are currently very few crystal structures of GPCRs it limits the in-silico techniques and reliability of these techniques that can be carried out.

We here at Heptares have developed a proprietary technology that expedites the study of GPCRs by dramatically stabilizing these important receptors outside of the cell membrane. This proprietary technology enables us to make purified, stabilized and functional GPCRs known as StaRs or Stabilized Receptors.

The StaR construction technology, their uses and a case study will be presented demonstrating their applicability in both biophysical screening applications and in-silico design.

Presentation: Malcolm Walkinshaw , Douglas Houston, Kun-Yi Hsin, Wissam Mehio, Steven R. Shave, Paul Taylor
Finding and Filling Allosteric Pockets

University of Edinburgh, UK

We have developed a set of computational tools and procedures that aid ligand discovery. Our binding site identification algorithm (STP) uses surface atom properties to identify additional ligand binding sites or allosteric pockets. The central relational database EDULISS holds structural, physicochemical and pharmacophoric properties of over 5 million commercially available compounds, and its versatile web interface (available at, allows easy web access.

In addition to routine compound selection we can select compounds on the basis of a geometrical distribution of atom or group types. Additional shape and distance descriptors are held in pre-calculated bit strings which permit fast and efficient searching and our efficient shape similarity program (UFSRAT: Ultrafast Searching with Atom Types) has been integrated within the database system. Our docking and mining program (LIDAEUS) uses a multiconformer version of the database and with an efficient parallelized implementation can dock and rank 5M compounds in under 10 hours. Examples of the use of these programs in various ligand discovery projects will be presented.