Autumn Meeting 2008

Dotmatics and Cresset BioMolecular Discovery, UK
BioPark Hertfordshire, Welwyn Garden City, Herts., AL7 3AX, UK

Dotmatics and Cresset look forward to welcoming you to the Autumn 2008 UK QSAR and Chemoinformatics meeting in November. We have an excellent line-up of speakers, so it promises to be an interesting and lively day.


  • Craig Bruce (University of Nottingham)
    QSAR modeller seeks meaningful relationship [Slides] [Abstract]
  • Ijen Chen (Vernalis Ltd., Cambridge, UK)
    Design of fragment libraries – Status and challenges [Slides] [Abstract]
  • Richard Jackson (Institute of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK)
    A new probabilistic model for binding site similarity analysis: Applications in understanding ligand cross-reactivity and the functional classification of the protein kinase family [Slides] [Abstract]
  • James Lumley (Arrow Therapeutics)
    NMR and Virtual Screening for Hepatitis-C NS5b Polymerase Inhibitors [Slides] [Abstract]
  • Mark Mackey (Cresset BioMolecular Discovery)
    Bioisosteric replacement using molecular fields [Slides] [Abstract]
  • Chris Murray (Astex Therapeutics)
    Group efficiencies: A simple way of monitoring the optimisation of fragments into leads [Abstract]
  • Dan Ormsby (Dotmatics Limited, 1-2 Thorley Hall Stables, Bishop’s Stortford, Herts. CM23 4BE)
    Informatics tools for SAR analysis [Abstract]
  • Martin Slater (BioFocus DPI)
    Field based library design for kinase and ion channel targets [Abstract]
  • Mike Sternberg (Imperial College)
    Logic-based drug discovery [Abstract]


Presentation: Craig Bruce , S.D. Pickett, J.D. Hirst
QSAR modeller seeks meaningful relationship

University of Nottingham [Slides]

A good QSAR model comprises several components. Predictive accuracy is paramount, but it is not the only important aspect. In addition, one should apply robust and appropriate statistical tests to the models to assess their significance or the significance of any apparent improvements. The real impact of a QSAR, however, perhaps lies in its chemical insight and interpretation, an aspect which is often overlooked.

Any insight into the relationship between descriptors and structure can be used to further our understanding, but obtaining this insight is not always as straightforward as calculating predictive accuracy. Interpretation is dependent on the classifier. For example, a decision tree is simple to interpret, but does not produce the most predictive models. Similarly, support vector machines offer excellent predictive capability, but generate a model that is difficult to interpret.

Random forest offer the predictive capability of support vector machines but the interpretation is complicated by the ensemble of trees. Therefore, to obtain useful interpretation from a random forest we have employed a selection of tools. We present some of these tools applied to literature data sets.

Presentation: Ijen Chen , Rod Hubbard
Design of fragment libraries – Status and challenges

Vernalis Ltd., Cambridge, UK [Slides]

Fragment-based methods have become established over the past ten years as a powerful approach in structure-based lead discovery, with a number of compounds now entering clinical trials (1-3). The recent successes have led to the methods being adapted to varying degrees within most pharmaceutical companies. The development of fragment based methods has relied upon two innovations. The first is improvements in (usually biophysical) methods for detecting the weak, often mM binding of compounds to a target. The second are the methods for hit expansion that evolve such weak hits into compounds that are on scale in target binding assays.

As with any screening approach, the design of the fragment library is crucial. If the library does not contain appropriate compounds, it will result in no hits. However, the design is also constrained by the methods used to detect binding and how the fragments are going to be used. This presentation will review the general issues associated with designing libraries for fragment screening, including an overview of the approaches adopted in different companies (4). In addition, I will briefly summarise our experience of utilising fragments for drug discovery projects.


  1. Hubbard et al., Curr Top Med Chem. 2007, 7(16):1568-81
  2. Hajduk and Greer, Nat Rev Drug Discov. 2007, 6(3):211-9
  3. Congreve et al., J Med Chem 2008, 51(13):3661-80.
  4. Hubbard et al., Curr Opin Drug Discov Devel. 2007, 10(3):289-97


Presentation: Richard Jackson
A new probabilistic model for binding site similarity analysis: Applications in understanding ligand cross-reactivity and the functional classification of the protein kinase family

Institute of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK [Slides]

The large-scale comparison of protein-ligand binding sites is problematic, in that measures of structural similarity are difficult to quantify and are not easily understood in terms of statistical similarity that can ultimately be related to structure and function. We present a binding site matching score the Poisson Index (PI) based upon a well-defined statistical model1. PI requires only the number of matching atoms between two sites and the size of the two sites-the same information used by the Tanimoto Index (TI), a comparable and widely used measure for molecular similarity. Despite the difficulty of determining a biological ‘ground truth’ for binding site similarity we conclude that PI is a suitable measure of binding site similarity and could form the basis for a binding site classification scheme comparable to existing protein domain classification schema.

We have recently undertaken a large-scale comparison of protein kinase ATP-binding sites. This has allowed us to discover binding site similarity in different sub-families of protein kinase that are not evident from sequence similarity alone. We propose a relevant classification of the protein kinase family based on the similarity of their binding sites. Not only does this classification highlight features that are important for the potency and selectivity of kinase inhibitors, but it also allows us to rationalise cross-reactivity among the protein kinases.

  1. Davies J.R, Jackson R.M, Mardia K.V, Taylor C.C. The Poisson Index: a new probabilistic model for protein ligand binding site similarity. Bioinformatics , 2007, 23: 3001-3008.


Presentation: James Lumley
NMR and Virtual Screening for Hepatitis-C NS5b Polymerase Inhibitors

Arrow Therapeutics [Slides]

Early exploitation of Molecular Modeling techniques on novel biological targets with limited apo protein structure data, but little or no chemical ligand information can be a challenging task. The results are presented for a small 1D NMR screen run in early 2004 against such a target, the isolated HCV NS5b polymerase protein. Ligand information gained from this NMR experiment in 2004 is contrasted against the 2008 patent literature to highlight the power of the NMR technique for selecting novel chemistry. The various computational methods used to design and exploit the NMR experiment are discussed with an emphasis on the complexities of trying to algorithmically predict protein-ligand binding modes.

Presentation: Mark Mackey
Bioisosteric replacement using molecular fields

Cresset BioMolecular Discovery [Slides]

The use of molecular fields for comparing and scoring molecules and for virtual screening is a well-established technique for finding new series of active molecules. However, in many cases it would be useful to get suggestions on ways to modify existing series. Tools for finding ring replacements or bioisosteres have often focussed on matching fragment properties. However, the properties of the fragments often depend on their environment. We have solved this problem by developing a method for locating bioisosteric replacements and scoring them in product space using molecular field similarity. The effectiveness of this method will be demonstrated on sets of Cox-2 and CCR5 actives.

Presentation: Chris Murray
Group efficiencies: A simple way of monitoring the optimisation of fragments into leads

Astex Therapeutics

Kuntz et al. were the first to suggest that normalising potencies with respect to molecular size was a good way of monitoring fragment or hit progression through the discovery process. Hopkins et al. actually applied the method in a drug discovery context and coined the term ligand efficiency. At Astex, we routinely use ligand efficiency to compare fragments and assess their progressability. Problems and issues associated with the use of ligand efficiencies are discussed.

Recently Verdonk and Rees have proposed the use of group efficiencies to judge the value of particular changes made to molecules. The approach relies on deriving “normalised” group contributions between pairs of molecules (i.e., a normalised Free-Wilson analysis) and assessing whether the changes are useful. The method has proved popular with medicinal chemists in assessing the “value” of disparate changes to the structure. At Astex we typically generate hundreds of experimentally determined protein-ligand complexes on targets of interest. Recently we have used these structures together with the associated ligand potencies to automatically derive group efficiencies and provide a molecular visualisation for medicinal chemists. The approach adopted will be discussed.

Presentation: Dan Ormsby
Informatics tools for SAR analysis

Dotmatics Limited, 1-2 Thorley Hall Stables, Bishop’s Stortford, Herts. CM23 4BE

Despite the development of many computational techniques for the analysis of activity relationships within a series of compounds, there still remains a need to relay these results as simple visualisations to teams of Medicinal Chemists. A new platform for presenting chemistry rich data will be presented.

This software allows the interactive display of chemical structures, physiochemical properties and categorical information. Existing datasets can be further enriched with calculated properties from in-built or external methods such as web services. We will demonstrate the use of interpretable fingerprints to elucidate the structural underpinning of activity relationships. Novel visualisation and analysis methods can be implemented via the Python scripting interface within the software.

Presentation: Martin Slater
Field based library design for kinase and ion channel targets

BioFocus DPI

GPCRs, Ion channels and Kinase enzymes have been key areas of focus for library design at BioFocusDPI. Protein family specific SAR, chemogenomic information and, where applicable, X-ray structure data have been employed to enable rational design of ligands. We have recently applied Cresset’s Field based technology, this time a ligand centric approach, to compliment our existing technologies. Two library design case studies: FieldFocusTM ion channel (FFI-TM) Library design and evolution of the FieldFocusTM Kinase (FFK-TM) library design concept will be described.

Presentation: Mike Sternberg , Ata Amini, David Gough, Paul Shrimpton,Human Lohdi,Stephen H Muggleton
Logic-based drug discovery

Imperial College

This talk will present several applications of logic-based machine learning to drug discovery. Over the last few years we developed an approach which combines logic-based inductive logic programming (ILP) with its benefit of comprehensible rules, with regression to derive quantitative structure-activity relationships. The first set of studies combined ILP with support vector machines in an approach known as SVILP. Studies on SVILP include developing a quantitative structure activity relationship for thermolysin inhibitor, prediction of toxicology (1) and the generation of a system specific scoring function for protein-ligand interactions (2). These concepts formed the basis of the logic-based drug discovery approach being used in Equinox Pharma – INDDEx. At Equinox, INDDEx has been applied to ligand-based in silico screening to identify chemically- novel hits against two GPCR receptors. Benefits of the INDDEx approach include the capacity to exploit information from a large number of positive and negative examples and the lack of a requirement for molecular superposition. INDDEx also provides chemically-understandable rules explain the basis for activity which could assist in a hit to lead programme.


  1. Amini, A., Muggleton, S.H., Lodhi, H. and Sternberg, M.J. (2007) A novel logic-based approach for quantitative toxicology prediction, J Chem Inf Model, 47, 998-1006.
  2. Amini, A., Shrimpton, P.J., Muggleton, S.H. and Sternberg, M.J. (2007) A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming, Proteins, 69, 823-831.