Autumn Meeting 2010

Hinxton, Cambridge

The UKQSAR and ChemoInformatics Group Autumn 2010 Meeting was held on 12th October at the Wellcome Trust Conference Centre, Genome Campus at Hinxton near Cambridge

We have a diverse and interesting programme arranged. One session will be on “not for profit” initiatives and includes a talk by Tom Heightman from SGC on the building of chemogenomic platforms for epigenetic protein families, a talk on workflow tools by Mike Bodkin from Lilly and uses of chemogenomics databases from EBI. In addition, there will be talks on the application of QSAR and pharmacophore methods from GSK and Lilly, a comparison of fingerprint methods carried out at Schrodinger and Andreas Bender from the Unilever Centre will talk about modeling using proteochemometric approaches.


  • Nick Barton (GSK, UK)
    A Sparse Approach to Lead Optimisation [Slides] [Abstract]
  • Andreas Bender (Unilever Centre, University of Cambridge)
    From Single-Target Models to Multiple-Target Models – Extrapolating in Target Space Using Proteochemometrics Approaches [Slides] [Abstract]
  • Patricia Bento (EMBL-EBI, UK)
    Exploring the property space of bioactive peptides [Slides] [Abstract]
  • Jas Bhachoo (Schrodinger, UK)
    Large-scale systematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments [Slides] [Abstract]
  • Mike Bodkin (Lilly, UK)
    A Stitch in KNIME Saves Nine: Strategies for Design in Medicinal Chemistry [Slides] [Abstract]
  • Tom Heightman (Structural Genomics Consortium, Oxford University)
    Drugging the Epigenome: Protein Family Systematic Chemical Biology [Abstract]
  • Anne Hersey (EMBL-EBI, UK)
    ChEMBLDB – A Resource for Drug Discovery [Slides] [Abstract]
  • Juliette Pradon (Lilly, UK)
    A New Approach for Chemotype Enrichment in Ligand Based Design [Slides] [Abstract]


Presentation: Nick Barton
A Sparse Approach to Lead Optimisation

GSK, UK [Slides]

Experimental Design (DOE) is a well established technique for the efficient optimisation of many types of processes. Here, the application of a particular form of DOE to design ‘Sparse’ chemical arrays in the context of a Lead Optimisation campaign will be explored. The programme background, along with the rationale for the design and synthesis of a Sparse two dimensional array will be described. Analysis of the resulting measured biological data led to the generation of a robust QSAR model and high quality predictions for additional follow-up compounds. The Sparse array enabled the programme to conclude that the chemical space had been thoroughly exploited and support project transition for the series.

Presentation: Andreas Bender
From Single-Target Models to Multiple-Target Models – Extrapolating in Target Space Using Proteochemometrics Approaches

Unilever Centre, University of Cambridge [Slides]

The early phases of drug discovery often employ in silico models to rationalize structure activity relationships and to predict the activity of novel compounds. However, the predictive performance of these models is not always acceptable and the reliability of prospective predictions – both to novel compounds and to related protein targets – is in many cases limited. In this work we present a large-scale validation of a prospective proteochemometric model on inhibitors of 14 HIV reverse transcriptase (HIV RT) mutants. A second model we generated was built on ligands of the Adenosine receptor subtypes and validated on a smaller dataset.

We will show that activities against novel mutants can be predicted successfully in many cases, and discuss approaches to employ information gained from the model to devise chemicals with the optimal activity profile against a desired set of protein sequences. The approach is generally applicable for a set of closely related targets and thus it presents a method which can be applied also to a variety of novel datasets.

Presentation: Patricia Bento
Exploring the property space of bioactive peptides

EMBL-EBI, UK [Slides]

Peptides are the most important single chemical class of bioactive compounds. They represent a significant fraction of the drug market and are used therapeutically in a wide range of diseases including diabetes, obesity, cancer, hypertension, neurodegenerative diseases and as antibiotics and antiviral agents. In this talk, we will explore the property space of bioactive peptides as well as of their constituent amino acids, and map the diversity of a large set of synthetically accessible range of peptides.

Presentation: Jas Bhachoo , Madhavi Sastry, Jeffrey F. Lowrie, Steven L. Dixon, Woody Sherman,
Large-scale systematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments

Schrodinger, UK [Slides]

A systematic virtual screening study on eleven pharmaceutically relevant targets has been conducted to investigate the interrelation between 8 2D fingerprinting methods, 13 atom-typing schemes, 13 bit scaling rules, and 12 similarity metrics using the new cheminformatics package Canvas. In total, 157,872 virtual screens were performed to assess the ability of each combination of parameters to identify actives in a database screen.

In general, fingerprint methods such as MOLPRINT2D, Radial, and Dendritic that encode information about local environment beyond simple linear paths outperformed other fingerprint methods. Atom-typing schemes with more specific information turned out to be generally superior to more generic atom-typing schemes. Enrichment factors across all targets were improved considerably with the best settings, although no single set of parameters performed optimally on all targets. The size of the addressable bit space for the fingerprints was also explored and it was found to have a substantial impact on enrichments.

Presentation: Mike Bodkin
A Stitch in KNIME Saves Nine: Strategies for Design in Medicinal Chemistry

Lilly, UK [Slides]

The talk provides an introduction in the application of the Knime workflow tool through a spectrum of computational medicinal chemistry scenarios. So given a literature dataset, how similar are the compounds? How can I build and cross-validate QSAR models? Execute a simple matched-pair analysis? Perform molecular dockings, virtual library enumeration, de novo and multi-objective design? Knime provides a drag & drop opensource workbench to bring together best-of-breed tools from pharma, academia and the software houses in a common environment suitable for use by computational and medicinal chemists a-like.

Presentation: Tom Heightman
Drugging the Epigenome: Protein Family Systematic Chemical Biology

Structural Genomics Consortium, Oxford University

Working within a family of related proteins allows scientists to apply learnings from one member to accelerate chemistry and biology research on related targets. This approach has enhanced efficiency in a number of protein families including kinases, where the accumulation of protein structures, small molecule structure- activity relationships and cellular assay techniques over more than a decade provides a platform for the rapid generation of new leads, an understanding of selectivity determinants, and an increasingly integrated understanding of cellular signalling pathways.

Many of the epigenetic proteins that effect or detect post- translational histone modifications on histones are only recently identified, and so a body of biological and chemical data has not yet accumulated. This presents an opportunity for high throughput protein production and crystallography techniques to be applied in concert with chemistry and cellular biology to rapidly and systematically explore new epigenetic protein families. In this talk I will review our progress on building chemogenomic platforms for bromodomains and histone lysine demethylases, including structural biology, development of biophysical and biochemical assays, knowledge-based hit identification chemistry, and cellular assays.

Presentation: Anne Hersey
ChEMBLDB – A Resource for Drug Discovery

EMBL-EBI, UK [Slides]

Large amounts of data are now available to researchers in publicly accessible databases such as the ChEMBL database ( chembldb) which brings together chemical and bioactivity data. This enables researchers to easily identify compounds tested on specific biological targets or the bioactivity data on compounds with specific structures or substructures. The focus of this talk will be an analysis of the information that can be obtained from ChEMBL, including potency, selectivity and ADMET properties.

Presentation: Juliette Pradon
A New Approach for Chemotype Enrichment in Ligand Based Design

Lilly, UK [Slides]

Literature suggests that pharmacophore methods can often be used in virtual screening for the discovery of novel scaffolds in a cost-effective manner. However, just how well such programs can identify ‘actives’ of novel chemotype and how this can be modulated is poorly characterised. In this presentation two commercially available pharmacophore programs, Phase from Schrodinger and MOE, were employed for a retrospective virtual screening study on the DUD (Directory of Useful Decoys) [1] and MUV (Maximum Unbiased Validation) [2] datasets. Two metrics, the enrichment factor and the Boltzmann-enhanced discrimination of receiver operating characteristic (BedROC) [3], were used to compare the performance of these pharmacophore programs in addressing the issues of early enrichment as well as chemotype enrichment. A novel strategy for chemotype enrichment will be reported.


    1. N. Huang et al., J. Med. Chem., 2006, 49 (23), 6789-6801.
    2. S. G. Rohrer and K. Baumann, J. Chem. Inf. Model., 2009, 49 (2), 169-184.
    3. J.-F. Truchon and C. I. Bayly, J. Chem. Inf. Model., 2007, 47, 488-508.