UKQSAR Spring 2017 Meeting – Talk and Poster Abstracts

Talk Abstracts

Innovation in Small-Molecule-Druggable Chemical Space: Where are the Original Modulators of New Targets and New Bioactive Molecular Frameworks Published? A Comparison between Patents and Scientific Literature (Stephanie Ashenden, University of Cambridge)

The publication of novel small molecules, along with their associated targets, has understandably increased over time due the discovery of a greater number of druggable targets, as well as constantly improving technology such as high-throughput screening (HTS). In this study we have analysed bioactive structures and their associated targets, comparing them between patents and scientific literature that are accessible through the ChEMBL and GOSTAR databases. To assess whether the target class or year has affected when and where target annotations (bioactive compound and its associated target) are first published the data has been binned. It was found that in the majority of cases, the first bioactive compound associated with a particular target is published in scientific literature prior to being published in patents. Furthermore, these compound-target annotations are often published in both scientific literature (44%) and within patents or only in scientific literature (53%) rather than patents only. The novelty of compounds and their frameworks of varying complexities have also been investigated. It shows that structures tend to be primarily published in patents. Additionally, they tend to be published solely in scientific literature (49%) or only patents (47%) rather than in both scientific literature and patents. The same trends have been found when analysing the molecular and topological frameworks.

The publication of novel small molecules, along with their associated targets, has understandably increased over time due the discovery of a greater number of druggable targets, as well as constantly improving technology such as high-throughput screening (HTS). In this study we have analysed bioactive structures and their associated targets, comparing them between patents and scientific literature that are accessible through the ChEMBL and GOSTAR databases, to observe where the original modulators of new targets and new bioactive molecular frameworks are first published. A systematic analysis of the sources of novel bioactive compounds over time, as well as their associated molecular scaffolds may aid the design of new biologically targeted chemical libraries. This is because the design and synthesis of analogues of existing bioactive compounds is often used as a strategy to maintain a compound collection.

Damn the compass, full steam ahead: Decision theory and drug discovery (Jack Scannell, UBS / Edinburgh University / CASMI)

There is an ugly contrast at the heart of the modern biomedical research and development effort. The technologies that most people believe are important for drug discovery have become hundreds, thousands, in some cases billions of times more cost efficient. On the other hand, the amount spent by the drug industry and academia per new drug brought to market has increased nearly 100 fold since 1950, in real terms. Furthermore, drugs are more likely to fail in clinical trials now than in the 1970s. There are two broad classes of explanation for the ugly contrast. Either (1) “low hanging fruit problem” (which is progressive and intractable) or (2) the industry has progressively adopted less productive R&D methods. We have used decision theory to represent drug R&D as a formal search process. We think this sheds light on the contrast. R&D efficiency appears surprisingly sensitive the validity of screening and disease models (analogous to the quality of a boat’s compass) and surprisingly insensitive to brute-force efficiency (analogous to the size of a boat’s engine). We suspect that the validity of the stock of screening and disease models has declined over time for two reasons. First, the best models yield cures, so are retired. This leaves the less valid models for the as-yet-untreated diseases. Second, there has been uncritical enthusiasm for highly reductionist models that lack validity.  The modern biomedical research effort is therefore like a boat with a huge engine but bad compass. It spends too much time heading at great speed in the wrong direction.

Overcoming Psychological Barriers to Good Decision-Making in Drug Discovery (Matt Segall, Optibrium)

Better individual and team decision-making could enhance drug discovery performance. Reproducible biases effecting human decision making, known as cognitive biases, have been understood by psychologists for at least half a century. These threaten objectivity and balance and so are credible causes for continuing unpleasant surprises in late development and high operating costs of compound discovery. We will consider the risks to R&D decision-making for four of the most common and insidious cognitive biases: confirmation bias, poor calibration, availability bias and an excess focus on certainty. We will suggest approaches for overcoming these, such as strategies adapted from evidence-based medicine and computational tools that seek to guide the decision making process. These include methods for multi-parameter optimisation that encourage objective consideration of all of the available information and explicit consideration of the impact of uncertainty in drug discovery.

Impact of Physicochemical Properties on Oral Drug Dose & Hepatotoxicity (Paul Leeson, Paul Leeson Consulting)

The importance of dose as compound quality metric is examined.  The effect of physicochemical properties on oral drug dose is surveyed, showing that acids have higher mean dose than other ion classes.  Compounds with doses in the upper 20-40 percent range show significantly reduced size and lipophilicity versus lower doses, whereas other properties such as polar surface area or Fsp3 are not influenced by dose.  In the upper dose 20% range, only 9/369 drugs have Mol Wt  >400 and cLogP >4, emphasising the need to achieve low doses in candidate drugs in this space, which is commonly employed in pharmaceutical patents.  The most hepatotoxic oral drugs (from Chen et al, Drug Disc Today 2016, 21, 648) were found to be differentiated from the least hepatotoxic by dose, ion class, where acids are most hepatotoxic, lipophilicity and surprisingly, Fsp3.  A model derived from recursive partition successfully predicts 18/21 candidates which failed in development due to human hepatotoxicity.’  Finally, dose is used in combination with lipophilicity and solubility criteria in GSK’s proposed candidate quality guidance (Bayliss et al, Drug Disc Today 2016, 21, 1719).

Rational Design of Multi-target ligands at A1R, A2AR and PDE10A with Therapeutic Potential for Neurodegenerative Diseases (Leen Kalash, University of Cambridge)

The adenosine neuromodulation system (A1R and A2AR) has been identified as a key target for the management of neurodegenerative diseases,1,2  and recent findings suggest that PDE10A also plays a role in Parkinson’s, Huntingdon’s disease, and schizophrenia.3-5 Therefore, there is therapeutic potential for multi-target ligands against A1R, A2AR and PDE10A in neurodegenerative diseases. We describe the design of A1R/A2AR/PDE10A multi-target ligands following a retrosynthetic approach employing in silico target prediction and docking. Triazoloquinazolines were predicted to show activity at A1R, A2AR and PDE10A, and were validated experimentally as A1R/A2AR-PDE10A multi-target ligands. Six known PDE 10 inhibitors were initially evaluated pharmacologically using yeast screening platform, as well as CHO-K1 cells expressing the adenosine receptors. This identified three as being selective for the A2AR. They have been shown to result in elevation of cellular cAMP levels with pEC50 values of 7.6 ±0.2, 7.2 ±0.4, and 6.4 ±0.5 (n=5), as well as mediating ERK1/2 activation in an A2AR dependent manner. Hence, we have developed a computational strategy for designing multi-target ligands, which has been validated experimentally, and may be generally applicable to multi-target compound design at other disease/target classes.

Mapping the 3D Structures of Small Molecule Binding Sites (Josh Meyers, ICR)

Effective sampling of chemical space is crucial to the future of small molecule drug discovery. Compound library design has historically focused on chemical structure analysis. However, the increasing availability of structural data, together with new protein binding site analysis tools, has enabled more analysis of protein binding sites than ever before. A shift in the way we navigate chemical space towards a more binding site centric approach will enable us to find small molecule hits for a wider range of targets in future screening campaigns (Pertot et al , Drug Discov. Today 2010, 15 (15-16), 656–667).

We have developed a method for generating a similarity ‘map’ of biologically-relevant binding site space (Meyers et al, J. Cheminf. 2016, 8 (70)). Application of the geometric pocket detection tool, fpocket, enables consideration of known binding sites and also potential binding sites that have not, as yet, been shown to bind small molecule ligands. Binding site comparison (BSC) utilising both structure and sequence-based methods calculate the degree of 3D similarity between a pair of binding sites. Implementation of a clustering technique that has been developed to identify noise in the data decreases the prevalence of non-conserved binding sites. Finally, maps of binding site space are visualised and interrogated on intuitive circular plots.

Here, we present a map of the 3D structures of potential binding sites derived from a dataset of therapeutically relevant protein targets curated from the PDB. We have analysed the resultant map of small molecule binding sites along with associated experimental ligand-binding data. We propose that such protein binding site maps, coupled with ligand binding data will be useful for hit discovery, medicinal chemistry design and further building our understanding of ligand polypharmacology. Furthermore, this analysis highlights regions of protein binding site space that are not satisfied by current compound screening libraries and will provide a direction for future library optimisation and chemical diversity projects.

Fragment Hotspot Maps: A CSD-derived Method for Hotspot identification (Chris Radoux, CCDC)

Locating a ligand-binding site is an important first step in structure-guided drug discovery, but current methods do little to suggest which interactions within a pocket are the most important for binding. We recently published a method that samples atomic hotspots with simple molecular probes to produce Fragment Hotspot Maps (Radoux et al, J Med Chem, 2016, 59, 4314–4325). These maps specifically highlight fragment-binding sites and their corresponding pharmacophores. For ligand-bound structures, they provide an intuitive visual guide within the binding site, directing medicinal chemists where to grow the molecule and alerting them to suboptimal interactions within the original hit.

The method was validated using experimental binding positions of 21 fragments and subsequent lead molecules. The ligands are found in high scoring areas of the fragment hotspot maps, with fragment atoms having a median percentage rank of 97%.

We have concentrated on making Fragment Hotspot Maps more accessible through the development of a freely available web server, and on integrating them with existing computational techniques. Fragment Hotspot Maps can now be used to analyse molecular dynamics trajectories, create pharmacophores and improve docking by suggesting which hydrogen bond(s) should be used in constraints.

Optimisation of shape fingerprints for protein-ligand systems (Joanna Zarnecka, Liverpool John Moores University)

Shape is one of the most important properties that dictate whether a molecule is likely to be an effective drug. Molecules similar in shape are more likely to show similar activity towards the same target protein. The pharmaceutical industry uses this concept in virtual screening and in isosteres that improve the properties of potential drugs.

One of the methods that might be used to quantify molecular shape is shape fingerprints – binary bit strings that encode the shape of compounds. They involve fast calculations and low storage needs. Shape is measured indirectly by alignment to a database of standard molecular shapes – the reference shapes.

The aim of the investigation was to establish an optimum set of reference shapes and settings that are suitable for identifying ligands that are likely to bind at the same target.

There are 3 stages of calculations: 1) defining a set of reference shapes, 2) generating shape fingerprints and 3) comparing shape fingerprints. The shapes of all molecules that have ever been reported in complex with a protein were downloaded and filtered using an algorithm described by Haigh et al. (J. Chem. Inf. Model. (2005) 45, 673–684); this identified several alternative sets of reference shapes.

A test set described by Taylor et al. was used to evaluate the performance of the sets of reference shapes. The test set consists of 10 groups of protein-ligand complexes from the PDB. In order to investigate whether shape fingerprints provide a useful means to identify such groups, the shape fingerprint for every molecule in the test set was compared to those for every other molecule. Comparisons were made using crystal structures and conformations generated from SMILES.

The method is able to distinguish subsets of compounds with good levels of accuracy, considering that only the shape of molecules is considered; no other features are represented in these calculations. This suggests that shape is an extremely strong influence on biological activity, as envisaged by the lock-and-key concept. Shape fingerprints are a useful method to apply this concept and are able to group compounds that are likely to share biological activity.

Poster Abstracts

The use of public data to improve proprietary ADME models (Maria-Anna Trapotsi, University of Hertfordshire)

ADME (Absorption, Distribution, Metabolism and Elimination) properties are important factors in the drug discovery pipeline. Literature ADME data are often collected in large chemical databases like ChEMBL (Bento A, et al., Nucleic Acids Research. 2013;42:1083-1090, Gaulton A, et al., Nucleic Acids Research. 2016;45:945-954), which might help to improve the prediction of ADME properties. Pharmaceutical companies build ADME QSAR models using proprietary data and thus the inclusion of literature data might be a valuable source for the development of predictive models. The current study is investigating whether merging literature and proprietary data can improve the predictive activity of proprietary models and enlarge their applicability domain (AD).

Literature data for Caco-2 A to B permeability assay downloaded from ChEMBL and a filtering process applied. Three permeability models developed and each model had a different training set: the first model was built with Evotec proprietary compounds, the second with literature compounds (downloaded from ChEMBL) and the third with both proprietary and literature (ChEMBL) compounds. Three different methods used for QSAR building: 1. Partial Least Squares (PLS), 2. Random Forest (RF) and 3. Support Vector Regression (SVR) with a radial basis function (rbf) kernel. Temporal test sets (i.e. including experimental measurements published after the models were build) of public and proprietary data were used to assess the models. In addition, four distance to model metrics used to assess the applicability domain of the models: 1. Mahalanobis distance (MD), 2. Leverage, 3. k-NN (k-Nearest Neighbour) with Euclidean distance (ED) and 4. k-NN with Manhattan distance (ManhD).

The results showed that the merged model, which represents the inclusion of both proprietary and literature data, was able to improve the prediction of temporal proprietary compounds (with RF) and significantly improve the prediction of temporal literature compounds compared to the existing Evotec model. The models’ AD investigated by estimating the distance of test set compounds from the training set and additionally distance thresholds applied to establish the percentage of test compounds within the AD. A greater percentage of Evotec and literature temporal test set compounds were within the AD of the new merged model compared to the existing Evotec model.

In conclusion, this study demonstrates that the inclusion of public compounds in proprietary models can be beneficial to enlarge the applicability domain and can improve model’s performance.

Putting electrostatics and water at the center of structure-based drug design (Susana Tomasio, Cresset)

The electrostatics of protein active sites can inform ligand design and SAR. However, no calculation of electrostatic potentials in the active site can be complete without also considering whether water molecules are tightly bound and contributing to the potentials and ligand binding. In this poster we will explore the effect of including water molecules shown to be energetically favourable on the electrostatic potential of individual proteins. We will present a new application to enable rapid and accurate calculation of water stability and protein interaction potentials. Combining these analyses with the popular Waterswap technique results in an in-depth knowledge of protein targets and ligand protein binding.

Curation and modelling of solubility data to support digital design of unit operations (R.L. Marchese Robinson, Leeds University)

Computational prediction of solubility is a key aim of the pharmaceutical industry. Many computational models and, more specifically, quantitative structure-property relationships (QSPRs) have been developed to predict aqueous solubility as an indicator of active pharmaceutical ingredient (API) bioavailability (Skyner et al., Phys.Chem.Chem.Phys., 2015 (17), p.6174). In contrast, the work reported in this presentation has a different focus: to support the digital design of unit operations in pharmaceutical manufacturing, e.g. batch crystallization and wet granulation. Hence, data are being curated and used to develop and validate QSPR-like models to predict temperature dependent solubility profiles of molecular crystals, e.g. APIs or excipients. This work builds upon the limited number of QSPR studies which have sought to model temperature dependent solubility profiles (Avdeef, ADMET & DMPK, 2015 (3), p.298, Klimenko et al., J. Comput. Chem., 2016 (37), p.2045). For example, the use of descriptors derived from crystal structures is being investigated. This work is being carried out in the context of the UK ADDoPT (Advanced Digital Design of Pharmaceutical Therapeutics) project ( This presentation will situate the work within the context of ADDoPT, then discuss the data curation and initial modelling results which have been obtained to date.

Can we assess phototoxicity of non-steroidal anti-inflammatory drugs  through in-silico calculations? (Neus Aguilera-Porta, GSK)

Pharmaceuticals need to be assessed for photosafety as outlined in the ICH S10 guidance. The ICH S10 guidance (and the associated ICH M3 guidance) suggests that an initial assessment of the phototoxicity potential should be conducted based on “photochemical properties and pharmacological/chemical class”. The characterization of the UV visible absorption spectrum is recommended as the initial assessment because it can obviate any further requirements for photosafety evaluation.

However beyond that there is no discussion of whether a molecule’s intrinsic properties can provide an indication of its potential phototoxicity. Thus, it is crucial to investigate the photophysics and photoreactive paths that are activated within a molecule upon photon absorption (Tønnesen H.H, Photostability of Drugs and Drug Formulations, New York: CRC Press, 2004).

We have analysed a set of non-steroidal anti-inflammatory drugs’ (NSAIDs) by simulating their exposure to light and mapping the deactivation funnels to lower lying electronic states by exploring the excited state minimum energy paths and locating the main stationary points.

On the way to the ground state, the excited system could go through interstate crossings that act as funnels to allow deactivation. In some cases, interstate will imply a change of spin configuration, intersystem crossings (ISC) (Fig.1), allowing the population to transfer to manifolds of different multiplicity than that of the ground state

The object of our study is to determine whether characterisation of the ISC can be used to test the hypothesis of whether the excited electron state lifetime is causally related to a molecules potential phototoxicity.

The translation of different deactivation mechanisms and new indicators would allow the generation of a phototoxicity model that would predict the potential photosafety of new drugs by analysing their photophysical properties.

Solubility vs hERG, how to get one while avoiding the other (Ed Griffen, Medchemica)

Increasing solubility and avoiding hERG activity represent two opposing challenges in the design of bioactive compounds. Design strategies based round incorporating amines into molecules may generate soluble compounds but risk binging to the hERG ion channel receptor with concomitant cardiac toxicity risks. We demonstrate how using advanced matched molecular pair analysis on public data and extraction of statistically significant rules, we can identify other chemical approaches that can be used to modulate each property selectively.

The First Proteochemometric Model for the Bromodomain Family: Insights into Selectivity (Kathryn A Giblin, Cambridge University)

Recent efforts are beginning to deliver selective probe molecules for bromodomain proteins to elucidate key roles in oncological, cardiovascular and immuno-inflammatory disorders at the individual protein level. Despite this, the design of selective bromodomain inhibitors remains a challenge. We present here the first modelling of bromodomain bioactivity using global proteochemometric (PCM) modelling via integrating data made available at AstraZeneca and public data, with the aim to understand the key features that contribute to selectivity within the protein family. Generation of small molecule descriptors and amino acid sequence alignment-based descriptors of the bromodomain active site has afforded the training of a predictive random forest classification model, validated on an external test set (balanced accuracy > 0.85). The PCM models have been further validated using leave-one-group-out analyses and benchmarked against other methods including QSAR. Feature importance is used to highlight key residues in the binding site, presenting as selectivity “hotspots”. The final model is currently being experimentally validated.

EGFR: A Case Study in Irreversible Binding Mode Prediction with POSIT (Maya Beano, Astra Zeneca)

The formation of a covalent bond between drug molecule and protein is an attractive way to inhibit protein function.  However, prediction of binding mode for compounds that irreversibly bind to their target presents some challenges not met by traditional ligand docking software.  Some solutions that exist at present allow for protein flexibility and typically take a relatively long time to produce results.  Here we present a modified version of POSIT (OpenEye software) that performs covalent docking much more quickly by making use of protein and ligand information.  We assess its performance looking at EGFR as a case study and for binding of the potent EGFR inhibitor Tagrisso and related molecules.

Automatic extraction of bioactivity data from patents (Daniel Lowe, NextMove)

Structure-Activity Relationship (SAR) analysis is important for the development of novel small molecule drugs. Such analyses rely on bioactivity data either from in-house or published data, with data from the latter currently being extracted manually at much expensive. Here we report on an entirely automated system for extracting bioactivity data that we are developing, initially targeting US patents.

The system relies on combining the results of many technologies: chemical entity recognition, chemical name to structure, table processing, chemical compound number resolution, chemical sketch interpretation, and even in some cases reconstitution of molecules from a generic core and R-group definitions. Where possible, the target and the assay description are also identified.

To assess the precision/recall of our system we compare our results with those manually extracted from US patents by BindingDB. We also compare the data we’ve extracted with the data present in ChEMBL from journal articles, to analyse whether there are significant differences between activity data in journal articles and patents e.g. differences in targets of interest.