UK QSAR Newsletter Autumn 2021

UK-QSAR Autumn 2021
The UK-QSAR and ChemoInformatics Group
Welcome to the Autumn 2021 UK-QSAR Newsletter!
Although some aspects of life are starting to resemble something like normality, our upcoming Autumn Meeting will again be a virtual event, taking place on for half a day on the afternoon of Thursday 14th October 2021.The meeting will be (virtually) hosted by Charles River Laboratories and will cover a range of topics aligned to what we’ve come to think of as “traditional” computer-aided drug design.  More details on the meeting are belowRegistration is now open.  Abstracts and references provided by the speakers are provided below.Since our last meeting, one of the major developments for computational chemistry is the release of the AlphaFold set of AI-computed protein structures for use in research.  An article on the data available and some initial observations on the structures is  below.The format for the Spring 2022 Meeting is still under discussion and will be reviewed in due course.You’ll also find the regular articles on Jobs and Upcoming Meetings.As ever, please send any feedback or suggestions you have for future newsletters to Susan Boyd at

Autumn Meeting Information

The virtual meeting will be held on the afternoon of Thursday 14th October 2021, and as ever the meeting is free to attend, although delegates will need to register before 11th October.  For this meeting, hosted by Charles River Laboratories,  we return to more traditional comp chem & SBDD, with topics covered including managing hydrogen-bond donors in drug design, molecular dynamics simulations to help characterise likely binding interactions and some case studies of inhibitor design.  Poster abstracts can be submitted here. Speakers include Peter Kenny (who will be known to many of you!),  Ying-Chih Chiang from the Kobilka Institute in Hong Kong, Michael Bodnarchuk from AZ, Danielle Newby from the University of Oxford, and Julien Michel from the University of Edinburgh.  The provisional agenda is:

1145   Open  
11.50   Welcome,   Nicole Hamblin, Senior Director CRL
11.55   Welcome from Meeting Organiser,  David Clark, CRL

Session 1      Chair: David Clark, CRL
1200-1230   Hydrogen Bond Donors in Drug Design, Peter Kenny
1230-1300   Molecular Dynamics Simulations of Penicillin-Binding Protein 2a (PBP2a) with Ceftaroline at the Allosteric Site, Ying-Chih Chiang, Kobilka Institute of Innovative Drug Discovery, School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen

1300-1400   Lunch & Posters   

Session 2   Chair: Pascal Savy, CRL
1400-1430   Allosteric Covalent Inhibitors of the Mutant GTPase KRAS, Michael Bodnarchuk, AZ
1430-1500   Can you teach old drugs new tricks to prevent dementia?, Danielle Newby, Department of Psychiatry, Warneford Hospital, University of Oxford

1500-1530  Tea break

Session 3   Chair: Emanuela Gancia, CRL
1530-1600  Energetics of protein disorder-order transitions in small molecule binding, Julien Michel, University of Edinburgh
1600-1630  Bag of Features: A Multi-Instance Perspective on QSAR, Emmanuel Noutahi, Valence Discovery
1630  Closing remarks, announcement of poster winner  

AlphaFold:  Plenty of Structures to Conjure with!
Susan Boyd, CompChem Solutions Ltd.

Following on from their much publicised success in last year’s CASP14 competition, DeepMind firstly published the detailed methodology they used to correctly predict 88% of protein structures (C-alpha traces) to within 4A RMSD, then, jointly with the European Bioinformatics Institute (EMBL-EBI) released the AlphaFold Protein Structure Database, which contains DeepMind’s predicted structures for 20,000 human proteins plus proteins for 20 other key organisms, to accelerate scientific research.This is a great adjunct to the current PDB, and Uniprot has already cross-referenced the AlphFold structures with links directly from the relevant protein sequence pages.Another player quick to incorporate the new AlphaFold data is the team from CRUK’s ICR who have already started work to incorporate the human AlphaFold models into the CanSAR knowledgebase.How good are the models?  We know their own data shows high correspondence with experimental structures (perhaps with the exception of NMR structures) for most cases.  In my own experience, so far I have found the models to be pretty good.  Some protein structures in the PDB only cover part of the full-length sequence, and in some examples I have explored the AlphaFold models not only cover the PDB residues pretty well, but they also extend the coverage towards the terminii of the sequence, in a way which appears reasonable.  The confidence level of each residue’s prediction is stored in the bvalue field of the downloadable PDB file, so it is possible to see at a glance which regions of a given structure should be treated with more caution than others.

AlphaFold structure of human HTRA2 protease, with blue indicating areas of strong confidence and yellow/orange indicating regions of lower confidence.

DeepMind’s predictions are hindered by the prevalence of intrinsically disordered regions of the proteins, by conformational flexibility within proteins (eg DFG-in/DFG-out kinases) and by conformational changes which may occur when complexes are formed with additional proteins, but the team continue to work on these areas for future development, and they hope to increase their number of protein predictions from the initial ca 400k to 130 million (50% of known proteins) by the end of this year, so one to watch!All in all the development of AlphaFold and the decision to release these structures for free public access can only be a positive for those of us involved in structure-based drug design.  There is still plenty for the DeepMind team to work on and we await their future developments with eager anticipation.

Abstracts & Pre-Reading Material

The speakers have provided the following abstracts and references which might be of interest ahead of their talks.

Peter Kenny, Hydrogen bond donors in drug design
Hydrogen bond donors (HBDs) are sometimes seen as a greater ADME liability than hydrogen bond acceptors (HBAs). Two factors shape our perception of HBDs in drug design. First, HBAs are more numerous than HBDs in molecular structures of interest to drug designers. Second, the presence of an HBD in a molecular structure usually requires that an HBA also be present. Analysis of alkane/water partition coefficients shows that HBAs are typically more strongly hydrated than HBDs. The HBA/HBD balance is also discussed as a determinant of aqueous solubility and I will conclude the presentation with some recommendations for managing HBDs in design.
Reference: Borges et al (2017) The influence of hydrogen bonding on partition coefficients. JCAMD 31:163-181.

YingChih Chiang,  Molecular Dynamics Simulations of Penicillin-Binding Protein 2a (PBP2a) with Ceftaroline at the Allosteric Site
Methicillin-resistant Staphylococcus aureus (MRSA) tolerates β-lactam antibiotics by two different strategies. One is to produce β-lactamases to decompose the β-lactams, and the other is to bring in Penicillin-binding protein 2a (PBP2a) for cell wall synthesis. PBP2a is known to keep its biological function under the presence of most β-lactams. This ability has been attributed to PBP2a’s active site being protected to reduce the probability of ligand binding.Previous crystallographic studies suggested that this protected active site opens for reaction once a native substrate or a good analog binds at an allosteric domain of PBP2a. This mechanism has been employed to explain ceftaroline’s activity against MRSA infections, i. e. by binding to the allosteric site, thereby opening the active site for inhibition by ceftaroline. In this work, we investigate the binding of ceftaroline at this proposed allosteric site using molecular dynamics simulations. Unstable binding was observed using the major force fields CHARMM36 and Amber ff14SB, and free energy calculations were unable to confirm a strong allosteric effect. Our study suggests that the allosteric effect induced by ceftaroline is probably weak.
Reference: Ying-Chih Chiang, Mabel T. Y. Wong, Jonathan W. Essex, Israel Journal of Chemistry, 60(7), 754-763 (2020).

Michael S. Bodnarchuk, Allosteric Covalent Inhibitors of the Mutant GTPase KRAS (G12C)
Of all human cancers, 20% have a mutation in GTPase KRAS with a high frequency being found in pancreatic, colorectal and non-small cell lung cancer (NSCLC). Indeed, a glycine to cysteine mutation at codon 12 is the most frequent mutation found in NSCLC, rendering KRAS constitutively active and driving cell proliferation, survival and differentiation. Covalent targeting of this cysteine residue offers the potential for selective inhibitors of the G12C mutant isoform and an allosteric mode of action potentially negates the impact of high nucleotide binding affinity for the GTPase. A knowledge- and structure-based design approach was utilised to derive diverse series of covalent inhibitors that interact directly with this cysteine mutation, causing a shift in the switch II region and rendering KRAS inactive. Here I explore and assess the impact of computational chemistry upon this program and highlight how different techniques have accelerated the development of these compounds.

Danielle Newby, Can you teach old drugs new tricks to prevent dementia?
There is no cure for dementia and recent advances in treatments of dementia are controversial. Potentially, 40% of new dementia cases still could be preventable through targeting modifiable risk factors. Therefore, there still growing interest to identify drug treatments, which prevent dementia from occurring in the first place. Due to the vast number of drugs on the market, it is possible to repurpose “old drugs” currently used to treat other diseases for “new” uses for dementia prevention. In this talk, I will describe my research focussing on drug treatments used to treat cardiovascular, metabolic and inflammatory diseases and how they are associated with reduced dementia risk and cognitive decline using a variety of real world datasets. Furthermore, I will then go on to describe the next steps and future work involving how we can take these results further to understand why certain drugs reduce dementia risk and how we could use machine learning to identify combinations of drugs to prevent dementia.
Reference: Newby, D., Prieto-Alhambra, D., Duarte-Salles, T., Ansell, D., Pedersen, L., Van Der Lei, J., Mosseveld, M., Rijnbeek, P., James, G., Alexander, M., Egger, P., Podhorna, J., Stewart, R., Perera, G., Avillach, P., Grosdidier, S., Lovestone, S. & Nevado-Holgado, A. J. Methotrexate and relative risk of dementia amongst patients with rheumatoid arthritis: A multi-national multi-database case-control study. Alzheimer’s Res. Ther. 12, (2020).

Julien Michel, Energetics of protein disorder-order transitions in small molecule binding
Disorder-order transitions of intrinsically disordered regions (IDR) in proteins is commonly used in Nature to tune affinity and selectivity of protein-protein complexes. How such strategy could be exploited for the rational design of small molecule ligands is uncertain.
Our group has carried out isothermal titration calorimetry measurements to investigate binding energetics of ligands Nutlin-3a and AM-7209 to variants of the protein MDM2 that include or lack an IDR ‘lid’ region. We observe that AM-7209 shows a remarkable 250-fold loss of affinity for a MDM2 variant where the lid IDR is truncated, whereas the potency of Nutlin-3a is unaffected.
Clarification in the observed binding energetics was pursued by combining alchemical free energy calculations and enhanced sampling molecular dynamics simulations with calorimetry and NMR measurements on a panel of MDM2 mutants.
Our findings provide a rationale for the AM-7209 selective disorder-order transition in MDM2 and delineate a framework for exploiting protein disorder in rational drug design.
Reference: ‘’Elucidation of Ligand-Dependent Modulation of Disorder-Order Transitions in the Oncoprotein MDM2’’ Bueren-Calabuig, J. A. ; Michel, J. PLoS Comput. Biol. , 11(6): e1004282, 2015 doi:10.1371/journal.pcbi.1004282.

Emmanuel Noutahi, Bag of Features: A Multi-Instance Perspective on QSAR
Modern QSAR approaches regularly take advantage of recent advances in machine learning to improve molecular property prediction models used for predicting bioactivity computationally prior to experimental testing.However, despite improvements in methodology, a major bottleneck of recent methods remains the representational power of the molecule featurizer, which often either lacks faithfulness to the true molecular structure and/or relevancy to the predictive task of interest. To address these shortcomings, while also mitigating the impact of representation choice on predictive performance, we formulate the task of predicting biological properties of molecules as a multi-instance learning problem in which a single label is assigned to a bag of descriptors all derived from the same 2D structure. The proposed approach relies on the hypothesis that the QSAR is better captured through several molecular representation perspectives instead of only a single perspective.We explore various architectures of the proposed multi-instance learning framework, including an attention-based aggregator that helps provide insight into the contribution of each representation (instance) in the bag towards the measured properties. We further show that our formulation achieves similar or higher accuracy compared to classical machine learning approaches on common molecular benchmark tasks without sacrificing interpretability. In particular, we highlight one potential application of our framework in the prediction of biological activity that incorporates 3D molecular information while simultaneously providing insights on plausible bioactive conformers. 

Senior Computational Chemist, Sygnature Discovery, Nottingham or Alderley Park, UK
Chemoinformatics Data Scientist, CCDC, Cambridge, UK
Multiple positions, Exscientia, Oxford, UK
Application Scientist (and other roles), Chemical Computing Group, Cambridge, UK
Computational Chemist, Domainex, Cambridge UK
Senior Scientist CADD, Charles River Laboratories, Cambridge, UK
Research Leader CADD, Charles River Laboratories, Cambridge, UK
Various positions, Cresset, Lilltington, UK
Bioinformaticians, Data Analysts, Curators (and others), Healx, Cambridge/Remote, UK
Various positions,, London, UK 

Upcoming Meetings
The following meetings may be of interest to our readers:
UKQSAR Spring 2022 Meeting, Details TBC
3rd RSC/SCI Symposium on Anti-Bacterial Drug Discovery, 15-16th November 2021
Automated Synthesis Forum, 15-16th November 2021
Cambridge Cheminformatics Meeting, 24th November 2021
MGMS Young Modellers’ Forum, February 2022 (TBC)