UK-QSAR Spring 2022 Meeting | UK QSAR and Cheminformatics Group

Our Spring 2022 meeting will be held on Tuesday, April 26th – Downing College, Cambridge, please see agenda and abstracts below.

Registration is now open: https://www.eventbrite.com/e/uk-qsar-spring-2022-meeting-tickets-276008447697

Agenda

9:00 Open registration, coffee/tea

9:45 Welcome and Opening remarks, Geoff Skillman, CSO, OpenEye

Session 1 Chair: Christopher Bayly, OpenEye

10:00 Jonathan Essex, University of Southampton – Is the sampling of water in protein-ligand systems a solved problem?

10:30 David Hahn, Janssen – Large Scale Free Energy Calculations in Drug Discovery

11:00 – 11:30 Break

Session 2 Chair: Gunther Stahl, OpenEye

11:30 Francois Berenger, University of Tokyo – Lean-Docking: Exploiting Ligands’ Predicted Docking Scores to Accelerate Molecular Docking (abstract)

12:00 Christopher Bayly, OpenEye Scientific Software, – Binding Free Energies in Orion: a Parallel Universe

12:30 – 14:00 Lunch

Session 3 Chair: TBA

14:00 Bert de Groot, MPI Goettingen, High throughput relative and absolute non-equilibrium binding free energies with pmx (abstract)

14:30 Hannah Bruce Macdonald, MSD, Application of free energy methods for lead optimisation

15:00 – 15:30 Break

Session 4 Chair: Christopher Bayly, OpenEye

15:30 Peter Coveney, UCL, Assembling an arsenal to achieve reliable free energy calculations (abstract)

16:00 Martin Packer, AstraZeneca, Impact of FEP in prospective molecular design – moving from single edges to large virtual libraries (abstract)

16:30 concluding remarks, poster winner announcement and close, Geoff Skillman, OpenEye

Abstracts & Pre-Reading Material

Some speakers have provided the following abstracts and references which might be of interest ahead of their talks:

Jonathan Essex, University of Southampton
Is the sampling of water in protein-ligand systems a solved problem?
Molecular simulations are becoming increasingly widely used in drug discovery, particularly in the context of free energy calculations to optimise ligands for affinity. The two main factors affecting precision and accuracy in this context are ensuring adequate sampling, and an accurate representation of the intermolecular interactions. One significant sampling problem relates to the modelling of water diffusion from bulk solvent into a protein-ligand binding site, particularly when the water molecules are buried deeply within the protein-ligand interface. Under such circumstances it can be difficult to ensure that all the appropriate water binding sites are occupied, and even more difficult for the hydration state of the protein-ligand system to change rapidly as part of an alchemical perturbation.
To address this sampling problem, we have been investigating the use of grand canonical simulation methods. In grand canonical Monte Carlo (GCMC), rather than keeping the number of water molecules in the simulation cell fixed, trial water insertion and deletion moves are attempted, subject to a chemical potential constraint. In doing so, waters may enter or leave the protein ligand binding site, without having to physically diffuse through the protein-ligand interface. Here, the methodology of GCMC will be briefly reviewed, followed by a discussion regarding its use in locating water binding sites identified by X-ray crystallography. The ability of GCMC to explore the thermodynamics of water networks will then be described, together with the coupling of GCMC moves with alchemical perturbations to ensure water relaxation. Finally, the advantages of a new approach based on slowly growing water molecules in situ will be discussed, particularly regarding the ability of this method to drive ligand and/or protein conformational change.

Recommended Reading:
M.L. Samways, R.D. Taylor, H.E. Bruce Macdonald, J.W. Essex “Water Molecules at Protein-Drug Interfaces: Computational Prediction and Analysis Methods” Chemical Society Reviews 50, 2021, 9104-9120
C. Barillari, J. Taylor, R. Viner, J.W. Essex “Classification of Water Molecules in Protein Binding Sites” Journal of the American Chemical Society 129, 2007, 2577-2587

David F. Hahn, Janssen
Large Scale Free Energy Calculations in Drug Discovery
Free energy (FE) calculations are rapidly becoming a key component in the drug design process to estimate the binding affinities of drug candidates to protein targets and prioritize syntheses. There have been significant efforts over the years to improve methods and parameters for free energy calculations.
Retrospective calculations and benchmarks will be presented showing that free energy calculations reach acceptable accuracy to be applied in a drug discovery project.
Next, I will discuss how the learnings of retrospective benchmark studies are transferred to drug discovery applications. The challenges faced in prospective applications will be discussed, with focus on the incorporation into drug discovery processes and on increasing the throughput. Additionally, I will present potential areas where FE calculation workflows can be improved and extended.

Recommended Reading:
Pérez-Benito, L.; et al.. “Predicting Activity Cliffs with Free-Energy Perturbation” J. Chem. Theory Comput. 2019, 15 (3), 1884–1895. https://doi.org/10.1021/acs.jctc.8b01290.
Gapsys, V.; et al. “Large Scale Relative Protein Ligand Binding Affinities Using Non- Equilibrium Alchemy.” Chem. Sci. 2020, 11 (4), 1140–1152. https://doi.org/10.1039/C9SC03754C.
Schindler, C. E. M.; et al. “Large-Scale Assessment of Binding Free Energy Calculations in Active Drug Discovery Projects.” J. Chem. Inf. Model. 2020, 60 (11), 5457–5474. https://doi.org/10.1021/acs.jcim.0c00900.
Hahn, D. F.; et al. “Best Practices for Constructing, Preparing, and Evaluating Protein-Ligand Binding Affinity Benchmarks.” arXiv:2105.06222 [physics, q-bio] 2021. https://doi.org/10.48550/arXiv.2105.06222
Gapsys, V.; et al. “Pre- Exascale Computing of Protein–Ligand Binding Free Energies with Open Source Software for Drug Design.” J. Chem. Inf. Model. 2022, 62 (5), 1172–1177. https://doi.org/10.1021/acs.jcim.1c01445.
Kutzner, C.; et al. “GROMACS in the Cloud: A Global Supercomputer to Speed Up Alchemical Drug Design.” J. Chem. Inf. Model. 2022, 62 (7), 1691–1711. https://doi.org/10.1021/acs.jcim.2c00044.

Francois Berenger, University of Tokyo
Lean-Docking: Exploiting Ligands’ Predicted Docking Scores to Accelerate Molecular Docking
In structure-based virtual screening (SBVS), a binding site on a protein structure is used to search for ligands with favorable nonbonded interactions. Because it is computationally difficult, docking is time-consuming and any docking user will eventually encounter a chemical library that is too big to dock. This problem might arise because there is not enough computing power or because preparing and storing so many three-dimensional (3D) ligands requires too much space. In this study, however, we show that quality regressors can be trained to predict docking scores from molecular fingerprints. Although typical docking has a screening rate of less than one ligand per second on one CPU core, our regressors can predict about 5800 docking scores per second. This approach allows us to focus docking on the portion of a database that is predicted to have docking scores below a user-chosen threshold. Herein, usage examples are shown, where only 25% of a ligand database is docked, without any significant virtual screening performance loss. We call this method “lean-docking”. To validate lean-docking, a massive docking campaign using several state-of-the-art docking software packages was undertaken on an unbiased data set, with only wet-lab tested active and inactive molecules. Although regressors allow the screening of a larger chemical space, even at a constant docking power, it is also clear that significant progress in the virtual screening power of docking scores is desirable.
https://www.researchgate.net/publication/350940167_Lean-Docking_Exploiting_Ligands’_Predicted_Docking_Scores_to_Accelerate_Molecular_Docking

Christopher Bayly, OpenEye Scientific Software
Binding Free Energies in Orion: A Parallel Universe
The massive parallelism possible in OpenEye’s Orion platform encourages OpenEye to re-think our science to come up with more highly parallelized methods and workflows. This motivated us to leave the traditional FEP and TI approaches to Relative Binding Free Energy (RBFE) calculations, heading instead in a new direction with Non-Equilibrium Switching which has greater parallelism potential. Here we will show how our MD-based tools use NES to enable fast and efficient RBFE calculations.

Bert de Groot, MPI Goettingen
High throughput relative and absolute non-equilibrium binding free energies with pmx
Alchemical free energy calculations have come of age. Based on rigorous first principles of statistical mechanics, these calculations explore physical paths not experimentally accessible and provide unprecedented accuracy in the prediction of processes as diverse as protein thermostability and ligand binding free energies. Based on the pmx framework coupled to the GROMACS molecular dynamics engine, results of high-throughput relative as well as absolute ligand binding free energies are presented.
Suggested reading:
https://doi.org/10.1039/C9SC03754C
https://doi.org/10.1039/D1SC03472C
https://doi.org/10.1021/acs.jcim.1c01445
https://doi.org/10.48550/arXiv.2201.06372

Hannah Bruce Macdonald, MSD
Application of free energy methods for lead optimisation
For free energy methods to be applied effectively for drug lead optimisation methods, the success of the methods needs to be understood with appropriate datasets of sufficient size, quality and with both sufficient dynamic range while within the limits of the assay. Difficulties arise with this when using methods on ‘live’ drug design projects. Additionally, potency is not the only factor of importance in drug design, and efforts to capture the lipophilicity in free energy calculations will be discussed. Results using Perses – an open source single topology relative free energy software package will be shown for both acyclic and macrocyclic compounds.

Recommended reading:
Cournia, Zoe, Bryce Allen, and Woody Sherman. “Relative binding free energy calculations in drug discovery: recent advances and practical considerations.” Journal of chemical information and modeling 57.12 (2017): 2911-2937.
Lee, Tai-Sung, et al. “Alchemical binding free energy calculations in AMBER20: Advances and best practices for drug discovery.” Journal of chemical information and modeling 60.11 (2020): 5595-5623.
Barros, Emilia P., et al. “Recent developments in multiscale free energy simulations.” Current opinion in structural biology 72 (2022): 55-62.

Peter Coveney, UCL
Assembling an arsenal to achieve reliable free energy calculations
The ability of rapidly and accurately predicting binding affinities of ligands to a target protein of interest would greatly facilitate drug discovery programs by enabling researchers to focus on the compounds with a high probability of being pharmacologically active. Both the machine learning (ML) and physics-based (PB) methods have been increasingly used for the free energy predictions in drug development projects. The methods individually have their own advantages and limitations, which fortunately complement each other. We have coupled the ML and PB into a coherent scientific workflow, bringing together several methods of which some have already been applied in drug discovery while others are relatively new to the field and yet to be adopted. Such a coupled approach creates synergies between PB and ML methods and can significantly improve the outcomes, in terms of both the accuracy of the predictions and the coverage of chemical space. The workflow can be applied to the entire process of early drug discovery stage which involves hit discovery, hit to lead, lead optimization, and evaluation of potential side effects and toxicities. A very large number of compounds can be generated and evaluated, which narrow down to performing several independent calculations concurrently at large scale to increase the throughput. The ensemble computing pattern is ideal for such scenarios, which employs a high throughput “embarrassingly” parallel workload. This workflow is a suite of applications that collectively are able to scale up to exascale machines. We have demonstrated that the innovative, iterative and interactive heterogeneous workflow has the potential to accelerate the existing drug discovery process.

A. Al Saadi, et al., “IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads,” in 50th International Conference on Parallel Processing, Aug. 2021, Article No.: 40, pp. 1–12. DOI: 10.1145/3472456.3473524.
A. P. Bhati, et al., “Pandemic Drugs at Pandemic Speed: Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers”, Interface Focus, 11, 20210018, DOI: 10.1098/rsfs.2021.0018
S. Wan, A. P. Bhati, S. J. Zasada and P. V. Coveney, “Rapid, accurate, precise and reproducible ligand–protein binding free energy prediction”, Interface Focus 10, 2020007 (2020), DOI:10.1098/rsfs.2020.0007

Martin Packer, AstraZeneca
Impact of FEP in prospective molecular design – moving from single edges to large virtual libraries
Free energy perturbation (FEP) models provide precise and accurate predictions for protein-ligand binding affinity. Currently available GPU hardware enables us to generate data for a single ligand with compute times of a few hours. Given the potential accuracy of FEP models, it is very desirable to apply them to every ligand that we design, but compute time then becomes a severely limiting factor. Active learning FEP combines FEP data with machine learning algorithms, to generate FEP-based structure activity models using rationally selected subsets from a virtual library. We iterate between FEP and machine learning models until we judge that simple models are predictive for FEP, so that we can spare further detailed computation.
In 2020 we applied this approach to a set of 16 active drug design projects within AstraZeneca. We used three approaches to design large virtual libraries and saw positive impact across a diverse set of protein targets. Over the course of 9 months we were able to generate 165,000 FEP data points and used those to prioritise synthesis of 445 molecules. We also used the models to generate detailed SAR maps for new hit series, which we exemplify here using a previously published series for the kinase EphB4.
Active learning FEP takes us a step closer towards a design environment in which every virtual molecule is assessed on its predicted affinity and its ADME properties, to focus synthesis and test activities on molecules most likely to meet multiple endpoints required for successful drug design.
Suggested reading:
https://doi.org/10.1021/acs.jcim.9b00367