Archived Newsletters

UK-QSAR Spring 2024

The UK-QSAR and Cheminformatics Group

Welcome to the Spring 2024 UK-QSAR Newsletter!

Charm Therapeutics and Chemical Computing Group look forward to welcoming you to the Spring 2024 UK-QSAR meeting, to be held on 11th April 2024 at the Babraham Research Campus near Cambridge:

The Petersfield Lecture Theatre,
The Cambridge Building,
Babraham Research Campus,
Babraham, Cambridge CB22 3AT

The theme for the meeting is “The Nuts and Bolts of AI: Model Derivation and Application”. Artificial Intelligence (AI) and Machine Learning (ML) are phrases we hear a lot these days; they bring together techniques that our community has been using for decades. There is a danger that “AI” may be treated as a “black box”, with a great deal of trust being placed in the results of its application, whereas how models are derived and trained of course has a huge influence on their output. We must therefore consider how models are produced when defining their applications, and how much weight we give their conclusions. The talks for the meeting will focus on these issues.

As ever, the meeting is free of charge to attend, but registration in advance is essential – “walk-up” attendance is not possible.

Please note that paper copies of the agenda and abstracts will not be provided onsite, so we recommend that you bookmark this page on your device or screenshot the agenda section.

In case of any questions, please contact Steve Maginn at

Our newsletter contains more details of the meeting  below including registration, the draft  agenda,  travel and accommodation information, info for  poster presenters and  talk abstracts/pre-reading material, as well as the usual sections on  job openings and upcoming  meetings which may be of interest.  We also have information on the upcoming  MGMS meeting and on the EBI-EMBL Industry Partnership meeting which features some familiar faces in the speaker list.

Sadly, we have recently lost a highly respected and talented pillar of our professional community, Richard Hall.  His colleague and UKQSAR Committee member, Marcel Verdonk, has penned a few words  below.

Our Autumn Meeting is scheduled for 17th October and will be held in Oxford, sponsored by Exscientia.

Spring Meeting Information


You can register online at . Please ensure that you are able to attend the event in person before registering, as it will not be streamed online or recorded for later access.

Registration closes at midnight UK time on April 4th, 2024.


The provisional agenda is:

09:00-10:00 Registration
10:00-10:15 Welcome Remarks
10:15-10:45 Gian Marco Ghiandoni and Lewis Mervin (AstraZeneca) – “Qptuna: Enabling high-quality production predictive AI/ML at AstraZeneca”
10:45-11:15 Michael Parker (Optibrium) – “Rapid AI generation of optimised compound designs guided by user interactions”
11:15-11:45 Rachael Skyner (OMass Therapeutics) – “Forget about the model ¬– what about the data?”
11:45-12:15 Damjan Krstajic (Serbian Research Centre for Cheminformatics) – “Can predictive models admit when they don’t know?”
13:45-14:15 Madeleine Taylor (Strathclyde University) – “Development of molecular descriptors for quantitative structure-retention relationships”
14:15-14:45 Markus Kossner (Chemical Computing Group) – “Reverse Fingerprinting: Application to Motif Detection and Pharmacophore Query Generation”
15:15-15:45 Finlay MacLean (Charm Therapeutics) – “Seedling: a scoring and generation framework for protein-ligand co-folding”
15:45-16:15 Fernanda Duarte (Oxford University) – Title TBA


The Babraham Research Campus is located 9km south east of Cambridge city centre, next to the A1307 Cambridge to Haverhill road, and just off the A11. –

For those arriving at Cambridge’s central railway station, we have organised a return shuttle bus which will depart from close to the station for the Babraham Campus at around 09:00 on the morning of April 11th, returning at the conclusion of the event. Details will be sent to those who indicate they may wish to use this facility when registering. Spaces on this service will be limited, so we recommend that locally-based people use the number 13 bus service, which runs from Cambridge (Drummer Street bus station) to Haverhill on a half-hourly basis with stops close to where Station Road joins Hills Road, and close to the access road roundabout for the Babraham Research Campus.

A taxi from / to Cambridge central station will cost around £20 each way.

There is ample car-parking on site – drivers should provide the registration number of their vehicle when registering for the event. There is also a dedicated cycleway from central Cambridge adjacent to the A1307, all the way out to Babraham.


There are numerous hotels in central Cambridge.  Close to the main station is the Ibis with the Travelodge a short walk away.

For a more authentic Cambridge college experience, University Rooms  may have availability in college accommodation.


You will have the opportunity to submit a poster abstract during the registration process. Posters will be displayed in the foyer outside the Petersfield Lecture Theatre, for viewing during breaks. The boards will be of size 120x150cm, and therefore will accommodate A0 sized posters in either landscape or portrait format.

The deadline for receipt of poster abstracts is March 15th. You will be notified soon after that date if your poster has been accepted. Priority will be given to posters presented by younger researchers, i.e. PhD students and postdocs.

In addition to the usual prize of an invitation to present a talk at the next UK-QSAR meeting, there will be a small additional prize for the poster judged the best. Only posters presented by students and other young researchers are eligible for the prize, but poster submissions are welcome from everyone.

Additional information on posters:

  • Abstracts can only be submitted during the registration process, or subsequently after declaring an intention to do so during registration.
  •  Abstract submission deadline is March 15th – this will not be extended.
  • Posters should be mounted 9.00-10.00 am on the day of the meeting

Richard Hall

Marcel Verdonk, Astex

With great sadness we would like to acknowledge the loss of a pillar of the cheminformatics community and a very familiar face at UK QSAR meetings, Richard Hall, who died on the 15th of December. Rich was a great scientist and innovator, a genuine, warm man and a loyal friend who will be sorely missed. There will be an oral tribute to Rich and his legacy to this community at the autumn UK QSAR meeting.

Abstracts & Pre-Reading Material

Some speakers have provided the following abstracts and references which might be of interest ahead of their talks.

Qptuna: Enabling high-quality production predictive AI/ML at AstraZeneca

Lewis Mervin1, Gianmarco Ghiandoni2

1Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK

2Augmented DMTA Engineering, R&D IT, R&D, AstraZeneca, Cambridge, UK

Model development and engineering are tightly coupled phases in the field of predictive AI/ML for drug discovery. They are in fact critical to its uptake in medicinal chemistry projects which require both accuracy and performance when dealing with real-world data. Here we present Qptuna, an automated model building framework for molecular property prediction that we have built in-house at AstraZeneca, and describe the ways in which models are deployed, consumed, and combined with other software in practice (e.g., for de novo design using REINVENT). Impact examples are also discussed to show how our tools are used to progress compounds in projects. To conclude, we provide an outlook on the future of predictive AI/ML to enable actionable decision making in drug discovery.


Rapid AI generation of optimised compound designs guided by user interactions

Michael Parker, Optibrium, Cambridge, UK

We present a novel AI approach for generative chemistry, rapidly generating new compound designs with improved properties. We pair a generative transformer model with a Bayesian optimisation algorithm to identify desirable property changes from user interactions and generate new compound ideas meeting those criteria. We show that the model can identify user goals within a multi-dimensional parameter space within a few interactions, and successfully generate relevant, optimised compound designs meeting multi-parameter goals. This powerful combination allows chemists to obtain new AI generated compounds quickly and easily, tailored to their project goals, without having to spend time defining complex filters and multi-parameter property criteria.


Forget about the model – what about the data?

Rachael Skyner,1 Elliot Nelson1 and Dominga Evangelista2

1OMass Therapeutics, Building 4000, Chancellor Court, John Smith Drive, Oxford Business Park, ARC, Oxford, OX4 2GX; †

2 Department of Pharmacy and Biotechnology, University of Bologna, Via Belmeloro 6, 40126, Bologna, Italy

In cheminformatics, the focus is often on refining predictive models and applying new predictive techniques to unsolved challenges. However, the quality and appropriateness of the underlying data are critical for robust and meaningful outcomes. We will delve into the intricacies of data utilisation; challenging the conventional use of benchmarking datasets straight out of the box.

First, we explore the pitfalls of adopting benchmarking datasets without scrutiny. We discuss effective techniques to identify and interrogate data, shedding light on known issues with widely used benchmarks. By advocating for a critical examination of datasets, we hope to help others to consider and identify problems in pre-prepared datasets and therefore enhance the reliability of their results.

Navigating novel or less-investigated prediction problems without a pre-curated dataset can be especially challenging. The second part of this talk presents case studies and general guidelines for building datasets, utilising examples such as dataset curation from Chembl. Through these insights, we hope to provide valuable solutions and starting points for cases where a more tailored dataset is required.

The discussion extends to diverse approaches for preparing data to ensure unbiased training and testing. By examining various methods, we aim to equip researchers with a toolkit to enhance the fairness and generalizability of their models.

In conclusion, this talk aims to emphasise the significance of thoughtful data curation, interrogation, and processing; providing practical guidelines and shifting focus from model-centric thinking to data-centric thinking as a starting point for model building in cheminformatics.


Can predictive models admit when they don’t know?

Damjan Krstajic, Director, Research Centre for Cheminformatics, Belgrade, Serbia

We are of the opinion that during the design of a binary classifier one ought to consider adding an “I don’t know” answer. We provide the case for the introduction of this third category when a human needs to make a decision based on the answer from a binary classifier. We discuss several approaches that may be used in this scenario. A procedure to define “I don’t know” predictions in binary classifiers, called all leave-one-out models (ALOOM), is presented as a proof of the concept.


Development of molecular descriptors for quantitative structure-retention relationships

Madeleine Taylora, Roman Szucsb, Lucy Morganc, Roland Brownc, David Palmera

aPure and Applied Chemistry, University of Strathclyde, G1 1XL, UK; bDepartment of Analytical Chemistry, Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia; cPfizer Global R&D, Sandwich, UK.

QSRR models are widely used in the pharmaceutical industry to help identify unknown compounds in HPLC screening experiments1. These models rely on high quality, relevant descriptors. Traditional descriptors are focused on solute features, but chromatographic retention is a phenomenon defined by solvation and partition interactions. Therefore, new molecular descriptors are developed that describe solvation structure using the reference interaction site model (RISM). The usefulness of these descriptors has been proven previously for predictions of solvation free energy2, entropy of solvation, and enthalpy of solvation3. They are adapted for chromatography by modelling the chromatographic conditions including the mobile phases and an analogue of the stationary phase. Together, these describe the dynamic equilibrium in the column.

Datasets provided by Pfizer have been used to validate these descriptors. 1D RISM equations were solved for the analyte molecules in various solvents with pyRISM solver software4. Compared to a PLS model using Mordred descriptors alone, the addition of RISM descriptors for solutes in methanol increased R2 from 0.515 to 0.717, and decreased RMSD from 1.01 to 0.77 min. Additionally, an outlier with atypical chemical structure had its percentage error reduced from 97% to 34% with the addition of these physics-based descriptors. Together, the descriptors display synergy which suggests that the information provided by RISM descriptors is complementary to that provided by the standard 2D descriptors. The QSRR methods to utilise these descriptors are also being developed, including more advanced machine learning algorithms and a procedure to predict differences in retention times.

[1] P. R. Haddad, M. Taraji and R. Szücs, Anal. Chem., 2021, 93, 228–256.

[2] D. S. Palmer, M. Mišin, M. V. Fedorov and A. Llinas, Mol. Pharm., 2015, 12, 3420–3432. [3] D. J. Fowles and D. S. Palmer, Phys. Chem. Chem. Phys., 2023, 25, 6944–6954.

[4] A. Ahmad, 2AUK/pyRISM, DOI: 10.5281/zenodo.7783600, 2023.


Reverse Fingerprinting: Application to Motif Detection and Pharmacophore Query Generation

Markus Kossner, Principal Scientist & Scientific Services Manager, Chemical Computing Group, Köln, Germany

‘Reverse Fingerprinting’ is a method that uses feature list fingerprints to detect differentiating structural elements in small molecule and protein datasets. This talk covers the theory of reverse fingerprinting and presents examples of its application to detecting important structural motifs, coloring atoms by activity contribution, generating 3D pharmacophore queries and identifying liability regions in protein structures.


Seedling: a scoring and generation framework for protein-ligand co-folding

Finlay MacLean, Charm Therapeutics, Cambridge, UK

In place of manually designing molecules one-by-one, generative chemistry approaches such as REINVENT promise to aid medicinal chemists to probe the chemical space associated with their design hypotheses. While much work has been done on developing de novo molecular generation algorithms based on machine learning, current approaches rely on simple scoring functions such as QSAR models, docking, and simple properties such as QED and logP. This greatly limits the power of such methods.

At Charm we have built a state-of-the-art molecule optimisation platform, Seedling, that incorporates molecular dynamics as well as our proprietary protein-ligand co-folding algorithm, DragonFold, to perform structure-based molecular generation.

In Seedling, a suite of generators ‘grow’ chemically reasonable designs based on “seed” molecules. Using a distributed platform to perform thousands of computations in parallel, these designs are then scored by an arsenal of physics-based simulations and machine learning models. Molecules are selected for expensive simulations via an active learning strategy, allowing us to efficiently search chemical space for the most promising molecules. This platform enables our experts to formulate complex hypotheses and return the next day to evaluate the most promising ideas.

MGMS Meeting: Adaptive Immune Receptors

Steve Maginn, Chemical Computing Group Inc

The MGMS (Molecular Graphics and Modelling Society) is pleased to announce that registrations are now open for the “Adaptive Immune Receptors: Structural modelling and immunoinformatics” one-day conference, which will be hosted at the University of Oxford on Friday April 5th 2024. 

Conference website:


Abstract Submission: 

 Currently accepted speakers include: 

– Professor Peter Tessier (University of Michigan, MI; Keynote) 

– Professor Charlotte Deane (University of Oxford; Keynote) 

– Dr. Paula Dobrinic (Immunocore) 

– Dr. Joseph Watson (University of Washington, WA) 

– Dr. Monica Fernández-Quintero (Scripps Institute, CA) 

– Dr. Pietro Sormanni (University of Cambridge)

Attendees are encouraged to submit a poster abstract, which has the chance to be selected for a talk if submitted before the 9th February deadline. 

“Early-bird” registration fees are available until February 23rd 

Target to Patient: Creating Tomorrow’s Drug Discovery Toolbox (EMBL-EBI Industry Partnerships)

Lucie Smith, EBI Hinxton

Target to Patient: Creating tomorrow’s drug discovery toolbox

29 – 30 April 2024

This meeting will focus on ground-breaking advances in genomics for target choice, in vitro technologies, novel modalities, risk mitigation, data analysis, AI/ML, digital health, and how they could radically increase the discovery of disease relevant biological targets and the development of new drugs.

In-person and virtual audiences will experience presentations from our esteemed panel of speakers, participate in live Q&A and panel discussion sessions, take advantage of networking opportunities, poster sessions, vendor exhibitions and on-demand materials. Keynote speakers include Melanie Lee CBE and Professor Sir Munir Pirohamed.  The full speaker agenda is here.

Participants from industry, academia and government interested in challenging the existing approaches to target selection and translation are encouraged to attend, and set the agenda for the next era of drug discovery.

Hinxton Hall Conference Centre, Wellcome Genome Campus, Hinxton, UK. CB10 1RQ or ONLINE

Registration OPEN

Early bird discount ends: 26 February 2024. Student and academic discounts available.

Abstracts for consideration for poster presentations can be submitted during registration.

Sponsors: EMBL-EBI, GSK, AstraZeneca 

Supported by: ELRIG, SMR



Upcoming Meetings

The following meetings may be of interest to our readers:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.