Matched Molecular Pairs | UK QSAR and Cheminformatics Group

What is a Matched Molecular Series, and why you should care

Noel O’Boyle, NextMove Software

The concept of the Matched Molecular Pair (MMP), two molecules with the same scaffold but different R groups at the same position, has become very popular in recent years for rationalising trends in SAR. The success of Matched Molecular Pair Analysis (MMPA) is due to the fact that relative changes in property values are easier to predict than absolute values. Predictions based on MMPA work well for physicochemical properties as well as biological activities that correlate highly with such properties.

However, in general MMPA does not work well for predicting R groups that improve biological activity. The simple reason for this is that for one binding site environment changing group A to group B may increase activity while for another binding site environment it may decrease activity. While attempts have been made to address this problem, for example by focusing on MMPs from just the target of interest or with a particular atom environment, the underlying problem remains.

Enter the Matched Molecular Series (MMS), a concept introduced by Bajorath in 2011. This is simply a generalisation of the MMP concept to a series of any length, that is, N molecules with the same scaffold but different R groups at the same position. With MMPA, we are asking the question “Will changing B to C increase the activity?”; in contrast, if using MMS of length 3, we are asking “Will changing B to C increase the activity given that B is more active than A?” In other words, using longer series introduce more context, and this context represents a particular binding site environment.

Building on this idea in collaboration with AstraZeneca Mölndal, we have developed the “Matsy” algorithm (from “Matched Series”) which uses an existing source of activity data (e.g. the ChEMBLdb, or an internal database) to make predictions for what R group is likely increase activity given an observed activity order for measured R groups. Given a query matched series, the algorithm searches an activity database for all R groups that have been measured along with the query, and calculates the percentage of times each R group increased the activity beyond the most active R group in the query. The R groups with the highest percentages are presented as the most likely candidates to try next. For example, given an observed pIC50 order of ethyl > propyl > methyl, the top prediction is tert-butyl on the basis of 23 observations in ChEMBLdb of which 39% increased the activity.

In summary, using Matched Molecular Series you can overcome the limitations of MMPA for activity prediction by implicitly incorporating information on the binding site environment. The technique may be used as a way of guiding a medicinal chemistry programme, as a hypothesis generator, or simply a way to navigate existing SAR data.

For further details, please see our publication in J. Med. Chem. (http://dx.doi.org/10.1021/jm500022q) or the summary in a recent talk (http://www.slideshare.net/NextMoveSoftware/using-matched-molecular-series-as-a-predictive-tool-to-optimize-biological-activity).