Developing muscarinic receptor M1 classification models utilizing transfer learning and generative AI techniques

Abstract

Muscarinic receptor subtype 1 (M1) is a G protein-coupled receptor (GPCR) and a key pharmacological target for peripheral neuropathy, chronic obstructive pulmonary disease, nerve agent exposures, and cognitive disorders. Screening and identifying compounds with potential to interact with M1 will aid in rational drug design for these disorders. In this work, we developed machine learning-based M1 classification models utilizing publicly available bioactivity data. As inactive compounds are rarely reported in the literature, we encountered the problem of imbalanced datasets. We investigated two strategies to overcome this bottleneck: 1) transfer learning and 2) using generative models to oversample the inactive class. Our analysis shows that these approaches reduced misclassification of the inactive class not only for M1 but also for other GPCR targets. Overall, we have developed classification models for M1 receptor that will enable rapid screening of large chemical databases and advance drug discovery.

Data folder containing all the data used in this study. It is divided into 4 sub-sections.
- Input: Contains the M1 bioactivities from ChEMBL and BindingDB, a combined public dataset and the list of compounds generated by recurrent neural network (RNN) and REINVENT4.
- Training: Contains different files used for training the model for M1 and 5 additional GPCR targets. RNN and REINVENT4 added files are also provided. Deep neural network (DNN) models uses Morgan fingerprints for training, and those files are labelled as “_FPs”
- Test: Contains the scaffold-split based, high-throughput screening (HTS) and DrugBank datasets. Same naming conventions are followed as in Training. Fingerprints for the HTS dataset was too large and is not provided.
- Results: Contains all the results file generated from the scripts for M1 and 5 additional GPCR targets.
- The toml file used to sample compounds using REINVENT4 is also provided
Scripts folder containing three python scripts that can be run with python script.py
- curate_data.py that combines data from public databases and create the training and scaffold-split-based test sets
- add_gen.py that augments the RNN/DEG generated inactives with the training set
- pred_sklearn.py that runs the NB, RF and XG models
- pred_dnn_w_transfer_learning.py that runs the regular DNN along with transfer learning
Model folder containing the Naïve Bayes, Random forest, XGBoost and the various DNN models

Setting up the anaconda environment

To run the python script, you will first need Anaconda installed. From an Anaconda prompt, set up a new environment using the following commands:

conda create -n imbalanced_m1 python=3.9

conda activate imbalanced_m1

Next, navigate to this repository's folder and enter the following command to install dependencies:

pip install -r requirements.txt

Running the scripts

Combining data and creating training and test sets

python curate_data.py

Running the NB/RF/XG models

python pred_sklearn.py

Running the DNN and transfer learning models

python pred_dnn_w_transfer_learning.py

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Data		Data
Model		Model
Scripts		Scripts
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Developing muscarinic receptor M1 classification models utilizing transfer learning and generative AI techniques

Abstract

Contents

Setting up the anaconda environment

Running the scripts

Combining data and creating training and test sets

Running the NB/RF/XG models

Running the DNN and transfer learning models

About

Releases

Packages

Contributors 3

Languages

License

BHSAI/imbalanced_data_M1

Folders and files

Latest commit

History

Repository files navigation

Developing muscarinic receptor M1 classification models utilizing transfer learning and generative AI techniques

Abstract

Contents

Setting up the anaconda environment

Running the scripts

Combining data and creating training and test sets

Running the NB/RF/XG models

Running the DNN and transfer learning models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages