Skip to content

molML/Nano_Particles_Active_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

repo version python version license Static Badge Static Badge

Machine learning-guided high throughput nanoparticle design

Ana Ortiz-Perez1, Derek van Tilborg1, Roy van der Meel, Francesca Grisoni*, Lorenzo Albertazzi*
1These authors contributed equally to this work.
*Corresponding authors: [email protected], [email protected].

Abstract
Designing nanoparticles with desired properties is a challenging endeavor, due to the large combinatorial space and complex structure-function relationships. High throughput methodologies and machine learning approaches are attractive and emergent strategies to accelerate nanoparticle composition design. To date, how to combine nanoparticle formulation, screening, and computational decision-making into a single effective workflow is underexplored. In this study, we showcase the integration of three key technologies, namely microfluidic-based formulation, high content imaging, and active machine learning. As a case study, we apply our approach for designing PLGA-PEG nanoparticles with high uptake in human breast cancer cells. Starting from a small set of nanoparticles for model training, our approach led to an increase in uptake from ~5-fold to ~15-fold in only two machine learning guided iterations, taking one week each. To the best of our knowledge, this is the first time that these three technologies have been successfully integrated to optimize a biological response through nanoparticle composition. Our results underscore the potential of the proposed platform for rapid and unbiased nanoparticle optimization.

Figure 1 Figure 1. Conceptual overview of the proposed iterative nanoparticle design pipeline. (a) the three key integrated technologies: (1) nanoparticles are formulated by microfluidics-assisted nanoprecipitation by controlling different formulation variables xi, (2) the formulations are screened with high content imaging (HCI) to determine their properties yi (e.g., their uptake in MDA-MB-468 cells, as in this proof of concept), and (3) a machine learning model learns the relationship between nanoparticle formulations (x) and their corresponding property (y), and is used to guide the next cycle. (b) Overview of experimental cycle: from microfluidic formulation to formulation selection for the following cycle in five days.

Prerequisites

The following Python packages are required to run this codebase. Tested on macOS 13.3.1

Installation

Install dependencies from the provided env.yaml file. This typically takes a couple of minutes.

conda env create -f env.yaml

Manual installation of requirements (tested on macOS 13.3.1):

conda create -n np_al python=3.9
conda activate np_al
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip3 install pyro-ppl==1.8.4 pandas==1.5.3 numpy==1.23.5 xgboost==1.7.3 scikit-learn==1.2.1 scikit-optimize==0.9.0 tqdm

Content

This repository is structured in the following way:

  • data: contains all data required to replicate the study
    • cycle_0: the starting data set
    • cycle_1: the data of cycle 0 + experimental data of the exploratively acquired nanoparticle formulations
    • cycle_2: the data of cycle 0, 1 + experimental data of the exploitatively acquired nanoparticle formulations
    • cycle_3: the data of cycle 0, 1, 2 + experimental data of the 'validation' nanoparticle formulations
    • screen_library.csv: file containing all nanoparticle formulations of our design space
    • all_samples.csv: file containing all data and predictions, used for data visualization in the paper
  • experiments: contains all Python scripts required to replicate the study
  • figures: contains the R script to recreate our figures
  • models: all BNN and XGB models (pickled)
  • nano: all Python functions, from models to evaluation
  • results: all of our results (predictions, sample acquisitions, etc)

How to cite

You can currently cite our pre-print:

Ortiz-Perez et al. (2023). Machine learning-guided high throughput nanoparticle design. ChemRxiv.

License

This codebase is under MIT license. For use of specific models, please refer to the model licenses found in the original packages.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published