Skip to content

Latest commit

 

History

History
135 lines (111 loc) · 12.1 KB

README.md

File metadata and controls

135 lines (111 loc) · 12.1 KB

ReSimNet

A Pytorch Implementation of paper

ReSimNet: Drug Response Similarity Prediction using Siamese Neural Networks
Jeon and Park et al., 2018

Abstract

Traditional drug discovery approaches identify a target for a disease and find a compound that binds to the target. In this approach, structures of compounds are considered as the most important features because it is assumed that similar structures will bind to the same target. Therefore, structural analogs of the drugs that bind to the target are selected as drug candidates. However, even though compounds are not structural analogs, they may achieve the desired response. A new drug discovery method based on drug response, which can complement the structure-based methods, is needed.

We implemented Siamese neural networks called ReSimNet that take as input two chemical compounds and predicts the CMap score of the two compounds, which we use to measure the transcriptional response similarity of the two counpounds. ReSimNet learns the embedding vector of a chemical compound in a transcriptional response space. ReSimNet is trained to minimize the difference between the cosine similarity of the embedding vectors of the two compounds and the CMap score of the two compounds. ReSimNet can find pairs of compounds that are similar in response even though they may have dissimilar structures. In our quantitative evaluation, ReSimNet outperformed the baseline machine learning models. The ReSimNet ensemble model achieves a Pearson correlation of 0.518 and a precision@1% of 0.989. In addition, in the qualitative analysis, we tested ReSimNet on the ZINC15 database and showed that ReSimNet successfully identifies chemical compounds that are relevant to a prototype drug whose mechanism of action is known.

Pipeline

Full Pipeline

Requirements

Git Clone & Initial Setting

Clone our source codes and make folders to save data you need.

# clone the source code on your directory
$ git clone https://github.com/dmis-lab/ReSimNet
$ cd ReSimNet

# make folder to save and load your data
$ cd tasks
$ mkdir -p data

# make folder to save and load your model
cd ../../..
$ mkdir -p results

Download Files You Need to Run ReSimNet

Dataset for Training

Pre-Trained Models

All 10 Models for Ensemble

Example Input Pairs

  • examples.csv (244byte)
    Save this file to ./ReSimNet/tasks/data/pairs/examples.csv

Click the link ""Download the FingerPrint Respresentation"".

Training the ReSimNet

# Train for new model.
$ bash train.sh

# Train for the new ensemble models.
$ bast train_ensemble.sh

CMap Score Prediction using ReSimNet

For your own fingerprint pairs, ReSimNet provides a predicted CMap score for each pair. Running download.sh and predict.sh will first download pretrained ReSimNet with sample datasets, and save a result file for predicted CMap scores.

# Save scores of sample pair data
$ bash predict_example.sh

Input Fingerprint pair file must be a .csv file in which every row consists of two columns denoting two Fingerprints of each pair. Please, place files under './tasks/data/pairs/'.

# Sample Fingerprints (./tasks/data/pairs/examples.csv)
id1,id2
BRD-K43164539,BRD-A45333398
BRD-K83289131,BRD-K82484965
BRD-K06817181,BRD-A41112154
BRD-K06817181,BRD-K67977190
BRD-K06817181,BRD-A87125127
BRD-K68095457,BRD-K38903228
BRD-K68095457,BRD-K01902415
BRD-K68095457,BRD-K06817181

Predicted CMap scores will be saved at each row of a file './results/input-pair-file.model-name.csv'.

# Sample results (./results/examples.csv.ReSimNet7.csv')
prediction
0.9146181344985962
0.9301251173019409
0.8519644737243652
0.9631381034851074
0.7272981405258179

CMap Score Prediction of ZINC using ReSimNet

# Save scores of sample pair data
$ bash predict_zinc.sh

Click the link ""Download the ZINC files"".

  • zinc-test.zip (8KB)
    Save this file to ./ReSimNet/tasks/data/pairs_zinc/zinc-test.zip and unzip.
# Sample Zinc files (./tasks/data/pairs_zinc/zinc-test/AACA.csv)
,smiles,zinc_id,inchikey,mwt,logp,reactive,purchasable,tranche_name,features,fingerprint
17,CC1NNC(=S)NN1,ZINC000018204142,BYIXAEICDPEBOP-UHFFFAOYSA-N,132.192,-1.181,10,50,AACA,,00000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Click the link ""Download the example pairings"".

  • example_drugs.csv (7KB)
    Save this file to ./ReSimNet/tasks/data/pairs_zinc/example_drugs.csv
# Sample example files (./tasks/data/pairs_zinc/example_drugs.csv)
pair,fp
ZINC18279871,00000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000100000000000000000000000000000000000000000000000000100000000000010010000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000010000000000000000000000000000000000000000000000000000000000000000000000100000000000100000000000000000000000000000000000010000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000001000000000000000000000000000000000000010000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000010000000001000000000000000000000000000000000000000000000001000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000001000000000000000001000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000100000000000001000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100001000000000000000000000000000001001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
ZINC3938668,00000100000000000000000000000100000000000000000000000000000000000000000000100000100000001000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000100010000010000000000000000000000000000000000000000000000000000000000000100001001000000000000000000000000000101000010000000010000000000000000000000000000000001000000000000000000000000000000001000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000010000000000000000000000001000100000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000101000000000100000000001000000000000000000000000000000000000010000010000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000100000000000000000100000000100000000000000010000100000000000000000100000000000000000000000000000100000000000000100000000100000000001000000000000000001001000000000000000000000000000100000001000000000000000001010000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000001000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000010000000000000000000000100000000000000000010100000000000000000000000000000000000000000000000000010001000000100000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000001000000001000000010000000010000000000000000000000000000000000000010000000000000000000000100001000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000011000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000100000000000000010000000000000000000000000000000000010000000000000

Predicted CMap scores will be saved at each row of a file './results/input-pair-file.model-name.csv'.

# Sample results (./results/AACA.csv.ReSimNet7.csv')
pair1,pair2,prediction
ZINC000018204142,ZINC18279871,0.90729403
ZINC000018204142,ZINC3938668,0.91043824

Liscense

Apache License 2.0