A Pytorch Implementation of paper
ReSimNet: Drug Response Similarity Prediction using Siamese Neural Networks
Jeon and Park et al., 2018
Traditional drug discovery approaches identify a target for a disease and find a compound that binds to the target. In this approach, structures of compounds are considered as the most important features because it is assumed that similar structures will bind to the same target. Therefore, structural analogs of the drugs that bind to the target are selected as drug candidates. However, even though compounds are not structural analogs, they may achieve the desired response. A new drug discovery method based on drug response, which can complement the structure-based methods, is needed.
We implemented Siamese neural networks called ReSimNet that take as input two chemical compounds and predicts the CMap score of the two compounds, which we use to measure the transcriptional response similarity of the two counpounds. ReSimNet learns the embedding vector of a chemical compound in a transcriptional response space. ReSimNet is trained to minimize the difference between the cosine similarity of the embedding vectors of the two compounds and the CMap score of the two compounds. ReSimNet can find pairs of compounds that are similar in response even though they may have dissimilar structures. In our quantitative evaluation, ReSimNet outperformed the baseline machine learning models. The ReSimNet ensemble model achieves a Pearson correlation of 0.518 and a precision@1% of 0.989. In addition, in the qualitative analysis, we tested ReSimNet on the ZINC15 database and showed that ReSimNet successfully identifies chemical compounds that are relevant to a prototype drug whose mechanism of action is known.
- Install cuda-8.0
- Install cudnn-v5.1
- Install Pytorch 0.3.0
- Install Numpy 1.61.1
- Python version >= 3.4.3 is required
Clone our source codes and make folders to save data you need.
# clone the source code on your directory
$ git clone https://github.com/dmis-lab/ReSimNet
$ cd ReSimNet
# make folder to save and load your data
$ cd tasks
$ mkdir -p data
# make folder to save and load your model
cd ../../..
$ mkdir -p results
- ReSimNet-Dataset.pkl (43MB)
Save this file to ./ReSimNet/tasks/data/ReSimNet-Dataset.pkl
- ReSimNet-model-best.zip (12MB)
Save this file to ./ReSimNet/results/ReSimNet-models-best.zip and Unzip.
- ReSimNet-models-ensenble.zip (117MB)
Save this file to ./ReSimNet/results/ReSimNet-model-ensemble.zip and Unzip.
- examples.csv (244byte)
Save this file to ./ReSimNet/tasks/data/pairs/examples.csv
- pertid2fingerprint.pkl (10MB)
Save this file to ./ReSimNet/tasks/data/pertid2fingerprint.pkl
# Train for new model.
$ bash train.sh
# Train for the new ensemble models.
$ bast train_ensemble.sh
For your own fingerprint pairs, ReSimNet provides a predicted CMap score for each pair. Running download.sh and predict.sh will first download pretrained ReSimNet with sample datasets, and save a result file for predicted CMap scores.
# Save scores of sample pair data
$ bash predict_example.sh
Input Fingerprint pair file must be a .csv file in which every row consists of two columns denoting two Fingerprints of each pair. Please, place files under './tasks/data/pairs/'.
# Sample Fingerprints (./tasks/data/pairs/examples.csv)
id1,id2
BRD-K43164539,BRD-A45333398
BRD-K83289131,BRD-K82484965
BRD-K06817181,BRD-A41112154
BRD-K06817181,BRD-K67977190
BRD-K06817181,BRD-A87125127
BRD-K68095457,BRD-K38903228
BRD-K68095457,BRD-K01902415
BRD-K68095457,BRD-K06817181
Predicted CMap scores will be saved at each row of a file './results/input-pair-file.model-name.csv'.
# Sample results (./results/examples.csv.ReSimNet7.csv')
prediction
0.9146181344985962
0.9301251173019409
0.8519644737243652
0.9631381034851074
0.7272981405258179
# Save scores of sample pair data
$ bash predict_zinc.sh
- zinc-test.zip (8KB)
Save this file to ./ReSimNet/tasks/data/pairs_zinc/zinc-test.zip and unzip.
# Sample Zinc files (./tasks/data/pairs_zinc/zinc-test/AACA.csv)
,smiles,zinc_id,inchikey,mwt,logp,reactive,purchasable,tranche_name,features,fingerprint
17,CC1NNC(=S)NN1,ZINC000018204142,BYIXAEICDPEBOP-UHFFFAOYSA-N,132.192,-1.181,10,50,AACA,,00000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
- example_drugs.csv (7KB)
Save this file to ./ReSimNet/tasks/data/pairs_zinc/example_drugs.csv
# Sample example files (./tasks/data/pairs_zinc/example_drugs.csv)
pair,fp
ZINC18279871,00000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000100000000000000000000000000000000000000000000000000100000000000010010000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000010000000000000000000000000000000000000000000000000000000000000000000000100000000000100000000000000000000000000000000000010000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000001000000000000000000000000000000000000010000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000010000000001000000000000000000000000000000000000000000000001000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000001000000000000000001000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000100000000000001000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100001000000000000000000000000000001001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
ZINC3938668,00000100000000000000000000000100000000000000000000000000000000000000000000100000100000001000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000100010000010000000000000000000000000000000000000000000000000000000000000100001001000000000000000000000000000101000010000000010000000000000000000000000000000001000000000000000000000000000000001000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000010000000000000000000000001000100000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000101000000000100000000001000000000000000000000000000000000000010000010000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000100000000000000000100000000100000000000000010000100000000000000000100000000000000000000000000000100000000000000100000000100000000001000000000000000001001000000000000000000000000000100000001000000000000000001010000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000001000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000010000000000000000000000100000000000000000010100000000000000000000000000000000000000000000000000010001000000100000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000001000000001000000010000000010000000000000000000000000000000000000010000000000000000000000100001000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000011000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000100000000000000010000000000000000000000000000000000010000000000000
Predicted CMap scores will be saved at each row of a file './results/input-pair-file.model-name.csv'.
# Sample results (./results/AACA.csv.ReSimNet7.csv')
pair1,pair2,prediction
ZINC000018204142,ZINC18279871,0.90729403
ZINC000018204142,ZINC3938668,0.91043824
Apache License 2.0