This is the readme file that contains the guidelines and information about the compilation the code of the following paper
Paper Name:- Ensembling of Gene Clusters utilizing Deep Learning and Protein-protein Interaction Information
- Authors: Pratik Dutta1, Sriparna Saha1, Sraansh Chopra1 and Varnika Miglani2
- Affiliation: 1Indian Institute of Technology Patna, India, 2Samsung R&D Institute India-Noida
- Accepted(8th May, 2019): IEEE/ACM Transactions on Computational Biology and Bioinformatics
- Corresponding Author: Pratik Dutta ([email protected] )
If you find this code or the article useful, consider citing our work:
@article{dutta2019ensembling,
title={Ensembling of Gene Clusters utilizing Deep Learning and Protein-protein Interaction Information},
author={Dutta, Pratik and Saha, Sriparna and Chopra, Saraansh and Miglani, Varnika},
journal={IEEE/ACM transactions on computational biology and bioinformatics},
year={2019},
publisher={IEEE}
}
This folder contains five preprocessed datasets which are used as the input of the proposed MOO-based clustering algorithm.
This folder contains the python code of the proposed MOO-based clustering. Use terminal
(for linux users) and goto the '1. MOO-based clustering' folder. Then complie the code by following commands
cd examples
Write the PATH DESCRIPTION of the dataset
in line number 27 of the main.py
python main.py <initial_population_size> <number_of_generation>
Output: Generate a file named non_dominated_solutions.txt
that contains all the cluster information.
This folder contains .ipynb
(Jupyter Notebok) files for creating a set of disconnected walks which further used to generate the labelled dataset
. This labelled dataset
is used as the training dataset for the proposed neural network models. The main components of the folder are
BCLL_FuLL_Labels
labels of the all non-dominated solutions for B-CLL datasetalgorithm1.ipynb
Thisjupyter notebook
file takes all non-dominated solutions as the input and gives weighted coincidence matrix. This coincidence matrix is fed toalgorithm 1
and it gives a set of disconnected walks. The set of the disconnected is save indisconnected_walk.txt
create_train_test.ipynb
Thisjupyter notebook
generateslabeled_file.txt
andunlabeled_file.txt
.
This folder contains .ipynb
files for training model which are used to generate final consensus partitionings for approach 2. For better use you can use jupyter notebook
to run the files. The developed deep learning models are
NN Model.ipynb
PyTorch implementation of the proposed multi-layer perceptron with two hidden layersCNN Model.ipynb
PyTorch implementation of the proposed convolutional neural networkLabel Script.ipynb
is used to combine the originally labeled gene expressions and model labeled gene expressions into one file for further metric evaluations (BHI and BSI).BHI_labels_CNN.txt
andBHI_labels_NN_2Hidden.txt
are the labels assigned to the unlabeled gene expressions by the trained models plus the originally labeled gene expression profiles.trained_model_10000_epochs.pt
,trained_model_10000_epochs_2.pt
,trained_model_10000_epochs_3.pt
,trained_model_CNN_10000_epochs_1.pt
,trained_model_CNN_10000_epochs_2.pt
files are the weights and bias matrices that are obtained after training the above models.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.