This codebase is developed on top of MDLM (Sahoo et.al, 2023).
conda create -n sedd python=3.9.18
conda activate sedd
bash env.sh
# install grelu
git clone https://github.com/Genentech/gReLU.git
cd gReLU
pip install .
All data and model weights can be downloaded from this link:
Save the downloaded file in BASE_PATH
.
The enhancer dataset used for this experiment is provided in BASE_PATH/mdlm/gosai_data
.
python main_gosai.py
The pretrained model weights are provided in BASE_PATH/mdlm/outputs_gosai/pretrained.ckpt
.
python train_oracle.py
The oracle for fine-tuning is provided in BASE_PATH/mdlm/outputs_gosai/lightning_logs/reward_oracle_ft.ckpt
; the oracle for evaluation is provided in BASE_PATH/mdlm/outputs_gosai/lightning_logs/reward_oracle_eval.ckpt
.
The oracle for binary classification on chromatin accessibility (ATAC-Acc) is provided in BASE_PATH/mdlm/gosai_data/binary_atac_cell_lines.ckpt
.
python finetune_reward_bp.py --name test
The fine-tuned model weights are provided in BASE_PATH/mdlm/reward_bp_results_final/finetuned.ckpt
See eval.ipynb
Change the base_path
in dataloader_gosai.py
, finetune_reward_bp.py
, oracle.py
, train_oracle.py
, eval.ipynb
to BASE_PATH
for saving data and models.
- The original dataset is provided by Gosai et al., 2023.
- The trained oracle is based on gReLU.