Skip to content

Latest commit

 

History

History
89 lines (45 loc) · 3.31 KB

README.md

File metadata and controls

89 lines (45 loc) · 3.31 KB

hdpy

Repository for "HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations" by Derek Jones, Jonathan E. Allen, Xiaohua Zhang, Behnam Khaleghi, Jaeyoung Kang, Weihong Xu, Niema Moshiri, Tajana S. Rosing

  • ecfp/: contains implementations of ecfp encoding algorithms

  • molehd/: contains implementations of the MoleHD (Ma et.al) SMILES-based encoding algorithms

  • prot_lig/: contains implementations of HDC encoding for protein drug interactions

  • selfies/: contains implementaions of encoding algorithms for SELFIES strings

  • configs/: contains configuration files for the various HDC models

  • argparser.py: contains logic for the arguments used to drive the programs in this project

  • data_utils.py: contains logic for dataloading

  • encode_utils.py: contains general encoding logic

  • main.py: driver program for HDBind experiments

  • metrics.py: contains logic for the various metrics used in the work

  • model.py: contains logic for the HDC model implementations themselves

  • run_timings.py: contains logic to estimate timing information for various processes such as ECFP computation

  • sdf_to_smiles.py: utility script to convert collections of molecules

  • utils.py: additional utility functions

Getting started

In order to install the required dependencies, please first install anaconda or miniconda.

To install hdpy (from root directory):

conda create --name hdpy --file hdpy_env_release.yml

python -m pip install .

Separately installing dependencies

Separately you can do the following:

Install the deepchem library

#> conda install -c conda-forge deepchem #using conda but can refer to the docs for your specific install

pip install --pre deepchem #conda install doesn't work currently, use nightly build

Next, install PyTorch. This project does not make use of torchvision or torchaudio so we'll skip that (feel free to do so if inclined)

conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia

Next, install rdkit

conda install -c conda-forge rdkit

Next, ray

pip install ray==2.7.0rc0

Next, SmilesPE

pip install SmilesPE

Next, SELFIES

pip install selfies

Running the benchmarks

To run the MoleculeNet training and testing script:

python main_molnet.py --dataset bbbp --split-type scaffold --n-trials 10 --random-state 5 --batch-size 128 --num-workers 8 --config configs/hdbind-rp-ecfp-1024-1.yml

To run the LIT-PCBA training and testing script:

python main_litpcba.py --dataset lit-pcba --split-type ave --n-trials 10 --random-state 5 --batch-size 128 --num-workers 8 --config configs/hdbind-rp-ecfp-1024-1.yml

Getting Involved

Contact Derek Jones for any questions/collaboration to expand the project! [email protected], [email protected]

Citation

Jones, D., Allen, J. E., Zhang, X., Khaleghi, B., Kang, J., Xu, W., Moshiri, N., & Rosing, T. S. (2023, March 27). HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations. arXiv. http://arxiv.org/abs/2303.15604