Repository for "HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations" by Derek Jones, Jonathan E. Allen, Xiaohua Zhang, Behnam Khaleghi, Jaeyoung Kang, Weihong Xu, Niema Moshiri, Tajana S. Rosing
-
ecfp/: contains implementations of ecfp encoding algorithms
-
molehd/: contains implementations of the MoleHD (Ma et.al) SMILES-based encoding algorithms
-
prot_lig/: contains implementations of HDC encoding for protein drug interactions
-
selfies/: contains implementaions of encoding algorithms for SELFIES strings
-
configs/: contains configuration files for the various HDC models
-
argparser.py: contains logic for the arguments used to drive the programs in this project
-
data_utils.py: contains logic for dataloading
-
encode_utils.py: contains general encoding logic
-
main.py: driver program for HDBind experiments
-
metrics.py: contains logic for the various metrics used in the work
-
model.py: contains logic for the HDC model implementations themselves
-
run_timings.py: contains logic to estimate timing information for various processes such as ECFP computation
-
sdf_to_smiles.py: utility script to convert collections of molecules
-
utils.py: additional utility functions
In order to install the required dependencies, please first install anaconda or miniconda.
To install hdpy (from root directory):
conda create --name hdpy --file hdpy_env_release.yml
python -m pip install .
Separately you can do the following:
Install the deepchem library
#> conda install -c conda-forge deepchem #using conda but can refer to the docs for your specific install
pip install --pre deepchem #conda install doesn't work currently, use nightly build
Next, install PyTorch. This project does not make use of torchvision or torchaudio so we'll skip that (feel free to do so if inclined)
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia
Next, install rdkit
conda install -c conda-forge rdkit
Next, ray
pip install ray==2.7.0rc0
Next, SmilesPE
pip install SmilesPE
Next, SELFIES
pip install selfies
To run the MoleculeNet training and testing script:
python main_molnet.py --dataset bbbp --split-type scaffold --n-trials 10 --random-state 5 --batch-size 128 --num-workers 8 --config configs/hdbind-rp-ecfp-1024-1.yml
To run the LIT-PCBA training and testing script:
python main_litpcba.py --dataset lit-pcba --split-type ave --n-trials 10 --random-state 5 --batch-size 128 --num-workers 8 --config configs/hdbind-rp-ecfp-1024-1.yml
Contact Derek Jones for any questions/collaboration to expand the project! [email protected], [email protected]
Jones, D., Allen, J. E., Zhang, X., Khaleghi, B., Kang, J., Xu, W., Moshiri, N., & Rosing, T. S. (2023, March 27). HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations. arXiv. http://arxiv.org/abs/2303.15604