Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to featurize docking poses from already docked result files to use Open-ComBind? #38

Open
Sowmya-R-Krishnan opened this issue Apr 4, 2024 · 1 comment

Comments

@Sowmya-R-Krishnan
Copy link

Dear team,

Thank you very much for providing Open-ComBind as a command-line tool for docking pose selection. I have results from a previous docking job using GNINA with CNN-scoring. I have 10 proteins (PDB files already prepared) and 10 docking poses for each ligand. I would like to use Open-ComBind to finalize the docking pose for further analysis.
Based on exploring the help options available for each module of the tool, I realized that it follows a standard file path nomenclature like structure/proteins, structure/ligands etc. I tried using the featurize module with my docking result (sdf file) and it confirmed my fears - I am unable to figure out how to change the path names as per the nomenclature followed in Open-ComBind. Given that I have the following data in hand, can you kindly help me with the path and filename settings to be followed to run featurization and pose selection?

  1. PDB files of 10 proteins (already prepared for docking with GNINA).
  2. PDB files of crystal ligands separated from the co-crystal structures for grid box setting.
  3. Multi-SDF files for several ligands with 10 poses per file.

Also, while trying to rectify the error with the featurization step, I saw that in one of the codes (features/ifp.py), the protein filename has been defined/built as shown below:

prot_bname = input_file.split('-to-')[-1]
prot_fname = re.sub('-docked.*\.sdf(\.gz)?','_prot.pdb',prot_bname)
prot_file = f"structures/proteins/{prot_fname}"

Here, the input filename is expected to have a -to- phrase, the docking result file is not expected to have any preceding filepaths (since the next line uses structures/proteins/ as the hard-coded path to access the protein file, and the docking output file itself should be with the suffix -docked.sdf or -docked.sdf.gz. Is it possible to provide a detailed README or usage manual kind of file to understand these requirements beforehand and use Open-ComBind effectively? I think re-running all docking jobs through this pipeline again will not be possible for me. It will be great if there is a way to use the results directly here. Thank you for taking the time to read this and hoping to hear from the team soon.

Error from featurizer when a path was prefixed to the docking output filename
Screenshot from 2024-04-04 14-59-18

With regards,
Sowmya

@drewnutt
Copy link
Owner

I have added the ability to add additional keyword arguments during featurization (9493a73) that allow you to predefine the docking protein and protein file directory.

When the docking protein is defined, it no longer assumes anything about your docked file naming.

This can all be specified in the CLI or with the python API.

This basic implementation is limited to only 1 docking protein per featurization.

Let me know if this does not satisfy your constraints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants