Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for handling PDBs with multiple models #101

Merged
merged 9 commits into from
May 11, 2022

Conversation

a-r-j
Copy link
Contributor

@a-r-j a-r-j commented Apr 27, 2022

Description

Adds a collection of methods to extract and label models from PDB files containing multiple models.

Related issues or pull requests

#50 #100

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./biopandas/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under biopandas/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./biopandas -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./biopandas

@pep8speaks
Copy link

pep8speaks commented Apr 27, 2022

Hello @a-r-j! Thanks for updating this PR.

Line 237:13: W503 line break before binary operator
Line 238:13: W503 line break before binary operator

Line 250:13: W503 line break before binary operator
Line 251:13: W503 line break before binary operator
Line 315:22: E701 multiple statements on one line (colon)
Line 315:22: E231 missing whitespace after ':'
Line 315:23: E225 missing whitespace around operator
Line 315:23: E999 SyntaxError: invalid syntax
Line 325:17: W503 line break before binary operator
Line 326:17: W503 line break before binary operator
Line 327:17: W503 line break before binary operator
Line 332:17: W503 line break before binary operator
Line 333:17: W503 line break before binary operator
Line 334:17: W503 line break before binary operator
Line 382:60: E203 whitespace before ':'
Line 627:89: E501 line too long (96 > 88 characters)
Line 654:21: W503 line break before binary operator
Line 669:21: W503 line break before binary operator
Line 683:21: W503 line break before binary operator

Comment last updated at 2022-05-11 23:21:47 UTC

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR, looks great!

@@ -5,23 +5,27 @@
# License: BSD 3 clause
# Project Website: http://rasbt.github.io/biopandas/
# Code Repository: https://github.com/rasbt/biopandas
from __future__ import annotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary. I think type annotations were introduced in Python 3.6. If someone has an older version of Python, they won't be able to run it anyways because of the f-strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I can take care of removing it.

df.df["ANISOU"] = df.df["ANISOU"].loc[df.df["ANISOU"]["model_id"] == model_index]
return df

def get_models(self, model_indices: List[int]) -> PandasPdb:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thin the from future import annotations is required from this type hint (i didn't want to hassle with defining a typevar - if you remove the import I think this needs to be removed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Let's leave it then

self.df["ANISOU"]["model_id"] = idx_map
return self

def get_model(self, model_index: int) -> PandasPdb:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@a-r-j
Copy link
Contributor Author

a-r-j commented May 1, 2022

Ooh, before you merge it might be worth checking out how to provide a more informative error for structures that only contain a single structure.

Eg

from biopandas.pdb import PandasPdb

df = PandasPdb().fetch_pdb('3eiy')

a = df.get_model_start_end()

returns an empty df

and so df.get_model(1)

results in:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/atj39/github/biopandas/dev.ipynb Cell 3' in <cell line: 1>()
----> [1](vscode-notebook-cell:/home/atj39/github/biopandas/dev.ipynb#ch0000017?line=0) df.get_model(1)

File ~/github/biopandas/biopandas/pdb/pandas_pdb.py:647, in PandasPdb.get_model(self, model_index)
    [634](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=633) """Returns a new PandasPDB object with the dataframes subset to the given model index.
    [635](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=634) 
    [636](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=635) Parameters
   (...)
    [643](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=642) pandas_pdb.PandasPdb : A new PandasPdb object containing the structure subsetted to the given model.
    [644](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=643) """
    [646](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=645) df = deepcopy(self)
--> [647](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=646) df.label_models()
    [649](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=648) if "ATOM" in df.df.keys():
    [650](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=649)     df.df["ATOM"] = df.df["ATOM"].loc[df.df["ATOM"]["model_id"] == model_index]

File ~/github/biopandas/biopandas/pdb/pandas_pdb.py:614, in PandasPdb.label_models(self)
    [612](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=611) if "ATOM" in self.df.keys():
    [613](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=612)     pdb_df = self.df["ATOM"]
--> [614](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=613)     idx_map = np.piecewise(
    [615](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=614)         np.zeros(len(pdb_df)),
    [616](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=615)         [(pdb_df.line_idx.values >= start_idx) & (pdb_df.line_idx.values <= end_idx) for start_idx, end_idx in zip(idxs.start_idx.values, idxs.end_idx.values)], idxs.model_idx)
    [617](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=616)     self.df["ATOM"]["model_id"] = idx_map
    [618](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=617) # LABEL HETATMS

File <__array_function__ internals>:180, in piecewise(*args, **kwargs)

File ~/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py:708, in piecewise(x, condlist, funclist, *args, **kw)
    [704](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=703) n2 = len(funclist)
    [706](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=705) # undocumented: single condition is promoted to a list of one condition
    [707](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=706) if isscalar(condlist) or (
--> [708](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=707)         not isinstance(condlist[0], (list, ndarray)) and x.ndim != 0):
    [709](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=708)     condlist = [condlist]
    [711](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=710) condlist = asarray(condlist, dtype=bool)

IndexError: list index out of range

@rasbt
Copy link
Member

rasbt commented May 1, 2022

So one question I have is that model_idx is an integer column and model_id is a float column. Is the float type intentional? So that people can use model Ids like 1.0, 1.1, 1.2, 2.0, 2.1, ...?

Screen Shot 2022-05-01 at 9 44 50 AM
Screen Shot 2022-05-01 at 9 44 58 AM

@a-r-j
Copy link
Contributor Author

a-r-j commented May 1, 2022

Good spot! I think they should all be integers and it's an oversight from me

@rasbt
Copy link
Member

rasbt commented May 1, 2022

Eg

from biopandas.pdb import PandasPdb

df = PandasPdb().fetch_pdb('3eiy')

a = df.get_model_start_end()

returns an empty df

and so df.get_model(1)

results in:

...


Does df.get_model(0) work in this case? I mean we could provide a custom error message for these cases, but I think it might be fine to just have the default pandas indexing errors here.

@a-r-j
Copy link
Contributor Author

a-r-j commented May 1, 2022

It doesn't, no, as the labelling function breaks before any selections are made.

@rasbt
Copy link
Member

rasbt commented May 1, 2022

I see. The question, is, do we want this to work? E.g., having always model_idx=1 no matter whether it's 1 or more models?

@a-r-j
Copy link
Contributor Author

a-r-j commented May 1, 2022

Easy fix, it handles single model structures now by mapping all the lines in the PDB file to model_idx 1

@a-r-j
Copy link
Contributor Author

a-r-j commented May 8, 2022

@rasbt Anything you need from me to get these PRs merged? 😁

@rasbt
Copy link
Member

rasbt commented May 8, 2022

Ahh, sorry was traveling and forgot to follow-up. Let me check this again properly once I am home tomorrow at my main computer!

@rasbt
Copy link
Member

rasbt commented May 11, 2022

This looked all good to me! Just reformatted biopandas with black to fix the style issues, hence so many changes. Just wanted to do it for the new file but then I thought why not doing it for the whole library.

@rasbt rasbt merged commit 45ce75e into BioPandas:main May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants