Add support for handling PDBs with multiple models #101

a-r-j · 2022-04-27T16:16:40Z

Description

Adds a collection of methods to extract and label models from PDB files containing multiple models.

Related issues or pull requests

#50 #100

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./biopandas/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under biopandas/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./biopandas -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./biopandas

pep8speaks · 2022-04-27T16:16:44Z

Hello @a-r-j! Thanks for updating this PR.

In the file biopandas/mol2/pandas_mol2.py:

Line 237:13: W503 line break before binary operator
Line 238:13: W503 line break before binary operator

In the file biopandas/pdb/pandas_pdb.py:

Line 250:13: W503 line break before binary operator
Line 251:13: W503 line break before binary operator
Line 315:22: E701 multiple statements on one line (colon)
Line 315:22: E231 missing whitespace after ':'
Line 315:23: E225 missing whitespace around operator
Line 315:23: E999 SyntaxError: invalid syntax
Line 325:17: W503 line break before binary operator
Line 326:17: W503 line break before binary operator
Line 327:17: W503 line break before binary operator
Line 332:17: W503 line break before binary operator
Line 333:17: W503 line break before binary operator
Line 334:17: W503 line break before binary operator
Line 382:60: E203 whitespace before ':'
Line 627:89: E501 line too long (96 > 88 characters)
Line 654:21: W503 line break before binary operator
Line 669:21: W503 line break before binary operator
Line 683:21: W503 line break before binary operator

Comment last updated at 2022-05-11 23:21:47 UTC

review-notebook-app · 2022-04-28T14:04:48Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

rasbt

Thanks a lot for the PR, looks great!

rasbt · 2022-05-01T14:35:05Z

biopandas/pdb/pandas_pdb.py

@@ -5,23 +5,27 @@
 # License: BSD 3 clause
 # Project Website: http://rasbt.github.io/biopandas/
 # Code Repository: https://github.com/rasbt/biopandas
+from __future__ import annotations


I don't think this is necessary. I think type annotations were introduced in Python 3.6. If someone has an older version of Python, they won't be able to run it anyways because of the f-strings?

No worries, I can take care of removing it.

a-r-j · 2022-05-01T14:39:21Z

biopandas/pdb/pandas_pdb.py

+            df.df["ANISOU"] = df.df["ANISOU"].loc[df.df["ANISOU"]["model_id"] == model_index]
+        return df
+
+    def get_models(self, model_indices: List[int]) -> PandasPdb:


I thin the from future import annotations is required from this type hint (i didn't want to hassle with defining a typevar - if you remove the import I think this needs to be removed too.

I see. Let's leave it then

a-r-j · 2022-05-01T14:39:34Z

biopandas/pdb/pandas_pdb.py

+            self.df["ANISOU"]["model_id"] = idx_map
+        return self
+
+    def get_model(self, model_index: int) -> PandasPdb:


a-r-j · 2022-05-01T14:51:05Z

Ooh, before you merge it might be worth checking out how to provide a more informative error for structures that only contain a single structure.

Eg

from biopandas.pdb import PandasPdb

df = PandasPdb().fetch_pdb('3eiy')

a = df.get_model_start_end()

returns an empty df

and so df.get_model(1)

results in:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/atj39/github/biopandas/dev.ipynb Cell 3' in <cell line: 1>()
----> [1](vscode-notebook-cell:/home/atj39/github/biopandas/dev.ipynb#ch0000017?line=0) df.get_model(1)

File ~/github/biopandas/biopandas/pdb/pandas_pdb.py:647, in PandasPdb.get_model(self, model_index)
    [634](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=633) """Returns a new PandasPDB object with the dataframes subset to the given model index.
    [635](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=634) 
    [636](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=635) Parameters
   (...)
    [643](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=642) pandas_pdb.PandasPdb : A new PandasPdb object containing the structure subsetted to the given model.
    [644](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=643) """
    [646](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=645) df = deepcopy(self)
--> [647](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=646) df.label_models()
    [649](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=648) if "ATOM" in df.df.keys():
    [650](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=649)     df.df["ATOM"] = df.df["ATOM"].loc[df.df["ATOM"]["model_id"] == model_index]

File ~/github/biopandas/biopandas/pdb/pandas_pdb.py:614, in PandasPdb.label_models(self)
    [612](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=611) if "ATOM" in self.df.keys():
    [613](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=612)     pdb_df = self.df["ATOM"]
--> [614](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=613)     idx_map = np.piecewise(
    [615](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=614)         np.zeros(len(pdb_df)),
    [616](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=615)         [(pdb_df.line_idx.values >= start_idx) & (pdb_df.line_idx.values <= end_idx) for start_idx, end_idx in zip(idxs.start_idx.values, idxs.end_idx.values)], idxs.model_idx)
    [617](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=616)     self.df["ATOM"]["model_id"] = idx_map
    [618](file:///home/atj39/github/biopandas/biopandas/pdb/pandas_pdb.py?line=617) # LABEL HETATMS

File <__array_function__ internals>:180, in piecewise(*args, **kwargs)

File ~/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py:708, in piecewise(x, condlist, funclist, *args, **kw)
    [704](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=703) n2 = len(funclist)
    [706](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=705) # undocumented: single condition is promoted to a list of one condition
    [707](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=706) if isscalar(condlist) or (
--> [708](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=707)         not isinstance(condlist[0], (list, ndarray)) and x.ndim != 0):
    [709](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=708)     condlist = [condlist]
    [711](file:///home/atj39/anaconda3/envs/biopandas/lib/python3.8/site-packages/numpy/lib/function_base.py?line=710) condlist = asarray(condlist, dtype=bool)

IndexError: list index out of range

rasbt · 2022-05-01T14:52:26Z

So one question I have is that model_idx is an integer column and model_id is a float column. Is the float type intentional? So that people can use model Ids like 1.0, 1.1, 1.2, 2.0, 2.1, ...?

a-r-j · 2022-05-01T14:55:01Z

Good spot! I think they should all be integers and it's an oversight from me

rasbt · 2022-05-01T15:17:30Z

Eg

from biopandas.pdb import PandasPdb

df = PandasPdb().fetch_pdb('3eiy')

a = df.get_model_start_end()

returns an empty df

and so df.get_model(1)

results in:

...

Does df.get_model(0) work in this case? I mean we could provide a custom error message for these cases, but I think it might be fine to just have the default pandas indexing errors here.

a-r-j · 2022-05-01T15:20:26Z

It doesn't, no, as the labelling function breaks before any selections are made.

rasbt · 2022-05-01T15:25:49Z

I see. The question, is, do we want this to work? E.g., having always model_idx=1 no matter whether it's 1 or more models?

a-r-j · 2022-05-01T15:38:45Z

Easy fix, it handles single model structures now by mapping all the lines in the PDB file to model_idx 1

a-r-j · 2022-05-08T13:59:52Z

@rasbt Anything you need from me to get these PRs merged? 😁

rasbt · 2022-05-08T20:55:50Z

Ahh, sorry was traveling and forgot to follow-up. Let me check this again properly once I am home tomorrow at my main computer!

rasbt · 2022-05-11T23:22:34Z

This looked all good to me! Just reformatted biopandas with black to fix the style issues, hence so many changes. Just wanted to do it for the new file but then I thought why not doing it for the whole library.

Add support for handling PDBs with multiple models

67abce1

a-r-j added 2 commits April 27, 2022 17:23

update changelog

1f9a8f5

lint tests

cb08997

resolve merge conflicts

5f967d9

rasbt approved these changes May 1, 2022

View reviewed changes

a-r-j commented May 1, 2022

View reviewed changes

pep8 fixed and documentation link

c1ac012

use integer model ids in df labelling

ca781bc

handle indexing structures containing 1 model

2132d3b

rasbt added 2 commits May 11, 2022 18:21

add updated pep8 speaks

6559dc5

format biopandas with black

24172cd

rasbt merged commit 45ce75e into BioPandas:main May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for handling PDBs with multiple models #101

Add support for handling PDBs with multiple models #101

a-r-j commented Apr 27, 2022 •

edited

Loading

pep8speaks commented Apr 27, 2022 •

edited

Loading

review-notebook-app bot commented Apr 28, 2022

rasbt left a comment

rasbt May 1, 2022

rasbt May 1, 2022

a-r-j May 1, 2022

rasbt May 1, 2022

a-r-j May 1, 2022

a-r-j commented May 1, 2022 •

edited

Loading

rasbt commented May 1, 2022 •

edited

Loading

a-r-j commented May 1, 2022 •

edited

Loading

rasbt commented May 1, 2022

a-r-j commented May 1, 2022

rasbt commented May 1, 2022

a-r-j commented May 1, 2022

a-r-j commented May 8, 2022

rasbt commented May 8, 2022

rasbt commented May 11, 2022

Add support for handling PDBs with multiple models #101

Add support for handling PDBs with multiple models #101

Conversation

a-r-j commented Apr 27, 2022 • edited Loading

Description

Related issues or pull requests

Pull Request Checklist

pep8speaks commented Apr 27, 2022 • edited Loading

Comment last updated at 2022-05-11 23:21:47 UTC

review-notebook-app bot commented Apr 28, 2022

rasbt left a comment

Choose a reason for hiding this comment

rasbt May 1, 2022

Choose a reason for hiding this comment

rasbt May 1, 2022

Choose a reason for hiding this comment

a-r-j May 1, 2022

Choose a reason for hiding this comment

rasbt May 1, 2022

Choose a reason for hiding this comment

a-r-j May 1, 2022

Choose a reason for hiding this comment

a-r-j commented May 1, 2022 • edited Loading

rasbt commented May 1, 2022 • edited Loading

a-r-j commented May 1, 2022 • edited Loading

rasbt commented May 1, 2022

a-r-j commented May 1, 2022

rasbt commented May 1, 2022

a-r-j commented May 1, 2022

a-r-j commented May 8, 2022

rasbt commented May 8, 2022

rasbt commented May 11, 2022

a-r-j commented Apr 27, 2022 •

edited

Loading

pep8speaks commented Apr 27, 2022 •

edited

Loading

a-r-j commented May 1, 2022 •

edited

Loading

rasbt commented May 1, 2022 •

edited

Loading

a-r-j commented May 1, 2022 •

edited

Loading