Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VariantUtils and TestVariantUtils #272

Merged
merged 14 commits into from
Aug 17, 2023
Merged

VariantUtils and TestVariantUtils #272

merged 14 commits into from
Aug 17, 2023

Conversation

TomDuraisingh
Copy link
Collaborator

Functions that help filter down data from CGAP.

Copy link
Member

@willronchetti willronchetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releasing some feedback you can work on in background - feel free to reply/reach out with any questions or clarifications

Comment on lines +16 to +17
# Uncomment this if needed
# self.health = get_health_page(key=self.creds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think good to remove at this point

Comment on lines 9 to 11
SEARCH_RARE_VARIANTS_BY_GENE = '/search/?samplegeno.samplegeno_role=proband&type=VariantSample\
&variant.csq_gnomadg_af_popmax.from=0&variant.csq_gnomadg_af_popmax.to=0.001\
&variant.genes.genes_most_severe_gene.display_title='
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer f-string for multi-line strings ie:

SEARCH_RARE_VARIANTS_BY_GENE = (f'/search/?samplegeno.samplegeno_role=proband&type=VariantSample'
                                f'&variant.csq_gnomadg_af_popmax.from=0&'
                                f'variant.csq_gnomadg_af_popmax.to=0.001&'
                                f'variant.genes.genes_most_severe_gene.display_title=')

Comment on lines 39 to 41
def sort_dict_in_descending_order(unsorted_dict):
"""sorts dictionary in descending value order"""
sorted_list = sorted(unsorted_dict.items(), key=lambda x: x[1], reverse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort dict how? answer: by value (key=lambda x: x[1]), so better name is sort_dict_by_value_descending, or better yet make the descending fact a default argument that can be changed so you can just have sort_dict_by_value. You can also make this one line by calling dict directly on line 41.

return dict(sorted_list)

def create_dict_of_mutations(self, gene):
"""cretes dictionary of specified gene and 10+ occuring positions with their number of variants"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you're creating a dict like this with specific structure, add info on the docstring on what the return value actually looks like.

return {gene: self.sort_dict_in_descending_order({k: v for k, v in mutation_dict.items() if v >= 10})}

@staticmethod
def return_json(file_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it is useful to use type annotations ie:

def return_json(file_name: str) -> dict:
...

See here for more info. This applies to all functions in this file.

Comment on lines 77 to 78
def create_url(self, gene):
"""returns a url to the variants at the most commonly mutated position of a gene"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_url isn't the best name I'd say... What you have in the docstring should probably be reflected in the name. As a general convention, for basic usage the function name should be as self explanatory as possible.


def create_url(self, gene):
"""returns a url to the variants at the most commonly mutated position of a gene"""
d = self.create_dict_from_json_file('10+sorted_msa_genes_and_mutations.json')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded filename likely not desired

Comment on lines 117 to 118
expected_result = vu.SEARCH_RARE_VARIANTS_BY_GENE + mock_gene + ('&variant.POS.from={pos}\
&variant.POS.to={pos}&sort=-DP')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely to be much cleaner with f-strings ie:

expected_result = (f'{vu.SEARCH_RARE_VARIANTS_BY_GENE}{mock_gene}'
                   f'&variant.POS.from={pos}'
                   f'&variant.POS.to={pos}&sort=-DP')

It's also not clear this is working as desired, seems like you may want pos actually injected and that isn't happening as written I don't think?

expected_result = vu.SEARCH_RARE_VARIANTS_BY_GENE + mock_gene + ('&variant.POS.from={pos}\
&variant.POS.to={pos}&sort=-DP')
assert result == expected_result
mock_create_dict_from_json_file.assert_called_once_with('10+sorted_msa_genes_and_mutations.json')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded file path again - will break tests in the workflows

mock_create_dict_from_json_file.assert_called_once_with('10+sorted_msa_genes_and_mutations.json')

@patch('dcicutils.variant_utils.VariantUtils.return_json')
def test_create_list_of_als_park_genes(self, mock_return_json):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests looks very similar to another above... See if you can figure out how to parametrize them together as one

@coveralls
Copy link

coveralls commented Aug 14, 2023

Pull Request Test Coverage Report for Build 5894264393

  • 46 of 54 (85.19%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.04%) to 77.519%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dcicutils/variant_utils.py 45 53 84.91%
Totals Coverage Status
Change from base Build 5825640827: 0.04%
Covered Lines: 8169
Relevant Lines: 10538

💛 - Coveralls

@TomDuraisingh TomDuraisingh merged commit 3972a56 into master Aug 17, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants