Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

88 - Quantile Normalize Default Inconsistency #89

Merged
merged 2 commits into from
Aug 21, 2024

Conversation

avrohomgottlieb
Copy link
Contributor

@avrohomgottlieb avrohomgottlieb commented Aug 15, 2024

This PR is related to Issue #88

Context

There are five steps associated with downloading Datasets in pyrefinebio:

  1. creating the Dataset
    -> dataset = Dataset(...)
  2. processing it
    -> dataset.process()
  3. waiting for it to finish
    -> dataset.check()
  4. downloading it
    -> dataset.download()
  5. extracting it (optional)
    ->dataset.extract()

There are three workflows which enable the downloading of datasets with pyrefinebio (after a token has been created):

  1. Using the CLI - refinebio download-dataset ...
    -> This takes care of all of the steps in one commands.
  2. Using the exposed high level functions - pyrefinebio.download_dataset(...)
    -> This is the recommended way to use the library programmatically, taking care of all steps in one command.
  3. Using the exposed classes manually - dataset = pyrefinebio.Dataset(), dataset.add_samples(...), dataset.process(), ...
    -> This is called the Advanced Dataset Usage in the documentation, where each of the five steps must be performed manually.

Conclusion

Workflows 1 and 2 both have download commands which call high_level_functions::download_dataset under the hood. The first thing that these commands do is build a Dataset object by calling the class' constructor with the arguments passed into high_level_function::download_dataset by the user.

high_level_functions::download_dataset has the default parameter of skip_quantile_normalization=False. When the Dataset instance is created, this parameter is negated to properly set the class' quantile_normalize attribute, as follows quantile_normalize=(not skip_quantile_normalization). If --skip-quantile-normalization is passed to the CLI command, or skip_quantile_normalization=True is passed to pyrefinebio.download_dataset(...), then the default is overridden, and the value passed is as follows Dataset(..., quantile_normalization=False, ...).

When using workflow 3, however, the constructor assigns all parameters with a default None value (including quantile_normalize=None), ultimately making quantile_normalize falsy.

In conclusion, there is an inconsistency between these workflows, namely that quantile_normalize defaults to True in workflows 1 and 2, and defaults to None (falsy) in workflow 3.

This can be addressed by either:

  • Updating the documentation to clarify this.
  • Changing how defaults to defined parameters are assigned in the Dataset constructor.

In the meantime I've made a few edits to documentation to clarify how skip_quantile_normalization works with workflows 1 and 2.

@avrohomgottlieb avrohomgottlieb changed the title update skip-quantile-normalization documentation to improve clarity 88 - Quantile Normalize Default Inconsistency Aug 15, 2024
@avrohomgottlieb avrohomgottlieb self-assigned this Aug 15, 2024
Copy link
Contributor

@davidsmejia davidsmejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@avrohomgottlieb avrohomgottlieb linked an issue Aug 21, 2024 that may be closed by this pull request
@avrohomgottlieb
Copy link
Contributor Author

Re: initial PR comment

On the refinebio side, QN is automatically performed unless it is explicitly told to not be performed. As such, when Datasets are built manually in workflow 3, even if QN is not explicitly indicated, it is still performed by default, meaning that no logical error exists.

@avrohomgottlieb avrohomgottlieb merged commit 770d3f6 into master Aug 21, 2024
5 checks passed
@avrohomgottlieb avrohomgottlieb deleted the avrohom/88-possible-qn-skipping-bug branch August 21, 2024 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate bug with QN being skipped
2 participants