88 - Quantile Normalize Default Inconsistency #89
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is related to Issue #88
Context
There are five steps associated with downloading
Datasets
inpyrefinebio
:Dataset
->
dataset = Dataset(...)
->
dataset.process()
->
dataset.check()
->
dataset.download()
->
dataset.extract()
There are three workflows which enable the downloading of datasets with
pyrefinebio
(after a token has been created):refinebio download-dataset ...
-> This takes care of all of the steps in one commands.
pyrefinebio.download_dataset(...)
-> This is the recommended way to use the library programmatically, taking care of all steps in one command.
dataset = pyrefinebio.Dataset()
,dataset.add_samples(...)
,dataset.process()
, ...-> This is called the
Advanced Dataset Usage
in the documentation, where each of the five steps must be performed manually.Conclusion
Workflows 1 and 2 both have
download
commands which callhigh_level_functions::download_dataset
under the hood. The first thing that these commands do is build aDataset
object by calling the class' constructor with the arguments passed intohigh_level_function::download_dataset
by the user.high_level_functions::download_dataset
has the default parameter ofskip_quantile_normalization=False
. When theDataset
instance is created, this parameter is negated to properly set the class'quantile_normalize
attribute, as followsquantile_normalize=(not skip_quantile_normalization)
. If--skip-quantile-normalization
is passed to the CLI command, orskip_quantile_normalization=True
is passed topyrefinebio.download_dataset(...)
, then the default is overridden, and the value passed is as followsDataset(..., quantile_normalization=False, ...)
.When using workflow 3, however, the constructor assigns all parameters with a default
None
value (includingquantile_normalize=None
), ultimately makingquantile_normalize
falsy.In conclusion, there is an inconsistency between these workflows, namely that
quantile_normalize
defaults toTrue
in workflows 1 and 2, and defaults toNone
(falsy) in workflow 3.This can be addressed by either:
Dataset
constructor.In the meantime I've made a few edits to documentation to clarify how
skip_quantile_normalization
works with workflows 1 and 2.