Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move scrublet out of external #2703

Merged
merged 54 commits into from
Feb 12, 2024
Merged

Move scrublet out of external #2703

merged 54 commits into from
Feb 12, 2024

Conversation

flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Oct 23, 2023

  • Release notes not necessary because:

Docs:

How to review this PR

I made tests quantitative before this PR, so note that the only change that modified tests is 42143d8.

In that PR, I make it so there are no longer any duplicate simulated doublets being created. This is necessary to be able to support any neighborhood detection algorithm. I also feel like it makes more sense. This is the only algorithmic change to upstream. Please use your own judgement to check if this makes sense to you.

TODO:

  • remove unused utils (plotting, preprocessing)
  • figure out what remaining utils to replace with ours
    • PCA/SVD:
      # Do PCA. Scrublet fits to the observed matrix and decomposes both observed
      # and simulated based on that fit, so we'll just let it do its thing rather
      # than trying to use Scanpy's PCA wrapper of the same functions.
    • mean_center, normalize_variance, zscore: small enough to be left alone I think
    • get_knn_graph: no need to have multiple implementations here, but our current implementation automatically calculates connectivities, which this doesn’t need Use our neighbors for scrublet #2723
  • refactor so the class API matches the way we use it
  • switch to dataclass for centralized attr defs
  • use non-deprecated random state style
  • use our logging instead of print statements

@flying-sheep flying-sheep added this to the 1.10.0 milestone Oct 23, 2023
@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (8986921) 72.85% compared to head (5e14231) 74.08%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2703      +/-   ##
==========================================
+ Coverage   72.85%   74.08%   +1.22%     
==========================================
  Files         111      115       +4     
  Lines       12383    12616     +233     
==========================================
+ Hits         9022     9346     +324     
+ Misses       3361     3270      -91     
Files Coverage Δ
scanpy/_utils/__init__.py 67.20% <100.00%> (+0.35%) ⬆️
scanpy/external/pl.py 32.63% <100.00%> (-17.74%) ⬇️
scanpy/external/pp/__init__.py 100.00% <100.00%> (ø)
scanpy/get/get.py 92.59% <ø> (ø)
scanpy/logging.py 95.12% <ø> (ø)
scanpy/neighbors/__init__.py 80.75% <100.00%> (+3.28%) ⬆️
scanpy/neighbors/_common.py 64.91% <100.00%> (ø)
scanpy/neighbors/_types.py 77.27% <100.00%> (ø)
scanpy/plotting/__init__.py 100.00% <100.00%> (ø)
scanpy/preprocessing/__init__.py 100.00% <100.00%> (ø)
... and 7 more

... and 2 files with indirect coverage changes

@flying-sheep
Copy link
Member Author

some preliminary stats about the three versions: https://gist.github.com/flying-sheep/ecd20674bdcd1839d0bb7c16f67794cf#file-scrublet-bench-ipynb

Copy link
Contributor

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I have lots of comments, but a lot of the code is copy-paste and I think that this approach is not a bad idea. I think a follow-up refactor is needed. I also am not a fan of the original design of editing things in place. So I would be happy to review a follow up PR but I think that can be done separately.

scanpy/preprocessing/_scrublet/pipeline.py Show resolved Hide resolved
scanpy/preprocessing/_scrublet/__init__.py Show resolved Hide resolved
scanpy/preprocessing/_scrublet/core.py Show resolved Hide resolved
scanpy/preprocessing/_scrublet/__init__.py Show resolved Hide resolved
scanpy/preprocessing/_scrublet/core.py Show resolved Hide resolved
scanpy/preprocessing/_scrublet/__init__.py Outdated Show resolved Hide resolved
scanpy/preprocessing/_scrublet/__init__.py Show resolved Hide resolved
scanpy/preprocessing/_scrublet/__init__.py Outdated Show resolved Hide resolved
@flying-sheep flying-sheep merged commit c091999 into master Feb 12, 2024
11 checks passed
@flying-sheep flying-sheep deleted the add-scrublet branch February 12, 2024 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Doublet filtering function
2 participants