Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader for seqFISH data #53

Merged
merged 26 commits into from
Jun 16, 2024
Merged

Reader for seqFISH data #53

merged 26 commits into from
Jun 16, 2024

Conversation

LLehner
Copy link
Member

@LLehner LLehner commented Jun 19, 2023

Add reader for seqFISH data.

@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2023

Codecov Report

Merging #53 (9f41cbe) into main (755d475) will increase coverage by 0.68%.
Report is 22 commits behind head on main.
The diff coverage is 44.66%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #53      +/-   ##
==========================================
+ Coverage   41.92%   42.60%   +0.68%     
==========================================
  Files          16       17       +1     
  Lines         854      946      +92     
==========================================
+ Hits          358      403      +45     
- Misses        496      543      +47     
Files Changed Coverage Δ
src/spatialdata_io/readers/merscope.py 25.00% <8.33%> (-2.03%) ⬇️
src/spatialdata_io/readers/seqfish.py 33.96% <33.96%> (ø)
src/spatialdata_io/__init__.py 100.00% <100.00%> (ø)
src/spatialdata_io/_constants/_constants.py 100.00% <100.00%> (ø)

Copy link
Member

@timtreis timtreis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor changes

src/spatialdata_io/__init__.py Outdated Show resolved Hide resolved
src/spatialdata_io/readers/seqfish.py Outdated Show resolved Hide resolved
src/spatialdata_io/readers/seqfish.py Outdated Show resolved Hide resolved
src/spatialdata_io/readers/seqfish.py Outdated Show resolved Hide resolved
@LLehner LLehner requested a review from timtreis June 20, 2023 12:31
@giovp
Copy link
Member

giovp commented Jun 20, 2023

remember to add it to docs/api.md and also add codex there ( I think it was missing from #34 ). Also please delete the "coming soon" section in the index.md . You can delete this whole thing

image

EDIT: and same thing to the README.md of the repo, please check that the index.md and the readme.md always correspond, thank you!

@timtreis
Copy link
Member

It seems impossible for me to download that 85 GB monster of seqFISh data to actually validate that reader, @giovp could you maybe try it out?

@LucaMarconato
Copy link
Member

@LLehner please test with napari before merging. I suggest to write to Zarr and read again the sdata object in case in which the performance are bad (since the first lazy representation is reading from a non-performant disk storage, while after you save and read Zarr and Parquet are used).

@LLehner
Copy link
Member Author

LLehner commented Jun 22, 2023

@LucaMarconato
seems to work, here is an example of points visualization:

sdata = seqfish(path=path)
interactive = Interactive(sdata)
interactive.run()

then selecting global>transcripts_1 results in:

image

@LucaMarconato
Copy link
Member

LucaMarconato commented Jul 15, 2023

Great work @LLehner!

A few comments:

  • As Liang Ding reported, there was a bug in the indexing, I fixed it (you were using the last value of enumerate() which was giving len - 1).
  • I noticed that ImageModel.parse() was not using scale_factors even if the images are big. This leads to poor performance because it doesn't compute intermediate resolutions. I fixed this in my last commit.
  • Similar to this, no chunking was used for the images (in this case the images are so big that it was not possible saving them to disk). You can see this in the screenshot below.
    I will make this point clearer in the docs/notebooks because it is not obvious that one should pay attention to the chunk shape/size of the data by using chunks parameter in the parser. Please notice that when using scale_factors, the chunking is automatically computed to get an efficient representation (also this is not obvious from the docs).
image
  • I think that in the napari screenshot you showed you were didn't save the data to disk first. To harness the chunked and multiscale representation of the data one needs first to save to Zarr (sdata.write()) and then read again. Otherwise the visualization of the images would be very inefficient and napari would hang. I will update the docs/notebooks to make this clearer as it is not straightforward.

I will now try visualizing the data with napari.

@LucaMarconato
Copy link
Member

LucaMarconato commented Jul 18, 2023

@LLehner I downloaded the data on my machine and used napari to view it. There are some bugs that I ask you to fix please, but it's almost there, they are all very quick to address.

  • the key names for images and labels are swapped
  • the "cells" (shapes) should be also per section. So cells_1, cells_2, cells_3, instead of one element will all the cells. Otherwise when they are plot they overlap in space.
  • the cells should have a radius that is deduced from the area, not a constant radius. We do this also for xenium(), you can copy the code from there
  • finally, here we should not have a unique coordinate system called global, but we should have three coordinate systems, one per sample/fov. Please check the mibitof/to_zarr.py file in the sandbox, or cosmx() and I think also steinbock() to see how to do that.

@LucaMarconato
Copy link
Member

LucaMarconato commented Jul 18, 2023

I also noticed that

  • the gene expression and obs from the table can't be plotted

Not sure why, I get a two different weird errors, one for obs and one for expression. When the rest is addressed we should look into this. To do this I usually run napari/vscode from PyCharm and go with breakpoints. I can also check into this if you want, please keep me posted.

@LLehner LLehner changed the title Reader for seqFISH files Reader for seqFISH data Jul 27, 2023
@LLehner LLehner marked this pull request as draft August 20, 2023 11:41
@giovp
Copy link
Member

giovp commented Feb 5, 2024

is this still planned to be included? @LucaMarconato @LLehner ?

@LucaMarconato
Copy link
Member

Yes, it's in the todo list; still didn't have time to test.

@LLehner LLehner mentioned this pull request Jun 12, 2024
2 tasks
@LucaMarconato LucaMarconato marked this pull request as ready for review June 16, 2024 15:08
@LucaMarconato
Copy link
Member

@LLehner I addressed all the task items from this conversation that were still open. I have also added a converter script in the sandbox and tested the reader with napari-spatialdata. Finally, I added extra arguments to be able to parse a subset of the elements and of the sections (useful when debugging).

Thanks again for the work on this PR, ready to merge now (the failing tests are due to the fact that we need to make a new release in spatialdata and are unrelated to this PR).

@LucaMarconato LucaMarconato merged commit af90e05 into scverse:main Jun 16, 2024
2 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants