Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ROI tables, dimension separator & build_omero_channel_metadata changes #32

Merged
merged 42 commits into from
Sep 14, 2023

Conversation

jluethi
Copy link
Contributor

@jluethi jluethi commented Jun 8, 2023

@tibuch @imagejan Super cool library for converting MD images into OME-Zarrs! 👏🏻

I would like to add Fractal support to use this library. That means 2 things:

  1. Add ROI tables to the MD conversion (because Fractal relies on ROI tables for downstream processing)
  2. Add actual Fractal wrapper tasks

This PR contains a proposal to add both things to this repository, as well as a few minor changes I'll go in detail about below (and that we can also move to a different PR, if that's preferred).


ROI tables

We typically rely on 2 types of ROI tables being created by the OME-Zarr converter: Whole-well ROI (trivial, but allows us to use ROI-based loading in a general manner) and FOV ROIs (describing which region of the full image belong to which field of view from the microscope).

To add FOV tables, this needs to be integrated into the core calculations of loading the images (because that's when the decision is made on how to place the FOVs in the well image). Thus I have integrated this into the core functionality in MetaSeriesUtils and now a few additional functions return a roi_tables dictionary (keys: str = table_name, values: pd.DataFrame actual table).

The actual table writing is optional, the parsing functions now just return a dictionary with tables that can be written to an AnnData table in the OME-Zarr. I also included a helper function to actually do this in Zarr.py as write_roi_table.

Unfortunately, this is does mean that it's a breaking change to the write_cyx_image_to_well & write_czyx_image_to_well API in that it now returns an additional output (the roi_tables dictionary). I have updated all tests, examples and usage of this function in this package. But if e.g. the prefect tasks depend on this repository, it would require a small adaption in those scripts (e.g. just adding an , - to the outputs of write_cyx_image_to_well & write_czyx_image_to_well to ignore those roi_tables.

All other changes should be transparent to the usage of this library, as far as I can tell.

Including the ROI tables writing in the main package (and being able to read them in the tests) does mean that anndata would become a core dependency.


Fractal tasks

I included 2 Fractal tasks in a subpackage. These tasks allow users to parse 2D, 3D or mixed MD images into an OME-Zarr & write the ROI tables.

We could also have these tasks in a separate package if you prefer. It should not interfere with anything in the main package though.

The tasks are in their sub package and running them in Fractal comes with an extra dependency on fractal-tasks-core (we're just refactoring this to be much lighter) & pydantic. This dependency is only needed when installing it to run in Fractal. Thus, they are in an extra fractal-tasks.

In addition, there is a small json manifest for the tasks, some tests to check their output & a small example in the examples folder using your example data from resources.


Minor suggested changes

a) Changes to build_omero_channel_metadata: I added a check that start and end are not the same value (leads to issues in napari otherwise). Also, proposing to switch to 0.999 percentile rescaling to make the visualization a bit less harsh by default (alternatively, we could expose the percentiles as an option the user can set)
b) dimension_separator: I hard-coded the dimension_separator = “/” now. I don’t fully understand how it picks the defaults. But sometimes, it picks “.” as a dimension separator for me, and that does not work with napari-ome-zarr for viewing the images. We can also make this an option of course to parse through everything.


Summary

I'm proposing:

  1. Modify the API of write_cyx_image_to_well & write_czyx_image_to_well slightly to also output ROI tables
  2. Add a anndata dependency to the package
  3. Add 2 fractal tasks to a subpackage
  4. Add an extra for fractal-tasks with their additional dependencies
  5. Change the behavior of build_omero_channel_metadata slightly
  6. Hard-code the dimension_separator to /

@jluethi
Copy link
Contributor Author

jluethi commented Jun 8, 2023

Fixed the pre-commit issues now. It looks like one of my new test for the Fractal tasks seems to fail in the CI, but it's running locally. Will need to look into this afterwards.

@tibuch
Copy link
Contributor

tibuch commented Jun 9, 2023

Hi @jluethi,

Thank you for opening this PR. Extracting the information of the individual fields of view during the data parsing and handing it back makes a lot of sense. I also like adding support to write tables to AnnData following the emerging NGFF spec.

However, I would like to keep the core processing functionality strictly separated from any workflow orchestration tool.

Could you refactor the PR such that only core functionality is added to faim-hcs? We could also schedule a pair-programming session where we go over the PR together.

Cheers!

@jluethi
Copy link
Contributor Author

jluethi commented Jun 9, 2023

Hey @tibuch

Thanks for the feedback. Yes, I can refactor it that way and make a new repository for the actual task, which will depend on the faim-hcs repository. Are there plans to deploy this repository to pypi for easier dependency handling?

I'll ping you for a pair-programming session to go over the refactored PR once I have it ready.

@tibuch
Copy link
Contributor

tibuch commented Jun 9, 2023

Currently there are no plans, but if this will make development easier on you side we can put it on pypi. We are using it like this at the moment:
setup.py:

install_requires =
    faim-hcs@git+https://github.com/fmi-faim/[email protected]

@jluethi
Copy link
Contributor Author

jluethi commented Jun 23, 2023

Hey @tibuch

I went through all the suggested changes from our meeting on Monday (cleanup, using the output of compute_z_sampling for the Z extent of the ROIs, parsing dimension separator & pyramid settings through all the functions, refactoring the ROI table creation & writing a bit to reduce redundancies).

I have one additional suggestion to add to this PR (see below) that came up. Other than that, it would be ready from my side.


During the work on pyramid level specification, I noticed that it's hard to solve ome/napari-ome-zarr#84 with just the two existing parameters. Thus, I'm suggesting we add an additional parameter there:

Add a lowest_res_target parameter to the chunk size calculation in _compute_chunk_size_cyx to replace the check on whether the calculated resolution should be the lowest resolution created.
Reason: Chunk size of 2048 is a good target for most resolutions. But with the current setup, a 384 well plate would get chunks of 1024 each at the lowest resolution pyramid level. For 24 wells, that leads to a plate image of 24'576 => to large for most GPUs to display in napari (see ome/napari-ome-zarr#84)

⇒ A user should be able to overwrite this, i.e. set it to 512 or 256 to enforce that low resolution pyramids are created and it shouldn’t relate to optimal chunk sizes at higher resolutions.

By default, the behavior will remain as it was before (I think that’s a reasonable default for anything below 384 well plates)

There is some interaction with max_levels (as before): Whichever condition is hit first limits the number of pyramid levels. That can be a bit confusing, but also desired flexibility. I currently still kept it in (behavior as before).

Copy link
Member

@imagejan imagejan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a few comments.

My main questions:

  • Shouldn't the column naming (x_micrometer etc.) be a bit more flexible and not hard-code the unit?
  • Why did you remove the optional inclusion of the z_position from the metadata here? That's unrelated to the topic of this pull request, isn't it?

src/faim_hcs/MetaSeriesUtils.py Outdated Show resolved Hide resolved
src/faim_hcs/MetaSeriesUtils.py Outdated Show resolved Hide resolved
src/faim_hcs/MetaSeriesUtils.py Outdated Show resolved Hide resolved
tests/test_MetaSeriesUtils.py Outdated Show resolved Hide resolved
Update docstring

Co-authored-by: Jan Eglinger <[email protected]>
@jluethi
Copy link
Contributor Author

jluethi commented Jul 4, 2023

Thanks @imagejan ! I'll update the PR to address these docstring & pytest suggestions.

Regarding the main questions:

Shouldn't the column naming (x_micrometer etc.) be a bit more flexible and not hard-code the unit?

Yes, this should eventually become more general. The first version of Fractal ROIs has these hard-coded column names though and this PR adds support for such Fractal ROIs. There should be more generic ROIs eventually, but I'm not fully sure what direction they'd go (maybe the shapes variant of spatial data regions, see https://spatialdata.scverse.org/en/latest/design_doc.html#regions-of-interest ? Or some other broader approach?).

Why did you remove the optional inclusion of the z_position from the metadata here? That's unrelated to the topic of this pull request, isn't it?

I assume you mean include_z_position in get_well_image_CYX?
Agreed, I didn't remove it at first. During a review with @tibuch , we noticed that the variable is not used anymore in the get_well_image_CYX function since ba0f7ff (it was left in the docstring, but removed from the actual function). Plus, I'm not sure what the z_position would be in 2D only images anyway. @tibuch suggested removing it as part of cleaning up this function. We can also add it back if there's a case to be made for what z_position should be in the 2D case and if we add back use of the include_z_position variable within the get_well_image_CYX function.

@jluethi
Copy link
Contributor Author

jluethi commented Jul 11, 2023

@imagejan I updated the docstrings as suggested and moved the column names in the tests into a fixture. From my side, this PR is ready to be merged (upon rerunning the github automations on the latest commit, which needs to be approved by a maintainer).

Copy link
Member

@imagejan imagejan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look at this pull request and added a few more nitpicky comments. I hope that @tibuch and I can finish reviewing this next week and finally merge.

Comment on lines 135 to 138
Given that Fractal ROI tables are always 3D, but we only stitch the xy
planes here, the z starting position is always 0 and the
z extent is set to 1. This is overwritten downsteam if the 2D planes are
assembled into a 3D stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faim-hcs doesn't depend on or link to Fractal, so I would rephrase this sentence to not be fractal-specific. Are (field-of-view and well) ROIs always 3D as of this implementation? Or does it make sense to implement them as 2D here, and switch to 3D depending on context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't fully make this Fractal independent, as the ROI tables are created in the Fractal ROI table format. We are working towards having a specification page for v1 of our ROI tables and I'll propose it as an addition then. I updated the wording slightly to reflect that, but it still contains a Fractal reference.
Our v1 ROI tables are always 3D. If a future version of them is more flexible on dimensionality (& column names), I'll be happy to provide an update PR.

adata = ad.AnnData(X=df_roi)
adata.obs_names = roi_table.index
adata.var_names = list(map(str, roi_table.columns))
ad._io.specs.write_elem(group_tables, table_name, adata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from the _, the _io package is meant to be private. Is it safe to use it here? I mean, is it likely to break with future updates of anndata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. There isn't a more stable table implementation available at the moment though. Given that we're only using the anndata dependency for the table handling & writing: Are you ok with fixing the anndata dependency at a given version (0.8.0 works, I can test the most recent 0.9.2 to ensure it's still fine) and only manually updating it until the table writing gets a stable home?

src/faim_hcs/Zarr.py Outdated Show resolved Hide resolved
@jluethi
Copy link
Contributor Author

jluethi commented Sep 3, 2023

Would be great if you could run the tests in the github CI again. Had some issues running them locally with the test data (yay Airport wifi? Didn't manage to debug before boarding)

I think I've addressed all the issues mentioned above. Let me know if you'd be happy with fixing the AnnData version for the time being, given that the write_elem import comes from the experimental _io subpackage.

@jluethi
Copy link
Contributor Author

jluethi commented Sep 4, 2023

Tl;dr: It wasn't a local CI issue. The same bug gets reproduced locally as in the Github CI. It seems to be an fsspec issue (see below for my investigation on it).
Looks like this affects ome-zarr-py as well, see: ome/ome-zarr-py#302
Their error messages are very similar to ours here.
I suggest we fix fsspec to fsspec==2023.6.0 (the one before the fsspec==2023.9.0 that breaks ome-zarr-py atm) until we know how to address this.


The very confusing thing: Commit c471dd1, which passed the CI above (the one before my latest changes) now also fails when I run the test this way. Even the current main branch seems to fail with the same error as above, at least in my local testing. Thus, it appears that some dependency changed since to trigger this.

I checked ome-zarr (I had it working in 0.7.1 before, now it's at 0.8.0. But tests in main still fail with it).

Also checked Zarr (I had it working with zarr-2.13.6, now it's at 2.16.1. But test still fail with 2.13.6)

Also checked fsspec (I had it working with fsspec==2023.5.0, now it's at 2023.9.0) => downgrading fsspec to 2023.5.0 made tests pass again locally. Independent of the ome-zarr & zarr version above

@jluethi
Copy link
Contributor Author

jluethi commented Sep 4, 2023

The most recent commit fixes fsspec<=2023.6.0 and limits anndata to anndata>=0.8.0,<=0.9.1.

Would be great to run the online CI again with this, locally it's working again. And that would stop potential anndata changes to interfere with faim-hcs for the moment

@tibuch
Copy link
Contributor

tibuch commented Sep 14, 2023

@jluethi is there somewhere a Fractal ROI Table spec to which we could link? Maybe even a versioned spec?

Otherwise this looks good to me. I think we can merge it.

@imagejan is there something you would like to see addressed?

@jluethi
Copy link
Contributor Author

jluethi commented Sep 14, 2023

We're working on a formalized Fractal ROI table spec, it's planned for October. I made another reminder issue here for it: #41

@imagejan imagejan merged commit 6a1cd2b into fmi-faim:main Sep 14, 2023
5 checks passed
@jluethi jluethi deleted the fractal-roi-tables branch September 15, 2023 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants