Add ROI tables, dimension separator & build_omero_channel_metadata changes #32

jluethi · 2023-06-08T09:40:04Z

@tibuch @imagejan Super cool library for converting MD images into OME-Zarrs! 👏🏻

I would like to add Fractal support to use this library. That means 2 things:

Add ROI tables to the MD conversion (because Fractal relies on ROI tables for downstream processing)
Add actual Fractal wrapper tasks

This PR contains a proposal to add both things to this repository, as well as a few minor changes I'll go in detail about below (and that we can also move to a different PR, if that's preferred).

ROI tables

We typically rely on 2 types of ROI tables being created by the OME-Zarr converter: Whole-well ROI (trivial, but allows us to use ROI-based loading in a general manner) and FOV ROIs (describing which region of the full image belong to which field of view from the microscope).

To add FOV tables, this needs to be integrated into the core calculations of loading the images (because that's when the decision is made on how to place the FOVs in the well image). Thus I have integrated this into the core functionality in MetaSeriesUtils and now a few additional functions return a roi_tables dictionary (keys: str = table_name, values: pd.DataFrame actual table).

The actual table writing is optional, the parsing functions now just return a dictionary with tables that can be written to an AnnData table in the OME-Zarr. I also included a helper function to actually do this in Zarr.py as write_roi_table.

Unfortunately, this is does mean that it's a breaking change to the write_cyx_image_to_well & write_czyx_image_to_well API in that it now returns an additional output (the roi_tables dictionary). I have updated all tests, examples and usage of this function in this package. But if e.g. the prefect tasks depend on this repository, it would require a small adaption in those scripts (e.g. just adding an , - to the outputs of write_cyx_image_to_well & write_czyx_image_to_well to ignore those roi_tables.

All other changes should be transparent to the usage of this library, as far as I can tell.

Including the ROI tables writing in the main package (and being able to read them in the tests) does mean that anndata would become a core dependency.

Fractal tasks

I included 2 Fractal tasks in a subpackage. These tasks allow users to parse 2D, 3D or mixed MD images into an OME-Zarr & write the ROI tables.

We could also have these tasks in a separate package if you prefer. It should not interfere with anything in the main package though.

The tasks are in their sub package and running them in Fractal comes with an extra dependency on fractal-tasks-core (we're just refactoring this to be much lighter) & pydantic. This dependency is only needed when installing it to run in Fractal. Thus, they are in an extra fractal-tasks.

In addition, there is a small json manifest for the tasks, some tests to check their output & a small example in the examples folder using your example data from resources.

Minor suggested changes

a) Changes to build_omero_channel_metadata: I added a check that start and end are not the same value (leads to issues in napari otherwise). Also, proposing to switch to 0.999 percentile rescaling to make the visualization a bit less harsh by default (alternatively, we could expose the percentiles as an option the user can set)
b) dimension_separator: I hard-coded the dimension_separator = “/” now. I don’t fully understand how it picks the defaults. But sometimes, it picks “.” as a dimension separator for me, and that does not work with napari-ome-zarr for viewing the images. We can also make this an option of course to parse through everything.

Summary

I'm proposing:

Modify the API of write_cyx_image_to_well & write_czyx_image_to_well slightly to also output ROI tables
Add a anndata dependency to the package
Add 2 fractal tasks to a subpackage
Add an extra for fractal-tasks with their additional dependencies
Change the behavior of build_omero_channel_metadata slightly
Hard-code the dimension_separator to /

…sks)

…nge default upper quantile to 0.999

for more information, see https://pre-commit.ci

jluethi · 2023-06-08T09:56:41Z

Fixed the pre-commit issues now. It looks like one of my new test for the Fractal tasks seems to fail in the CI, but it's running locally. Will need to look into this afterwards.

tibuch · 2023-06-09T09:13:33Z

Hi @jluethi,

Thank you for opening this PR. Extracting the information of the individual fields of view during the data parsing and handing it back makes a lot of sense. I also like adding support to write tables to AnnData following the emerging NGFF spec.

However, I would like to keep the core processing functionality strictly separated from any workflow orchestration tool.

Could you refactor the PR such that only core functionality is added to faim-hcs? We could also schedule a pair-programming session where we go over the PR together.

Cheers!

jluethi · 2023-06-09T09:21:35Z

Hey @tibuch

Thanks for the feedback. Yes, I can refactor it that way and make a new repository for the actual task, which will depend on the faim-hcs repository. Are there plans to deploy this repository to pypi for easier dependency handling?

I'll ping you for a pair-programming session to go over the refactored PR once I have it ready.

tibuch · 2023-06-09T09:27:09Z

Currently there are no plans, but if this will make development easier on you side we can put it on pypi. We are using it like this at the moment:
setup.py:

install_requires =
    faim-hcs@git+https://github.com/fmi-faim/[email protected]

… (with defaults)

jluethi · 2023-06-23T08:14:55Z

Hey @tibuch

I went through all the suggested changes from our meeting on Monday (cleanup, using the output of compute_z_sampling for the Z extent of the ROIs, parsing dimension separator & pyramid settings through all the functions, refactoring the ROI table creation & writing a bit to reduce redundancies).

I have one additional suggestion to add to this PR (see below) that came up. Other than that, it would be ready from my side.

During the work on pyramid level specification, I noticed that it's hard to solve ome/napari-ome-zarr#84 with just the two existing parameters. Thus, I'm suggesting we add an additional parameter there:

Add a lowest_res_target parameter to the chunk size calculation in _compute_chunk_size_cyx to replace the check on whether the calculated resolution should be the lowest resolution created.
Reason: Chunk size of 2048 is a good target for most resolutions. But with the current setup, a 384 well plate would get chunks of 1024 each at the lowest resolution pyramid level. For 24 wells, that leads to a plate image of 24'576 => to large for most GPUs to display in napari (see ome/napari-ome-zarr#84)

⇒ A user should be able to overwrite this, i.e. set it to 512 or 256 to enforce that low resolution pyramids are created and it shouldn’t relate to optimal chunk sizes at higher resolutions.

By default, the behavior will remain as it was before (I think that’s a reasonable default for anything below 384 well plates)

There is some interaction with max_levels (as before): Whichever condition is hit first limits the number of pyramid levels. That can be a bit confusing, but also desired flexibility. I currently still kept it in (behavior as before).

imagejan

I just added a few comments.

My main questions:

Shouldn't the column naming (x_micrometer etc.) be a bit more flexible and not hard-code the unit?
Why did you remove the optional inclusion of the z_position from the metadata here? That's unrelated to the topic of this pull request, isn't it?

src/faim_hcs/MetaSeriesUtils.py

tests/test_MetaSeriesUtils.py

Update docstring Co-authored-by: Jan Eglinger <[email protected]>

jluethi · 2023-07-04T09:37:59Z

Thanks @imagejan ! I'll update the PR to address these docstring & pytest suggestions.

Regarding the main questions:

Shouldn't the column naming (x_micrometer etc.) be a bit more flexible and not hard-code the unit?

Yes, this should eventually become more general. The first version of Fractal ROIs has these hard-coded column names though and this PR adds support for such Fractal ROIs. There should be more generic ROIs eventually, but I'm not fully sure what direction they'd go (maybe the shapes variant of spatial data regions, see https://spatialdata.scverse.org/en/latest/design_doc.html#regions-of-interest ? Or some other broader approach?).

Why did you remove the optional inclusion of the z_position from the metadata here? That's unrelated to the topic of this pull request, isn't it?

I assume you mean include_z_position in get_well_image_CYX?
Agreed, I didn't remove it at first. During a review with @tibuch , we noticed that the variable is not used anymore in the get_well_image_CYX function since ba0f7ff (it was left in the docstring, but removed from the actual function). Plus, I'm not sure what the z_position would be in 2D only images anyway. @tibuch suggested removing it as part of cleaning up this function. We can also add it back if there's a case to be made for what z_position should be in the 2D case and if we add back use of the include_z_position variable within the get_well_image_CYX function.

jluethi · 2023-07-11T11:51:24Z

@imagejan I updated the docstrings as suggested and moved the column names in the tests into a fixture. From my side, this PR is ready to be merged (upon rerunning the github automations on the latest commit, which needs to be approved by a maintainer).

imagejan

I had another look at this pull request and added a few more nitpicky comments. I hope that @tibuch and I can finish reviewing this next week and finally merge.

imagejan · 2023-08-22T16:52:57Z

src/faim_hcs/MetaSeriesUtils.py

+    Given that Fractal ROI tables are always 3D, but we only stitch the xy
+    planes here, the z starting position is always 0 and the
+    z extent is set to 1. This is overwritten downsteam if the 2D planes are
+    assembled into a 3D stack.


faim-hcs doesn't depend on or link to Fractal, so I would rephrase this sentence to not be fractal-specific. Are (field-of-view and well) ROIs always 3D as of this implementation? Or does it make sense to implement them as 2D here, and switch to 3D depending on context?

We can't fully make this Fractal independent, as the ROI tables are created in the Fractal ROI table format. We are working towards having a specification page for v1 of our ROI tables and I'll propose it as an addition then. I updated the wording slightly to reflect that, but it still contains a Fractal reference.
Our v1 ROI tables are always 3D. If a future version of them is more flexible on dimensionality (& column names), I'll be happy to provide an update PR.

imagejan · 2023-08-22T16:57:48Z

src/faim_hcs/Zarr.py

+    adata = ad.AnnData(X=df_roi)
+    adata.obs_names = roi_table.index
+    adata.var_names = list(map(str, roi_table.columns))
+    ad._io.specs.write_elem(group_tables, table_name, adata)


Judging from the _, the _io package is meant to be private. Is it safe to use it here? I mean, is it likely to break with future updates of anndata?

Good point. There isn't a more stable table implementation available at the moment though. Given that we're only using the anndata dependency for the table handling & writing: Are you ok with fixing the anndata dependency at a given version (0.8.0 works, I can test the most recent 0.9.2 to ensure it's still fine) and only manually updating it until the table writing gets a stable home?

src/faim_hcs/Zarr.py

jluethi · 2023-09-03T17:31:24Z

Would be great if you could run the tests in the github CI again. Had some issues running them locally with the test data (yay Airport wifi? Didn't manage to debug before boarding)

I think I've addressed all the issues mentioned above. Let me know if you'd be happy with fixing the AnnData version for the time being, given that the write_elem import comes from the experimental _io subpackage.

jluethi · 2023-09-04T11:00:39Z

Tl;dr: It wasn't a local CI issue. The same bug gets reproduced locally as in the Github CI. It seems to be an fsspec issue (see below for my investigation on it).
Looks like this affects ome-zarr-py as well, see: ome/ome-zarr-py#302
Their error messages are very similar to ours here.
I suggest we fix fsspec to fsspec==2023.6.0 (the one before the fsspec==2023.9.0 that breaks ome-zarr-py atm) until we know how to address this.

The very confusing thing: Commit c471dd1, which passed the CI above (the one before my latest changes) now also fails when I run the test this way. Even the current main branch seems to fail with the same error as above, at least in my local testing. Thus, it appears that some dependency changed since to trigger this.

I checked ome-zarr (I had it working in 0.7.1 before, now it's at 0.8.0. But tests in main still fail with it).

Also checked Zarr (I had it working with zarr-2.13.6, now it's at 2.16.1. But test still fail with 2.13.6)

Also checked fsspec (I had it working with fsspec==2023.5.0, now it's at 2023.9.0) => downgrading fsspec to 2023.5.0 made tests pass again locally. Independent of the ome-zarr & zarr version above

jluethi · 2023-09-04T11:06:48Z

The most recent commit fixes fsspec<=2023.6.0 and limits anndata to anndata>=0.8.0,<=0.9.1.

Would be great to run the online CI again with this, locally it's working again. And that would stop potential anndata changes to interfere with faim-hcs for the moment

tibuch · 2023-09-14T08:34:39Z

@jluethi is there somewhere a Fractal ROI Table spec to which we could link? Maybe even a versioned spec?

Otherwise this looks good to me. I think we can merge it.

@imagejan is there something you would like to see addressed?

jluethi · 2023-09-14T08:45:54Z

We're working on a formalized Fractal ROI table spec, it's planned for October. I made another reminder issue here for it: #41

jluethi and others added 24 commits May 24, 2023 16:06

Add write_roi_table function

f316614

Add ROI table calculation for 2D planes

4233cb2

Add basic scripts to process some test data (prototype for Fractal ta…

8afc267

…sks)

Cleanup MetaSeriesUtils

a12a6c6

Add ROI table generation to montage_stage_pos_image_YX and update tests

a719be3

Update Zarr tests to also expect roi_tables

5fac806

Add check to build_omero_channel_metadata for start & end values, cha…

1e816b3

…nge default upper quantile to 0.999

Improve 3D saving in Fractal tasks

cf3cdb4

Set the dimension_separator to / to make it work in napari

d36951d

Change Fractal tasks to use mode instead of is_2D flag

76ebb7b

Improve table metadata updates

f30dfa4

Change create ome zarr into a Fractal task

0d21214

Change fractal task filenames, organize them as actual task functions

0dff70b

Add main functions, update dependencies, add Fractal manifest

c917103

Add pydantic dependency for fractal-tasks extra

dfb04d0

Add tasks manually to Fractal

1cd635f

Make Fractal tasks collectable by Fractal server

e9211f0

Fix bug: Channels have different z len => pick largest for ROI table

d534795

Add tests to check ROI table content

57ba33c

Update examples with roi tables

726eb5f

Update Fractal example to have relative paths

ecb62bf

Add tests for Fractal tasks

8487322

[pre-commit.ci] auto fixes from pre-commit.com hooks

c26fa4c

for more information, see https://pre-commit.ci

pre-commit cleanup

8bcd1e3

jluethi added 2 commits June 13, 2023 17:52

Remove all Fractal task parts from faim-hcs

7c8a262

Update anndata dependency & import from io, not experimental

0c2103b

jluethi and others added 9 commits June 22, 2023 11:12

Change Z step calculation

0c3084e

Update expected Z length in ROI table in MetaSeries tests

c6e3214

Parse the dimension_separator through all levels

3b107ec

Add lowest_res_target to _compute_chunk_size_cyx

f0e7780

Expose max_levels, max_size, lowest_res_target in top-level functions…

356f96b

… (with defaults)

Simplify write_roi_table

5409d19

Simplify montage_grid_image_YX

0ca98ef

Merge branch 'main' into fractal-roi-tables

4acb8a9

Refactor ROI table functions to have column names in a central place

1e4fa68

imagejan reviewed Jul 4, 2023

View reviewed changes

src/faim_hcs/MetaSeriesUtils.py Outdated Show resolved Hide resolved

src/faim_hcs/MetaSeriesUtils.py Outdated Show resolved Hide resolved

src/faim_hcs/MetaSeriesUtils.py Outdated Show resolved Hide resolved

tests/test_MetaSeriesUtils.py Outdated Show resolved Hide resolved

Update src/faim_hcs/MetaSeriesUtils.py

0dee855

Update docstring Co-authored-by: Jan Eglinger <[email protected]>

jluethi added 2 commits July 11, 2023 13:48

Improve docstrings

f57be3f

Use column names as fixture in tests

c471dd1

jluethi mentioned this pull request Jul 11, 2023

Generalized wavelength extraction & fix xy mixup in ROI tables #36

Merged

imagejan reviewed Aug 22, 2023

View reviewed changes

jluethi added 2 commits September 3, 2023 19:11

Improve montage_stage_pos_image_YX docstring

45cae48

Switch to kwargs for passing args for _compute_chunk_size_cyx

1e076c3

Fix fsspec & anndata dependencies

25bdf0d

Increase compatible anndata version

723208d

jluethi mentioned this pull request Sep 14, 2023

ROI tables: Link to specification once available #41

Closed

imagejan merged commit 6a1cd2b into fmi-faim:main Sep 14, 2023
5 checks passed

jluethi deleted the fractal-roi-tables branch September 15, 2023 08:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ROI tables, dimension separator & build_omero_channel_metadata changes #32

Add ROI tables, dimension separator & build_omero_channel_metadata changes #32

jluethi commented Jun 8, 2023

jluethi commented Jun 8, 2023

tibuch commented Jun 9, 2023

jluethi commented Jun 9, 2023

tibuch commented Jun 9, 2023

jluethi commented Jun 23, 2023

imagejan left a comment

jluethi commented Jul 4, 2023

jluethi commented Jul 11, 2023

imagejan left a comment

imagejan Aug 22, 2023

jluethi Sep 3, 2023

imagejan Aug 22, 2023

jluethi Sep 3, 2023

jluethi commented Sep 3, 2023 •

edited

Loading

jluethi commented Sep 4, 2023

jluethi commented Sep 4, 2023

tibuch commented Sep 14, 2023

jluethi commented Sep 14, 2023

Add ROI tables, dimension separator & build_omero_channel_metadata changes #32

Add ROI tables, dimension separator & build_omero_channel_metadata changes #32

Conversation

jluethi commented Jun 8, 2023

ROI tables

Fractal tasks

Minor suggested changes

Summary

jluethi commented Jun 8, 2023

tibuch commented Jun 9, 2023

jluethi commented Jun 9, 2023

tibuch commented Jun 9, 2023

jluethi commented Jun 23, 2023

imagejan left a comment

Choose a reason for hiding this comment

jluethi commented Jul 4, 2023

jluethi commented Jul 11, 2023

imagejan left a comment

Choose a reason for hiding this comment

imagejan Aug 22, 2023

Choose a reason for hiding this comment

jluethi Sep 3, 2023

Choose a reason for hiding this comment

imagejan Aug 22, 2023

Choose a reason for hiding this comment

jluethi Sep 3, 2023

Choose a reason for hiding this comment

jluethi commented Sep 3, 2023 • edited Loading

jluethi commented Sep 4, 2023

jluethi commented Sep 4, 2023

tibuch commented Sep 14, 2023

jluethi commented Sep 14, 2023

jluethi commented Sep 3, 2023 •

edited

Loading