Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two questions about converting larger than memory ND data into ome-zarr #255

Open
dpshepherd opened this issue Feb 25, 2023 · 4 comments
Open

Comments

@dpshepherd
Copy link

Hi all,

Thanks for the hard work on this package and overall on ome-ngff. We are very excited to learn that Dask arrays are now supported!

We have 4D data of shape CZYX, where typically c=17 and dtype=np.uint16. The data is generated by iterative multiplexed light-sheet imaging. The 'zyx' dimension is the same for each channel and is usually large (ranging from [256,50000,50000] to [1000,100000,100000]). The full resolution data for each channel is stored as a Zarr array on disk and can be stacked together using Dask.

Two questions regarding converting this data to ome-zarr:

  1. Should we pre-calculate the multiscale data on our own given the large size? Looking through a few issues and PRs, it isn't clear to us if the Scaler() function in ome-zarr-py performs lazy down-sampling.
  2. Is there a concrete example on how to construct the metadata dictionary that contains the channel names and colors for each channel? We've found good example on the axes and transformations, but was a bit unsure about channels. Sorry if we missed something obvious.

Thanks!

@will-moore
Copy link
Member

The write_image() should be able to handle a dask array and perform lazy downsampling, but we (OME) haven't tested with the size of data you're working with, although others may have done.

The Scaler class only has one way of downsampling for dask arrays, which uses resize from https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/dask_utils.py#L11
to downscale and then write the data to disk:

def _write_dask_image(

There was some discussion on the logic for that on the PR: #192 (comment)

There is a PR currently open to fix a bug with the resizing of the edge tiles in a dask array at #244.

There's also an issue raised about this at #237.

No, there's no channels constructor helper methods. Just the example at https://ngff.openmicroscopy.org/latest/#omero-md. Apologies for the minimal docs there.
The schema (see https://github.com/ome/ngff/blob/ee4d5dab677636a28f1f65c248a751e279a0d1fe/0.4/schemas/image.schema#L97) specifies that just window and color are required. The window.min and .max are the range of pixel values and the start/end are rendering settings range for black (start) to saturated (end)`.

@toloudis
Copy link
Contributor

Coincidentally, I have an immediate need to parameterize the order parameter which we left at order=1 for the dask skimage rescale function.
https://github.com/ome/ome-zarr-py/blob/master/ome_zarr/scale.py#L153
Interestingly, for visualization of raw microscopy intensities, using order>1 preserves good details but for segmentations/labels we need to use low order to prevent interpolation.
I'll probably PR something soon on that. It could be interesting to allow providing one's own external Scaler implementation too - I can't remember if that was ever a thing.

@dpshepherd
Copy link
Author

Hi all,

Thank you both for the info. We are trying with some smaller data first and hit a few technical snags. We'll work on them on our own and come back with more questions.

Thanks again!

@dpshepherd
Copy link
Author

Hi all,

We ended up writing lazy downsampling code for these large datasets, as the current state of this project attempts to load the entire full-resolution array into memory to calculate the downsamples.

Because we generate the data from our own microscopes and are now doing the downsampling on our own, it makes more sense to re-arrange the existing zarr store and then add the various OME format attributes. Otherwise, we are needlessly copying data between two zarr stores. On that note, addressing issue #258 would help us a lot, because we could then validate.

Thanks for the guidance! I'll try to find a place to host the completed ome-zarr to see how viewing such a large dataset remotely works once everything is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants