OME Zarr chunking #572

tischi · 2022-01-19T11:17:31Z

I have a feeling that the default chunking for OME.Zarr is not ideal.

It takes a long time to load with intermediates like this:

constantinpape · 2022-01-19T19:42:42Z

ome.zarr itself does not have a default chunking, so this probably depends on the library that writes this. Did you use the java/mobie one or python for this dataset? What are the chunks?

K-Meech · 2022-01-24T11:33:10Z

@tischi Do you have this dataset somewhere I could access? So I can figure out what the chunks etc are?

tischi · 2022-01-24T14:57:43Z

running... -bash-4.2$ cp -r SXAA03648.ome.zarr /g/cba/exchange/kimberly/

tischi · 2022-01-25T09:53:53Z

@K-Meech @constantinpape @KateMoreva

I am pretty sure now that the issue is that for the lower resolutions the data at the image borders is corrupt:

Maybe some issue with the gzip reader or writer?

Example data is here: /g/cba/exchange/kimberly/SXAA03648.ome.zarr

K-Meech · 2022-01-25T10:30:39Z

Strange. I'll take a look.

K-Meech · 2022-01-25T10:53:53Z

@tischi - if I recall correctly, the ome-zarr writing stuff (from within fiji) is using the same libraries as the n5 writing. Could you try writing the same dataset as n5? Does it have the same problem?

K-Meech · 2022-01-25T11:06:00Z

Actually, if you just put the raw data in the same folder, I can play around with it myself :)

tischi · 2022-01-25T11:19:40Z

should be there

K-Meech · 2022-01-25T17:21:05Z

Ok - so I looked into this a bit more. It doesn't happen with n5, so it's a specific problem with the ome-zarr writing. I suspect it's something going wrong with how chunks are padded at the edges of the dataset in the lower resolution levels - probably around here: https://github.com/mobie/mobie-io/blob/main/src/main/java/org/embl/mobie/io/ome/zarr/writers/N5OMEZarrWriter.java#L227. I'll keep looking.

K-Meech · 2022-01-25T17:30:51Z

@constantinpape Is there an easy way to open v0.3 ome-zarr in python to inspect individual chunks? e.g. is it supported by z5py?

constantinpape · 2022-01-25T20:23:20Z

Yes, there is some functionality to access chunks directly: https://github.com/constantinpape/z5/blob/master/src/python/module/z5py/dataset.py#L477-L529

As an example you could use it like this to check if all the chunks in a scale level of an ome.zarr file exist:

import z5py
with z5py.File("my-file.ome.zarr", "r") as f:
  ds = f["s0"]  # the name of scale level zero
  # assuming a 2d dataset here, extension to 3d is trivial
  for i in range(ds.chunks_per_dimension[0]):
    for j in range(ds.chunks_per_dimension[1]):
      chunk_id = (i, j)
      print("Have chunk", chunk_id, ":", ds.chunks_exists(chunk_id))

Hope this helps / let me know if you run into any issues.

tischi · 2022-01-26T08:23:51Z

I think that some chunks at the image boundary are corrupt (probably for all resolutions, but one sees it best at the low resolution because there are less chunks and the effect in the rendering is more evident).

K-Meech · 2022-01-26T12:41:33Z

Thanks for the links @constantinpape! I'm having a few issues though. There are some slight differences in how Java writes the metadata vs python which are causing issues. E.g. in each dataset's .zarray file - Java puts these lines:

"fill_value": "0",
"filters": [],

These cause errors when trying to open the dataset in z5py. Deleting the "filters" line, and changing the fill value to 0 (rather than "0") fixes this. Should I change how this is written from the java code? Or could you make z5py accept these options too?

After fixing this, I can run the code you put above. But it return false for every chunk which seems unlikely! So perhaps there are some other metadata differences...

constantinpape · 2022-01-26T13:43:45Z

@K-Meech what exactly are you using for writing the data? Is it based on https://github.com/saalfeldlab/n5-zarr or on something else.

These cause errors when trying to open the dataset in z5py. Deleting the "filters" line, and changing the fill value to 0 (rather than "0") fixes this. Should I change how this is written from the java code? Or could you make z5py accept these options too?

The fill value "0" is wrong. It should not be a string. The filters should actually be ok, but maybe I have just never encountered this in z5py.

After fixing this, I can run the code you put above. But it return false for every chunk which seems unlikely! So perhaps there are some other metadata differences...

Ok, I know why. This is due to some recent change with the dimension separator that I don't support yet.
Could you maybe send me the link to one of these zarr files.

K-Meech · 2022-01-26T13:58:15Z

@constantinpape yes - it's based on https://github.com/saalfeldlab/n5-zarr, with very slight differences.
There's an ome-zarr file on: /g/cba/exchange/KIMBER~1/SXAA03648.ome.zarr

constantinpape · 2022-01-26T20:41:19Z

There's an ome-zarr file on: /g/cba/exchange/KIMBER~1/SXAA03648.ome.zarr

@K-Meech something with the filepath is not right.
But I think I can fix most of these things without it, will give it a try now.

constantinpape · 2022-01-27T07:54:44Z

@K-Meech I have updated z5py so that it can deal with filters: [] and can also read zarr with nested chunks (this is why all chunks came back empty for you before). You need to upgrade to z5py 2.0.12; it's available on conda-forge.

The string fill_value is a bug on the java side; it should be a number.
For now you can fix it manually and maybe create an issue in n5-zarr about it.

K-Meech · 2022-01-27T09:43:18Z

Thanks @constantinpape! I'll try again.

K-Meech · 2022-01-27T09:59:15Z

Ok - now I get a new error:

Traceback (most recent call last):
  File "C:/Users/meechan/Documents/Repos/general_image_analysis/check_chunks.py", line 9, in <module>
    ds = f["s0"]  # the name of scale level zero
  File "C:\Users\meechan\Anaconda3\envs\image_analysis_general\lib\site-packages\z5py\group.py", line 78, in __getitem__
    return Dataset._open_dataset(self, name.lstrip('/'))
  File "C:\Users\meechan\Anaconda3\envs\image_analysis_general\lib\site-packages\z5py\dataset.py", line 241, in _open_dataset
    ds = _z5py.open_dataset(ghandle, name)
IndexError: invalid map<K, T> key

I copied the file into my folder: /g/schwab/Kimberly/temp/SXAA03648.ome.zarr

constantinpape · 2022-01-27T11:06:32Z

I copied the file into my folder: /g/schwab/Kimberly/temp/SXAA03648.ome.zarr

Ok, I can access it; will check it out later.

tischi · 2022-01-27T12:18:00Z

I already saved quite some big files with this bug.
Could I manually go in and fix this in a single place or do I need to rewrite all the voxel data?

K-Meech · 2022-01-27T12:24:21Z

You can just fix it in the metadata files, so you don't need to re-write the voxel data. You would need to adapt each '.zarray' file (inside each dataset) and change the fill_value from "0" to 0. The filters line is fine - you can leave that as is.

tischi · 2022-01-27T12:31:41Z

Ok, there is quite a bunch of them because of all the resolution levels, but I will figure out some linux sed magic to do this...

tischi · 2022-01-27T13:04:04Z

Took me some time but that did it 😓
find . -path './*/*/.zarray' -exec grep -i ': "0"' -l {} \; -exec sed -i 's/: "0"/: 0/g' {} \;

constantinpape · 2022-01-27T13:04:32Z

Ok, there is quite a bunch of them because of all the resolution levels, but I will figure out some linux sed magic to do this...

But keep in mind that this is not only a metadata issue. I also think that there's an issue with the border chunks.

constantinpape · 2022-01-27T13:35:21Z

@K-Meech I can reproduce the error you see. I am investigating it now, gonna ping you when I now more.

constantinpape · 2022-01-27T14:09:33Z

Ok, the dataset can't be opened in z5py because z5py does not support data stored in big endian in zarr (this is not very common, but seems to be the default way of writing it in n5-zarr). I will fix this on my side.

@K-Meech in the meantime you can just use the zarr python library to read the data. It does not contain convenience function to read individual chunks; but you can just view the data in napari (see code snippet below). I did this for your data and I can't find any issues. So maybe the issues with the boundary chunks is not in writing but in reading them?

import zarr
import napari

with zarr.open("./SXAA03648.ome.zarr", "r") as f:
    data = f["s0"][:]

v = napari.Viewer()
v.add_image(data)
napari.run()

K-Meech · 2022-01-27T15:08:34Z

Thanks @constantinpape! So - turns out I see the same issues opening with python and the zarr library. Datasets s0 and s1 look fine - but s2 and s3 show issues at the edges. E.g. for dataset s3 in napari, you see weird bands at the right side

constantinpape · 2022-01-27T15:11:42Z

I see @K-Meech. Then it looks like an issue with writing the data that only occurs for the higher scales.

K-Meech · 2022-01-27T15:13:59Z

I'll look into this some more, but I imagine it's an issue coming from upstream in the n5-zarr library. I didn't change anything in the downsampling etc code for the version in mobie-io.

constantinpape · 2022-01-27T15:27:31Z

I'll look into this some more, but I imagine it's an issue coming from upstream in the n5-zarr library. I didn't change anything in the downsampling etc code for the version in mobie-io

Yeah, I also have the feeling that we can't fully trust n5-zarr in writing the data yet. It should be added to zarr-implementations to ensure that it really conforms to the zarr standard: zarr-developers/zarr_implementations#54.
I will add big endian support in z5 in the meantime so that we can also read it in there: constantinpape/z5#196.

K-Meech · 2022-01-27T17:56:34Z

Alright - I think I've got it now! This was actually a problem with the modifications I made to code from BigDataViewer for writing the different scale levels. Here: https://github.com/mobie/mobie-io/blob/develop/src/main/java/org/embl/mobie/io/n5/util/ExportScalePyramid.java#L152 there's a 'loopBack' where previously written levels are accessed (but only when writing very downsampled levels!). I hadn't updated the reading code here so this was misbehaving. I'll check this tomorrow, but I think it should be an easy fix.

K-Meech · 2022-03-02T14:40:47Z

This is fixed now.

tischi added the enhancement New feature or request label Jan 19, 2022

tischi added bug Something isn't working and removed enhancement New feature or request labels Jan 25, 2022

constantinpape mentioned this issue Jan 27, 2022

Add n5-zarr zarr-developers/zarr_implementations#54

Open

tischi mentioned this issue Jan 27, 2022

Add support for ome-ngff-v0.4 mobie/mobie-io#42

Closed

K-Meech mentioned this issue Jan 28, 2022

Fix boundary issues for ome-zarr writing mobie/mobie-io#55

Merged

K-Meech closed this as completed Mar 2, 2022

K-Meech mentioned this issue Mar 2, 2022

Add OME-Zarr files - Project creator #627

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OME Zarr chunking #572

OME Zarr chunking #572

tischi commented Jan 19, 2022

constantinpape commented Jan 19, 2022

K-Meech commented Jan 24, 2022

tischi commented Jan 24, 2022

tischi commented Jan 25, 2022

K-Meech commented Jan 25, 2022

K-Meech commented Jan 25, 2022

K-Meech commented Jan 25, 2022

tischi commented Jan 25, 2022

K-Meech commented Jan 25, 2022

K-Meech commented Jan 25, 2022

constantinpape commented Jan 25, 2022

tischi commented Jan 26, 2022

K-Meech commented Jan 26, 2022

constantinpape commented Jan 26, 2022

K-Meech commented Jan 26, 2022

constantinpape commented Jan 26, 2022

constantinpape commented Jan 27, 2022

K-Meech commented Jan 27, 2022

K-Meech commented Jan 27, 2022

constantinpape commented Jan 27, 2022

tischi commented Jan 27, 2022

K-Meech commented Jan 27, 2022

tischi commented Jan 27, 2022

tischi commented Jan 27, 2022

constantinpape commented Jan 27, 2022

constantinpape commented Jan 27, 2022

constantinpape commented Jan 27, 2022 •

edited

Loading

K-Meech commented Jan 27, 2022

constantinpape commented Jan 27, 2022

K-Meech commented Jan 27, 2022

constantinpape commented Jan 27, 2022

K-Meech commented Jan 27, 2022

K-Meech commented Mar 2, 2022

OME Zarr chunking #572

OME Zarr chunking #572

Comments

tischi commented Jan 19, 2022

constantinpape commented Jan 19, 2022

K-Meech commented Jan 24, 2022

tischi commented Jan 24, 2022

tischi commented Jan 25, 2022

K-Meech commented Jan 25, 2022

K-Meech commented Jan 25, 2022

K-Meech commented Jan 25, 2022

tischi commented Jan 25, 2022

K-Meech commented Jan 25, 2022

K-Meech commented Jan 25, 2022

constantinpape commented Jan 25, 2022

tischi commented Jan 26, 2022

K-Meech commented Jan 26, 2022

constantinpape commented Jan 26, 2022

K-Meech commented Jan 26, 2022

constantinpape commented Jan 26, 2022

constantinpape commented Jan 27, 2022

K-Meech commented Jan 27, 2022

K-Meech commented Jan 27, 2022

constantinpape commented Jan 27, 2022

tischi commented Jan 27, 2022

K-Meech commented Jan 27, 2022

tischi commented Jan 27, 2022

tischi commented Jan 27, 2022

constantinpape commented Jan 27, 2022

constantinpape commented Jan 27, 2022

constantinpape commented Jan 27, 2022 • edited Loading

K-Meech commented Jan 27, 2022

constantinpape commented Jan 27, 2022

K-Meech commented Jan 27, 2022

constantinpape commented Jan 27, 2022

K-Meech commented Jan 27, 2022

K-Meech commented Mar 2, 2022

constantinpape commented Jan 27, 2022 •

edited

Loading