Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OME Zarr chunking #572

Closed
tischi opened this issue Jan 19, 2022 · 33 comments
Closed

OME Zarr chunking #572

tischi opened this issue Jan 19, 2022 · 33 comments
Labels
bug Something isn't working

Comments

@tischi
Copy link
Contributor

tischi commented Jan 19, 2022

@K-Meech @constantinpape

I have a feeling that the default chunking for OME.Zarr is not ideal.

It takes a long time to load with intermediates like this:

image

@tischi tischi added the enhancement New feature or request label Jan 19, 2022
@constantinpape
Copy link
Contributor

ome.zarr itself does not have a default chunking, so this probably depends on the library that writes this. Did you use the java/mobie one or python for this dataset? What are the chunks?

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 24, 2022

@tischi Do you have this dataset somewhere I could access? So I can figure out what the chunks etc are?

@tischi
Copy link
Contributor Author

tischi commented Jan 24, 2022

running... -bash-4.2$ cp -r SXAA03648.ome.zarr /g/cba/exchange/kimberly/

@tischi
Copy link
Contributor Author

tischi commented Jan 25, 2022

@K-Meech @constantinpape @KateMoreva

I am pretty sure now that the issue is that for the lower resolutions the data at the image borders is corrupt:

image

Maybe some issue with the gzip reader or writer?

Example data is here: /g/cba/exchange/kimberly/SXAA03648.ome.zarr

@tischi tischi added bug Something isn't working and removed enhancement New feature or request labels Jan 25, 2022
@K-Meech
Copy link
Collaborator

K-Meech commented Jan 25, 2022

Strange. I'll take a look.

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 25, 2022

@tischi - if I recall correctly, the ome-zarr writing stuff (from within fiji) is using the same libraries as the n5 writing. Could you try writing the same dataset as n5? Does it have the same problem?

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 25, 2022

Actually, if you just put the raw data in the same folder, I can play around with it myself :)

@tischi
Copy link
Contributor Author

tischi commented Jan 25, 2022

should be there

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 25, 2022

Ok - so I looked into this a bit more. It doesn't happen with n5, so it's a specific problem with the ome-zarr writing. I suspect it's something going wrong with how chunks are padded at the edges of the dataset in the lower resolution levels - probably around here: https://github.com/mobie/mobie-io/blob/main/src/main/java/org/embl/mobie/io/ome/zarr/writers/N5OMEZarrWriter.java#L227. I'll keep looking.

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 25, 2022

@constantinpape Is there an easy way to open v0.3 ome-zarr in python to inspect individual chunks? e.g. is it supported by z5py?

@constantinpape
Copy link
Contributor

Yes, there is some functionality to access chunks directly: https://github.com/constantinpape/z5/blob/master/src/python/module/z5py/dataset.py#L477-L529

As an example you could use it like this to check if all the chunks in a scale level of an ome.zarr file exist:

import z5py
with z5py.File("my-file.ome.zarr", "r") as f:
  ds = f["s0"]  # the name of scale level zero
  # assuming a 2d dataset here, extension to 3d is trivial
  for i in range(ds.chunks_per_dimension[0]):
    for j in range(ds.chunks_per_dimension[1]):
      chunk_id = (i, j)
      print("Have chunk", chunk_id, ":", ds.chunks_exists(chunk_id))

Hope this helps / let me know if you run into any issues.

@tischi
Copy link
Contributor Author

tischi commented Jan 26, 2022

I think that some chunks at the image boundary are corrupt (probably for all resolutions, but one sees it best at the low resolution because there are less chunks and the effect in the rendering is more evident).

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 26, 2022

Thanks for the links @constantinpape! I'm having a few issues though. There are some slight differences in how Java writes the metadata vs python which are causing issues. E.g. in each dataset's .zarray file - Java puts these lines:

"fill_value": "0",
"filters": [],

These cause errors when trying to open the dataset in z5py. Deleting the "filters" line, and changing the fill value to 0 (rather than "0") fixes this. Should I change how this is written from the java code? Or could you make z5py accept these options too?

After fixing this, I can run the code you put above. But it return false for every chunk which seems unlikely! So perhaps there are some other metadata differences...

@constantinpape
Copy link
Contributor

@K-Meech what exactly are you using for writing the data? Is it based on https://github.com/saalfeldlab/n5-zarr or on something else.

These cause errors when trying to open the dataset in z5py. Deleting the "filters" line, and changing the fill value to 0 (rather than "0") fixes this. Should I change how this is written from the java code? Or could you make z5py accept these options too?

The fill value "0" is wrong. It should not be a string. The filters should actually be ok, but maybe I have just never encountered this in z5py.

After fixing this, I can run the code you put above. But it return false for every chunk which seems unlikely! So perhaps there are some other metadata differences...

Ok, I know why. This is due to some recent change with the dimension separator that I don't support yet.
Could you maybe send me the link to one of these zarr files.

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 26, 2022

@constantinpape yes - it's based on https://github.com/saalfeldlab/n5-zarr, with very slight differences.
There's an ome-zarr file on: /g/cba/exchange/KIMBER~1/SXAA03648.ome.zarr

@constantinpape
Copy link
Contributor

There's an ome-zarr file on: /g/cba/exchange/KIMBER~1/SXAA03648.ome.zarr

@K-Meech something with the filepath is not right.
But I think I can fix most of these things without it, will give it a try now.

@constantinpape
Copy link
Contributor

@K-Meech I have updated z5py so that it can deal with filters: [] and can also read zarr with nested chunks (this is why all chunks came back empty for you before). You need to upgrade to z5py 2.0.12; it's available on conda-forge.

The string fill_value is a bug on the java side; it should be a number.
For now you can fix it manually and maybe create an issue in n5-zarr about it.

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 27, 2022

Thanks @constantinpape! I'll try again.

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 27, 2022

Ok - now I get a new error:

Traceback (most recent call last):
  File "C:/Users/meechan/Documents/Repos/general_image_analysis/check_chunks.py", line 9, in <module>
    ds = f["s0"]  # the name of scale level zero
  File "C:\Users\meechan\Anaconda3\envs\image_analysis_general\lib\site-packages\z5py\group.py", line 78, in __getitem__
    return Dataset._open_dataset(self, name.lstrip('/'))
  File "C:\Users\meechan\Anaconda3\envs\image_analysis_general\lib\site-packages\z5py\dataset.py", line 241, in _open_dataset
    ds = _z5py.open_dataset(ghandle, name)
IndexError: invalid map<K, T> key

I copied the file into my folder: /g/schwab/Kimberly/temp/SXAA03648.ome.zarr

@constantinpape
Copy link
Contributor

I copied the file into my folder: /g/schwab/Kimberly/temp/SXAA03648.ome.zarr

Ok, I can access it; will check it out later.

@tischi
Copy link
Contributor Author

tischi commented Jan 27, 2022

I already saved quite some big files with this bug.
Could I manually go in and fix this in a single place or do I need to rewrite all the voxel data?

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 27, 2022

You can just fix it in the metadata files, so you don't need to re-write the voxel data. You would need to adapt each '.zarray' file (inside each dataset) and change the fill_value from "0" to 0. The filters line is fine - you can leave that as is.

@tischi
Copy link
Contributor Author

tischi commented Jan 27, 2022

Ok, there is quite a bunch of them because of all the resolution levels, but I will figure out some linux sed magic to do this...

@tischi
Copy link
Contributor Author

tischi commented Jan 27, 2022

Took me some time but that did it 😓
find . -path './*/*/.zarray' -exec grep -i ': "0"' -l {} \; -exec sed -i 's/: "0"/: 0/g' {} \;

@constantinpape
Copy link
Contributor

Ok, there is quite a bunch of them because of all the resolution levels, but I will figure out some linux sed magic to do this...

But keep in mind that this is not only a metadata issue. I also think that there's an issue with the border chunks.

@constantinpape
Copy link
Contributor

@K-Meech I can reproduce the error you see. I am investigating it now, gonna ping you when I now more.

@constantinpape
Copy link
Contributor

constantinpape commented Jan 27, 2022

Ok, the dataset can't be opened in z5py because z5py does not support data stored in big endian in zarr (this is not very common, but seems to be the default way of writing it in n5-zarr). I will fix this on my side.

@K-Meech in the meantime you can just use the zarr python library to read the data. It does not contain convenience function to read individual chunks; but you can just view the data in napari (see code snippet below). I did this for your data and I can't find any issues. So maybe the issues with the boundary chunks is not in writing but in reading them?

import zarr
import napari

with zarr.open("./SXAA03648.ome.zarr", "r") as f:
    data = f["s0"][:]

v = napari.Viewer()
v.add_image(data)
napari.run()

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 27, 2022

Thanks @constantinpape! So - turns out I see the same issues opening with python and the zarr library. Datasets s0 and s1 look fine - but s2 and s3 show issues at the edges. E.g. for dataset s3 in napari, you see weird bands at the right side
Screenshot (387)

@constantinpape
Copy link
Contributor

I see @K-Meech. Then it looks like an issue with writing the data that only occurs for the higher scales.

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 27, 2022

I'll look into this some more, but I imagine it's an issue coming from upstream in the n5-zarr library. I didn't change anything in the downsampling etc code for the version in mobie-io.

@constantinpape
Copy link
Contributor

I'll look into this some more, but I imagine it's an issue coming from upstream in the n5-zarr library. I didn't change anything in the downsampling etc code for the version in mobie-io

Yeah, I also have the feeling that we can't fully trust n5-zarr in writing the data yet. It should be added to zarr-implementations to ensure that it really conforms to the zarr standard: zarr-developers/zarr_implementations#54.
I will add big endian support in z5 in the meantime so that we can also read it in there: constantinpape/z5#196.

@K-Meech
Copy link
Collaborator

K-Meech commented Jan 27, 2022

Alright - I think I've got it now! This was actually a problem with the modifications I made to code from BigDataViewer for writing the different scale levels. Here: https://github.com/mobie/mobie-io/blob/develop/src/main/java/org/embl/mobie/io/n5/util/ExportScalePyramid.java#L152 there's a 'loopBack' where previously written levels are accessed (but only when writing very downsampled levels!). I hadn't updated the reading code here so this was misbehaving. I'll check this tomorrow, but I think it should be an easy fix.

@K-Meech
Copy link
Collaborator

K-Meech commented Mar 2, 2022

This is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants