Skip to content

Commit

Permalink
fix: Zarr V2 empty filter serialisation (#147)
Browse files Browse the repository at this point in the history
It is no longer possible to create non-conformant Zarr V2 filter metadata by setting filters to `Some(Vec![])`. Zarr V2 arrays with `filters: []` can still be read. `zarr-python` also shared this bug.
  • Loading branch information
LDeakin authored Feb 17, 2025
1 parent 4775e84 commit 2d162fb
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 2 deletions.
4 changes: 4 additions & 0 deletions doc/correctness_issues.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
## Correctness Issues with Past Versions
- Prior to `zarrs_metadata` [v0.3.5](https://github.com/LDeakin/zarrs/releases/tag/zarrs_metadata-v0.3.5) (`zarrs` <= 0.19), it was possible for a user to create non-conformant Zarr V2 metadata with `filters: []`
- Empty filters now always correctly serialise to `null`
- `zarrs` will indefinitely support reading Zarr V2 data with `filters: []`
- `zarr-python` shared this bug (see https://github.com/zarr-developers/zarr-python/issues/2842)
- Prior to zarrs [v0.11.5](https://github.com/LDeakin/zarrs/releases/tag/v0.11.5), arrays that used the `crc32c` codec have invalid chunk checksums
- These arrays will fail to be read by Zarr implementations if they validate checksums
- These arrays can be read by zarrs if the [validate checksums](crate::config::Config#validate-checksums) global configuration option is disabled or the relevant codec option is set explicitly
Expand Down
3 changes: 3 additions & 0 deletions zarrs_metadata/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed
- Ensure that Zarr V2 array metadata with empty `filters` is serialised as `null` instead of `[]`

## [0.3.4] - 2025-02-13

### Added
Expand Down
47 changes: 45 additions & 2 deletions zarrs_metadata/src/v2/array.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
use derive_more::{derive::From, Display};
use serde::{Deserialize, Serialize};
use serde::{Deserialize, Serialize, Serializer};
use thiserror::Error;

use crate::{
Expand Down Expand Up @@ -77,7 +77,7 @@ pub struct ArrayMetadataV2 {
/// Either “C” or “F”, defining the layout of bytes within each chunk of the array.
pub order: ArrayMetadataV2Order,
/// A list of JSON objects providing codec configurations, or null if no filters are to be applied.
#[serde(default)]
#[serde(default, serialize_with = "serialize_v2_filters")]
pub filters: Option<Vec<MetadataV2>>,
/// If present, either the string "." or "/" defining the separator placed between the dimensions of a chunk.
#[serde(default = "chunk_key_separator_default_zarr_v2")]
Expand All @@ -92,6 +92,21 @@ pub struct ArrayMetadataV2 {
pub additional_fields: AdditionalFields,
}

#[allow(clippy::ref_option)]
fn serialize_v2_filters<S>(
filters: &Option<Vec<MetadataV2>>,
serializer: S,
) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let filters = filters.as_ref().filter(|v| !v.is_empty());
match filters {
Some(filters) => serializer.collect_seq(filters),
None => serializer.serialize_none(),
}
}

impl ArrayMetadataV2 {
/// Create Zarr V2 array metadata.
///
Expand All @@ -108,6 +123,7 @@ impl ArrayMetadataV2 {
compressor: Option<MetadataV2>,
filters: Option<Vec<MetadataV2>>,
) -> Self {
let filters = filters.filter(|v| !v.is_empty());
Self {
zarr_format: monostate::MustBe!(2u64),
shape,
Expand Down Expand Up @@ -303,3 +319,30 @@ pub enum ArrayMetadataV2Order {
/// Column-major order. The first dimension varies fastest.
F,
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn empty_filters() {
let array = ArrayMetadataV2 {
zarr_format: monostate::MustBe!(2u64),
shape: vec![10000, 10000],
chunks: vec![1000, 1000].try_into().unwrap(),
dtype: DataTypeMetadataV2::Simple("<f8".to_string()),
compressor: None,
fill_value: FillValueMetadataV2::String("NaN".to_string()),
order: ArrayMetadataV2Order::C,
filters: Some(vec![]),
dimension_separator: ChunkKeySeparator::Dot,
attributes: serde_json::Map::new(),
additional_fields: Default::default(),
};
let serialized = serde_json::to_string(&array).unwrap();
assert_eq!(
serialized,
r#"{"node_type":"array","zarr_format":2,"shape":[10000,10000],"chunks":[1000,1000],"dtype":"<f8","compressor":null,"fill_value":"NaN","order":"C","filters":null,"dimension_separator":"."}"#
);
}
}

0 comments on commit 2d162fb

Please sign in to comment.