Skip to content

Commit

Permalink
Extend NCZarr to support unlimited dimensions.
Browse files Browse the repository at this point in the history
The existing NCZarr extensions to Zarr are modified to support unlimited dimensions.
NCzarr extends the Zarr meta-data for the ".zgroup" object to include netcdf-4 model extensions. This information is stored in ".zgroup" as dictionary named "_nczarr_group".
Inside "_nczarr_group", there is a key named "dims" that stores information about netcdf-4 named dimensions. The value of "dims" is a dictionary whose keys are the named dimensions. The value associated with each dimension name has one of two forms
Form 1 is a special case of form 2, and is kept for backward compatibility. Whenever a new file is written, it uses format 2.
1. An integer representing the size of the dimension, which is used for simple named dimensions.
2. A dictionary with the following keys and values"
   * "size" with an integer value representing the (current) size of the dimension.
   * "unlimited" with a value of either "1" or "0" to indicate if this dimension is an unlimited dimension.

For Unlimited dimensions, the size is initially zero, and as variables extend the length of that dimension, the size value for the dimension increases.
That dimension size is shared by all arrays referencing that dimension, so if one array extends an unlimited dimension, it is implicitly extended for all other arrays that reference that dimension.
This is the standard semantics for unlimited dimensions.

## Related changes.
Adding unlimited dimensions required a number of other changes to the NCZarr code-base. These included the following.
* Did a partial refactor of the slice handling code in zwalk.c to clean it up.
* Added a number of tests for unlimited dimensions derived from the same test in nc_test4.
* Added several NCZarr specific unlimited tests; more are needed.
* Add test of endianness.

## Misc. Other changes
* Fixed an obscure memory leak in ncdump.
* Removed some obsolete unit testing code and test cases.
* Uncovered a bug in the netcdf-c handling of big-endian floats and doubles. Have not fixed yet. See tst_h5_endians.c.
* Renamed some nczarr_tests testcases to avoid name conflicts with nc_test4.
  • Loading branch information
DennisHeimbigner committed Aug 15, 2023
1 parent 032b910 commit 0f6a00e
Show file tree
Hide file tree
Showing 68 changed files with 4,756 additions and 757 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/run_tests_ubuntu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,9 @@ jobs:
shell: bash -l {0}
run: |
cd build
echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"
cat nczarr_test/run_specific_filters.sh
echo ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} ctest --output-on-failure -j 12 .
if: ${{ success() }}

Expand Down
6 changes: 4 additions & 2 deletions .github/workflows/run_tests_win_mingw.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
with:
msystem: MINGW64
update: true
install: git mingw-w64-x86_64-toolchain automake libtool autoconf make cmake mingw-w64-x86_64-hdf5 unzip mingw-w64-x86_64-libxml2
install: git mingw-w64-x86_64-toolchain automake libtool autoconf make cmake mingw-w64-x86_64-hdf5 unzip mingw-w64-x86_64-libxml2 mingw-w64-x86_64-ntldd

###
# Configure and build
Expand All @@ -48,7 +48,9 @@ jobs:
run: cat libnetcdf.settings

- name: (Autotools) Build Library and Utilities
run: make -j 8 LDFLAGS="-Wl,--export-all-symbols"
run: |
make -j 8 LDFLAGS="-Wl,--export-all-symbols"
find . -name '*.dll'
if: ${{ success() }}

- name: Check for plugins
Expand Down
1 change: 1 addition & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This file contains a high-level description of this package's evolution. Release

## 4.9.3 - TBD

* Extend NCZarr to support unlimited dimensions. See [Github #????](https://github.com/Unidata/netcdf-c/pull/????).
* Fix default parameters for caching of NCZarr. See [Github #2734](https://github.com/Unidata/netcdf-c/pull/2734).
* Introducing configure-time options to disable various filters, even if the required libraries are available on the system, in support of [GitHub #2712](https://github.com/Unidata/netcdf-c/pull/2712).
* Fix memory leak WRT unreclaimed HDF5 plist. See [Github #2752](https://github.com/Unidata/netcdf-c/pull/2752).
Expand Down
4 changes: 2 additions & 2 deletions docs/nczarr.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ are also [supported](./md_filters.html "filters").
Specifically, the model supports the following.
- "Atomic" types: char, byte, ubyte, short, ushort, int, uint, int64, uint64, string.
- Shared (named) dimensions
- Unlimited dimensions
- Attributes with specified types -- both global and per-variable
- Chunking
- Fill values
Expand All @@ -65,7 +66,6 @@ Specifically, the model supports the following.
With respect to full netCDF-4, the following concepts are
currently unsupported.
- User-defined types (enum, opaque, VLEN, and Compound)
- Unlimited dimensions
- Contiguous or compact storage

Note that contiguous and compact are not actually supported
Expand Down Expand Up @@ -375,7 +375,7 @@ Currently it contains the following key(s):
_\_nczarr_group\__ -- this key appears in every _.zgroup_ object.
It contains any netcdf specific group information.
Specifically it contains the following keys:
* "dims" -- the name and size of shared dimensions defined in this group.
* "dims" -- the name and size of shared dimensions defined in this group, as well an optional flag indictating if the dimension is UNLIMITED.
* "vars" -- the name of variables defined in this group.
* "groups" -- the name of sub-groups defined in this group.
These lists allow walking the NCZarr dataset without having to use the potentially costly search operation.
Expand Down
3 changes: 1 addition & 2 deletions lib_flags.am
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,13 @@
# libraries for netCDF-4.
#

AM_CPPFLAGS = -I$(top_builddir)/include -I$(top_srcdir)/include
AM_CPPFLAGS = -I$(top_builddir)/include -I$(top_srcdir)/include -I$(top_builddir) -I$(top_srcdir)
AM_LDFLAGS =

if USE_DAP
AM_CPPFLAGS += -I${top_srcdir}/oc2
endif


if ENABLE_NCZARR
AM_CPPFLAGS += -I${top_srcdir}/libnczarr
endif
Expand Down
2 changes: 1 addition & 1 deletion libnczarr/zarr.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ EXTERNL int NCZ_copy_fill_value(NC_VAR_INFO_T* var, void** dstp);
EXTERNL int NCZ_get_maxstrlen(NC_OBJ* obj);
EXTERNL int NCZ_fixed2char(const void* fixed, char** charp, size_t count, int maxstrlen);
EXTERNL int NCZ_char2fixed(const char** charp, void* fixed, size_t count, int maxstrlen);
EXTERNL int NCZ_copy_data(NC_FILE_INFO_T* file, NC_TYPE_INFO_T* xtype, const void* memory, size_t count, int nofill, void* copy);
EXTERNL int NCZ_copy_data(NC_FILE_INFO_T* file, NC_VAR_INFO_T* var, const void* memory, size_t count, int noclear, void* copy);
EXTERNL int NCZ_iscomplexjson(NCjson* value, nc_type typehint);

/* zwalk.c */
Expand Down
1 change: 1 addition & 0 deletions libnczarr/zcache.h
Original file line number Diff line number Diff line change
Expand Up @@ -67,5 +67,6 @@ extern size64_t NCZ_cache_size(NCZChunkCache* cache);
extern int NCZ_buildchunkpath(NCZChunkCache* cache, const size64_t* chunkindices, struct ChunkKey* key);
extern int NCZ_ensure_fill_chunk(NCZChunkCache* cache);
extern int NCZ_reclaim_fill_chunk(NCZChunkCache* cache);
extern int NCZ_chunk_cache_modify(NCZChunkCache* cache, const size64_t* indices);

#endif /*ZCACHE_H*/
28 changes: 22 additions & 6 deletions libnczarr/zchunking.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
static int pcounter = 0;

/* Forward */
static int compute_intersection(const NCZSlice* slice, const size64_t chunklen, NCZChunkRange* range);
static int compute_intersection(const NCZSlice* slice, const size64_t chunklen, unsigned char isunlimited, NCZChunkRange* range);
static void skipchunk(const NCZSlice* slice, NCZProjection* projection);
static int verifyslice(const NCZSlice* slice);

Expand All @@ -20,30 +20,44 @@ static int verifyslice(const NCZSlice* slice);
absolute position) of the first chunk that intersects the slice
and the index of the last chunk that intersects the slice.
In practice, the count = last - first + 1 is stored instead of the last index.
Note that this n-dim array of indices may have holes in it if the slice stride
is greater than the chunk length.
@param rank variable rank
@param slices the complete set of slices |slices| == R
@param ncr (out) the vector of computed chunk ranges.
@return NC_EXXX error code
*/
int
NCZ_compute_chunk_ranges(
int rank, /* variable rank */
struct Common* common,
const NCZSlice* slices, /* the complete set of slices |slices| == R*/
const size64_t* chunklen, /* the chunk length corresponding to the dimensions */
NCZChunkRange* ncr)
{
int stat = NC_NOERR;
int i;
int rank = common->rank;

for(i=0;i<rank;i++) {
if((stat = compute_intersection(&slices[i],chunklen[i],&ncr[i])))
if((stat = compute_intersection(&slices[i],common->chunklens[i],common->isunlimited[i],&ncr[i])))
goto done;
}

done:
return stat;
}

/**
@param Compute chunk range for a single slice.
@param chunklen size of the chunk
@param isunlimited if corresponding dim is unlimited
@param range (out) the range of chunks covered by this slice
@return NC_EXX error code
*/
static int
compute_intersection(
const NCZSlice* slice,
const size64_t chunklen,
size64_t chunklen,
unsigned char isunlimited,
NCZChunkRange* range)
{
range->start = floordiv(slice->start, chunklen);
Expand All @@ -53,6 +67,9 @@ compute_intersection(

/**
Compute the projection of a slice as applied to n'th chunk.
A projection defines the set of grid points touched within a
chunk by a slice. This set of points is the "projection"
of the slice onto the chunk.
This is somewhat complex because:
1. for the first projection, the start is the slice start,
but after that, we have to take into account that for
Expand Down Expand Up @@ -295,4 +312,3 @@ clearallprojections(NCZAllProjections* nap)
}
}
#endif

24 changes: 12 additions & 12 deletions libnczarr/zchunking.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@

#include "ncexternl.h"

/* Callback functions so we can use with unit tests */

typedef int (*NCZ_reader)(void* source, size64_t* chunkindices, void** chunkdata);
struct Reader {void* source; NCZ_reader read;};

Expand All @@ -29,6 +27,10 @@ typedef struct NCZSlice {
size64_t len; /* full dimension length */
} NCZSlice;

/* A projection defines the set of grid points
for a given set of slices as projected onto
a single chunk.
*/
typedef struct NCProjection {
int id;
int skip; /* Should this projection be skipped? */
Expand All @@ -54,30 +56,31 @@ typedef struct NCZSliceProjections {
the chunk */
} NCZSliceProjections;

/* Combine some values to simplify internal argument lists */
/* Combine some values to avoid having to pass long argument lists*/
struct Common {
NC_FILE_INFO_T* file;
NC_VAR_INFO_T* var;
struct NCZChunkCache* cache;
int reading; /* 1=> read, 0 => write */
int rank;
int scalar; /* 1 => scalar variable */
size64_t* dimlens;
size64_t* chunklens;
size64_t* memshape;
size64_t dimlens[NC_MAX_VAR_DIMS];
unsigned char isunlimited[NC_MAX_VAR_DIMS];
size64_t chunklens[NC_MAX_VAR_DIMS];
size64_t memshape[NC_MAX_VAR_DIMS];
void* memory;
size_t typesize;
size64_t chunkcount; /* computed product of chunklens; warning indices, not bytes */
int swap; /* var->format_info_file->native_endianness == var->endianness */
size64_t shape[NC_MAX_VAR_DIMS]; /* shape of the output hyperslab */
NCZSliceProjections* allprojections;
/* Parametric chunk reader so we can do unittests */
/* Parametric chunk reader */
struct Reader reader;
};

/**************************************************/
/* From zchunking.c */
EXTERNL int NCZ_compute_chunk_ranges(int rank, const NCZSlice*, const size64_t*, NCZChunkRange* ncr);
EXTERNL int NCZ_compute_chunk_ranges(struct Common*, const NCZSlice*, NCZChunkRange* ncr);
EXTERNL int NCZ_compute_projections(struct Common*, int r, size64_t chunkindex, const NCZSlice* slice, size_t n, NCZProjection* projections);
EXTERNL int NCZ_compute_per_slice_projections(struct Common*, int rank, const NCZSlice*, const NCZChunkRange*, NCZSliceProjections* slp);
EXTERNL int NCZ_compute_all_slice_projections(struct Common*, const NCZSlice* slices, const NCZChunkRange*, NCZSliceProjections*);
Expand All @@ -94,10 +97,7 @@ EXTERNL size64_t NCZ_computelinearoffset(size_t, const size64_t*, const size64_t
/* Special entry points for unit testing */
struct Common;
struct NCZOdometer;
EXTERNL int NCZ_projectslices(size64_t* dimlens,
size64_t* chunklens,
NCZSlice* slices,
struct Common*, struct NCZOdometer**);
EXTERNL int NCZ_projectslices(struct Common*, NCZSlice* slices, struct NCZOdometer**);
EXTERNL int NCZ_chunkindexodom(int rank, const NCZChunkRange* ranges, size64_t*, struct NCZOdometer** odom);
EXTERNL void NCZ_clearsliceprojections(int count, NCZSliceProjections* slpv);
EXTERNL void NCZ_clearcommon(struct Common* common);
Expand Down
23 changes: 10 additions & 13 deletions libnczarr/zdim.c
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,8 @@ NCZ_def_dim(int ncid, const char *name, size_t len, int *idp)
if ((stat = nc4_check_name(name, norm_name)))
return stat;

/* Since unlimited is not supported, len > 0 */
if(len <= 0)
/* Since unlimited is supported, len >= 0 */
if(len < 0)
return NC_EDIMSIZE;

/* For classic model: dim length has to fit in a 32-bit unsigned
Expand All @@ -110,10 +110,14 @@ NCZ_def_dim(int ncid, const char *name, size_t len, int *idp)
if ((stat = nc4_dim_list_add(grp, norm_name, len, -1, &dim)))
return stat;

/* Create struct for NCZ-specific dim info. */
if (!(dim->format_dim_info = calloc(1, sizeof(NCZ_DIM_INFO_T))))
return NC_ENOMEM;
((NCZ_DIM_INFO_T*)dim->format_dim_info)->common.file = h5;
{
NCZ_DIM_INFO_T* diminfo = NULL;
/* Create struct for NCZ-specific dim info. */
if (!(diminfo = calloc(1, sizeof(NCZ_DIM_INFO_T))))
return NC_ENOMEM;
dim->format_dim_info = diminfo;
diminfo->common.file = h5;
}

/* Pass back the dimid. */
if (idp)
Expand Down Expand Up @@ -269,10 +273,3 @@ NCZ_rename_dim(int ncid, int dimid, const char *name)

return NC_NOERR;
}

int
NCZ_inq_unlimdims(int ncid, int *ndimsp, int *unlimdimidsp)
{
if(ndimsp) *ndimsp = 0;
return NC_NOERR;
}
4 changes: 2 additions & 2 deletions libnczarr/zdispatch.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ static const NC_Dispatch NCZ_dispatcher = {
NCZ_def_dim,
NCZ_inq_dimid,
NCZ_inq_dim,
NCZ_inq_unlimdim,
NC4_inq_unlimdim,
NCZ_rename_dim,

NCZ_inq_att,
Expand All @@ -65,7 +65,7 @@ static const NC_Dispatch NCZ_dispatcher = {
NCZ_def_var_fill,

NCZ_show_metadata,
NCZ_inq_unlimdims,
NC4_inq_unlimdims,

NCZ_inq_ncid,
NCZ_inq_grps,
Expand Down
37 changes: 29 additions & 8 deletions libnczarr/zodom.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ nczodom_new(int rank, const size64_t* start, const size64_t* stop, const size64_
odom->properties.start0 = 1; /* assume */
for(i=0;i<rank;i++) {
odom->start[i] = (size64_t)start[i];
odom->stop[i] = (size64_t)stop[i];
odom->stride[i] = (size64_t)stride[i];
odom->stop[i] = (size64_t)stop[i];
odom->len[i] = (size64_t)len[i];
if(odom->start[i] != 0) odom->properties.start0 = 0;
if(odom->stride[i] != 1) odom->properties.stride1 = 0;
Expand Down Expand Up @@ -131,11 +131,11 @@ buildodom(int rank, NCZOdometer** odomp)
if((odom = calloc(1,sizeof(NCZOdometer))) == NULL)
goto done;
odom->rank = rank;
if((odom->start=malloc(sizeof(size64_t)*rank))==NULL) goto nomem;
if((odom->stop=malloc(sizeof(size64_t)*rank))==NULL) goto nomem;
if((odom->stride=malloc(sizeof(size64_t)*rank))==NULL) goto nomem;
if((odom->len=malloc(sizeof(size64_t)*rank))==NULL) goto nomem;
if((odom->index=malloc(sizeof(size64_t)*rank))==NULL) goto nomem;
if((odom->start=calloc(1,(sizeof(size64_t)*rank)))==NULL) goto nomem;
if((odom->stop=calloc(1,(sizeof(size64_t)*rank)))==NULL) goto nomem;
if((odom->stride=calloc(1,(sizeof(size64_t)*rank)))==NULL) goto nomem;
if((odom->len=calloc(1,(sizeof(size64_t)*rank)))==NULL) goto nomem;
if((odom->index=calloc(1,(sizeof(size64_t)*rank)))==NULL) goto nomem;
*odomp = odom; odom = NULL;
}
done:
Expand Down Expand Up @@ -168,7 +168,6 @@ nczodom_skipavail(NCZOdometer* odom)
odom->index[odom->rank-1] = odom->stop[odom->rank-1];
}

#if 0
size64_t
nczodom_laststride(const NCZOdometer* odom)
{
Expand All @@ -182,4 +181,26 @@ nczodom_lastlen(const NCZOdometer* odom)
assert(odom != NULL && odom->rank > 0);
return odom->len[odom->rank-1];
}
#endif

void
nczodom_print(const NCZOdometer* odom)
{
size_t i;
fprintf(stderr,"odom{rank=%d offset=%llu avail=%llu",odom->rank,nczodom_offset(odom),nczodom_avail(odom));
fprintf(stderr," start=(");
for(i=0;i<odom->rank;i++) {fprintf(stderr,"%s%llu",(i==0?"":" "),(unsigned long long)odom->start[i]);}
fprintf(stderr,")");
fprintf(stderr," stride=(");
for(i=0;i<odom->rank;i++) {fprintf(stderr,"%s%llu",(i==0?"":" "),(unsigned long long)odom->stride[i]);}
fprintf(stderr,")");
fprintf(stderr," stop=(");
for(i=0;i<odom->rank;i++) {fprintf(stderr,"%s%llu",(i==0?"":" "),(unsigned long long)odom->stop[i]);}
fprintf(stderr,")");
fprintf(stderr," len=(");
for(i=0;i<odom->rank;i++) {fprintf(stderr,"%s%llu",(i==0?"":" "),(unsigned long long)odom->len[i]);}
fprintf(stderr,")");
fprintf(stderr," index=(");
for(i=0;i<odom->rank;i++) {fprintf(stderr,"%s%llu",(i==0?"":" "),(unsigned long long)odom->index[i]);}
fprintf(stderr,")");
fprintf(stderr,"}\n");
}
5 changes: 4 additions & 1 deletion libnczarr/zodom.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ struct NCZSlice;
typedef struct NCZOdometer {
int rank; /*rank */
size64_t* start;
size64_t* stop; /* start + (count*stride) */
size64_t* stride;
size64_t* stop; /* start + (count*stride) */
size64_t* len; /* for computing offset */
size64_t* index; /* current value of the odometer*/
struct NCZOprop {
Expand All @@ -34,5 +34,8 @@ extern void nczodom_reset(NCZOdometer* odom);
extern void nczodom_free(NCZOdometer*);
extern size64_t nczodom_avail(const NCZOdometer*);
extern void nczodom_skipavail(NCZOdometer* odom);
extern size64_t nczodom_laststride(const NCZOdometer* odom);
extern size64_t nczodom_lastlen(const NCZOdometer* odom);
extern void nczodom_print(const NCZOdometer* odom);

#endif /*ZODOM_H*/
Loading

0 comments on commit 0f6a00e

Please sign in to comment.