Bgen to zarr (v2) #22

eric-czech · 2020-09-22T20:45:52Z

This is a second iteration on #21.

Notes:

The conversion to zarr is done via rechunker instead of through custom code and an intermediate store.
This expects Xarray integration in rechunker, so it won't really be ready until Add rechunking for Xarray datasets pangeo-data/rechunker#52 is done. For now I pointed the build at my fork.
There are two functions added: bgen_to_zarr and rechunk_bgen. I'd expect bgen_to_zarr to be less useful than rechunk_bgen, but it's there for better consistency with sgkit-vcf.vcf_reader. The main advantage of working with datasets rather than paths is that it's easier to attach custom sample/variant metadata (as is usually the case w/ plink/bgen) and have it run through the same rechunking flow.

ravwojdyla · 2020-10-01T09:50:45Z

I suggest we review/merge this before https://github.com/pystatgen/sgkit/issues/256. @eric-czech is this good for review?

eric-czech · 2020-10-01T11:44:41Z

@eric-czech is this good for review?

Yep, there are still a couple things that will need to change after pangeo-data/rechunker#52 but otherwise it is ready to go. I haven't tested this at scale like the old version, but I'll leave any problems with that off for later.

I also want to add an import guard for rechunker and make it an optional dependency, but I'll follow the other examples we have for it when I get there.

eric-czech · 2020-10-05T17:36:59Z

Hey @tomwhite, can you review this when you get a chance? I'd like to merge it here so that I can then work on merging the whole repo (as @ravwojdyla suggested).

tomwhite

This looks good to me.

eric-czech added 2 commits September 3, 2020 17:52

bgen_to_zarr implementation sgkit-dev#16

4ee4ee2

Bgen to zarr implementation sgkit-dev#16

bbdbb56

eric-czech mentioned this pull request Sep 22, 2020

Bgen to zarr (v1) #21

Closed

Pin bgen-reader version

499280c

eric-czech force-pushed the bgen_to_zarr_rechunker branch from 540eb48 to 499280c Compare September 22, 2020 22:33

ravwojdyla mentioned this pull request Oct 1, 2020

Move sgkit-bgen to main sgkit repo sgkit-dev/sgkit#256

Closed

Change dependency to rechunker master

a55b1cc

eric-czech force-pushed the bgen_to_zarr_rechunker branch from 208e1e4 to a55b1cc Compare October 5, 2020 17:34

tomwhite approved these changes Oct 6, 2020

View reviewed changes

eric-czech merged commit 8a87e82 into sgkit-dev:master Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bgen to zarr (v2) #22

Bgen to zarr (v2) #22

eric-czech commented Sep 22, 2020 •

edited

Loading

ravwojdyla commented Oct 1, 2020

eric-czech commented Oct 1, 2020

eric-czech commented Oct 5, 2020

tomwhite left a comment

Bgen to zarr (v2) #22

Bgen to zarr (v2) #22

Conversation

eric-czech commented Sep 22, 2020 • edited Loading

ravwojdyla commented Oct 1, 2020

eric-czech commented Oct 1, 2020

eric-czech commented Oct 5, 2020

tomwhite left a comment

Choose a reason for hiding this comment

eric-czech commented Sep 22, 2020 •

edited

Loading