Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Bgen to zarr (v2) #22

Merged
merged 4 commits into from
Oct 6, 2020
Merged

Conversation

eric-czech
Copy link
Collaborator

@eric-czech eric-czech commented Sep 22, 2020

#16

This is a second iteration on #21.

Notes:

  • The conversion to zarr is done via rechunker instead of through custom code and an intermediate store.
  • This expects Xarray integration in rechunker, so it won't really be ready until Add rechunking for Xarray datasets pangeo-data/rechunker#52 is done. For now I pointed the build at my fork.
  • There are two functions added: bgen_to_zarr and rechunk_bgen. I'd expect bgen_to_zarr to be less useful than rechunk_bgen, but it's there for better consistency with sgkit-vcf.vcf_reader. The main advantage of working with datasets rather than paths is that it's easier to attach custom sample/variant metadata (as is usually the case w/ plink/bgen) and have it run through the same rechunking flow.

@eric-czech eric-czech mentioned this pull request Sep 22, 2020
@eric-czech eric-czech force-pushed the bgen_to_zarr_rechunker branch from 540eb48 to 499280c Compare September 22, 2020 22:33
@ravwojdyla
Copy link
Collaborator

I suggest we review/merge this before https://github.com/pystatgen/sgkit/issues/256. @eric-czech is this good for review?

@eric-czech
Copy link
Collaborator Author

@eric-czech is this good for review?

Yep, there are still a couple things that will need to change after pangeo-data/rechunker#52 but otherwise it is ready to go. I haven't tested this at scale like the old version, but I'll leave any problems with that off for later.

I also want to add an import guard for rechunker and make it an optional dependency, but I'll follow the other examples we have for it when I get there.

@eric-czech eric-czech force-pushed the bgen_to_zarr_rechunker branch from 208e1e4 to a55b1cc Compare October 5, 2020 17:34
@eric-czech
Copy link
Collaborator Author

Hey @tomwhite, can you review this when you get a chance? I'd like to merge it here so that I can then work on merging the whole repo (as @ravwojdyla suggested).

Copy link
Collaborator

@tomwhite tomwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@eric-czech eric-czech merged commit 8a87e82 into sgkit-dev:master Oct 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants