Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for working with labeled multidimensional data? #236

Open
aiqc opened this issue Jul 22, 2021 · 2 comments
Open

API for working with labeled multidimensional data? #236

aiqc opened this issue Jul 22, 2021 · 2 comments

Comments

@aiqc
Copy link

aiqc commented Jul 22, 2021

Hi there,

Extremely excited about the potential of dagger and that the contributor base is growing.

In the long term, is there any plan for an API for working with labeled multi-dimensional data:
http://xarray.pydata.org/en/stable/getting-started-guide/why-xarray.html#core-data-structures

  • User-defined names for columns and channels (higher dimensions).
  • Attributes like shape, index, and dtype at each dimension.
  • group_by across across dimensions.

It looks like these projects started but stopped:

@jpsamaroo
Copy link
Member

Hey! I think the approach I'd like to take with Dagger is to have two different options for arrays:

  1. Use a Dagger.DArray to wrap other arrays
  2. Let packages implement their array functionality on top of Dagger or the DArray

For xarrays (which https://github.com/meggart/YAXArrays.jl seems to be the active Julia implemention; check the commit log), I think the latter option would probably make the most sense, since it seems like the xarray has some extra semantics and operations that regular AbstractArray subtypes don't implement. Option 1 is better for "simple" arrays like GPU arrays or various LinearAlgebra array types. There was discussion at JuliaIO/DiskArrays.jl#34 about integrating DiskArrays with Dagger, which I'd like to help support. Since DiskArrays is a dependency of YAXArrays, adding Dagger there should also provide features to YAXArrays.

@rafaqz
Copy link

rafaqz commented Jul 24, 2021

DimensionalData.jl/GeoData.jl are equally active and more widely used than YAXarrays.jl.

But integrating DiskArrays.jl is the best option as its depended on by both GeoData.jl and YAXArrays.jl.

Mostly these wrapper arrays need to be the outside wrapper, as you say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants