Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native support for cuPy/cuDF backed Anndata #355

Open
cjnolet opened this issue Apr 21, 2020 · 5 comments
Open

Native support for cuPy/cuDF backed Anndata #355

cjnolet opened this issue Apr 21, 2020 · 5 comments

Comments

@cjnolet
Copy link

cjnolet commented Apr 21, 2020

This would be immensely useful for the GPU data science community as it would start to enable pipelines fully on GPU.

@ivirshup
Copy link
Member

This would be great. I'm not sure how we'd run in on CI though. AFAIK cupy and cudf don't have a "mock gpu" backend at the moment, right?

@ivirshup
Copy link
Member

Looks like the uarray project could be helpful in implementing the cupy side of this.

As for dataframes, this conversation on the ossdata discourse is probably worth following.

@daxiongshu
Copy link

Any update on this issue?

@ivirshup ivirshup mentioned this issue Jul 27, 2023
2 tasks
@ivirshup
Copy link
Member

ivirshup commented Jul 27, 2023

I've opened #1080 to track cupy support. @Intron7, what do you think about cuDF support? Is it a high priority?

I'm kinda eye-ing the dataframe-api, which maybe we could leverage for more dataframe types. If that pans out we could go for cuDF support via that.

@Intron7
Copy link
Member

Intron7 commented Jul 27, 2023

@cjnolet @ivirshup I think in general, it's a good idea to support cudf in the long term. cudf is insanely fast when it comes to correlations (significantly faster than cupy) and other math-related tasks. However, as far as I know, it still has some issues with apply and categorical data. I attempted to use cudf for the GPU port of squidpy's ligrec, but it lacked some key features. What would be advantageous in the future is a fully dynamic anndata where you can seamlessly switch everything in and out of VRAM, and cudf has the potential to assist with that.

It would be beneficial to load data directly to the GPU from text files. Unfortunately, h5 files are not yet accelerated. Currently, I'm also concerned about VRAM in general. There isn't much heavy math computation going on in dataframes for anndata.

The areas where I foresee the most immediate benefits are likely the creation of .X .Layers etc. from cudf.Dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants