Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gds support? #2

Open
2 tasks
zeronewb opened this issue Apr 13, 2024 · 4 comments
Open
2 tasks

Gds support? #2

zeronewb opened this issue Apr 13, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@zeronewb
Copy link

NVIDIA Open GPU Kernel Modules Version

NONE

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

None

Kernel Release

None

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

None

Describe the bug

Howdy! Thank you so much for this work!
Kinda stupid question, could we use same hack for gds support, for weights offloading?
Thanks!

To Reproduce

None

Bug Incidence

Once

nvidia-bug-report.log.gz

None

More Info

No response

@zeronewb zeronewb added the bug Something isn't working label Apr 13, 2024
@johnnynunez
Copy link

Yes gds it would be nice, because only direct storage(https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html) it is working with GPU-DALI(https://github.com/NVIDIA/DALI)

@geohot
Copy link

geohot commented Apr 15, 2024

I don't know much about this, but the same idea should work. Would merge clean working GDS.

@johnnynunez
Copy link

johnnynunez commented Apr 15, 2024

I don't know much about this, but the same idea should work. Would merge clean working GDS.

gpu-dali is for all gpu cards, but nvidia gds(direct storage, the name now is Magnum IO) is only for professional gpus...
so it should be compatible because... if gpu-dali is working, magnum IO should too. It is a litle bit confusing because it's similar but..

NVIDIA DALI:

DALI is a library that accelerates data loading and preprocessing in deep learning applications. It is designed to improve input/output and data processing efficiency by shifting these tasks to the GPU, thereby freeing CPU resources for other operations.
It enables a variety of preprocessing operations such as image decoding, transformations, and data augmentation directly on the GPU, which can be extremely useful in computer vision and image processing workflows.
It facilitates integration with popular deep learning frameworks such as TensorFlow and PyTorch.
NVIDIA Magnum IO GPUDirect Storage:

GPUDirect Storage is part of NVIDIA's Magnum IO suite of technologies designed to optimize and accelerate data transfer between storage and GPUs.
It enables applications to read and write directly to GPU memory from storage, avoiding bottlenecks associated with data transfer through CPU and system memory. This is crucial for applications that handle large data sets such as simulations, big data analytics, and other high-performance tasks.
It reduces latency and increases performance by enabling faster and more direct transfers of large volumes of data to and from GPUs.

@cyberluke
Copy link

Hm, at first I was confused with this NVIDIA marketing stuff, but it seems to be outdated and now GPUDirect Storage should work on alll cards and any NVMe drive. The only thing you need is cufile for Linux and Windows.

By the way Microsoft have their own implementation compatible with NV GDS, they call it something like DirectStorage (without GPU) and it is aimed at games as well. So right now I look at Windows 11 settings with RTX 4090 and Windows 11 reports I have GPU Direct enabled out of the box and it is fully supported.

Regarding NV, I guess they update GPUDirect Storage implementation in drivers and it is enabled without their Magnum NVMe. Also what if we do it like Windows 95 style from old era PCs and create a RAM disk out of RAM and present it as SSD or NVMe storage (Microsoft DirectStorage supports also SSD, not only NVMe),,,,but with ramdisk we should have even faster beast because ram is faster than nvme, right? But it would use DirectStorage API and work with it like with NVMe directly, so it shouldf not use CPU.

Currently I'm looking for cufile for Windows because I cannot find it and on Github you have several GDS and GPUDirect implementations and the only dependency is cufile - Cuda kernel that is able to leverage file storage - that is also how DALI, TRITON, DeepSpeed and GPUDirect Storage works internally. DALI only has some pool of buffers for streaming on top of cufile and they claim it is faster than just cufile. But it can be just NV marketing toy because no one else on the market do it this way and nobody showed any kind of benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants