Gds support? #2

zeronewb · 2024-04-13T11:59:26Z

NVIDIA Open GPU Kernel Modules Version

NONE

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

I confirm that this does not happen with the proprietary driver package.

Operating System and Version

None

Kernel Release

None

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

I am running on a stable kernel release.

Hardware: GPU

None

Describe the bug

Howdy! Thank you so much for this work!
Kinda stupid question, could we use same hack for gds support, for weights offloading?
Thanks!

To Reproduce

None

Bug Incidence

Once

nvidia-bug-report.log.gz

None

More Info

No response

johnnynunez · 2024-04-15T11:12:40Z

Yes gds it would be nice, because only direct storage(https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html) it is working with GPU-DALI(https://github.com/NVIDIA/DALI)

geohot · 2024-04-15T21:46:05Z

I don't know much about this, but the same idea should work. Would merge clean working GDS.

johnnynunez · 2024-04-15T22:30:31Z

I don't know much about this, but the same idea should work. Would merge clean working GDS.

gpu-dali is for all gpu cards, but nvidia gds(direct storage, the name now is Magnum IO) is only for professional gpus...
so it should be compatible because... if gpu-dali is working, magnum IO should too. It is a litle bit confusing because it's similar but..

NVIDIA DALI:

DALI is a library that accelerates data loading and preprocessing in deep learning applications. It is designed to improve input/output and data processing efficiency by shifting these tasks to the GPU, thereby freeing CPU resources for other operations.
It enables a variety of preprocessing operations such as image decoding, transformations, and data augmentation directly on the GPU, which can be extremely useful in computer vision and image processing workflows.
It facilitates integration with popular deep learning frameworks such as TensorFlow and PyTorch.
NVIDIA Magnum IO GPUDirect Storage:

GPUDirect Storage is part of NVIDIA's Magnum IO suite of technologies designed to optimize and accelerate data transfer between storage and GPUs.
It enables applications to read and write directly to GPU memory from storage, avoiding bottlenecks associated with data transfer through CPU and system memory. This is crucial for applications that handle large data sets such as simulations, big data analytics, and other high-performance tasks.
It reduces latency and increases performance by enabling faster and more direct transfers of large volumes of data to and from GPUs.

cyberluke · 2025-01-23T05:33:28Z

Hm, at first I was confused with this NVIDIA marketing stuff, but it seems to be outdated and now GPUDirect Storage should work on alll cards and any NVMe drive. The only thing you need is cufile for Linux and Windows.

By the way Microsoft have their own implementation compatible with NV GDS, they call it something like DirectStorage (without GPU) and it is aimed at games as well. So right now I look at Windows 11 settings with RTX 4090 and Windows 11 reports I have GPU Direct enabled out of the box and it is fully supported.

Regarding NV, I guess they update GPUDirect Storage implementation in drivers and it is enabled without their Magnum NVMe. Also what if we do it like Windows 95 style from old era PCs and create a RAM disk out of RAM and present it as SSD or NVMe storage (Microsoft DirectStorage supports also SSD, not only NVMe),,,,but with ramdisk we should have even faster beast because ram is faster than nvme, right? But it would use DirectStorage API and work with it like with NVMe directly, so it shouldf not use CPU.

Currently I'm looking for cufile for Windows because I cannot find it and on Github you have several GDS and GPUDirect implementations and the only dependency is cufile - Cuda kernel that is able to leverage file storage - that is also how DALI, TRITON, DeepSpeed and GPUDirect Storage works internally. DALI only has some pool of buffers for streaming on top of cufile and they claim it is faster than just cufile. But it can be just NV marketing toy because no one else on the market do it this way and nobody showed any kind of benchmark.

zeronewb added the bug Something isn't working label Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gds support? #2

Gds support? #2

zeronewb commented Apr 13, 2024

johnnynunez commented Apr 15, 2024

geohot commented Apr 15, 2024

johnnynunez commented Apr 15, 2024 •

edited

Loading

cyberluke commented Jan 23, 2025

Gds support? #2

Gds support? #2

Comments

zeronewb commented Apr 13, 2024

NVIDIA Open GPU Kernel Modules Version

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Kernel Release

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

Describe the bug

To Reproduce

Bug Incidence

nvidia-bug-report.log.gz

More Info

johnnynunez commented Apr 15, 2024

geohot commented Apr 15, 2024

johnnynunez commented Apr 15, 2024 • edited Loading

cyberluke commented Jan 23, 2025

johnnynunez commented Apr 15, 2024 •

edited

Loading