Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small BAR Size Support? #13

Open
Qubitium opened this issue Jun 12, 2024 · 3 comments
Open

Small BAR Size Support? #13

Qubitium opened this issue Jun 12, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Qubitium
Copy link

Qubitium commented Jun 12, 2024

NVIDIA Open GPU Kernel Modules Version

550.90.07

Operating System and Version

Ubuntu 22.04

Kernel Release

6.8.9

Hardware: GPU

4090

Describe the bug

We have tested the modified kernel on two systems: 1x intel desktop (with full BAR=32GB), 1x amd server (without resizable_bar, BAR=256MB,512MB)

On the intel with full 32GB BAR size for the 2x4090, NCCL/P2P test is passing with modded driver.

However on the amd server platform where bios doesn't support resizable_bar, nvidia-smi is only showing 256MB and 512MB bar sizes for the 2x4090. On this amd server, even with the this modded nvidia driver, NCCL/P2P tests failed. The amd server also has lots of pcie devices so it may be running out of pcie map space to assign the large 32GB bars that 4090 support.

So my question is, is the current P2P+4090 code only working if BAR size >= full 4090 GPU vram size? Thank you!

@Qubitium Qubitium added the bug Something isn't working label Jun 12, 2024
@Qubitium Qubitium changed the title Small Bar Size Small BAR Size Support? Jun 12, 2024
@puppetm4st3r
Copy link

same question here =)

@ex3ndr
Copy link

ex3ndr commented Jun 30, 2024

What motherboard is AMD? I just updated bios on mine and it got Re-Size Bar support

@adonishong
Copy link

same question here, here is our configuration

dual Epyc 7742, supermicro H12DSG system

here is the lspci information

01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Micro-Star International Co., Ltd. [MSI] GA102 [GeForce RTX 3090]
Flags: bus master, fast devsel, latency 0, IRQ 402, NUMA node 0
Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Memory at 40090000000 (64-bit, prefetchable) [size=256M]
Memory at 400a0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 1000 [size=128]
Expansion ROM at f7000000 [virtual] [disabled] [size=512K]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

here is the error information from simpeP2P test
Enabling peer access between GPU0 and GPU1...
CUDA error at simpleP2P.cu:129 code=205(cudaErrorMapBufferObjectFailed) "cudaDeviceEnablePeerAccess(gpuid[1], 0)"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants