-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting RuntimeError: CUDA error: an illegal memory access was encountered with 3090s #4
Comments
fwiw, others are experiencing similar problems. I'm experiencing the same error in text-generation-ui at inference (using the ExLlamav2 model loader) after enabling resizable bars on two 3090s, but before installing this driver. It may be a problem in text-generation-ui? In any case, I'm following for a solution. |
If you're referring to that post on y-combinator, that is me. I got this error after installing this driver. |
This is only tested on 4090s, no idea if it works on anything else. Though if you don't have large BAR on your 3090s, I can confirm it won't work. |
I did check with lspci and all my GPUs show the 32G line. Not sure why I'm getting this error. I'm on a fresh ubuntu install. I don't have IOMMU enabled in the ubuntu grub settings but I think I still didn't disable it in my BIOS. Will try that and see if that is the problem. Edit: I disabled IOMMU in the BIOS but still see this error. |
This is working for me with 3090s. Didn't have to do anything but enable resizable BAR in the bios. Ensure you have the correct driver version installed. Low perf here is probably from the motherboard. nvidia-smi
p2pBandwidthLatencyTest
NCCL
vs. NCCL_P2P_DISABLE=1
|
Hi @brthor, how did you enable large bar1 in 3090s? Can you share your method if you don't mind? Or is there any tutorial/instructions anywhere? Thank you! |
Your GPU will have it if your motherboard supports it and you have it turned on. |
Like turn it on in the BIOS of motherboard? Which motherboard are you using? Do GPU vbios or firmware need to be updated? |
Yes you just turn it on in BIOS. Make sure you have above 4G decoding and rebar support enabled. My TR Zenith II Extreme has it and the GPUs show large bar support. I have an EPYC supermicro H12SSLi that doesn't have rebar in the bios so the 3090s don't show it when checked. |
It helps a lot, thank you! |
@t13m Resizeable bar must be supported in the vBios of the gpu first of all, this has been the case with the 3090s I have. If you don't have motherboard support you may be able to use https://github.com/xCuri0/ReBarUEFI You can also try setting |
I'm perplexed as to why isn't this more popular? Another question. Could I mix a 4090 with a 3090? What would be the drawbacks? I would like to get the benefits of more memory vs more performance. Is performance the only downside in running a 3090/4090 combo? |
Yes if you have a 4090 and just want more memory, a 3090 will do that. However, you would be stuck at the 3090s performance level. I would personally prefer to have 3x 3090s vs 1x 4090 and 1x 3090. |
NVIDIA Open GPU Kernel Modules Version
this one
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 22.04.4 LTS
Kernel Release
6.5.0-27-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
all (4x 3090)
Describe the bug
I installed this driver, and torch.cuda.can_device_access_peer(a, b) gives me TRUE for all gpus.
I get the following error when textgenwebui tries to load a model:
Aphrodite also crashes when loading any model.
To Reproduce
I installed this driver on ubuntu.
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
No response
The text was updated successfully, but these errors were encountered: