-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"out of memory" on V100 #15
Comments
I have the same problem and not using Docket:
(Does that just mean the test failed?) |
@RenaKunisaki We never tested the opencl version of cuda_memtest. |
Also take care if your X server is running on the same device.
|
I installed it from Arch package (AUR) and I don't seem to have |
Oh, if you are taking the aur package (here?) it will take the legacy sourceforge version. We haven't seen much activity on that one since years and thus update and fix our own forked CUDA version here. If you find updates to the OpenCL version we will gladly review and merge pull requests. |
cuda_memtest seems to abort with "out of memory" (line 148 in cuda_memtests.cu) when run in a container (nvidia-docker1 and 2) on V100 GPUs.
The problem might be a general one or just triggered in PIConGPU. Needs investigation. Maybe just multiple-times assigned from
mpiInfo
...Occurred with a 4 & 8 GPU PIConGPU lwfa example on a DGX-1.
The text was updated successfully, but these errors were encountered: