Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"out of memory" on V100 #15

Open
ax3l opened this issue Mar 12, 2018 · 5 comments
Open

"out of memory" on V100 #15

ax3l opened this issue Mar 12, 2018 · 5 comments

Comments

@ax3l
Copy link
Member

ax3l commented Mar 12, 2018

cuda_memtest seems to abort with "out of memory" (line 148 in cuda_memtests.cu) when run in a container (nvidia-docker1 and 2) on V100 GPUs.

The problem might be a general one or just triggered in PIConGPU. Needs investigation. Maybe just multiple-times assigned from mpiInfo...

Occurred with a 4 & 8 GPU PIConGPU lwfa example on a DGX-1.

@RenaKunisaki
Copy link

I have the same problem and not using Docket:

~> ocl_memtest 
hostname is guilmon
CL_PLATFORM_NAME: 	NVIDIA CUDA
CL_PLATFORM_VERSION: 	OpenCL 1.2 CUDA 10.2.120
                  	Device 0 is CL_DEVICE_TYPE_GPU, "GeForce GTX 950"
allocated 340 Mbytes from device 0
[05/17/2019 15:33:40][guilmon][0]:Test0 [Walking 1 bit]
[05/17/2019 15:33:40][guilmon][0]:Test0: global walk test
ERROR: opencl call failed with rc(-5), line 39, file ocl_tests.cpp
Error: Out of resources

(Does that just mean the test failed?)

@psychocoderHPC
Copy link
Member

@RenaKunisaki We never tested the opencl version of cuda_memtest.
Depending of the driver version OpenCL is not able to allocate 100% of the main gpu memory.
Could you rerun your your test with cuda_memtest?

@ax3l
Copy link
Member Author

ax3l commented May 17, 2019 via email

@RenaKunisaki
Copy link

I installed it from Arch package (AUR) and I don't seem to have cuda_memtest binary. I will try without X running though.

@ax3l
Copy link
Member Author

ax3l commented May 20, 2019

Oh, if you are taking the aur package (here?) it will take the legacy sourceforge version. We haven't seen much activity on that one since years and thus update and fix our own forked CUDA version here.

If you find updates to the OpenCL version we will gladly review and merge pull requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants