Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“out of resource" error on certain nvidia GPUs #3

Open
Jimmy-Z opened this issue Mar 29, 2018 · 11 comments
Open

“out of resource" error on certain nvidia GPUs #3

Jimmy-Z opened this issue Mar 29, 2018 · 11 comments

Comments

@Jimmy-Z
Copy link
Owner

Jimmy-Z commented Mar 29, 2018

Related:

zoogie/seedminer#16
https://gbatemp.net/posts/7851408/
https://gbatemp.net/posts/7879961/
https://gbatemp.net/posts/7826386/
https://gbatemp.net/posts/7851868/
https://gbatemp.net/posts/7884725/
https://gbatemp.net/posts/7881547/

I'll need testers and reports, including these info:

your GPU model, GPU RAM size, OS version, driver version.
bfcl info output
does seedminer's GPU mode throw a "out of resources" error for you? (yes I also need successful reports)
if the former is true, try the following build with two test commands, does it also say "out of resources"?

Test build:
bfCL-test-reduced-work-size-msky-lfcs-20.zip

Two test commands:
bfcl lfcs 00000007 0000 17f5c00d8b581e5e
bfcl msky c27164f2e0994db8000000007dd5c901 afcb0cc132bd2aeb8e0a6b6a841c51c0

Techy stuff

Despite what it looks like, this doesn't mean your GPU is not powerful/big enough, this program works on Intel IGPU and uses about several KB of GPU RAM, it's more like a OpenCL runtime bug from nvidia to me.

A reduced work size(from a little above 100,000,000 to 1,000,000) helped this guy with a GTX 980, so I guess this is the problem.

from OpenCL SDK document:

global_work_size
Points to an array of work_dim unsigned values that describe the number of global work-items in work_dim dimensions that will execute the kernel function. The total number of global work-items is computed as global_work_size[0] ... global_work_size[work_dim - 1].

The values specified in global_work_size cannot exceed the range given by the sizeof(size_t) for the device on which the kernel execution will be enqueued. The sizeof(size_t) for a device can be determined using CL_DEVICE_ADDRESS_BITS in the table of OpenCL Device Queries for clGetDeviceInfo. If, for example, CL_DEVICE_ADDRESS_BITS = 32, i.e. the device uses a 32-bit address space, size_t is a 32-bit unsigned integer and global_work_size values must be in the range 1 .. 2^32 - 1. Values outside this range return a CL_OUT_OF_RESOURCES error.

nvidia runtime announces GTX 980's address bits = 64, and 100,000,000 is no where near that.

@dgc1980
Copy link

dgc1980 commented Mar 29, 2018

1 platform(s) found:
=== 0x0283b480 ===
name : NVIDIA CUDA
vendor : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 9.1.75
1 device(s) found:
=== 0x0283b890 ===
name : GeForce GT 730
vendor : NVIDIA Corporation
version : OpenCL 1.1 CUDA
C version : OpenCL C 1.1
max compute units : 2
max work group size : 1024
type : GPU
available : yes
compiler available : yes
endian : little
frequency : 1400
global memory : 2147483648
local memory : 49152

since you wanted it here,
I was able to bruteforce the test of the msky no problem just slow as fuck
i cancelled the mii bruteforce after it offset 1
but the out of resources problem seems to be fixed for that at least :)

@dgc1980
Copy link

dgc1980 commented Mar 29, 2018

I also tried this version on my 1060, it lowered the speed by like 10%,
OCed i get about 700 M/s
now I get 630 M/s with this test build

@A7F
Copy link

A7F commented Mar 30, 2018

Hi! I got the "out of the resource" error running seedminer gpu but I'm quite out of the loop in the 3ds hacking scene, also I'm not really into these things... However, I'm glad to help you providing as much informations as possible!

What I got running your test build exe:

selected device GeForce GT 545 on platform NVIDIA CUDA
mbed TLS 2.7.0, AES-NI supported
self-test/benchmark mode
AES Key: 0d0b8bd02564dd0351d7e415e6f23f36
randomize source buffer using AES OFB
0.119 seconds for preparing test data, 562.03 MB/s
0.006 seconds for OpenCL compiling
0.031 seconds for data upload, 2195.25 MB/s
# sha1_16_test on 64 MB
0.047 seconds for OpenCL, 1419.57 MB/s
0.033 seconds for data download, 2059.82 MB/s
0.630 seconds for reference C(single thread), 106.49 MB/s
sha1_16_test: succeed
# aes_enc_128_test on 64 MB
0.339 seconds for OpenCL, 198.01 MB/s
0.019 seconds for data download, 3495.44 MB/s
0.202 seconds for reference C(single thread), 332.98 MB/s
aes_enc_128_test: succeed
# aes_dec_128_test on 64 MB
0.385 seconds for OpenCL, 174.09 MB/s
0.018 seconds for data download, 3667.35 MB/s
aes_dec_128_test: succeed
Premere un tasto per continuare . . .

seedminer gpu command output:

GPU selected
New3DS msed
LFCS      : 0x3d835e8
msed3 est : 0x80c4e550
Error est : -3516
ID0 hash 0: 199aa39d36207269e63a7d4402b97d32
Hash total: 1
movable_part2.sed generation success
bfcl msky e835d803020000000000000050e5c480 199aa39d36207269e63a7d4402b97d32 00000000
selected device GeForce GT 545 on platform NVIDIA CUDA
0.011 seconds for OpenCL compiling
local work size: 1024
ocl_assert: ocl_brute.c, function ocl_brute_msky, line 383
        clEnqueueReadBuffer(command_queue, mem_out, CL_TRUE, 0, sizeof(cl_uint), &out, 0, NULL, NULL)
error: out of resources

My current setup:

Microsoft Windows 10 (10.0) Professional 64-bit   (Build 16299)
Intel i7 2600 @3.40GHz
14GB DDR3 RAM dual channel  
DirectX Version 12.0
NVIDIA GeForce GT 545, 3 GB DDR3
GPU Manufacturer: Micro-Star International Co., Ltd. (MSI)
Driver version 390.77
API Direct3D version 11.2
144 CUDA Cores
Win32_VideoController		DriverVersion = 23.21.13.9077
Win32_VideoController		DriverDate = 01/23/2018

If you want me to test something, or if you need further informations, just let me know. :)

@Jimmy-Z
Copy link
Owner Author

Jimmy-Z commented Mar 31, 2018

@A7F thanks but you should run that test build with that two test command I gave in the OP.

@R1884
Copy link

R1884 commented Mar 31, 2018

Operating System: Windows 10 Pro, 64-bit
GPU: GeForce GT 750M
GPU RAM: 2048 MB GDDR5
Driver version: 381.65

bfcl info:
name : NVIDIA CUDA
vendor : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 8.0.0
1 device(s) found:
=== 0x00141430 ===
name : GeForce GT 750M
vendor : NVIDIA Corporation
version : OpenCL 1.2 CUDA
C version : OpenCL C 1.2
max compute units : 2
max work group size : 1024
type : GPU
available : yes
compiler available : yes
endian : little
frequency : 967
global memory : 2147483648
local memory : 49152

py -3 seedminer_launcher3.py gpu:
selected device GeForce GT 750M on platform NVIDIA CUDA
0.015 seconds for OpenCL compiling
local work size: 1024
ocl_assert: ocl_brute.c, function ocl_brute_msky, line 383
clEnqueueReadBuffer(command_queue, mem_out, CL_TRUE, 0, sizeof(cl_uint), &out, 0, NULL, NULL)
error: out of resources

bfcl msky c27164f2e0994db8000000007dd5c901 afcb0cc132bd2aeb8e0a6b6a841c51c0:
selected device GeForce GT 750M on platform NVIDIA CUDA
0.290 seconds for OpenCL compiling
local work size: 1024
got a hit: c27164f2e0994db82e3d14737dd5c901
24.48 seconds, 78.88 M/s

bfcl lfcs 00000007 0000 17f5c00d8b581e5e:
How long should I expect this one to take? It hasn't thrown the "out of resources" error but it's taking a while.

@A7F
Copy link

A7F commented Mar 31, 2018

bfcl info

1 platform(s) found:
=== 0x0011f270 ===
name    : NVIDIA CUDA
vendor  : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 9.1.84
        1 device(s) found:
        === 0x0011e8c0 ===
        name : GeForce GT 545
        vendor : NVIDIA Corporation
        version : OpenCL 1.1 CUDA
        C version : OpenCL C 1.1
        max compute units : 3
        max work group size : 1024
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1440
        global memory : 3221225472
        local memory : 49152

bfcl msky c27164f2e0994db8000000007dd5c901 afcb0cc132bd2aeb8e0a6b6a841c51c0

selected device GeForce GT 545 on platform NVIDIA CUDA
0.230 seconds for OpenCL compiling
local work size: 1024
got a hit: c27164f2e0994db82e3d14737dd5c901
38.61 seconds, 50.00 M/s

the first command doesn't say out of resource but only shows this:

selected device GeForce GT 545 on platform NVIDIA CUDA
0.003 seconds for OpenCL compiling
local work size: 1024
0

am I supposed to wait? Because it was something like 20min with that output

@Jimmy-Z
Copy link
Owner Author

Jimmy-Z commented Mar 31, 2018

@A7F Sorry I should have add that if the test command runs a few seconds without "out of resources" error, it's safe to cancel it with ctrl-c.

@knight-ryu12
Copy link

knight-ryu12 commented Apr 14, 2018

this happen with OC'd GPU cards.
Mine is GTX960 card with 4G GDDR5, OverClockable.

1 platform(s) found:
=== 0x007aa420 ===
name    : NVIDIA CUDA
vendor  : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 9.1.84
        1 device(s) found:
        === 0x007a97a0 ===
        name : GeForce GTX 960
        vendor : NVIDIA Corporation
        version : OpenCL 1.2 CUDA
        C version : OpenCL C 1.2
        max compute units : 8
        max work group size : 1024
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1253
        global memory : 0
        local memory : 49152

@NeroReflex
Copy link

I could NOT accomplish my task with a nVidia 920m (yes, this is a laptopg GPU).

Windows 10 Home 64-bit
16 GB of DDR3 RAM
2048MB GDDR3
nVidia 384.94
384 CUDA Cores

The error is:

selected device GeForce 920M on platform NVIDIA CUDA
0.018 seconds for OpenCL compiling
local work size: 1024
ocl_assert: ocl_brute.c, function ocl_brute_msky, line 383
        clEnqueueReadBuffer(command_queue, mem_out, CL_TRUE, 0, sizeof(cl_uint), &out, 0, NULL, NULL)
error: out of resources

So, I have downloaded the test build and issued a few commands:

> bfcl

selected device GeForce 920M on platform NVIDIA CUDA
mbed TLS 2.7.0, AES-NI supported
self-test/benchmark mode
AES Key: 0d0b8bd02564dd0351d7e415e6f23f36
randomize source buffer using RDRAND
1.000 seconds for preparing test data, 67.09 MB/s
0.451 seconds for OpenCL compiling
0.061 seconds for data upload, 1104.31 MB/s
# sha1_16_test on 64 MB
0.031 seconds for OpenCL, 2161.66 MB/s
0.057 seconds for data download, 1180.31 MB/s
0.631 seconds for reference C(single thread), 106.35 MB/s
sha1_16_test: succeed
# aes_enc_128_test on 64 MB
0.532 seconds for OpenCL, 126.17 MB/s
0.048 seconds for data download, 1402.16 MB/s
0.251 seconds for reference C(single thread), 266.87 MB/s
aes_enc_128_test: succeed
# aes_dec_128_test on 64 MB
0.533 seconds for OpenCL, 125.80 MB/s
0.048 seconds for data download, 1400.84 MB/s
aes_dec_128_test: succeed
> bfcl info

2 platform(s) found:
=== 0x026d43c0 ===
name    : Intel(R) OpenCL
vendor  : Intel(R) Corporation
profile : FULL_PROFILE
version : OpenCL 1.2
        2 device(s) found:
        === 0x02701e00 ===
        name : Intel(R) HD Graphics 4400
        vendor : Intel(R) Corporation
        version : OpenCL 1.2
        C version : OpenCL C 1.2
        max compute units : 20
        max work group size : 512
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1000
        global memory : 1708759450
        local memory : 65536
        === 0x026ed8c0 ===
        name : Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
        vendor : Intel(R) Corporation
        version : OpenCL 1.2 (Build 10094)
        C version : OpenCL C 1.2
        max compute units : 4
        max work group size : 8192
        type : CPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 1700
        global memory : 4211548160
        local memory : 32768
=== 0x0272f260 ===
name    : NVIDIA CUDA
vendor  : NVIDIA Corporation
profile : FULL_PROFILE
version : OpenCL 1.2 CUDA 9.0.125
        1 device(s) found:
        === 0x0272f300 ===
        name : GeForce 920M
        vendor : NVIDIA Corporation
        version : OpenCL 1.2 CUDA
        C version : OpenCL C 1.2
        max compute units : 2
        max work group size : 1024
        type : GPU
        available : yes
        compiler available : yes
        endian : little
        frequency : 954
        global memory : 2147483648
        local memory : 49152

of course: this is a laptop and the intel integrates is also available, but bfcl ignores it as it should.

bfcl msky ...............................

selected device GeForce 920M on platform NVIDIA CUDA
0.289 seconds for OpenCL compiling
local work size: 1024
got a hit: c27164f2e0994db82e3d14737dd5c901
36.93 seconds, 52.27 M/s

@zoogie
Copy link

zoogie commented May 1, 2018

@Jimmy-Z - Is it possible if you could push the commits for the test build? It is greatly needed! Thanks!

@Jimmy-Z
Copy link
Owner Author

Jimmy-Z commented May 29, 2018

Sorry for the delay, just committed the changes, @zoogie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants