Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA driver 551.86 new "Warning: Function [...] is a kernel, so overriding noinline attribute" #5456

Open
illsk1lls opened this issue Mar 31, 2024 · 18 comments

Comments

@illsk1lls
Copy link

illsk1lls commented Mar 31, 2024

Getting an error displaying on screen on the latest Win11 with jumbo when working on 7z, zip (tested)

Win11 - 10.0.22631.3296

Error text: Build log: (): Warning: Function zip is a kernel, so overriding noinline attribute. The function may be inlined when called. (): Warning: Function zip_final is a kernel, so overriding noinline attribute. The function may be inlined when called.

Example:

Using default input encoding: UTF-8
Loaded 1 password hash (ZIP-opencl, WinZip [PBKDF2-SHA1 OpenCL])
Cost 1 (HMAC size) is 4 for all loaded hashes
Build log: (): Warning: Function zip is a kernel, so overriding noinline attribute. The function may be inlined when called.
(): Warning: Function zip_final is a kernel, so overriding noinline attribute. The function may be inlined when called.


Build log: (): Warning: Function zip is a kernel, so overriding noinline attribute. The function may be inlined when called.
(): Warning: Function zip_final is a kernel, so overriding noinline attribute. The function may be inlined when called.


LWS=32 GWS=73728 (2304 blocks)
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
Enabling duplicate candidate password suppressor
testpassword#    (winzip-AES-256-testpassword#.zip/test.txt)
1g 0:00:01:03 DONE (2024-03-30 23:16) 0.01568g/s 1677Kp/s 1677Kc/s 1677KC/s 129456#..villena#
Use the "--show" option to display all of the cracked passwords reliably
Session completed.

Since the output for john.exe is using stderr instead of stdout for job status info, this information cannot be suppressed.

Version: 1.9.0-jumbo-1+bleeding-76cc11476a 2024-03-20 18:57:49 +0100
Build: cygwin 64-bit x86_64 AVX2 AC OMP OPENCL
SIMD: AVX2, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
CPU tests: AVX2
CPU fallback binary: john-avx-omp
OMP fallback binary: john-avx2
$JOHN is
Format interface version: 14
Max. number of reported tunable costs: 4
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
SINGLE_IDX_MAX: 2147483648
SINGLE_BUF_MAX: 4294967295
Effective limit: Number of salts vs. SingleMaxBufferSize
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 11.4.0
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 0300000d0
OpenSSL 3.0.13 30 Jan 2024
GMP library version: 6.3.0
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
times(2) sysconf(_SC_CLK_TCK) is 1000
Using times(2) for timers, resolution 1 ms
HR timer: QueryPerformanceCounter(), resolution 100 ns
Total physical host memory: 32617 MiB
Available physical host memory: 27139 MiB
Terminal locale string: C.UTF-8
Parsed terminal locale: UTF-8

Command used:
john --wordlist="password.lst" --rules=single,all --format=ZIP-opencl "hashfile"

Hashfile contents:
winzip-AES-256-testpassword#.zip/test.txt:$zip2$*0*3*0*e3bd6c1a4c4950d0c35c1b0ca2bd5e84*061f*4*9c276325*38408eb4a32fe6220bb6*$/zip2$:test.txt:winzip-AES-256-testpassword#.zip:C:\john-samples-main\ZIP\winzip-AES-256-testpassword#.zip

@illsk1lls illsk1lls changed the title Latest Win11 build+bleeding displaying new error "Build log: (): Warning: Function zip is a kernel, so overriding noinline attribute. The function may be inlined when called. (): Warning: Function zip_final is a kernel, so overriding noinline attribute. The function may be inlined when called." Latest Win11 build+bleeding displaying new error "overriding noinline attribute. The function may be inlined when called" Mar 31, 2024
@claudioandre-br
Copy link
Member

Thanks for reporting.

This is a warning, not an error, john is working and cracking for you. So, you really don't need to worry about it too much.

Anyway, I haven't found anything recent that could be causing this, so the root cause is probably related to a driver update (or your previous version of John was kind of out of date).

Can you see the same error when using other formats? Could you also add the john --list=opencl-devices output?

@illsk1lls
Copy link
Author

illsk1lls commented Mar 31, 2024

I only get the error with OpenCL, but for all formats and I did recently update the NVIDIA driver.

Output of: john --list=opencl-devices

Platform #0 name: NVIDIA CUDA, version: OpenCL 3.0 CUDA 12.4.125
    Device #0 (1) name:     NVIDIA GeForce RTX 2070
    Device vendor:          NVIDIA Corporation
    Device type:            GPU (LE)
    Device version:         OpenCL 3.0 CUDA
    OpenCL version support: OpenCL C 1.2
    Driver version:         551.86 [recommended]
    Native vector widths:   char 1, short 1, int 1, long 1
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          8191 MiB
    Global Memory Cache:    1152 KiB
    Local Memory:           48 KiB (Local)
    Constant Buffer size:   64 KiB
    Max memory alloc. size: 2047 MiB
    Max clock (MHz):        1440
    Profiling timer res.:   1000 ns
    Max Work Group Size:    1024
    Parallel compute cores: 36
    CUDA INT32 cores:       2304  (36 x 64)
    Speed index:            3317760
    Warp size:              32
    Max. GPRs/work-group:   65536
    Compute capability:     7.5 (sm_75)
    Kernel exec. timeout:   yes
    PCI device topology:    01:00.0

Platform #1 name: Intel(R) OpenCL, version: OpenCL 2.1
    Device #0 (2) name:     Intel(R) UHD Graphics 630
    Device vendor:          Intel(R) Corporation
    Device type:            GPU (LE)
    Device version:         OpenCL 2.1 NEO
    OpenCL version support: OpenCL C 2.0
    Driver version:         26.20.100.6911
    Native vector widths:   char 16, short 8, int 4, long 1
    Preferred vector width: char 16, short 8, int 4, long 1
    Global Memory:          13047 MiB
    Global Memory Cache:    512 KiB
    Local Memory:           64 KiB (Local)
    Constant Buffer size:   4095 MiB
    Max memory alloc. size: 4095 MiB
    Max clock (MHz):        1150
    Profiling timer res.:   83 ns
    Max Work Group Size:    256
    Parallel compute cores: 24
    Stream processors:      192  (24 x 8)
    Speed index:            220800

    Device #1 (3) name:     Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
    Device vendor:          Intel(R) Corporation
    Device type:            CPU (LE)
    Device version:         OpenCL 2.1 (Build 0)
    OpenCL version support: OpenCL C 2.0
    Driver version:         7.6.0.0228
    Native vector widths:   char 32, short 16, int 8, long 4
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          32617 MiB
    Global Memory Cache:    256 KiB
    Local Memory:           32 KiB (Global)
    Constant Buffer size:   128 KiB
    Max memory alloc. size: 8154 MiB
    Max clock (MHz):        2600
    Profiling timer res.:   100 ns
    Max Work Group Size:    8192
    Parallel compute cores: 12
    Speed index:            124800

I am grabbing the latest Jumbo pragmatically at each run with this: https://github.com/illsk1lls/ZipRipper
John is doing it's job correctly despite the warning..

Is there a way to suppress that specific warning and still get other output? verbosity=2 suppresses it but also the password output, 2>nul suppresses it, but that suppresses all status output because they also use the STDERR stream. I didn't think of the driver, that's actually extremely likely.

Most importantly, what does the warning mean?

@solardiz solardiz changed the title Latest Win11 build+bleeding displaying new error "overriding noinline attribute. The function may be inlined when called" NVIDIA driver 551.86 new "Warning: Function [...] is a kernel, so overriding noinline attribute" Mar 31, 2024
@solardiz
Copy link
Member

Thank you for reporting this @illsk1lls! We do not explicitly use noinline anywhere. So my current best guess is it's an NVIDIA driver bug, which I guess they'll fix soon. However, if they later make a stable general release with the bug still not fixed, then we may want to do something about it from our side.

I notice they released 551.86 as "Game Ready". I don't know what exactly this means, but it's probably kind of a public alpha or beta version, for public testing. They have a feedback thread here, which you may add to - https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/539523/geforce-grd-55186-feedback-thread-released-31924/ - although I suspect they mostly care about feedback on gaming issues there, given the target audience and specific features of this release.

@claudioandre-br
Copy link
Member

If it were me, since (your) testing environment is very good, I would try patches like this:

You can safely try editing the .cl file.

diff --git a/run/opencl/zip_kernel.cl b/run/opencl/zip_kernel.cl
index d181eb2f3e..40b570eb35 100644
--- a/run/opencl/zip_kernel.cl
+++ b/run/opencl/zip_kernel.cl
@@ -238,7 +238,7 @@ inline uint prepare(__global const uchar *pwbuf, __global const uint *buf_idx, u
 
 #define ITERATIONS 1000 /* salt->iterations */
 
-__kernel void zip(__global const uchar *pwbuf,
+__kernel inline void zip(__global const uchar *pwbuf,
                   __global const uint *buf_idx,
                   __constant zip_salt *salt,
                   volatile __global uint *crack_count_ret,
@@ -264,7 +264,7 @@ __kernel void zip(__global const uchar *pwbuf,
        }
 }
 
-kernel void zip_final(__global const uchar *pwbuf,
+kernel inline void zip_final(__global const uchar *pwbuf,
                       __global const uint *buf_idx,
                       __constant zip_salt *salt,
                       __global const uchar *saltdata,

Or (needs a new build):

diff --git a/src/opencl_common.c b/src/opencl_common.c
index 0370c649cd..4f9c08f822 100644
--- a/src/opencl_common.c
+++ b/src/opencl_common.c
@@ -1297,7 +1297,7 @@ void opencl_build(int sequential_id, const char *opts, int save, const char *fil
                HANDLE_CLERROR(build_code, "clBuildProgram");
        }
        // Nvidia may return a single '\n' that we ignore
-       else if (options.verbosity >= LOG_VERB && strlen(build_log) > 1)
+       else if (options.verbosity > LOG_VERB && strlen(build_log) > 1)
                fprintf(stderr, "Build log: %s\n", build_log);
 
        MEM_FREE(build_log);

But the first needs to be well tested and the second is probably not the best choice.

@illsk1lls
Copy link
Author

illsk1lls commented Mar 31, 2024

@claudioandre-br I'll give it a shot, ty

@solardiz NVIDIA GameReady drivers are the current public stable release for GTX and RTX cards

@solardiz
Copy link
Member

solardiz commented Apr 1, 2024

NVIDIA GameReady drivers are the current public stable release for GTX and RTX cards

OK, thanks @illsk1lls. Then it's more of an issue. Did you install just this driver or also separately install CUDA? If you did not separately install CUDA, then can you please report this driver issue to NVIDIA, such as in the feedback thread I found? Chances are it affects most uses of OpenCL, also by other projects.

@solardiz
Copy link
Member

solardiz commented Apr 1, 2024

Another thought I have is that maybe - just maybe - this is somehow exposed by our usage of -cl-std=CL1.2. I guess NVIDIA does more testing with their current defaults than with OpenCL spec version override options like that. If so, this does not necessarily "affects most uses of OpenCL, also by other projects."

It could be helpful to test with other OpenCL projects, or to try removing this option from ours. It's in opencl_common.c function get_build_opts, unfortunately probably cannot be overridden from john.conf as we apply those options first.

Maybe that options order is something for us to change (regardless of this specific issue) - apply externally provided options last, not first. What do you think @claudioandre-br @magnumripper? When the same option (with different parameter values) is specified more than once, compilers typically use the last one - but I don't know if that's also guaranteed or at least is typical with OpenCL backends, is it?

@claudioandre-br
Copy link
Member

I have a feeling that the -cl-std=CL1.2 option is too invasive as it stands now. We can easily add a john.conf attribute to prevent it (or, on the contrary, enable it only if required).

We can't do much testing of NVIDIA anyway, so we (I) don't have much of an opinion on the matter.

@claudioandre-br
Copy link
Member

claudioandre-br commented Apr 1, 2024

It's simple as we can see below, but what will happen to the "array arguments" in the new driver version when we disable it, and what effect will the OpenCL 1.2 headers have on these new versions?

From ce9493f558c81c1900c653f08eeaba05e2edc20f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Claudio=20Andr=C3=A9?= <[email protected]>
Date: Mon, 1 Apr 2024 14:56:14 -0300
Subject: [PATCH] OpenCL: Add a flag to control when to use OpenCL 1.2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Claudio André <[email protected]>
---
 doc/CHANGES         | 2 ++
 doc/README-OPENCL   | 3 +++
 run/john.conf       | 5 +++++
 src/opencl_common.c | 9 ++++++++-
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/doc/CHANGES b/doc/CHANGES
index 8ff614bd85..4383f13a55 100644
--- a/doc/CHANGES
+++ b/doc/CHANGES
@@ -383,3 +383,5 @@ OpenBSD/PowerPC, OpenBSD/PA-RISC, OpenBSD/VAX, NetBSD/VAX, Solaris/SPARC64,
 Mac OS X (PowerPC and x86), SCO, BeOS.
 * Bug and portability fixes, and new bugs.
 * Bonus: "Strip" cracker included in the default john.conf (john.ini).
+* Add a global bool "ForceOpenCLVersion" in john.conf that requires OpenCL
+compilers to use the OpenCL version 1.2 standard in the compilation process.
diff --git a/doc/README-OPENCL b/doc/README-OPENCL
index 6ad572ba88..4c947d2911 100644
--- a/doc/README-OPENCL
+++ b/doc/README-OPENCL
@@ -222,3 +222,6 @@ achieve that is "make kernel-cache-clean"
 AMD drivers are hopeless.  Every bugfix implements two new bugs.  Try using
 the latest and greatest.  nvidia drivers are a whole lot more stable but if
 seeing problems, first thing to try is updating drivers.
+
+You can set ForceOpenCLVersion=Y in john.conf to force the compiler to use
+OpenCL 1.2 standards in the OpenCL compilation process.
\ No newline at end of file
diff --git a/run/john.conf b/run/john.conf
index d26c3558c4..7db7b522ca 100644
--- a/run/john.conf
+++ b/run/john.conf
@@ -457,6 +457,11 @@ ForceScalar = N
 # concatenated to this.
 GlobalBuildOpts = -cl-mad-enable
 
+# Global build option to use OpenCL 1.2 standard.
+# Some recent nvidia drivers complains about array arguments in kernel functions
+# unless explicitly reverting to OpenCL 1.2. Add -cl-std=CL1.2 when applicable.
+ForceOpenCLVersion = N
+
 # Initial local work-size for auto-tune (CPU devices excepted).
 # 0 means let the OpenCL implementation pick a suitable value.
 # 1 means query for "best multiple" (usually corresponds to "warp size").
diff --git a/src/opencl_common.c b/src/opencl_common.c
index 0370c649cd..d93c7ccbfd 100644
--- a/src/opencl_common.c
+++ b/src/opencl_common.c
@@ -1137,6 +1137,13 @@ static void print_device_info(int sequential_id)
 	log_event("Device %d: %s%s", sequential_id + 1, device_name, board_name);
 }
 
+static char *get_force_opencl_version(int sequential_id)
+{
+	if (cfg_get_bool(SECTION_OPTIONS, SUBSECTION_OPENCL, "ForceOpenCLVersion", 0))
+		return get_device_version(sequential_id) >= 200 ? "-cl-std=CL1.2 " : "";
+	return "";
+}
+
 static char *get_build_opts(int sequential_id, const char *opts)
 {
 	char *build_opts = mem_alloc(LINE_BUFFER_SIZE);
@@ -1150,7 +1157,7 @@ static char *get_build_opts(int sequential_id, const char *opts)
 	snprintf(build_opts, LINE_BUFFER_SIZE,
 	         "-I opencl %s %s%s%s%s%s%d %s%d %s -D_OPENCL_COMPILER %s",
 	        global_opts,
-	        get_device_version(sequential_id) >= 200 ? "-cl-std=CL1.2 " : "",
+	        get_force_opencl_version(sequential_id),
 #ifdef __APPLE__
 	        "-D__OS_X__ ",
 #else
-- 
2.40.1

By all means, it's a good tool for testing.

@CryptoCoinLab
Copy link

If it were me, since (your) testing environment is very good, I would try patches like this:

You can safely try editing the file..cl

diff --git a/run/opencl/zip_kernel.cl b/run/opencl/zip_kernel.cl
index d181eb2f3e..40b570eb35 100644
--- a/run/opencl/zip_kernel.cl
+++ b/run/opencl/zip_kernel.cl
@@ -238,7 +238,7 @@ inline uint prepare(__global const uchar *pwbuf, __global const uint *buf_idx, u
 
 #define ITERATIONS 1000 /* salt->iterations */
 
-__kernel void zip(__global const uchar *pwbuf,
+__kernel inline void zip(__global const uchar *pwbuf,
                   __global const uint *buf_idx,
                   __constant zip_salt *salt,
                   volatile __global uint *crack_count_ret,
@@ -264,7 +264,7 @@ __kernel void zip(__global const uchar *pwbuf,
        }
 }
 
-kernel void zip_final(__global const uchar *pwbuf,
+kernel inline void zip_final(__global const uchar *pwbuf,
                       __global const uint *buf_idx,
                       __constant zip_salt *salt,
                       __global const uchar *saltdata,

Or (needs a new build):

diff --git a/src/opencl_common.c b/src/opencl_common.c
index 0370c649cd..4f9c08f822 100644
--- a/src/opencl_common.c
+++ b/src/opencl_common.c
@@ -1297,7 +1297,7 @@ void opencl_build(int sequential_id, const char *opts, int save, const char *fil
                HANDLE_CLERROR(build_code, "clBuildProgram");
        }
        // Nvidia may return a single '\n' that we ignore
-       else if (options.verbosity >= LOG_VERB && strlen(build_log) > 1)
+       else if (options.verbosity > LOG_VERB && strlen(build_log) > 1)
                fprintf(stderr, "Build log: %s\n", build_log);
 
        MEM_FREE(build_log);

But the first needs to be well tested and the second is probably not the best choice.

1、kernel inline void zip_final is not work for me.
new error: 0: OpenCL CL_INVALID_KERNEL_NAME (-46) error in xxxx.c:294 - Error creating kernel
The reason is that the it cann't find right kernel name.

2、opencl_common.c => else if (options.verbosity > LOG_VERB
Yes, it works !!!

Thank you !

@solardiz
Copy link
Member

@claudioandre-br @magnumripper I see we define LOG_VERB differently based on JTR_RELEASE_BUILD. I suggest that we drop this logic - always define LOG_VERB the way we did for JTR_RELEASE_BUILD.

+++ b/src/opencl_common.c
@@ -61,13 +61,9 @@
 
 #define LOG_SIZE 1024*16
 
-// If we are a release build, only output OpenCL build log if
-// there was a fatal error (or --verbosity was increased).
-#ifdef JTR_RELEASE_BUILD
+/* Only output OpenCL build log if there was a fatal error
+ * (or --verbosity was increased). */
 #define LOG_VERB VERB_LEGACY
-#else
-#define LOG_VERB VERB_DEFAULT
-#endif
 
 /* Common OpenCL variables */
 int platform_id;
diff --git a/src/params.h b/src/params.h
index 83afed9eb..74726debd 100644
--- a/src/params.h
+++ b/src/params.h
@@ -30,9 +30,7 @@
 
 /*
  * Define this for release tarballs. It affects the version reporting (will
- * be the string above and below and never a Git hash) as well as some other
- * details. Eg. it mutes output of OpenCL run-time build log unless the build
- * failed.
+ * be the string above and below and never a Git hash).
  */
 //#define JTR_RELEASE_BUILD 1
 

OK with you?

@claudioandre-br
Copy link
Member

OK with you?

Since bleeding is no longer a place for experimentation, that's okay.

In fact, this even helps with a possible policy change to allow frequent (source only) releases. It's frustrating to see people using and complaining about old software.

solardiz added a commit to solardiz/john that referenced this issue Apr 28, 2024
solardiz added a commit that referenced this issue Apr 28, 2024
@magnumripper
Copy link
Member

I'm seeing this ridiculous inline warning with 560.35.03 / CUDA 12.6.65. I'm also seeing what I think (as I haven't seen them before) are worse driver bugs. For example, cryptosafe-opencl suddenly requires me to declare gidx in the kernel as volatile, which doesn't make any sense at all. I'm currently trying to finish a new pdf-opencl but I'm stumped by similar weird behavior. My PDF kernel is similar to cryptosafe in the cracked_count_ret part in the end and in host code, but that volatile workaround doesn't help there. I also tried some semi random barriers (that also shouldn't be needed) to no avail.

Nvidia used to pass the all-OpenCL-formats test but I didn't try that for some time. Perhaps I should. Oh BTW I should try dropping that -cl-std=CL1.2 thing. Oddly enough, the only good google hit for this weird inline warning is this very issue!

@claudioandre-br
Copy link
Member

I should try dropping that -cl-std=CL1.2

I think reverting and testing this (and using a header that is compatible with OpenCL 3.0) is sensible and/or will be necessary sooner or later.

@magnumripper
Copy link
Member

I should try dropping that -cl-std=CL1.2

I think reverting and testing this (and using a header that is compatible with OpenCL 3.0) is sensible and/or will be necessary sooner or later.

Just dropping it didn't make any difference. What would OpenCL 3.0 buy us? Wouldn't 2.0 help us a lot?

@solardiz
Copy link
Member

I'm seeing this ridiculous inline warning with 560.35.03 / CUDA 12.6.65.

I guess you're only seeing it at non-default verbosity? I think we've suppressed it by default (along with the rest of build log).

@magnumripper
Copy link
Member

I guess you're only seeing it at non-default verbosity? I think we've suppressed it by default (along with the rest of build log).

Yes, I had to add Verbosity = 4 in john-local.conf after that change. Unlike most users I really need the build warnings. Except for this bogus one of course.

@magnumripper
Copy link
Member

565.57.01 (OpenCL 3.0 CUDA 12.7.33) still emits those warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants