argon2-opencl fix for macOS #5420

magnumripper · 2024-01-08T01:23:01Z

The Apple driver doesn't let us use inline assembler.

With this patch:

Device 3: AMD Radeon Pro Vega 20 Compute Engine
Benchmarking: argon2-opencl [Blake2 OpenCL]... |
Trying to compute 62 hashes at a time using 992 of 4080 MiB device memory
LWS=[64-192] GWS=[1984-5952] ([10-93] blocks) => Mode: LOCAL_MEMORY
DONE
Speed for cost 1 (t) of 3, cost 2 (m) of 4096, cost 3 (p) of 1, cost 4 (type [0:Argon2d 1:Argon2i 2:Argon2id]) of 0 and 1
Raw:	1099 c/s real, 15942 c/s virtual

There are other problems I haven't looked into yet, with other devices:

Device 1: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Testing: argon2-opencl [Blake2 OpenCL]... 0: OpenCL CL_INVALID_BUFFER_SIZE (-61) error in opencl_argon2_fmt_plug.c:608 - Error creating memory buffer

Device 2: Intel(R) UHD Graphics 630
Testing: argon2-opencl [Blake2 OpenCL]... Binary Build log: <program source>:68:7: warning: no previous prototype for function 'u64_shuffle'
ulong u64_shuffle(ulong v, uint thread_src, uint thread, __local ulong *buf)
      ^
<program source>:115:7: warning: no previous prototype for function 'block_th_get'
ulong block_th_get(const struct block_th *b, uint idx)
      ^
<program source>:125:6: warning: no previous prototype for function 'block_th_set'
void block_th_set(struct block_th *b, uint idx, ulong v)
     ^
<program source>:133:7: warning: no previous prototype for function 'mul_wide_u32'
ulong mul_wide_u32(ulong a, ulong b)
      ^
<program source>:149:6: warning: no previous prototype for function 'g'
void g(struct block_th *block)
     ^
<program source>:171:6: warning: no previous prototype for function 'apply_shuffle_shift2'
uint apply_shuffle_shift2(uint thread, uint idx)
     ^
<program source>:182:6: warning: no previous prototype for function 'shuffle_block'
void shuffle_block(struct block_th *block, uint thread, __local ulong *buf)
     ^
<program source>:260:6: warning: no previous prototype for function 'next_addresses'
void next_addresses(struct block_th *addr, uint thread_input, uint thread, __local ulong *buf)
     ^
<program source>:295:6: warning: no previous prototype for function 'blake2b_block'
void blake2b_block(ulong m[16], ulong out_len, ulong message_size)
     ^

FAILED (cmp_one(10))

The Apple driver doesn't let us use inline assembler.

solardiz

Overall, this is reasonable, but I think changes are needed - made comments.

solardiz · 2024-01-08T08:59:38Z

run/opencl/argon2_kernel.cl

@@ -68,7 +68,7 @@ ulong u64_shuffle_warp(ulong v, uint thread_src)
 ulong u64_shuffle(ulong v, uint thread_src, uint thread, __local ulong *buf)
 #endif
 {
-#if USE_WARP_SHUFFLE && gpu_nvidia(DEVICE_INFO) && SM_MAJOR >= 3
+#if USE_WARP_SHUFFLE && !__OS_X__ && gpu_nvidia(DEVICE_INFO) && SM_MAJOR >= 3


If we can't use inline asm even on NVIDIA, then the equivalent of this change should be on host, because the #else path this triggers here requires local memory and we conditionally provide its allocation size from host (on NVIDIA, we currently don't).

solardiz · 2024-01-08T09:01:11Z

run/opencl/argon2_kernel.cl

@@ -97,7 +97,7 @@ ulong u64_shuffle(ulong v, uint thread_src, uint thread, __local ulong *buf)
 	// TODO: Test on other device types to add support
 #if !gpu_nvidia(DEVICE_INFO) && !gpu_amd(DEVICE_INFO)
 	barrier(CLK_LOCAL_MEM_FENCE);
-#elif gpu_amd(DEVICE_INFO) && DEV_VER_MAJOR < 2500
+#elif !__OS_X__ && gpu_amd(DEVICE_INFO) && DEV_VER_MAJOR < 2500


I am surprised the asm here was reached for you - we only reach it on ancient AMD driver. Maybe the real issue is we somehow fail to set DEV_VER_MAJOR on macOS? Anyway, I don't mind this change.

solardiz · 2024-01-08T09:14:16Z

Thank you for testing this on macOS, and I'm happy to see you're active with the project again.

The Apple driver doesn't let us use inline assembler.

Maybe that's a property of the AMD driver? e.g. AMDGPU-Pro on super doesn't let us use inline asm either (but manages fine without a compiler memory barrier), which is why we have the version check here.

There are other problems I haven't looked into yet, with other devices:

That's interesting. This format is known to fail on CPU devices and Intel HD Graphics, but for me it fails differently - with FAILED (cmp_one(1)), see #5417.

FAILED (cmp_one(10))

That's the very last test vector, meaning the code almost worked for you on HD Graphics.

magnumripper · 2024-01-09T11:05:31Z

The Apple drivers never seemed to share a single line of code with Windows/Linux. I've only used macOS with older nvidia but I'm sure it never had inline asm. Also, none of the proprietary other things ever worked, such as temp/fan sensors, querying compute capability (unless CUDA), cl_amd_media_ops, nothing of the sorts. Also there were never any similarity whatsoever in version numbers. The driver version number is just 1.2 (for OpenCL 1.2) or sometimes 1.1, and a date. This means things like DEV_VER_MAJOR < 2500 will always match...

Anyway I'll have a look at that host side thing, and perhaps also see if I can find some old nvidia macbook, just because why not. And a look-see at the Intel UHD Grahpics thing, in case it's close to fully working.

solardiz · 2024-01-09T14:29:40Z

FWIW, I just noticed this in hashcat, could also be relevant to getting this format fully working on macOS:

            // work around, for some reason apple opencl can't have buffers larger 2^31
            // typically runs into trap 6
            // maybe 32/64 bit problem affecting size_t?
            // this seems to affect global memory as well no just single allocations
            if ((device_param->opencl_platform_vendor_id == VENDOR_ID_APPLE) && (device_param->is_metal == false))
            {
              const size_t undocumented_single_allocation_apple = 0x7fffffff;
              if (((c + 1 + 1) * MAX_ALLOC_CHECKS_SIZE) >= undocumented_single_allocation_apple) break;
            }

@magnumripper Perhaps you can try getting it to allocate more than 2 GiB on your laptop and see what happens? For this experiment, I'd try increasing the 6 here in opencl_argon2_fmt_plug.c:

        // Use almost all GPU memory by default
        unsigned int warps = 6, limit, target;

or do a real cracking run with a hash requiring more memory (e.g., one of our test vectors that use 16 MiB, or you can also increase the value of m in there - no need for the hash to remain crackable).

solardiz · 2024-04-06T18:16:59Z

This was sitting here for 3 months with no progress, so I went ahead and merged it as-is, then made my own follow-up commit c042fa3 to fix the NVIDIA host part (untested). Consistent with how we do it elsewhere in the codebase, on host I check for __APPLE__. I notice in OpenCL we check for __OS_X__. I hope this is right, and will match on the same systems, @magnumripper? Is there a good way for us to make this more reliable in case of future changes? For example, if Apple releases another OS where it sets __APPLE__, but does not set __OS_X__ in OpenCL? BTW, I have no idea what macros they set in iOS, which we don't currently support here.

I did not try addressing the 2G single allocation limit mentioned above. I think this needs testing first.

claudioandre-br · 2024-04-06T19:23:25Z

I notice in OpenCL we check for OS_X. I hope this is right, and will match

This is something we created and control at:

john/src/opencl_common.c

Line 1154 in c042fa3

#ifdef __APPLE__

Plus:

john/src/opencl_common.c

Line 1155 in c042fa3

"-D__OS_X__ ",

argon2-opencl fix for macOS

f00b22b

The Apple driver doesn't let us use inline assembler.

magnumripper requested a review from solardiz January 8, 2024 01:23

solardiz requested changes Jan 8, 2024

View reviewed changes

solardiz merged commit 3b26d3b into openwall:bleeding-jumbo Apr 6, 2024
31 of 32 checks passed

solardiz mentioned this pull request Apr 6, 2024

argon2-opencl fails on CPU and MIC #5417

Open

magnumripper deleted the jan2024 branch July 17, 2024 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

argon2-opencl fix for macOS #5420

argon2-opencl fix for macOS #5420

magnumripper commented Jan 8, 2024

solardiz left a comment

solardiz Jan 8, 2024

solardiz Jan 8, 2024

solardiz commented Jan 8, 2024

magnumripper commented Jan 9, 2024

solardiz commented Jan 9, 2024

solardiz commented Apr 6, 2024

claudioandre-br commented Apr 6, 2024

argon2-opencl fix for macOS #5420

argon2-opencl fix for macOS #5420

Conversation

magnumripper commented Jan 8, 2024

solardiz left a comment

Choose a reason for hiding this comment

solardiz Jan 8, 2024

Choose a reason for hiding this comment

solardiz Jan 8, 2024

Choose a reason for hiding this comment

solardiz commented Jan 8, 2024

magnumripper commented Jan 9, 2024

solardiz commented Jan 9, 2024

solardiz commented Apr 6, 2024

claudioandre-br commented Apr 6, 2024