-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argon2-opencl fix for macOS #5420
Conversation
The Apple driver doesn't let us use inline assembler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is reasonable, but I think changes are needed - made comments.
@@ -68,7 +68,7 @@ ulong u64_shuffle_warp(ulong v, uint thread_src) | |||
ulong u64_shuffle(ulong v, uint thread_src, uint thread, __local ulong *buf) | |||
#endif | |||
{ | |||
#if USE_WARP_SHUFFLE && gpu_nvidia(DEVICE_INFO) && SM_MAJOR >= 3 | |||
#if USE_WARP_SHUFFLE && !__OS_X__ && gpu_nvidia(DEVICE_INFO) && SM_MAJOR >= 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't use inline asm even on NVIDIA, then the equivalent of this change should be on host, because the #else
path this triggers here requires local memory and we conditionally provide its allocation size from host (on NVIDIA, we currently don't).
@@ -97,7 +97,7 @@ ulong u64_shuffle(ulong v, uint thread_src, uint thread, __local ulong *buf) | |||
// TODO: Test on other device types to add support | |||
#if !gpu_nvidia(DEVICE_INFO) && !gpu_amd(DEVICE_INFO) | |||
barrier(CLK_LOCAL_MEM_FENCE); | |||
#elif gpu_amd(DEVICE_INFO) && DEV_VER_MAJOR < 2500 | |||
#elif !__OS_X__ && gpu_amd(DEVICE_INFO) && DEV_VER_MAJOR < 2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised the asm
here was reached for you - we only reach it on ancient AMD driver. Maybe the real issue is we somehow fail to set DEV_VER_MAJOR
on macOS? Anyway, I don't mind this change.
Thank you for testing this on macOS, and I'm happy to see you're active with the project again.
Maybe that's a property of the AMD driver? e.g. AMDGPU-Pro on super doesn't let us use inline asm either (but manages fine without a compiler memory barrier), which is why we have the version check here.
That's interesting. This format is known to fail on CPU devices and Intel HD Graphics, but for me it fails differently - with
That's the very last test vector, meaning the code almost worked for you on HD Graphics. |
The Apple drivers never seemed to share a single line of code with Windows/Linux. I've only used macOS with older nvidia but I'm sure it never had inline asm. Also, none of the proprietary other things ever worked, such as temp/fan sensors, querying compute capability (unless CUDA), cl_amd_media_ops, nothing of the sorts. Also there were never any similarity whatsoever in version numbers. The driver version number is just 1.2 (for OpenCL 1.2) or sometimes 1.1, and a date. This means things like Anyway I'll have a look at that host side thing, and perhaps also see if I can find some old nvidia macbook, just because why not. And a look-see at the Intel UHD Grahpics thing, in case it's close to fully working. |
FWIW, I just noticed this in hashcat, could also be relevant to getting this format fully working on macOS: // work around, for some reason apple opencl can't have buffers larger 2^31
// typically runs into trap 6
// maybe 32/64 bit problem affecting size_t?
// this seems to affect global memory as well no just single allocations
if ((device_param->opencl_platform_vendor_id == VENDOR_ID_APPLE) && (device_param->is_metal == false))
{
const size_t undocumented_single_allocation_apple = 0x7fffffff;
if (((c + 1 + 1) * MAX_ALLOC_CHECKS_SIZE) >= undocumented_single_allocation_apple) break;
} @magnumripper Perhaps you can try getting it to allocate more than 2 GiB on your laptop and see what happens? For this experiment, I'd try increasing the // Use almost all GPU memory by default
unsigned int warps = 6, limit, target; or do a real cracking run with a hash requiring more memory (e.g., one of our test vectors that use 16 MiB, or you can also increase the value of |
This was sitting here for 3 months with no progress, so I went ahead and merged it as-is, then made my own follow-up commit c042fa3 to fix the NVIDIA host part (untested). Consistent with how we do it elsewhere in the codebase, on host I check for I did not try addressing the 2G single allocation limit mentioned above. I think this needs testing first. |
The Apple driver doesn't let us use inline assembler.
With this patch:
There are other problems I haven't looked into yet, with other devices: