Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGCN inefficient long add with constant #237

Closed
preda opened this issue Dec 20, 2024 · 6 comments · Fixed by llvm/llvm-project#122049
Closed

AMDGCN inefficient long add with constant #237

preda opened this issue Dec 20, 2024 · 6 comments · Fixed by llvm/llvm-project#122049
Labels
generic Build error, or some other issue not caused by an LLVM bug Under Investigation

Comments

@preda
Copy link

preda commented Dec 20, 2024

Consider this OpenCL kernel:

 kernel void testAdd(global long* io) {
  long C = ((long) 1) << 50;
  io[get_global_id(0)] = C + io[get_global_id(0)];
}

This ISA is generated for the long add:

	v_mov_b32_e32 v4, 0x40000
	v_add_co_u32_e32 v2, vcc, 0, v2
	v_addc_co_u32_e32 v3, vcc, v3, v4, vcc

As you see, the above code is.. un-necessary. In particular, v_add_co_u32_e32 v2, vcc, 0, v2 does not change the value of v2, and can not produce carry-out.

@preda
Copy link
Author

preda commented Dec 20, 2024

Expected would be something like:

v_mov_b32_e32 v4, 0x40000
v_add_co_u32_e32 v3, vcc, v3, v4

@preda
Copy link
Author

preda commented Dec 20, 2024

And a small observation: the same code is generated and the problem is easier to see if "long" is replaced with "unsigned long" in the sample kernel.

@ppanchad-amd ppanchad-amd added generic Build error, or some other issue not caused by an LLVM bug Under Investigation labels Dec 23, 2024
@ppanchad-amd
Copy link

Hi @preda. Internal ticket has been created to investigate your issue. Thanks!

@sohaibnd
Copy link

sohaibnd commented Jan 3, 2025

Hi @preda, what options did you use to compile that code?

@preda
Copy link
Author

preda commented Jan 4, 2025

Hi @preda, what options did you use to compile that code?

@sohaibnd

-cl-finite-math-only -cl-std=CL2.0

arsenm added a commit to llvm/llvm-project that referenced this issue Jan 8, 2025
If one of the inputs has all 0 bits, the low part cannot
carry and we can just pass through the original value.

Add case: https://alive2.llvm.org/ce/z/TNc7hf
Sub case: https://alive2.llvm.org/ce/z/AjH2-J

We could do this in the general case with computeKnownBits,
but add is so common this could be potentially expensive for
something which will fire infrequently.

One potential concern is this could break the 64-bit add
we expect to see for addressing mode matching, but these
constants shouldn't appear often in addressing expressions.
One test for large offset expressions changes but isn't worse.

Fixes ROCm#237
@sohaibnd
Copy link

sohaibnd commented Jan 8, 2025

@preda This is an optimization bug, thanks for pointing it out. See llvm#122049 for the fix being put in.

swift-ci pushed a commit to swiftlang/llvm-project that referenced this issue Jan 8, 2025
If one of the inputs has all 0 bits, the low part cannot
carry and we can just pass through the original value.

Add case: https://alive2.llvm.org/ce/z/TNc7hf
Sub case: https://alive2.llvm.org/ce/z/AjH2-J

We could do this in the general case with computeKnownBits,
but add is so common this could be potentially expensive for
something which will fire infrequently.

One potential concern is this could break the 64-bit add
we expect to see for addressing mode matching, but these
constants shouldn't appear often in addressing expressions.
One test for large offset expressions changes but isn't worse.

Fixes ROCm#237
shenhanc78 pushed a commit to shenhanc78/llvm-project that referenced this issue Jan 8, 2025
If one of the inputs has all 0 bits, the low part cannot
carry and we can just pass through the original value.

Add case: https://alive2.llvm.org/ce/z/TNc7hf
Sub case: https://alive2.llvm.org/ce/z/AjH2-J

We could do this in the general case with computeKnownBits,
but add is so common this could be potentially expensive for
something which will fire infrequently.

One potential concern is this could break the 64-bit add
we expect to see for addressing mode matching, but these
constants shouldn't appear often in addressing expressions.
One test for large offset expressions changes but isn't worse.

Fixes ROCm#237
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generic Build error, or some other issue not caused by an LLVM bug Under Investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants