Add atomic intrinsics #79

wangpc-pp · 2024-06-06T06:38:28Z

We need this when optimizing some locks' implementation, but using inline assembly will stop some optimizations. For example, if we use inline assembly, the compiler doesn't know lr.w will sign-extend the result.

ARM provides similar intrinsics like __builtin_arm_ldaex.

Currently, we only add intrinsics for Zawrs.

wangpc-pp · 2024-06-06T06:43:39Z

I don't know if we should add intrinsics for AMO instructions as we have some "standard" C builtins for AMO.

riscv-c-api.md

wangpc-pp · 2024-07-16T04:33:47Z

Ping. :-)

riscv-c-api.md

cmuellner · 2025-01-07T11:20:10Z

@wangpc-pp is there a GCC or LLVM patch that adds support for this?
I see the referenced LLVM PR (which has been closed), which also included LR and SC.

wangpc-pp · 2025-01-07T12:44:21Z

@wangpc-pp is there a GCC or LLVM patch that adds support for this? I see the referenced LLVM PR (which has been closed), which also included LR and SC.

I almost forgot this PR :-)
If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

cmuellner · 2025-01-07T12:53:14Z

I almost forgot this PR :-) If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

From a comment in the referenced LLVM PR (llvm/llvm-project#94578 (comment)), I can see that this was discussed, and there is acceptance for Zawrs intrinsics, but not for a particular API.

I'm fine with this PR, but we need the OK from the GCC and LLVM community.

I suggest you create an LLVM PR that adds this API (nothing else). Given that this would be a subset of your previous PR, it should be done relatively quickly.

wangpc-pp · 2025-01-07T13:15:34Z

I almost forgot this PR :-) If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

From a comment in the referenced LLVM PR (llvm/llvm-project#94578 (comment)), I can see that this was discussed, and there is acceptance for Zawrs intrinsics, but not for a particular API.

I'm fine with this PR, but we need the OK from the GCC and LLVM community.

I suggest you create an LLVM PR that adds this API (nothing else). Given that this would be a subset of your previous PR, it should be done relatively quickly.

Aha! I have done it (llvm/llvm-project#96283) before, but I forgot it. :-)

We need this when optimizing some locks' implementation, but using inline assembly will stop some optimizations. For example, if we use inline assembly, the compiler doesn't know `lr.w` will sign-extend the result. ARM provides similar intrinsics like `__builtin_arm_ldaex`. Currently, we only add intrinsics for `Zawrs`.

jrtc27 · 2025-01-16T15:21:49Z

How are you supposed to use these? WRS refers to the current reservation, but the only way to have control over that is with LR, which means it has to be in assembly already, surely?

wangpc-pp · 2025-01-16T15:29:53Z

How are you supposed to use these? WRS refers to the current reservation, but the only way to have control over that is with LR, which means it has to be in assembly already, surely?

I have the question as well so I haven't merged the implementation. Maybe we can provide a higher abstract layer that generates LR/SC+WRS directly?

aswaterman · 2025-01-16T23:07:40Z

We probably don't want to add intrinsics that expose LR and SC individually because of the lack of a forward-progress guarantee for unconstrained LR/SC loops. Exposing higher-level abstractions that use LR/SC under the hood avoids this pitfall.

This is slightly less of a concern for LR/WRS since there's no functional bug if the reservation yields before the WRS is executed, but it is still not ideal. Some cores will eagerly yield the reservation as soon as they hit another memory access (whether or not it's an SC), so if the compiler slips a register spill in between the LR and the WRS, such cores won't ever enter a lower-power state. So I think we want higher-level abstractions in this case, too.

We could define e.g. wrs_sto_until(addr, value) to emit something like 1: lr.d t0, (addr); bne t0, value, 1f; wrs.sto; j 1b; 1:, and wrs_sto_while that inverts the branch condition, but unfortunately there might be a proliferation of these for more complex loop-exit conditions. I guess at some point you just need to write assembly code.

jrtc27 · 2025-01-16T23:15:22Z

This is slightly less of a concern for LR/WRS since there's no functional bug if the reservation yields before the WRS is executed, but it is still not ideal. Some cores will eagerly yield the reservation as soon as they hit another memory access (whether or not it's an SC), so if the compiler slips a register spill in between the LR and the WRS, such cores won't ever enter a lower-power state. So I think we want higher-level abstractions in this case, too.

But if there's an LR that comes in between for whatever reason then you may have a valid reservation for a different location. Waiting with a timeout will then risk waiting a bit too long, but waiting without a timeout will risk waiting indefinitely even though the intended location has been modified. Probably there will be no LR added by the compiler, but who's to say for sure...

aswaterman · 2025-01-16T23:28:17Z

Indeed. In any case, the upshot is that we shouldn't go down the route of adding intrinsics for the primitives. Adding intrinsics for entire WRS loops, maybe, if we think we can capture the important use cases without a proliferation of new intrinsics.

wangpc-pp mentioned this pull request Jun 6, 2024

[RISCV] Add riscv_atomic.h and Zawrs/Zalrsc builtins llvm/llvm-project#94578

Closed

wangpc-pp mentioned this pull request Jun 21, 2024

[RISCV] Add riscv_atomic.h and Zawrs builtins llvm/llvm-project#96283

Open

topperc reviewed Jun 22, 2024

View reviewed changes

riscv-c-api.md Outdated Show resolved Hide resolved

topperc reviewed Jul 16, 2024

View reviewed changes

riscv-c-api.md Outdated Show resolved Hide resolved

topperc reviewed Jul 16, 2024

View reviewed changes

riscv-c-api.md Outdated Show resolved Hide resolved

wangpc-pp force-pushed the main branch from c8b2e50 to 767e3ed Compare July 16, 2024 13:59

wangpc-pp force-pushed the main branch from 767e3ed to 66cf1c5 Compare January 7, 2025 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add atomic intrinsics #79

Add atomic intrinsics #79

wangpc-pp commented Jun 6, 2024 •

edited

Loading

wangpc-pp commented Jun 6, 2024

wangpc-pp commented Jul 16, 2024

cmuellner commented Jan 7, 2025

wangpc-pp commented Jan 7, 2025

cmuellner commented Jan 7, 2025

wangpc-pp commented Jan 7, 2025

jrtc27 commented Jan 16, 2025

wangpc-pp commented Jan 16, 2025

aswaterman commented Jan 16, 2025 •

edited

Loading

jrtc27 commented Jan 16, 2025

aswaterman commented Jan 16, 2025

Add atomic intrinsics #79

Are you sure you want to change the base?

Add atomic intrinsics #79

Conversation

wangpc-pp commented Jun 6, 2024 • edited Loading

wangpc-pp commented Jun 6, 2024

wangpc-pp commented Jul 16, 2024

cmuellner commented Jan 7, 2025

wangpc-pp commented Jan 7, 2025

cmuellner commented Jan 7, 2025

wangpc-pp commented Jan 7, 2025

jrtc27 commented Jan 16, 2025

wangpc-pp commented Jan 16, 2025

aswaterman commented Jan 16, 2025 • edited Loading

jrtc27 commented Jan 16, 2025

aswaterman commented Jan 16, 2025

wangpc-pp commented Jun 6, 2024 •

edited

Loading

aswaterman commented Jan 16, 2025 •

edited

Loading