Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add atomic intrinsics #79

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Jun 6, 2024

We need this when optimizing some locks' implementation, but using inline assembly will stop some optimizations. For example, if we use inline assembly, the compiler doesn't know lr.w will sign-extend the result.

ARM provides similar intrinsics like __builtin_arm_ldaex.

Currently, we only add intrinsics for Zawrs.

@wangpc-pp
Copy link
Contributor Author

I don't know if we should add intrinsics for AMO instructions as we have some "standard" C builtins for AMO.

riscv-c-api.md Outdated Show resolved Hide resolved
@wangpc-pp
Copy link
Contributor Author

Ping. :-)

riscv-c-api.md Outdated Show resolved Hide resolved
riscv-c-api.md Outdated Show resolved Hide resolved
@cmuellner
Copy link
Collaborator

@wangpc-pp is there a GCC or LLVM patch that adds support for this?
I see the referenced LLVM PR (which has been closed), which also included LR and SC.

@wangpc-pp
Copy link
Contributor Author

@wangpc-pp is there a GCC or LLVM patch that adds support for this? I see the referenced LLVM PR (which has been closed), which also included LR and SC.

I almost forgot this PR :-)
If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

@cmuellner
Copy link
Collaborator

I almost forgot this PR :-) If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

From a comment in the referenced LLVM PR (llvm/llvm-project#94578 (comment)), I can see that this was discussed, and there is acceptance for Zawrs intrinsics, but not for a particular API.

I'm fine with this PR, but we need the OK from the GCC and LLVM community.

I suggest you create an LLVM PR that adds this API (nothing else). Given that this would be a subset of your previous PR, it should be done relatively quickly.

@wangpc-pp
Copy link
Contributor Author

I almost forgot this PR :-) If we have the consensus on accepting this, I can create another PR with only Zawrs. WDYT? Should I preceed on?

From a comment in the referenced LLVM PR (llvm/llvm-project#94578 (comment)), I can see that this was discussed, and there is acceptance for Zawrs intrinsics, but not for a particular API.

I'm fine with this PR, but we need the OK from the GCC and LLVM community.

I suggest you create an LLVM PR that adds this API (nothing else). Given that this would be a subset of your previous PR, it should be done relatively quickly.

Aha! I have done it (llvm/llvm-project#96283) before, but I forgot it. :-)

We need this when optimizing some locks' implementation, but using
inline assembly will stop some optimizations. For example, if we use
inline assembly, the compiler doesn't know `lr.w` will sign-extend the
result.

ARM provides similar intrinsics like `__builtin_arm_ldaex`.

Currently, we only add intrinsics for `Zawrs`.
@jrtc27
Copy link

jrtc27 commented Jan 16, 2025

How are you supposed to use these? WRS refers to the current reservation, but the only way to have control over that is with LR, which means it has to be in assembly already, surely?

@wangpc-pp
Copy link
Contributor Author

How are you supposed to use these? WRS refers to the current reservation, but the only way to have control over that is with LR, which means it has to be in assembly already, surely?

I have the question as well so I haven't merged the implementation. Maybe we can provide a higher abstract layer that generates LR/SC+WRS directly?

@aswaterman
Copy link
Contributor

aswaterman commented Jan 16, 2025

We probably don't want to add intrinsics that expose LR and SC individually because of the lack of a forward-progress guarantee for unconstrained LR/SC loops. Exposing higher-level abstractions that use LR/SC under the hood avoids this pitfall.

This is slightly less of a concern for LR/WRS since there's no functional bug if the reservation yields before the WRS is executed, but it is still not ideal. Some cores will eagerly yield the reservation as soon as they hit another memory access (whether or not it's an SC), so if the compiler slips a register spill in between the LR and the WRS, such cores won't ever enter a lower-power state. So I think we want higher-level abstractions in this case, too.

We could define e.g. wrs_sto_until(addr, value) to emit something like 1: lr.d t0, (addr); bne t0, value, 1f; wrs.sto; j 1b; 1:, and wrs_sto_while that inverts the branch condition, but unfortunately there might be a proliferation of these for more complex loop-exit conditions. I guess at some point you just need to write assembly code.

@jrtc27
Copy link

jrtc27 commented Jan 16, 2025

This is slightly less of a concern for LR/WRS since there's no functional bug if the reservation yields before the WRS is executed, but it is still not ideal. Some cores will eagerly yield the reservation as soon as they hit another memory access (whether or not it's an SC), so if the compiler slips a register spill in between the LR and the WRS, such cores won't ever enter a lower-power state. So I think we want higher-level abstractions in this case, too.

But if there's an LR that comes in between for whatever reason then you may have a valid reservation for a different location. Waiting with a timeout will then risk waiting a bit too long, but waiting without a timeout will risk waiting indefinitely even though the intended location has been modified. Probably there will be no LR added by the compiler, but who's to say for sure...

@aswaterman
Copy link
Contributor

Indeed. In any case, the upshot is that we shouldn't go down the route of adding intrinsics for the primitives. Adding intrinsics for entire WRS loops, maybe, if we think we can capture the important use cases without a proliferation of new intrinsics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants