-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add atomic intrinsics #79
base: main
Are you sure you want to change the base?
Conversation
I don't know if we should add intrinsics for AMO instructions as we have some "standard" C builtins for AMO. |
Ping. :-) |
@wangpc-pp is there a GCC or LLVM patch that adds support for this? |
I almost forgot this PR :-) |
From a comment in the referenced LLVM PR (llvm/llvm-project#94578 (comment)), I can see that this was discussed, and there is acceptance for Zawrs intrinsics, but not for a particular API. I'm fine with this PR, but we need the OK from the GCC and LLVM community. I suggest you create an LLVM PR that adds this API (nothing else). Given that this would be a subset of your previous PR, it should be done relatively quickly. |
Aha! I have done it (llvm/llvm-project#96283) before, but I forgot it. :-) |
We need this when optimizing some locks' implementation, but using inline assembly will stop some optimizations. For example, if we use inline assembly, the compiler doesn't know `lr.w` will sign-extend the result. ARM provides similar intrinsics like `__builtin_arm_ldaex`. Currently, we only add intrinsics for `Zawrs`.
How are you supposed to use these? WRS refers to the current reservation, but the only way to have control over that is with LR, which means it has to be in assembly already, surely? |
I have the question as well so I haven't merged the implementation. Maybe we can provide a higher abstract layer that generates LR/SC+WRS directly? |
We probably don't want to add intrinsics that expose LR and SC individually because of the lack of a forward-progress guarantee for unconstrained LR/SC loops. Exposing higher-level abstractions that use LR/SC under the hood avoids this pitfall. This is slightly less of a concern for LR/WRS since there's no functional bug if the reservation yields before the WRS is executed, but it is still not ideal. Some cores will eagerly yield the reservation as soon as they hit another memory access (whether or not it's an SC), so if the compiler slips a register spill in between the LR and the WRS, such cores won't ever enter a lower-power state. So I think we want higher-level abstractions in this case, too. We could define e.g. |
But if there's an LR that comes in between for whatever reason then you may have a valid reservation for a different location. Waiting with a timeout will then risk waiting a bit too long, but waiting without a timeout will risk waiting indefinitely even though the intended location has been modified. Probably there will be no LR added by the compiler, but who's to say for sure... |
Indeed. In any case, the upshot is that we shouldn't go down the route of adding intrinsics for the primitives. Adding intrinsics for entire WRS loops, maybe, if we think we can capture the important use cases without a proliferation of new intrinsics. |
We need this when optimizing some locks' implementation, but using inline assembly will stop some optimizations. For example, if we use inline assembly, the compiler doesn't know
lr.w
will sign-extend the result.ARM provides similar intrinsics like
__builtin_arm_ldaex
.Currently, we only add intrinsics for
Zawrs
.