-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contribute ARC feedback on Zilsd #45
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,42 +87,27 @@ LD instructions with destination `x0` are processed as any other load, but the r | |
|
||
When using `x0` as `src` of SD or C.SDSP, the entire 64-bit operand is zero — i.e., register `x1` is not accessed. | ||
|
||
=== Fault Handling | ||
|
||
In implementations that crack Zilsd instructions for sequential execution, correct execution requires addressing idempotent memory, because the hart must be able to handle traps detected during the sequence. The entire sequence is re-executed after returning from the trap handler, and multiple traps are possible during the sequence. | ||
|
||
[NOTE] | ||
==== | ||
It is implementation defined whether interrupts can also be taken during the sequence execution. | ||
==== | ||
|
||
=== Software view of the load/store pair sequence | ||
|
||
From a software perspective load/store pair instructions appear as: | ||
|
||
* load instructions: | ||
** A sequence of one or more loads reading the bytes of the doubleword without updating rd or rd+1 | ||
*** If the effective address is 4B aligned: | ||
**** The bytes are grouped into word accesses. | ||
**** The words may be loaded in any order. | ||
**** The words may be grouped into doublewords accesses. | ||
**** Any of the words may be loaded multiple times. | ||
*** Else: | ||
**** The bytes may be loaded in any order. | ||
**** The bytes may be grouped into larger accesses. | ||
**** Any of the bytes may be loaded multiple times. | ||
** An atomic write of the load result into rd and rd+1 | ||
* store instructions: | ||
** A sequence of one or more stores writing the bytes of the doubleword | ||
*** If the effective address is 4B aligned: | ||
**** The bytes are grouped into word accesses. | ||
**** The words may be stored in any order. | ||
**** The words may be grouped into doublewords accesses. | ||
**** Any of the words may be stored multiple times. | ||
*** Else: | ||
**** The bytes may be stored in any order. | ||
**** The bytes may be grouped into larger accesses. | ||
**** Any of the bytes may be stored multiple times. | ||
=== Exception Handling | ||
|
||
For the purposes of RVWMO and exception handling, LD and SD instructions are | ||
considered to be misaligned loads and stores, with one additional constraint: | ||
an LD or SD instruction whose effective address is a multiple of 4 gives rise | ||
to two 4-byte memory operations. | ||
|
||
NOTE: This definition permits LD and SD instructions giving rise to exactly one | ||
christian-herber-nxp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
memory access, regardless of alignment. | ||
It also permits decomposing instructions with 4-byte-aligned effective | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems redundante with the normative definition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can get behind deleting this NOTE if you think it's unhelpfully redundant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have tried to update the note to be less repetitive and also clarify that the accesses are atomic. Maybe this helps with @tovine's comment.
aswaterman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
addresses into two operations, with no constraint on the order in which the | ||
aswaterman marked this conversation as resolved.
Show resolved
Hide resolved
aswaterman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
operations are performed. | ||
aswaterman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
These decomposed sequences are interruptible. | ||
Exceptions might occur on subsequent operations, making the effects of previous | ||
operations within the same instruction visible. | ||
|
||
NOTE: Software should make no assumptions about the number or order of | ||
accessses these instructions might give rise to, beyond the 4-byte constraint | ||
mentioned above. | ||
For example, an interrupted store might overwrite the same bytes upon return | ||
from the interrupt handler. | ||
|
||
<<< | ||
|
||
|
@@ -300,4 +285,4 @@ Stores a 64-bit value from registers `rs2'` and `rs2'+1`. | |
It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'. | ||
It expands to `sd rs2', offset(rs1')`. | ||
|
||
Included in: <<zclsd>> | ||
Included in: <<zclsd>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this text is a bit confusing - this only applies in case the access is not 8-byte aligned, right? Right now it can read as if LD and SD are always considered to be misaligned accesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's deliberate, though. 8-byte alignment offers no special benefit for these instructions. (Of course, the uarch is likely to optimize them, but that's not the point here.) The only architecturally visible question is whether they're 4-byte aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what you're saying is that implementations are free to consider them misaligned (and trap) regardless of actual alignment?
I'd very much hope that compilers will (or at least can) be instructed to make them 8-byte aligned, otherwise they bring almost none of the originally intended benefits over a pair of normal LW/SW.
One such intended benefit is to allow low-end MCU-class cores to better utilize the bus if they have one that is wider than the native machine width, but this is only possible if the wider access is then aligned, and those cores don't typically have the same kind of macro-op fusion capabilities as larger ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a rhetorical tactic to simplify the description of the semantics, not a means to recommend what implementations should do. Obviously it's a good thing for implementations to implement these ops as wide bus xacts when possible. That doesn't run afoul of anything in the spec.