Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribute ARC feedback on Zilsd #45

Merged
merged 5 commits into from
Sep 18, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 22 additions & 37 deletions zilsd.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -87,42 +87,27 @@ LD instructions with destination `x0` are processed as any other load, but the r

When using `x0` as `src` of SD or C.SDSP, the entire 64-bit operand is zero — i.e., register `x1` is not accessed.

=== Fault Handling

In implementations that crack Zilsd instructions for sequential execution, correct execution requires addressing idempotent memory, because the hart must be able to handle traps detected during the sequence. The entire sequence is re-executed after returning from the trap handler, and multiple traps are possible during the sequence.

[NOTE]
====
It is implementation defined whether interrupts can also be taken during the sequence execution.
====

=== Software view of the load/store pair sequence

From a software perspective load/store pair instructions appear as:

* load instructions:
** A sequence of one or more loads reading the bytes of the doubleword without updating rd or rd+1
*** If the effective address is 4B aligned:
**** The bytes are grouped into word accesses.
**** The words may be loaded in any order.
**** The words may be grouped into doublewords accesses.
**** Any of the words may be loaded multiple times.
*** Else:
**** The bytes may be loaded in any order.
**** The bytes may be grouped into larger accesses.
**** Any of the bytes may be loaded multiple times.
** An atomic write of the load result into rd and rd+1
* store instructions:
** A sequence of one or more stores writing the bytes of the doubleword
*** If the effective address is 4B aligned:
**** The bytes are grouped into word accesses.
**** The words may be stored in any order.
**** The words may be grouped into doublewords accesses.
**** Any of the words may be stored multiple times.
*** Else:
**** The bytes may be stored in any order.
**** The bytes may be grouped into larger accesses.
**** Any of the bytes may be stored multiple times.
=== Exception Handling

For the purposes of RVWMO and exception handling, LD and SD instructions are
considered to be misaligned loads and stores, with one additional constraint:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this text is a bit confusing - this only applies in case the access is not 8-byte aligned, right? Right now it can read as if LD and SD are always considered to be misaligned accesses.

Copy link
Member Author

@aswaterman aswaterman Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's deliberate, though. 8-byte alignment offers no special benefit for these instructions. (Of course, the uarch is likely to optimize them, but that's not the point here.) The only architecturally visible question is whether they're 4-byte aligned.

Copy link
Collaborator

@tovine tovine Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what you're saying is that implementations are free to consider them misaligned (and trap) regardless of actual alignment?
I'd very much hope that compilers will (or at least can) be instructed to make them 8-byte aligned, otherwise they bring almost none of the originally intended benefits over a pair of normal LW/SW.
One such intended benefit is to allow low-end MCU-class cores to better utilize the bus if they have one that is wider than the native machine width, but this is only possible if the wider access is then aligned, and those cores don't typically have the same kind of macro-op fusion capabilities as larger ones.

Copy link
Member Author

@aswaterman aswaterman Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a rhetorical tactic to simplify the description of the semantics, not a means to recommend what implementations should do. Obviously it's a good thing for implementations to implement these ops as wide bus xacts when possible. That doesn't run afoul of anything in the spec.

an LD or SD instruction whose effective address is a multiple of 4 gives rise
to two 4-byte memory operations.

NOTE: This definition permits LD and SD instructions giving rise to exactly one
christian-herber-nxp marked this conversation as resolved.
Show resolved Hide resolved
memory access, regardless of alignment.
It also permits decomposing instructions with 4-byte-aligned effective
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundante with the normative definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get behind deleting this NOTE if you think it's unhelpfully redundant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to update the note to be less repetitive and also clarify that the accesses are atomic. Maybe this helps with @tovine's comment.

aswaterman marked this conversation as resolved.
Show resolved Hide resolved
addresses into two operations, with no constraint on the order in which the
aswaterman marked this conversation as resolved.
Show resolved Hide resolved
aswaterman marked this conversation as resolved.
Show resolved Hide resolved
operations are performed.
aswaterman marked this conversation as resolved.
Show resolved Hide resolved
These decomposed sequences are interruptible.
Exceptions might occur on subsequent operations, making the effects of previous
operations within the same instruction visible.

NOTE: Software should make no assumptions about the number or order of
accessses these instructions might give rise to, beyond the 4-byte constraint
mentioned above.
For example, an interrupted store might overwrite the same bytes upon return
from the interrupt handler.

<<<

Expand Down Expand Up @@ -300,4 +285,4 @@ Stores a 64-bit value from registers `rs2'` and `rs2'+1`.
It computes an effective address by adding the zero-extended offset, scaled by 8, to the base address in register rs1'.
It expands to `sd rs2', offset(rs1')`.

Included in: <<zclsd>>
Included in: <<zclsd>>
Loading