This repository has been archived by the owner on Mar 20, 2024. It is now read-only.
POST V1.0 - support .vvm variants for slide1up/slide1down #358
Labels
Resolve after v1.0
Does not need to be resolved for v1.0 draft
This variant is a simple, low cost and extension consistent to the merge instructions:
Description:
This .vvm variant uses the vs1 vector register group as the source for the non-masked elements, replacing rd for this purpose. All tail elements are copied over from vs1.
As with .vx variants destination cannot overlap either source.
The ,vv non-masked variant should be reserved as it is equivalent to the .vx non-masked.
It's cost then is comparable to other variants to base instructions.
To be consistent with the slideup/slidedown instructions, when that mask bit is set, binary zero is the replacement value of the "shifted out element". (v0[0] for slide1up and v0[vl-1] for slide1down).
Equivalences:
Similarly for vslide1down.vvm it is equivalent to the corresponding two instruction sequence.
Addresses design restriction:
To allow slide1up/slide1down to be restartable, the destination cannot overlap source vector group registers..
As a result without the .vvm variant, a vmv or equivalent is required to address this restriction when it arises.
Known application:
The DUPH operation in the FFTW3 library can be implemented in a single instruction with a mask of (0,1,0,1,....)
Similarly the DUPL operation in a single operation (with a complement mask in v0)
Alternatives:
The two instruction equivalence above could be optimized via fusion or chaining. The RISCV synchronous interrupt requirement imposes considerable constraints on designs in vector implementation. In particular register renaming, re-buffering or deferring interrupts is required.
If the V synchronous processing were relaxed, numerous chaining, and especially fusing opportunities would be available to in-order non-speculative implementations. It would be my preference to prescribe such relaxation when only visible from interrupt context.
However, this slide1 .vvm case would still weigh toward its inclusion due to the minimal additional gates and thus being available to even the simplest implementations.
The text was updated successfully, but these errors were encountered: