POST V1.0 - support .vvm variants for slide1up/slide1down #358

David-Horner · 2020-01-02T06:52:29Z

This variant is a simple, low cost and extension consistent to the merge instructions:

Description:
This .vvm variant uses the vs1 vector register group as the source for the non-masked elements, replacing rd for this purpose. All tail elements are copied over from vs1.
As with .vx variants destination cannot overlap either source.

The ,vv non-masked variant should be reserved as it is equivalent to the .vx non-masked.

It is a natural extension to the instruction and it parallels the .vvm variant of merge.

It's cost then is comparable to other variants to base instructions.

To be consistent with the slideup/slidedown instructions, when that mask bit is set, binary zero is the replacement value of the "shifted out element". (v0[0] for slide1up and v0[vl-1] for slide1down).

Equivalences:

vslide1up.vvm vd, vs2, vs1, v0 replaces the two instruction sequence:

    vmv vd,vs1

    slide1up.vvm vd, vs2, x0, v0

Similarly for vslide1down.vvm it is equivalent to the corresponding two instruction sequence.

Addresses design restriction:

To allow slide1up/slide1down to be restartable, the destination cannot overlap source vector group registers..
As a result without the .vvm variant, a vmv or equivalent is required to address this restriction when it arises.

Known application:

The DUPH operation in the FFTW3 library can be implemented in a single instruction with a mask of (0,1,0,1,....)

    vslide1up.vvm  vd, vs1, vs1, v0

Similarly the DUPL operation in a single operation (with a complement mask in v0)

Alternatives:

The two instruction equivalence above could be optimized via fusion or chaining. The RISCV synchronous interrupt requirement imposes considerable constraints on designs in vector implementation. In particular register renaming, re-buffering or deferring interrupts is required.

If the V synchronous processing were relaxed, numerous chaining, and especially fusing opportunities would be available to in-order non-speculative implementations. It would be my preference to prescribe such relaxation when only visible from interrupt context.

However, this slide1 .vvm case would still weigh toward its inclusion due to the minimal additional gates and thus being available to even the simplest implementations.

The text was updated successfully, but these errors were encountered:

kasanovic · 2020-01-23T00:01:42Z

I can see the appeal but OTOH, this requires mask register is setup and might only make sense for even-odd interleaving which could end up being handled with EDIV?

David-Horner · 2020-06-28T18:30:24Z

This does not need to be decided before V1.0.
Deferring also defers the consideration of reserving the non-mask variant as a duplicate of another instruction.

David-Horner changed the title ~~support .vv variants for masked slide1up/slide1down~~ support .vvm variants for slide1up/slide1down Jan 2, 2020

David-Horner changed the title ~~support .vvm variants for slide1up/slide1down~~ POST V1.0 - support .vvm variants for slide1up/slide1down Jun 28, 2020

kasanovic added the Resolve after v1.0 Does not need to be resolved for v1.0 draft label Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POST V1.0 - support .vvm variants for slide1up/slide1down #358

POST V1.0 - support .vvm variants for slide1up/slide1down #358

David-Horner commented Jan 2, 2020 •

edited

Loading

kasanovic commented Jan 23, 2020

David-Horner commented Jun 28, 2020

POST V1.0 - support .vvm variants for slide1up/slide1down #358

POST V1.0 - support .vvm variants for slide1up/slide1down #358

Comments

David-Horner commented Jan 2, 2020 • edited Loading

kasanovic commented Jan 23, 2020

David-Horner commented Jun 28, 2020

David-Horner commented Jan 2, 2020 •

edited

Loading