Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

POST V1.0 - support .vvm variants for slide1up/slide1down #358

Open
David-Horner opened this issue Jan 2, 2020 · 2 comments
Open

POST V1.0 - support .vvm variants for slide1up/slide1down #358

David-Horner opened this issue Jan 2, 2020 · 2 comments
Labels
Resolve after v1.0 Does not need to be resolved for v1.0 draft

Comments

@David-Horner
Copy link
Contributor

David-Horner commented Jan 2, 2020

This variant is a simple, low cost and extension consistent to the merge instructions:

Description:
This .vvm variant uses the vs1 vector register group as the source for the non-masked elements, replacing rd for this purpose. All tail elements are copied over from vs1.
As with .vx variants destination cannot overlap either source.

The ,vv non-masked variant should be reserved as it is equivalent to the .vx non-masked.

It is a natural extension to the instruction and it parallels the .vvm variant of merge.

It's cost then is comparable to other variants to base instructions.

To be consistent with the slideup/slidedown instructions, when that mask bit is set, binary zero is the replacement value of the "shifted out element". (v0[0] for slide1up and v0[vl-1] for slide1down).

Equivalences:

vslide1up.vvm vd, vs2, vs1, v0 replaces the two instruction sequence:

    vmv vd,vs1

    slide1up.vvm vd, vs2, x0, v0 

Similarly for vslide1down.vvm it is equivalent to the corresponding two instruction sequence.

Addresses design restriction:

To allow slide1up/slide1down to be restartable, the destination cannot overlap source vector group registers..
As a result without the .vvm variant, a vmv or equivalent is required to address this restriction when it arises.

Known application:

The DUPH operation in the FFTW3 library can be implemented in a single instruction with a mask of (0,1,0,1,....)

    vslide1up.vvm  vd, vs1, vs1, v0

Similarly the DUPL operation in a single operation (with a complement mask in v0)

Alternatives:

The two instruction equivalence above could be optimized via fusion or chaining. The RISCV synchronous interrupt requirement imposes considerable constraints on designs in vector implementation. In particular register renaming, re-buffering or deferring interrupts is required.

If the V synchronous processing were relaxed, numerous chaining, and especially fusing opportunities would be available to in-order non-speculative implementations. It would be my preference to prescribe such relaxation when only visible from interrupt context.

However, this slide1 .vvm case would still weigh toward its inclusion due to the minimal additional gates and thus being available to even the simplest implementations.

@David-Horner David-Horner changed the title support .vv variants for masked slide1up/slide1down support .vvm variants for slide1up/slide1down Jan 2, 2020
@kasanovic
Copy link
Collaborator

I can see the appeal but OTOH, this requires mask register is setup and might only make sense for even-odd interleaving which could end up being handled with EDIV?

@David-Horner
Copy link
Contributor Author

This does not need to be decided before V1.0.
Deferring also defers the consideration of reserving the non-mask variant as a duplicate of another instruction.

@David-Horner David-Horner changed the title support .vvm variants for slide1up/slide1down POST V1.0 - support .vvm variants for slide1up/slide1down Jun 28, 2020
@kasanovic kasanovic added the Resolve after v1.0 Does not need to be resolved for v1.0 draft label Jun 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Resolve after v1.0 Does not need to be resolved for v1.0 draft
Projects
None yet
Development

No branches or pull requests

2 participants