Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

Suggestion for signed to unsigned saturation vnclip variant (code attached) #287

Open
ZPedro opened this issue Sep 5, 2019 · 5 comments
Labels
Resolve after v1.0 Does not need to be resolved for v1.0 draft

Comments

@ZPedro
Copy link

ZPedro commented Sep 5, 2019

In this issue it is suggested to add a vnclip variant performing signed to unsigned saturation. The impetus is for signal processing applications where both underflow and overflow need to be guarded against, and where the canonical output format is unsigned. Best workaround I could figure is (assuming 16->8 bit narrowing and an immediate scale factor):

  • vsadd.vx v8, v8, x31 # (-128)<<scale
  • vnclip.vi v4, v8, scale
  • vxor.vx v4, v4, x30 # 0x80

This workaround mobilizes two scalar registers for holding constants, and requires one or two additional instructions (in some cases adding (-128)<<scale can be folded with an earlier addition) per vnclip.

This suggestion is the main takeaway from an attempt to implement planar YUV 420 to array-of-structures RGB conversion using the vector extension. I came up with two versions (both attached): one that relies on segmented loads, and one that relies on vrgather to undo chroma subsampling, the latter of which can conceptually scale to a subsampling factor of 3 (or other non-power of two). Coefficients are taken from Poynton.

Writing this code brought additional insights, but those will have to be elaborated in a separate issue.

YUV420p2RGBX8888-vlseg2b.txt

YUV420p2RGBX8888-vrgather.txt

@kasanovic
Copy link
Collaborator

I could see adding vnclipsu and vnclipus to handle both possible changes in signedness, but this would chew up some more encoding space and probably require existing encodings to move.

@kasanovic
Copy link
Collaborator

kasanovic commented Dec 7, 2019

If the vxsat flag is not important, then
vmax.vi v8, v8, 0 # Clip negative to zero
vnclipu.vi v4, v8, scale
will perform scaled clip of signed to unsigned without using scalar registers.

@ZPedro
Copy link
Author

ZPedro commented Dec 10, 2019

Good one; while I could see some applications (e.g. studio transcoding) caring about vxsat, the applications I have worked on definitely don't. I am still not used to thinking with readily available min and max instructions (however, I believe you mean vmax.vx v8, v8, x0).

I trust you will find an appropriate balance between encoding space usage, implementation constraints, and software needs; for reference, in the vlseg2b variant of planar YUV 420 to array-of-structures RGB conversion, the main loop dynamically executes 65 non-vsetvli vector instructions, regardless of the workaround used, while the loop would only require 53 if vclipsu.vi was available.

@kasanovic
Copy link
Collaborator

Yes, should have been vmax.vx v8, v8, x0. The immediate max/min forms were not included as given the available small immediate field, they are not very useful except for zero value, which is supported through x0.

@ZPedro
Copy link
Author

ZPedro commented Jun 7, 2020

I updated my code for v0.9 of the spec:

  • Switched to vwmul and friends following removal of vwsmacc; results in a net decrease in instruction count, but a lot of those were vmv.v.i vx, 0, so in terms of meaningful instructions the net result is rather an increase.
  • Switched to new EEW load and store instructions. No impact.
  • Adopted modern vsetvli method for keeping vl as it was.
  • Miscellaneous fixes.

YUV420p2RGBX8888-vlseg2b.txt
YUV420p2RGBX8888-vrgather.txt

@kasanovic kasanovic added the Resolve after v1.0 Does not need to be resolved for v1.0 draft label Jun 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Resolve after v1.0 Does not need to be resolved for v1.0 draft
Projects
None yet
Development

No branches or pull requests

2 participants