Suggestion for signed to unsigned saturation vnclip variant (code attached) #287

ZPedro · 2019-09-05T14:38:57Z

In this issue it is suggested to add a vnclip variant performing signed to unsigned saturation. The impetus is for signal processing applications where both underflow and overflow need to be guarded against, and where the canonical output format is unsigned. Best workaround I could figure is (assuming 16->8 bit narrowing and an immediate scale factor):

vsadd.vx v8, v8, x31 # (-128)<<scale
vnclip.vi v4, v8, scale
vxor.vx v4, v4, x30 # 0x80

This workaround mobilizes two scalar registers for holding constants, and requires one or two additional instructions (in some cases adding (-128)<<scale can be folded with an earlier addition) per vnclip.

This suggestion is the main takeaway from an attempt to implement planar YUV 420 to array-of-structures RGB conversion using the vector extension. I came up with two versions (both attached): one that relies on segmented loads, and one that relies on vrgather to undo chroma subsampling, the latter of which can conceptually scale to a subsampling factor of 3 (or other non-power of two). Coefficients are taken from Poynton.

Writing this code brought additional insights, but those will have to be elaborated in a separate issue.

YUV420p2RGBX8888-vlseg2b.txt

YUV420p2RGBX8888-vrgather.txt

kasanovic · 2019-12-07T01:05:56Z

I could see adding vnclipsu and vnclipus to handle both possible changes in signedness, but this would chew up some more encoding space and probably require existing encodings to move.

kasanovic · 2019-12-07T01:21:12Z

If the vxsat flag is not important, then
vmax.vi v8, v8, 0 # Clip negative to zero
vnclipu.vi v4, v8, scale
will perform scaled clip of signed to unsigned without using scalar registers.

ZPedro · 2019-12-10T11:40:39Z

Good one; while I could see some applications (e.g. studio transcoding) caring about vxsat, the applications I have worked on definitely don't. I am still not used to thinking with readily available min and max instructions (however, I believe you mean vmax.vx v8, v8, x0).

I trust you will find an appropriate balance between encoding space usage, implementation constraints, and software needs; for reference, in the vlseg2b variant of planar YUV 420 to array-of-structures RGB conversion, the main loop dynamically executes 65 non-vsetvli vector instructions, regardless of the workaround used, while the loop would only require 53 if vclipsu.vi was available.

kasanovic · 2019-12-12T19:21:23Z

Yes, should have been vmax.vx v8, v8, x0. The immediate max/min forms were not included as given the available small immediate field, they are not very useful except for zero value, which is supported through x0.

ZPedro · 2020-06-07T15:20:54Z

I updated my code for v0.9 of the spec:

Switched to vwmul and friends following removal of vwsmacc; results in a net decrease in instruction count, but a lot of those were vmv.v.i vx, 0, so in terms of meaningful instructions the net result is rather an increase.
Switched to new EEW load and store instructions. No impact.
Adopted modern vsetvli method for keeping vl as it was.
Miscellaneous fixes.

YUV420p2RGBX8888-vlseg2b.txt
YUV420p2RGBX8888-vrgather.txt

ZPedro mentioned this issue Apr 20, 2020

Change SEW be the "largest element width" #425

Closed

David-Horner mentioned this issue Apr 21, 2020

For V1.0 - Make unsigned scalar integer in widening instructions 2 * SEW #427

Closed

kasanovic added the Resolve after v1.0 Does not need to be resolved for v1.0 draft label Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for signed to unsigned saturation vnclip variant (code attached) #287

Suggestion for signed to unsigned saturation vnclip variant (code attached) #287

ZPedro commented Sep 5, 2019

kasanovic commented Dec 7, 2019

kasanovic commented Dec 7, 2019 •

edited

Loading

ZPedro commented Dec 10, 2019

kasanovic commented Dec 12, 2019

ZPedro commented Jun 7, 2020

Suggestion for signed to unsigned saturation vnclip variant (code attached) #287

Suggestion for signed to unsigned saturation vnclip variant (code attached) #287

Comments

ZPedro commented Sep 5, 2019

kasanovic commented Dec 7, 2019

kasanovic commented Dec 7, 2019 • edited Loading

ZPedro commented Dec 10, 2019

kasanovic commented Dec 12, 2019

ZPedro commented Jun 7, 2020

kasanovic commented Dec 7, 2019 •

edited

Loading