Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

POST V1.0 Fractional vtype field vfill – Fractional Fill order and Fractional Instruction eLement Location #421

Open
David-Horner opened this issue Apr 15, 2020 · 1 comment
Labels
Resolve after v1.0 Does not need to be resolved for v1.0 draft

Comments

@David-Horner
Copy link
Contributor

David-Horner commented Apr 15, 2020

The new field, vfill, fulfills two distinct purposes: fractional cluster order (fill) and selection (element location).
Corresponding masks segments are active for each selected cluster.
These are specific to three cases for fractional data for:

  1. For one vector operand instructions: provides the fill degree and order.
    Examples:
    load/store
    vclstr/vdclstr
    mask ordinal
    narrowing

  2. For two operand single-SEW instructions it determines the participating clusters.
    Examples:
    vadd.vv vadd.vi vfadd.vv
    vmseq.vv vmseq.vx

  3. For two operand widening instructions it determines the participating clusters.
    Examples:
    vwadd.vv vwadd.vx vwadd.wv

The structure and values are chosen to provide backward compatibility with LMUL>=1 , and to minimize the vfill state changes in typical code sequences.
Conceptually the field lmul is superseded by a two field pair: vlvl (identical to current lmul in size, values and location) and vfill (a new 2 bit field).
When vfill is zero vlvl determined LMUL level exactly as lmul field would do, with all non-fractional functionality working as before.
When vfill is non-zero additional fractional LMUL functionality is in effect.
Especially fractional levels 1/2, 1/4 and 1/8 are determined by non-zero vfill and vlvl value 1,2 and 3.
table:

Vlvl (prev lmul) vfill One vector operator: Two vector operands Widening comment
00 00 X0 X0 N/A LMUL=1 : Note 1
01 00 N/A N/A N/A LMUL=2
10 00 N/A N/A N/A LMUL=4
11 00 N/A N/A N/A LMUL=8
00 01 ~ ~ ~ reserved
00 1x ~ ~ ~ reserved
Odd:even Odd:even Odd:even
01 01 - X1 - X1 - X1 LMUL=1/2
01 10 X1 - X1 - X1 - LMUL=1/2
01 11 ~ Y1 X1 W1 X1 LMUL=1/2 : Note 3&4
10 01 - X2 - X2 - X2 LMUL=1/4
10 10 X2 - X2 - X2 - LMUL=1/4
10 11 ~ Y2 X2 W2 X2 LMUL=1/4
11 01 - X4 - X4 - X4 LMUL=1/8
11 10 X4 - X4 - X4 - LMUL=1/8
11 11 ~ Y4 X4 W4 X4 LMUL=1/8

Notes:
1 – LMUL=1 is transitional for fractional LMUL.
The structure is compatible in the limiting case of for both clustered and striped.
2 – For vfill=11 two operand, vl counts the pairs of operations.
3 – Consider: use vfill=11 single operand to process double the iterations of vfill=01.

Legend:

N/A not applicable to fractional operations

~ not a valid combination (reserved)

"-" gap of size equal to LMUL=1/2

X0 consecutively numbered elements (clusters with no gaps)
[i+n-1] .... [i+2] [i +1] [i+0] where n is 2 * CLSTR
and i is determined by two cluster boundary.

X1/2/4 can occupy even or odd sides of gap/cluster pair.

X1 consecutively numbered elements (clusters with equal size gap)
[i+n-1] .... [i+2] [i +1] [i+0] where n is number elements in a cluster
and i is determined by cluster boundary.

X2 same as X1 except effective cluster size is CLSTR / 2

X4 same as X1 except effective cluster size is CLSTR / 4

Y1/2/4 equivalent to X1/2/4 but occupy odd cluster location only.
These odd clusters are processed in tandem with the X even clusters, such that vl * 2 operations are performed.

W1/2/4 equivalent to X1/2/4 but occupy odd cluster location only.
for widening ops vs1 is sourced from this odd cluster location.
(while vs2 is sourced from even cluster location).
When vs1 = vs2 a single physical register sources both operands.

One vector operand instructions:
Load exemplifies the processing. Either the odd or even cluster in the gap/cluster, cluster/gap pair is chosen by vfill.

For even clusters, elements are filled from the lower bits until the cluster is filled, the gap is skipped and the next cluster filled, etc. until vl is exhausted.

For odd clusters, the initial gap of CLSTR bytes is skipped, the cluster is filled, the rest (if any of the CLSTR bytes is skipped to the next CLSTR gap/cluster pair, and the process repeated until vl is exhausted.

Note: the corresponding bits in V0 are used to mask elements for instruction with vm=0.

The same element numbering derived by load apply to store and all other one vector register instruction.

For two operand single-SEW instructions:
The same element numbering derived by load apply to each vector and the corresponding mask bits whether selected from the even or odd clusters.

For vfill= 01 or 10, both operands for the instruction are selected from either even or odd clusters, respectively, one from each of the two registers vs1 and vs2. The result is stored in the corresponding element in the even or odd cluster of vd, respectively.

For vfill=11, two operations occur for each value of vl. The even ( X ) elements are processed as described for vfill=01, with the result written to the element of the even vd cluster. The odd ( Y ) elements are processed as described for vfill=10, with the result written to the element of the odd vd cluster.

In all cases the corresponding bits in v0 for each used cluster element are in effect.

For two operand widening instructions

For vfill= 01 or 10 widening instructions select cluster source elements the same way as for two operand single-SEW instructions. However, the corresponding vd is in the next higher LMUL level. This odd/even works for 1/8 and 1/4 with correspondingly larger odd/even 1/4 and 1/2 clusters.
When LMUL=1/2 (vlvl=1) the vd result is always in the 2 * CLSTR sized group of elements.

For vfill=11 and LMUL=1/8 and 1/4 the two widening operations (even and odd) with the same vl value occurs as described in single-SEW instructions (replacing ( Y ) with (W)).

For vfill=11 and LMUL=1/2 only one widening operation occurs. As with vfill=01 or 10, vd is written into the 2 * CLSTR sized group of elements. However, the sources are chosen from both even and odd clusters. The element from vs2 ( X ) is selected from the even cluster, the element from vs1 ( W ) is selected from the odd cluster. This allows a single register to source both the elements for the widening operation when vs1 = vs2.

@David-Horner
Copy link
Contributor Author

This needs rework for a post V1.0 era.

Please flag as post v1.0.

@David-Horner David-Horner changed the title Fractional vtype field vfill – Fractional Fill order and Fractional Instruction eLement Location POST V1.0 Fractional vtype field vfill – Fractional Fill order and Fractional Instruction eLement Location Jun 28, 2020
@kasanovic kasanovic added the Resolve after v1.0 Does not need to be resolved for v1.0 draft label Jul 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Resolve after v1.0 Does not need to be resolved for v1.0 draft
Projects
None yet
Development

No branches or pull requests

2 participants