This repository has been archived by the owner on Mar 20, 2024. It is now read-only.
POST V1.0 Fractional vtype field vfill – Fractional Fill order and Fractional Instruction eLement Location #421
Labels
Resolve after v1.0
Does not need to be resolved for v1.0 draft
The new field, vfill, fulfills two distinct purposes: fractional cluster order (fill) and selection (element location).
Corresponding masks segments are active for each selected cluster.
These are specific to three cases for fractional data for:
For one vector operand instructions: provides the fill degree and order.
Examples:
load/store
vclstr/vdclstr
mask ordinal
narrowing
For two operand single-SEW instructions it determines the participating clusters.
Examples:
vadd.vv vadd.vi vfadd.vv
vmseq.vv vmseq.vx
For two operand widening instructions it determines the participating clusters.
Examples:
vwadd.vv vwadd.vx vwadd.wv
The structure and values are chosen to provide backward compatibility with LMUL>=1 , and to minimize the vfill state changes in typical code sequences.
Conceptually the field lmul is superseded by a two field pair: vlvl (identical to current lmul in size, values and location) and vfill (a new 2 bit field).
When vfill is zero vlvl determined LMUL level exactly as lmul field would do, with all non-fractional functionality working as before.
When vfill is non-zero additional fractional LMUL functionality is in effect.
Especially fractional levels 1/2, 1/4 and 1/8 are determined by non-zero vfill and vlvl value 1,2 and 3.
table:
Notes:
1 – LMUL=1 is transitional for fractional LMUL.
The structure is compatible in the limiting case of for both clustered and striped.
2 – For vfill=11 two operand, vl counts the pairs of operations.
3 – Consider: use vfill=11 single operand to process double the iterations of vfill=01.
Legend:
N/A not applicable to fractional operations
~ not a valid combination (reserved)
"-" gap of size equal to LMUL=1/2
X0 consecutively numbered elements (clusters with no gaps)
[i+n-1] .... [i+2] [i +1] [i+0] where n is 2 * CLSTR
and i is determined by two cluster boundary.
X1/2/4 can occupy even or odd sides of gap/cluster pair.
X1 consecutively numbered elements (clusters with equal size gap)
[i+n-1] .... [i+2] [i +1] [i+0] where n is number elements in a cluster
and i is determined by cluster boundary.
X2 same as X1 except effective cluster size is CLSTR / 2
X4 same as X1 except effective cluster size is CLSTR / 4
Y1/2/4 equivalent to X1/2/4 but occupy odd cluster location only.
These odd clusters are processed in tandem with the X even clusters, such that vl * 2 operations are performed.
W1/2/4 equivalent to X1/2/4 but occupy odd cluster location only.
for widening ops vs1 is sourced from this odd cluster location.
(while vs2 is sourced from even cluster location).
When vs1 = vs2 a single physical register sources both operands.
One vector operand instructions:
Load exemplifies the processing. Either the odd or even cluster in the gap/cluster, cluster/gap pair is chosen by vfill.
For even clusters, elements are filled from the lower bits until the cluster is filled, the gap is skipped and the next cluster filled, etc. until vl is exhausted.
For odd clusters, the initial gap of CLSTR bytes is skipped, the cluster is filled, the rest (if any of the CLSTR bytes is skipped to the next CLSTR gap/cluster pair, and the process repeated until vl is exhausted.
Note: the corresponding bits in V0 are used to mask elements for instruction with vm=0.
The same element numbering derived by load apply to store and all other one vector register instruction.
For two operand single-SEW instructions:
The same element numbering derived by load apply to each vector and the corresponding mask bits whether selected from the even or odd clusters.
For vfill= 01 or 10, both operands for the instruction are selected from either even or odd clusters, respectively, one from each of the two registers vs1 and vs2. The result is stored in the corresponding element in the even or odd cluster of vd, respectively.
For vfill=11, two operations occur for each value of vl. The even ( X ) elements are processed as described for vfill=01, with the result written to the element of the even vd cluster. The odd ( Y ) elements are processed as described for vfill=10, with the result written to the element of the odd vd cluster.
In all cases the corresponding bits in v0 for each used cluster element are in effect.
For two operand widening instructions
For vfill= 01 or 10 widening instructions select cluster source elements the same way as for two operand single-SEW instructions. However, the corresponding vd is in the next higher LMUL level. This odd/even works for 1/8 and 1/4 with correspondingly larger odd/even 1/4 and 1/2 clusters.
When LMUL=1/2 (vlvl=1) the vd result is always in the 2 * CLSTR sized group of elements.
For vfill=11 and LMUL=1/8 and 1/4 the two widening operations (even and odd) with the same vl value occurs as described in single-SEW instructions (replacing ( Y ) with (W)).
For vfill=11 and LMUL=1/2 only one widening operation occurs. As with vfill=01 or 10, vd is written into the 2 * CLSTR sized group of elements. However, the sources are chosen from both even and odd clusters. The element from vs2 ( X ) is selected from the even cluster, the element from vs1 ( W ) is selected from the odd cluster. This allows a single register to source both the elements for the widening operation when vs1 = vs2.
The text was updated successfully, but these errors were encountered: