Shall we have insert instructions? #326

HanKuanChen · 2019-11-13T09:28:26Z

I notice #276 and #318 may require insert instructions.

In addition, if vinsert is supported, the following problems can be solved easily.

vinsert.vv vd, vs2, vs1, vm  # vd[vs2[i]] = vs1[i]
vinsert.vx vd, vs2, rs1, vm  # vd[vs2[i]] = x[rs1]
vinsert.vi vd, vs2, uimm, vm # vd[vs2[i]] = uimm

for(i=0;i!=n;++i)
{
    # compute fn
    fsw fn, 0(buffer)
    addi buffer, buffer, 4
}
# f0, f1, f2, ..., fn
vlw.v v0, (buffer)
vfmul.vv v0, coefficient, v0
vsw.v v0, (des)

If memory is slow, load and store will reduce performance.

for(i=0;i!=n;++i)
{
    # compute xn
    fmv.x.w xn, fn
    vslide1down.vx v0, v0, xn
}
vfmul.vv v0, coefficient, v0
vsw.v v0, (des)

If vl is very big, vslide1down.vx will reduce performance.

li one, 1
for(i=0;i!=n;++i)
{
    # compute xn
    fmv.x.w xn, fn
    vmv.s.x vindex, i
    vsetvli x0, one, e32, m8
    vinsert.vx v0, vindex, xn
    # set vl and vtype
}
vfmul.vv v0, coefficient, v0
vsw.v v0, (des)

It solves the above problems. No memory transfer is required, and vl is always 1.

The text was updated successfully, but these errors were encountered:

aswaterman · 2019-11-13T13:18:13Z

We have considered this instruction. This is what we wrote in 9a4d92e:

NOTE: The complementary vins.v.x instruction, which allows a write
to any element in a vector register, has been removed. This
instruction would be the only instruction (apart from vsetvl) that
requires two integer source operands, and also would be slow to
execute in an implementation with vector register renaming, relegating
its main use to debugger modifications to state. The alternative and
more generally useful vslide1up and vslide1down instructions can
be used to update vector register state in place over a debug link
without accessing memory.

For your example, option 1 is the best answer. If fsw -> vlw performance is poor, you're fucked anyway :)

HanKuanChen · 2019-11-14T01:54:27Z

Why?
If buffer is located at DRAM because the system has a very limited SRAM. Load/store from DRAM causes performance issue.
But option 2 and 3 only do register operations, no DRAM access, and the speed purely depends on CPU frequency (which is faster than memory access), isn't it?

kasanovic · 2019-11-15T07:40:26Z

You don't put the buffer in DRAM. If you can't spare one vector register's worth of SRAM space, then you're in a strangely configured system (big vector registers but tiny SRAM). Also note that using a memory buffer allows you to unroll and remove index updates (e.g., fsw f0, (a0); fsw f1, 4(a0); fsw f2, 8(a0),...).

kasanovic added the Resolve after v1.0 Does not need to be resolved for v1.0 draft label Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shall we have insert instructions? #326

Shall we have insert instructions? #326

HanKuanChen commented Nov 13, 2019

aswaterman commented Nov 13, 2019

HanKuanChen commented Nov 14, 2019

kasanovic commented Nov 15, 2019

Shall we have insert instructions? #326

Shall we have insert instructions? #326

Comments

HanKuanChen commented Nov 13, 2019

aswaterman commented Nov 13, 2019

HanKuanChen commented Nov 14, 2019

kasanovic commented Nov 15, 2019