You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.
Unlike the float case where the fused-vs-unfused issue creates complications (PR #79) in the integer case there is no downside to using single-instruction multiply-add. These are vital to getting above 50% of peak performance in key use cases such as matrix multiplication.
In general, these will support different combinations of bit-widths for the accumulator vs the mul operands.
A variant of this is the dot-product instructions discussed in PR #127. We need both these dot-product instructions, and general element-wise integer multiply-add.
Note that these are often used in kernels that are using nearly all available SIMD registers. That is why an approach of not exposing mul-add instructions in WebAsm and trying to let the compiler still transform code to use them, would often result in unwanted spillage. In fact, the source code will often be tailored to use a specific number of SIMD registers in the first place; not offering a multiply-add instruction to the source, requiring it to use separate Mul and Add with intermediate registers, would hinder that.
The text was updated successfully, but these errors were encountered:
Unlike the float case where the fused-vs-unfused issue creates complications (PR #79) in the integer case there is no downside to using single-instruction multiply-add. These are vital to getting above 50% of peak performance in key use cases such as matrix multiplication.
In general, these will support different combinations of bit-widths for the accumulator vs the mul operands.
A variant of this is the dot-product instructions discussed in PR #127. We need both these dot-product instructions, and general element-wise integer multiply-add.
Note that these are often used in kernels that are using nearly all available SIMD registers. That is why an approach of not exposing mul-add instructions in WebAsm and trying to let the compiler still transform code to use them, would often result in unwanted spillage. In fact, the source code will often be tailored to use a specific number of SIMD registers in the first place; not offering a multiply-add instruction to the source, requiring it to use separate Mul and Add with intermediate registers, would hinder that.
The text was updated successfully, but these errors were encountered: