You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for working with multiple 1-byte and 2-byte values packed into the native 4-byte integers.
We should offer both explicit access to these, which would be better structured and not a heap of idiosyncratic names (perhaps via the kat::array type? some other way?)
We should also check our existing code, to see when specializations are in order which would ensure we benefit from these instructions (e.g. in sequence operations or collaboration primitives).
The text was updated successfully, but these errors were encountered:
CUDA offers many functions:
https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SIMD.html
for working with multiple 1-byte and 2-byte values packed into the native 4-byte integers.
We should offer both explicit access to these, which would be better structured and not a heap of idiosyncratic names (perhaps via the
kat::array
type? some other way?)We should also check our existing code, to see when specializations are in order which would ensure we benefit from these instructions (e.g. in sequence operations or collaboration primitives).
The text was updated successfully, but these errors were encountered: