You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When utilize TileLang, some layout transformation like swizzling or padding will implicitly apply layout transformation, though this approach is efficient and powerful, but sometimes will lead to a crash for vectorization.
On Volta, applying a swizzle operation will adjust the memory layout to align with groups of 4 elements instead of 8 elements. This optimization enhances memory coalescing and data locality for efficient GPU execution.
We should enhance lower vectorize pass to automatically convert the vectorize stage into:
When utilize TileLang, some layout transformation like
swizzling
orpadding
will implicitly apply layout transformation, though this approach is efficient and powerful, but sometimes will lead to a crash for vectorization.Considering dequantize gemm on volta:
On Volta, applying a swizzle operation will adjust the memory layout to align with groups of 4 elements instead of 8 elements. This optimization enhances memory coalescing and data locality for efficient GPU execution.
We should enhance lower vectorize pass to automatically convert the vectorize stage into:
The text was updated successfully, but these errors were encountered: