You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Re-implement the loader to issue cp.async directly instead of using CuTe.
Re-implement the storer to issue the corresponding PTX for storing, avoiding CuTe.
Benefits:
This refactor will offer several advantages and help achieve a cleaner codebase. Currently, Cutlass requires its own Layout, which necessitates wrapping the parameters for Cutlass in various places. This results in code that is messy and difficult to manage. By eliminating the use of CuTe in favor of direct PTX instructions, we can streamline the implementation and improve code clarity.
The text was updated successfully, but these errors were encountered:
Description:
cp.async
directly instead of using CuTe.Benefits:
This refactor will offer several advantages and help achieve a cleaner codebase. Currently, Cutlass requires its own
Layout
, which necessitates wrapping the parameters for Cutlass in various places. This results in code that is messy and difficult to manage. By eliminating the use of CuTe in favor of direct PTX instructions, we can streamline the implementation and improve code clarity.The text was updated successfully, but these errors were encountered: