You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our project currently uses dlpack to cast Torch tensors to/from TVM runtime arguments. However, this approach introduces noticeable runtime overhead, as discussed here: Strange Overhead of TVM Runtime NDArray from DLPack.
For reference, some projects like BitBLAS implement custom CUDA-based solutions with ctype wrappers to avoid such overhead. However, these approaches often lack comprehensive support for handling tensor attributes, such as shape, strides, and data type.
A more appropriate and efficient solution would be to leverage the Torch C++ extension to directly bridge Torch tensors and TVM NDArray objects without introducing the overhead of dlpack conversions. This approach would maintain tensor attributes while improving runtime efficiency.
We should explore the feasibility of implementing this integration with Torch’s C++ extensions to mitigate the current performance bottleneck.
The text was updated successfully, but these errors were encountered:
Pull Request #12 introduced a JIT (Just-In-Time) component for TileLang, streamlining its functionality. Cpp extension can be built on top of this component to further enhance its capabilities.
Our project currently uses dlpack to cast Torch tensors to/from TVM runtime arguments. However, this approach introduces noticeable runtime overhead, as discussed here: Strange Overhead of TVM Runtime NDArray from DLPack.
For reference, some projects like BitBLAS implement custom CUDA-based solutions with ctype wrappers to avoid such overhead. However, these approaches often lack comprehensive support for handling tensor attributes, such as shape, strides, and data type.
A more appropriate and efficient solution would be to leverage the Torch C++ extension to directly bridge Torch tensors and TVM NDArray objects without introducing the overhead of dlpack conversions. This approach would maintain tensor attributes while improving runtime efficiency.
We should explore the feasibility of implementing this integration with Torch’s C++ extensions to mitigate the current performance bottleneck.
The text was updated successfully, but these errors were encountered: