[Feature Request] Lazy tvm tensor intrin registration is required to save import time #256

LeiWang1999 · 2024-12-03T11:33:13Z

Bitblas leveraged apache tvm to perform high performance code generation, within our tensor ir + schedule based template, we should register tensor intrin (for cuda tensor core, amd matrix core and lop3). There exists about hundreds of intrins that need to register during runtime, which requires interaction between cpp and python via ctypes, which can take several minutes to complete.

This results in a suboptimal user experience, even with our tilelang backend, we have to pass the registration during get hardware aware configs, some refactoring of code must be done to alleviate this overhead.

LeiWang1999 · 2024-12-08T11:41:24Z

Pull Request #255 has make some modifications related to this issue, we've put lop3 intrin registration into a lazy import form. But there still exists a bit registration overhead for tl backend (even it doesn't require any intrin to be registered) when we want to enable hardware aware tuning, this can be enhanced by replacing get_mma_intrin_group with a customized and fast one.

bitblas/gpu/matmul_analysis.py::650

if target.kind.name == "cuda" and check_sm_version(target.arch) >= 70:
    # TODO(lei): we should consider the dtype of the input a and b
    # instead of assuming both a and b share the same dtype.
    # As the tensorcore may supports e4m3_float8 * e5m2_float8
    in_dtype, out_dtype = get_in_out_dtypes(block_stmt)
    try:
        _ = get_mma_intrin_group(
            a_dtype=in_dtype,
            b_dtype=in_dtype,
            out_dtype=out_dtype,
        )
    except Exception:
        logger.debug("Cannot find the corresponding mma intrin group")
        return func, None

LeiWang1999 · 2024-12-13T16:49:54Z

Closed as recent pull request #262 has covered this issue :)

LeiWang1999 added the enhancement New feature or request label Dec 3, 2024

LeiWang1999 self-assigned this Dec 3, 2024

LeiWang1999 closed this as completed Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Lazy tvm tensor intrin registration is required to save import time #256

[Feature Request] Lazy tvm tensor intrin registration is required to save import time #256

LeiWang1999 commented Dec 3, 2024

LeiWang1999 commented Dec 8, 2024 •

edited

Loading

LeiWang1999 commented Dec 13, 2024

[Feature Request] Lazy tvm tensor intrin registration is required to save import time #256

[Feature Request] Lazy tvm tensor intrin registration is required to save import time #256

Comments

LeiWang1999 commented Dec 3, 2024

LeiWang1999 commented Dec 8, 2024 • edited Loading

LeiWang1999 commented Dec 13, 2024

LeiWang1999 commented Dec 8, 2024 •

edited

Loading