(WIP) bitnet and t-mac #23540

liqunfu · 2025-01-30T03:12:48Z

Preparation for 2bit T-MAC and ternary bit BitNet implementation.

2bit T-MAC implementation to be added at onnxruntime/core/mlas/lib/sqnbitgemm_bitnet_kernel_avx2.cpp and qnbitgemm_kernel_neon.cpp (Q2BitGemmXXX).

Q2BitGemmPackQuantBDataSize returns size of packed quant weight so that mlas allocates memory for it.
SQ2BitGemmPackQuantBData does packing of quantized weights. See SQ4BitGemmPackQuantBData for reference.
Q2BitGemmPerGemmWorkspaceSize returns size of workspace needed for activation. It is likely the same as for 4bit.
SQ2BitGemmKernel_CompInt8 does the matmul compute. It takes quantA, quantB, computes output. SQ2BitGemmKernel_CompInt8 shall be called from SQ2BitGemm_CompInt8 which need to be implemented too. see SQ4BitGemm_CompInt8 for reference.

BitNet implementation can be added later.
Tests for mlas function is at onnxruntime\test\mlas\unittest\test_sqnbitgemm.cpp by uncommenting SQNBitGemmShortExecuteTest<2, blklen>::RegisterShortExecuteTests();
matmulnbit kernel implementation is at: onnxruntime\contrib_ops\cpu\quantization\matmul_nbits.cc
tests for matmulnbit kernel are at onnxruntime\test\contrib_ops\matmul_4bits_test.cc by enabling DISABLED_Float32_Accuracy4_Q2.

Signed-off-by: Liqun Fu <[email protected]>

jywu-msft · 2025-01-30T06:14:04Z

Can you add description/context ?

Signed-off-by: Liqun Fu <[email protected]>

onnxruntime/core/mlas/lib/q4_dq.cpp

@@ -402,7 +402,8 @@
 struct BlockwiseQuantizer {
    // To support other qbits, need to add bit packing code for
    // storing to dst and zero points
-    static_assert(qbits == 4, "Only 4b block quantization is supported!");
+    static_assert(qbits == 4 || qbits == 2, "Only 4b block quantization is supported!");
+    //static_assert(qbits != 2 || Columnwise, "Only support Columnwise in qbits == 2 case.");


Signed-off-by: Liqun Fu <[email protected]>

onnxruntime/core/mlas/lib/sqnbitgemm_bitnet_kernel_avx2.cpp

+    switch (ComputeType) {
+        case SQNBIT_CompInt8: {
+            // workspace buffer is used for block quantization of A to int8
+            const size_t BlockCountK = MlasDivRoundup(K, BlkLen);
+            // QuantData + Scale
+            const size_t PerGemmWorkspaceSize = M * BlockCountK * Q8BlkSize(BlkLen);
+            return PerGemmWorkspaceSize;
+        }
+        default: {
+            return 0;
+        }
+    }


Signed-off-by: Liqun Fu <[email protected]>

fajin-corp · 2025-02-04T19:08:14Z

onnxruntime/core/mlas/lib/q4_dq.cpp

                            }
-                            const uint8_t vi1 = weights[j * q_rows + i / 2] >> 4;
-                            const float v1 = (static_cast<float>(vi1) - zp1) * scale1;
-                            dst[j * rows + (i + 1)] = static_cast<ElementT>(v1);
                        }
                    }


maybe separate the template specializations for qnbits so the code would be cleaner?

fajin-corp · 2025-02-04T19:10:33Z

                    range2scale<Tin, 4, signed_quant>(vmin_t[i + 1], vmax_t[i + 1], scale1_tt);

this might be wrong

Refers to: onnxruntime/core/mlas/lib/q4_dq.cpp:983 in b4aad01. [](commit_id = b4aad01, deletion_comment = False)

fajin-corp · 2025-02-04T19:11:25Z

onnxruntime/core/mlas/lib/q4_dq.cpp

-
-template <typename Tin, bool signed_quant>
-struct BlockwiseQDQQuantizer<Tin, 4, signed_quant> {
+struct BlockwiseQDQQuantizer {


would it be better to separate the specializations for different qbits?

fajin-corp · 2025-02-04T19:17:10Z

onnxruntime/core/mlas/lib/qnbitgemm_kernel_neon.cpp

+}
+
+size_t
+SQ2BitGemmKernel_CompInt8_avx2(


SQ2BitGemmKernel_CompInt8_avx2

avx2 kernel should not appear in this file

fajin-corp · 2025-02-04T19:18:42Z

typedef size_t(SQ4BitGemmKernel_CompInt8_Fn)(

rename to SQNBitGemmKernel_CompInt8_Fn?

Refers to: onnxruntime/core/mlas/lib/qnbitgemm.h:338 in b4aad01. [](commit_id = b4aad01, deletion_comment = False)

fajin-corp · 2025-02-04T19:19:22Z

typedef size_t(Q4BitGemmPackQuantBDataSize_Fn)(

rename to QNBitGemmPackQuantBDataSize_Fn?

Refers to: onnxruntime/core/mlas/lib/qnbitgemm.h:94 in b4aad01. [](commit_id = b4aad01, deletion_comment = False)

fajin-corp · 2025-02-04T19:19:42Z

onnxruntime/core/mlas/lib/qnbitgemm.h

@@ -113,6 +120,7 @@ struct MLAS_QNBIT_GEMM_DISPATCH {

    Q4BitGemmPackQuantBData_Fn* SQ4BitGemmPackQuantBData = nullptr;
    Q4BitGemmPackQuantBData_Fn* HQ4BitGemmPackQuantBData = nullptr;
+    Q4BitGemmPackQuantBData_Fn* SQ2BitGemmPackQuantBData = nullptr;


Q4BitGemmPackQuantBData_Fn

rename to QNBitGemmPackQuantBData_Fn

init code structure for matmul 2 bits

5484560

Signed-off-by: Liqun Fu <[email protected]>

liqunfu requested a review from a team as a code owner January 30, 2025 03:12

liqunfu changed the title ~~bitnet and t-mac~~ (WIP) bitnet and t-mac Jan 31, 2025

add and pass q4dq tests for q2bit - rename file and test name later

8c1cfe1

Signed-off-by: Liqun Fu <[email protected]>

github-advanced-security bot found potential problems Jan 31, 2025

View reviewed changes

some fixes

f6f22e3

Signed-off-by: Liqun Fu <[email protected]>

github-advanced-security bot found potential problems Jan 31, 2025

View reviewed changes

liqunfu added 3 commits February 3, 2025 12:24

add apis to neon and other avxs

3e1a951

Signed-off-by: Liqun Fu <[email protected]>

fix neon build

0130061

Signed-off-by: Liqun Fu <[email protected]>

disable 2bit test

b4aad01

Signed-off-by: Liqun Fu <[email protected]>

fajin-corp reviewed Feb 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) bitnet and t-mac #23540

(WIP) bitnet and t-mac #23540

liqunfu commented Jan 30, 2025 •

edited

Loading

jywu-msft commented Jan 30, 2025

fajin-corp Feb 4, 2025

fajin-corp commented Feb 4, 2025

fajin-corp Feb 4, 2025

fajin-corp Feb 4, 2025

fajin-corp commented Feb 4, 2025

fajin-corp commented Feb 4, 2025

fajin-corp Feb 4, 2025

(WIP) bitnet and t-mac #23540

Are you sure you want to change the base?

(WIP) bitnet and t-mac #23540

Conversation

liqunfu commented Jan 30, 2025 • edited Loading

jywu-msft commented Jan 30, 2025

fajin-corp Feb 4, 2025

Choose a reason for hiding this comment

fajin-corp commented Feb 4, 2025

fajin-corp Feb 4, 2025

Choose a reason for hiding this comment

fajin-corp Feb 4, 2025

Choose a reason for hiding this comment

fajin-corp commented Feb 4, 2025

fajin-corp commented Feb 4, 2025

fajin-corp Feb 4, 2025

Choose a reason for hiding this comment

liqunfu commented Jan 30, 2025 •

edited

Loading