Add 8 bit to quantization data type #3213

ZiyueXu77 · 2025-02-07T20:10:07Z

Fixes FLARE-2374 .

Description

This QA report has an 8 bit input, and previously we have not include it in the valid datatype, but since now we have 4 bit schemes, it makes sense to add it to the original datatype

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

ZiyueXu77 · 2025-02-07T20:13:56Z

/build

ZiyueXu77 added 3 commits February 6, 2025 10:24

fix qa bug

3d8c6eb

Merge branch 'NVIDIA:main' into quant_bug_fix

2ec91e2

add 8bit as valid datatype

09d5ce2

ZiyueXu77 requested a review from holgerroth February 7, 2025 20:10

Merge branch 'main' into quant_bug_fix

6885121

ZiyueXu77 enabled auto-merge (squash) February 7, 2025 20:14

chesterxgchen approved these changes Feb 7, 2025

View reviewed changes

ZiyueXu77 merged commit 50e3400 into NVIDIA:main Feb 7, 2025
20 checks passed

ZiyueXu77 deleted the quant_bug_fix branch February 7, 2025 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 8 bit to quantization data type #3213

Add 8 bit to quantization data type #3213

ZiyueXu77 commented Feb 7, 2025

ZiyueXu77 commented Feb 7, 2025

Add 8 bit to quantization data type #3213

Add 8 bit to quantization data type #3213

Conversation

ZiyueXu77 commented Feb 7, 2025

Description

Types of changes

ZiyueXu77 commented Feb 7, 2025