Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 8 bit to quantization data type #3213

Merged
merged 4 commits into from
Feb 7, 2025
Merged

Conversation

ZiyueXu77
Copy link
Collaborator

Fixes FLARE-2374 .

Description

This QA report has an 8 bit input, and previously we have not include it in the valid datatype, but since now we have 4 bit schemes, it makes sense to add it to the original datatype

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@ZiyueXu77 ZiyueXu77 requested a review from holgerroth February 7, 2025 20:10
@ZiyueXu77
Copy link
Collaborator Author

/build

@ZiyueXu77 ZiyueXu77 enabled auto-merge (squash) February 7, 2025 20:14
@ZiyueXu77 ZiyueXu77 merged commit 50e3400 into NVIDIA:main Feb 7, 2025
20 checks passed
@ZiyueXu77 ZiyueXu77 deleted the quant_bug_fix branch February 7, 2025 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants