feature: introduce int8 quant kernel #288

pommedeterresautee · 2023-02-16T08:42:52Z

Description

Quantization requires to be able to perform int-8 matmul on GPU with a bias and a scaler (symmetric quant).

Right now, PyTorch has no support for those things, but Triton should work.

Is this request already on the discussion forum?

No

Motivation

Quantization would be a very useful feature.

Have you tried to implement it?

No response

Self-service

I would be willing to contribute to this feature myself.

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

pommedeterresautee added the feature label Feb 16, 2023

pommedeterresautee self-assigned this Feb 16, 2023

pommedeterresautee added the performance make things faster, always label Feb 16, 2023

pommedeterresautee linked a pull request Feb 28, 2023 that will close this issue

feat: add support for int8 quantization on linear layers #299

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: introduce int8 quant kernel #288

feature: introduce int8 quant kernel #288

pommedeterresautee commented Feb 16, 2023

feature: introduce int8 quant kernel #288

feature: introduce int8 quant kernel #288

Comments

pommedeterresautee commented Feb 16, 2023

Description

Is this request already on the discussion forum?

Motivation

Have you tried to implement it?

Self-service

Code of Conduct