Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

Hongbosherlock · 2024-02-02T13:39:55Z

for example here:
https://github.com/AniZpZ/AutoSmoothQuant/blob/main/autosmoothquant/models/llama.py#L89

int8_module.q_proj = W8A8BFP32OFP32Linear.from_float(module.q_proj, attn_input_scale, 
int8_module.o_proj = W8A8BFP32OFP32LinearWithQuantScale.from_float(
            module.o_proj, out_input_scale, act_quant=int8_module.o_quant_type)

Is the difference whether it involvesquant_scale or not?

quant_scale is for activition x and dequant_scale is for weight, right ?

The text was updated successfully, but these errors were encountered:

Hongbosherlock · 2024-02-02T13:43:06Z

Can I have your WeChat for further discussion? I have some exciting concepts in mind for this project. I'm also willing to make a small contribution.
My WeChat ID is the lowercase of my GitHub nickname.

AniZpZ · 2024-02-04T02:29:22Z

Scales have been statically applied to the weights (offline). The differences between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear are engineering-related, not algorithmic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

Hongbosherlock commented Feb 2, 2024

Hongbosherlock commented Feb 2, 2024 •

edited

Loading

AniZpZ commented Feb 4, 2024

Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

Difference between W8A8BFP32OFP32LinearWithQuantScale and W8A8BFP32OFP32Linear #10

Comments

Hongbosherlock commented Feb 2, 2024

Hongbosherlock commented Feb 2, 2024 • edited Loading

AniZpZ commented Feb 4, 2024

Hongbosherlock commented Feb 2, 2024 •

edited

Loading