llama2-7b-chat量化完推理报错 #24

AlexMa0 · 2024-07-09T08:54:08Z

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:42<00:00, 21.33s/it]
Some weights of the model checkpoint at /data/AutoSmoothQuant/quantized_model/llama2-7b-chat/Llama-2-7b-chat-hf-smoothquant were not used when initializing Int8LlamaForCausalLM: ['model.layers.0.mlp.down_proj.quant_scale', 'model.layers.0.self_attn.o_proj.quant_scale', 'model.layers.1.mlp.down_proj.quant_scale', 'model.layers.1.self_attn.o_proj.quant_scale', 'model.layers.10.mlp.down_proj.quant_scale', 'model.layers.10.self_attn.o_proj.quant_scale', 'model.layers.11.mlp.down_proj.quant_scale', 'model.layers.11.self_attn.o_proj.quant_scale', 'model.layers.12.mlp.down_proj.quant_scale', 'model.layers.12.self_attn.o_proj.quant_scale', 'model.layers.13.mlp.down_proj.quant_scale', 'model.layers.13.self_attn.o_proj.quant_scale', 'model.layers.14.mlp.down_proj.quant_scale', 'model.layers.14.self_attn.o_proj.quant_scale', 'model.layers.15.mlp.down_proj.quant_scale', 'model.layers.15.self_attn.o_proj.quant_scale', 'model.layers.16.mlp.down_proj.quant_scale', 'model.layers.16.self_attn.o_proj.quant_scale', 'model.layers.17.mlp.down_proj.quant_scale', 'model.layers.17.self_attn.o_proj.quant_scale', 'model.layers.18.mlp.down_proj.quant_scale', 'model.layers.18.self_attn.o_proj.quant_scale', 'model.layers.19.mlp.down_proj.quant_scale', 'model.layers.19.self_attn.o_proj.quant_scale', 'model.layers.2.mlp.down_proj.quant_scale', 'model.layers.2.self_attn.o_proj.quant_scale', 'model.layers.20.mlp.down_proj.quant_scale', 'model.layers.20.self_attn.o_proj.quant_scale', 'model.layers.21.mlp.down_proj.quant_scale', 'model.layers.21.self_attn.o_proj.quant_scale', 'model.layers.22.mlp.down_proj.quant_scale', 'model.layers.22.self_attn.o_proj.quant_scale', 'model.layers.23.mlp.down_proj.quant_scale', 'model.layers.23.self_attn.o_proj.quant_scale', 'model.layers.24.mlp.down_proj.quant_scale', 'model.layers.24.self_attn.o_proj.quant_scale', 'model.layers.25.mlp.down_proj.quant_scale', 'model.layers.25.self_attn.o_proj.quant_scale', 'model.layers.26.mlp.down_proj.quant_scale', 'model.layers.26.self_attn.o_proj.quant_scale', 'model.layers.27.mlp.down_proj.quant_scale', 'model.layers.27.self_attn.o_proj.quant_scale', 'model.layers.28.mlp.down_proj.quant_scale', 'model.layers.28.self_attn.o_proj.quant_scale', 'model.layers.29.mlp.down_proj.quant_scale', 'model.layers.29.self_attn.o_proj.quant_scale', 'model.layers.3.mlp.down_proj.quant_scale', 'model.layers.3.self_attn.o_proj.quant_scale', 'model.layers.30.mlp.down_proj.quant_scale', 'model.layers.30.self_attn.o_proj.quant_scale', 'model.layers.31.mlp.down_proj.quant_scale', 'model.layers.31.self_attn.o_proj.quant_scale', 'model.layers.4.mlp.down_proj.quant_scale', 'model.layers.4.self_attn.o_proj.quant_scale', 'model.layers.5.mlp.down_proj.quant_scale', 'model.layers.5.self_attn.o_proj.quant_scale', 'model.layers.6.mlp.down_proj.quant_scale', 'model.layers.6.self_attn.o_proj.quant_scale', 'model.layers.7.mlp.down_proj.quant_scale', 'model.layers.7.self_attn.o_proj.quant_scale', 'model.layers.8.mlp.down_proj.quant_scale', 'model.layers.8.self_attn.o_proj.quant_scale', 'model.layers.9.mlp.down_proj.quant_scale', 'model.layers.9.self_attn.o_proj.quant_scale']

This IS expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
File "/data/AutoSmoothQuant/autosmoothquant/examples/test_model.py", line 60, in
main()
File "/data/conda_env/base/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/AutoSmoothQuant/autosmoothquant/examples/test_model.py", line 54, in main
output_ids = model.generate(**inputs, max_new_tokens=20)
File "/data/conda_env/base/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
outputs = self.model(
File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/conda_env/base/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 990, in forward
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position)
File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Int8LlamaModel' object has no attribute '_update_causal_mask'
推理时报错时什么原因呢？该如何解决？

AniZpZ · 2024-07-09T09:01:20Z

可能是transformers的版本问题。建议使用4.36.2版本试试。或者修改llama.py代码来适配你的transforemrs库

AlexMa0 · 2024-07-09T09:37:53Z

Some weights of the model checkpoint at /data/AutoSmoothQuant/quantized_model/llama2-7b-chat/Llama-2-7b-chat-hf-smoothquant were not used when initializing Int8LlamaForCausalLM: ['model.layers.30.mlp.down_proj.quant_scale', 'model.layers.6.self_attn.o_proj.quant_scale', 'model.layers.5.self_attn.o_proj.quant_scale', 'model.layers.12.mlp.down_proj.quant_scale', 'model.layers.29.mlp.down_proj.quant_scale', 'model.layers.19.self_attn.o_proj.quant_scale', 'model.layers.26.self_attn.o_proj.quant_scale', 'model.layers.3.self_attn.o_proj.quant_scale', 'model.layers.10.mlp.down_proj.quant_scale', 'model.layers.23.self_attn.o_proj.quant_scale', 'model.layers.8.mlp.down_proj.quant_scale', 'model.layers.4.mlp.down_proj.quant_scale', 'model.layers.5.mlp.down_proj.quant_scale', 'model.layers.24.self_attn.o_proj.quant_scale', 'model.layers.27.mlp.down_proj.quant_scale', 'model.layers.18.mlp.down_proj.quant_scale', 'model.layers.29.self_attn.o_proj.quant_scale', 'model.layers.28.self_attn.o_proj.quant_scale', 'model.layers.30.self_attn.o_proj.quant_scale', 'model.layers.15.mlp.down_proj.quant_scale', 'model.layers.20.self_attn.o_proj.quant_scale', 'model.layers.21.mlp.down_proj.quant_scale', 'model.layers.9.self_attn.o_proj.quant_scale', 'model.layers.22.mlp.down_proj.quant_scale', 'model.layers.10.self_attn.o_proj.quant_scale', 'model.layers.28.mlp.down_proj.quant_scale', 'model.layers.23.mlp.down_proj.quant_scale', 'model.layers.25.mlp.down_proj.quant_scale', 'model.layers.14.mlp.down_proj.quant_scale', 'model.layers.9.mlp.down_proj.quant_scale', 'model.layers.7.self_attn.o_proj.quant_scale', 'model.layers.27.self_attn.o_proj.quant_scale', 'model.layers.16.mlp.down_proj.quant_scale', 'model.layers.1.self_attn.o_proj.quant_scale', 'model.layers.14.self_attn.o_proj.quant_scale', 'model.layers.31.self_attn.o_proj.quant_scale', 'model.layers.16.self_attn.o_proj.quant_scale', 'model.layers.11.mlp.down_proj.quant_scale', 'model.layers.20.mlp.down_proj.quant_scale', 'model.layers.2.self_attn.o_proj.quant_scale', 'model.layers.24.mlp.down_proj.quant_scale', 'model.layers.18.self_attn.o_proj.quant_scale', 'model.layers.8.self_attn.o_proj.quant_scale', 'model.layers.26.mlp.down_proj.quant_scale', 'model.layers.17.self_attn.o_proj.quant_scale', 'model.layers.17.mlp.down_proj.quant_scale', 'model.layers.2.mlp.down_proj.quant_scale', 'model.layers.22.self_attn.o_proj.quant_scale', 'model.layers.6.mlp.down_proj.quant_scale', 'model.layers.0.mlp.down_proj.quant_scale', 'model.layers.13.self_attn.o_proj.quant_scale', 'model.layers.4.self_attn.o_proj.quant_scale', 'model.layers.11.self_attn.o_proj.quant_scale', 'model.layers.0.self_attn.o_proj.quant_scale', 'model.layers.1.mlp.down_proj.quant_scale', 'model.layers.21.self_attn.o_proj.quant_scale', 'model.layers.7.mlp.down_proj.quant_scale', 'model.layers.12.self_attn.o_proj.quant_scale', 'model.layers.3.mlp.down_proj.quant_scale', 'model.layers.19.mlp.down_proj.quant_scale', 'model.layers.25.self_attn.o_proj.quant_scale', 'model.layers.31.mlp.down_proj.quant_scale', 'model.layers.13.mlp.down_proj.quant_scale', 'model.layers.15.self_attn.o_proj.quant_scale']

This IS expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
["something to say I'keyvalton met im kyaswe, I'keyvalton met New L"]
使用4.36.2版本,感觉这个输出不正常呀，
还有就是，如果不更换transformers版本，我该如何修改llama.py文件呢？

AlexMa0 · 2024-07-09T09:39:56Z

问1+1=？回答是这样：
['1+1=?"- c- c- m.Ъ, I'keyval\n wop.Ъ,']

AniZpZ · 2024-07-09T10:40:47Z

检查calibration数据和过程是否正常
使用per-token而非per-tensor量化

AlexMa0 · 2024-07-10T06:49:17Z

：）hi 大佬
1、使用的数据集是：mit-han-lab/pile-val-backup/val.jsonl.zst
请问校准过程的正不正常的标准是啥呀？
2、我在量化时修改了quantconfig:{
"qkv": "per-token",
"out": "per-token",
"fc1": "per-token",
"fc2": "per-token"
}但是推理的时候依旧是有问题：['1+1=?\n Unterscheidung zwischen 1+1 und 1+2\n\n1+1=2\n']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama2-7b-chat量化完推理报错 #24

llama2-7b-chat量化完推理报错 #24

AlexMa0 commented Jul 9, 2024 •

edited

Loading

AniZpZ commented Jul 9, 2024

AlexMa0 commented Jul 9, 2024

AlexMa0 commented Jul 9, 2024

AniZpZ commented Jul 9, 2024

AlexMa0 commented Jul 10, 2024

llama2-7b-chat量化完推理报错 #24

llama2-7b-chat量化完推理报错 #24

Comments

AlexMa0 commented Jul 9, 2024 • edited Loading

AniZpZ commented Jul 9, 2024

AlexMa0 commented Jul 9, 2024

AlexMa0 commented Jul 9, 2024

AniZpZ commented Jul 9, 2024

AlexMa0 commented Jul 10, 2024

AlexMa0 commented Jul 9, 2024 •

edited

Loading