Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama2-7b-chat量化完推理报错 #24

Open
AlexMa0 opened this issue Jul 9, 2024 · 5 comments
Open

llama2-7b-chat量化完推理报错 #24

AlexMa0 opened this issue Jul 9, 2024 · 5 comments

Comments

@AlexMa0
Copy link

AlexMa0 commented Jul 9, 2024

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:42<00:00, 21.33s/it]
Some weights of the model checkpoint at /data/AutoSmoothQuant/quantized_model/llama2-7b-chat/Llama-2-7b-chat-hf-smoothquant were not used when initializing Int8LlamaForCausalLM: ['model.layers.0.mlp.down_proj.quant_scale', 'model.layers.0.self_attn.o_proj.quant_scale', 'model.layers.1.mlp.down_proj.quant_scale', 'model.layers.1.self_attn.o_proj.quant_scale', 'model.layers.10.mlp.down_proj.quant_scale', 'model.layers.10.self_attn.o_proj.quant_scale', 'model.layers.11.mlp.down_proj.quant_scale', 'model.layers.11.self_attn.o_proj.quant_scale', 'model.layers.12.mlp.down_proj.quant_scale', 'model.layers.12.self_attn.o_proj.quant_scale', 'model.layers.13.mlp.down_proj.quant_scale', 'model.layers.13.self_attn.o_proj.quant_scale', 'model.layers.14.mlp.down_proj.quant_scale', 'model.layers.14.self_attn.o_proj.quant_scale', 'model.layers.15.mlp.down_proj.quant_scale', 'model.layers.15.self_attn.o_proj.quant_scale', 'model.layers.16.mlp.down_proj.quant_scale', 'model.layers.16.self_attn.o_proj.quant_scale', 'model.layers.17.mlp.down_proj.quant_scale', 'model.layers.17.self_attn.o_proj.quant_scale', 'model.layers.18.mlp.down_proj.quant_scale', 'model.layers.18.self_attn.o_proj.quant_scale', 'model.layers.19.mlp.down_proj.quant_scale', 'model.layers.19.self_attn.o_proj.quant_scale', 'model.layers.2.mlp.down_proj.quant_scale', 'model.layers.2.self_attn.o_proj.quant_scale', 'model.layers.20.mlp.down_proj.quant_scale', 'model.layers.20.self_attn.o_proj.quant_scale', 'model.layers.21.mlp.down_proj.quant_scale', 'model.layers.21.self_attn.o_proj.quant_scale', 'model.layers.22.mlp.down_proj.quant_scale', 'model.layers.22.self_attn.o_proj.quant_scale', 'model.layers.23.mlp.down_proj.quant_scale', 'model.layers.23.self_attn.o_proj.quant_scale', 'model.layers.24.mlp.down_proj.quant_scale', 'model.layers.24.self_attn.o_proj.quant_scale', 'model.layers.25.mlp.down_proj.quant_scale', 'model.layers.25.self_attn.o_proj.quant_scale', 'model.layers.26.mlp.down_proj.quant_scale', 'model.layers.26.self_attn.o_proj.quant_scale', 'model.layers.27.mlp.down_proj.quant_scale', 'model.layers.27.self_attn.o_proj.quant_scale', 'model.layers.28.mlp.down_proj.quant_scale', 'model.layers.28.self_attn.o_proj.quant_scale', 'model.layers.29.mlp.down_proj.quant_scale', 'model.layers.29.self_attn.o_proj.quant_scale', 'model.layers.3.mlp.down_proj.quant_scale', 'model.layers.3.self_attn.o_proj.quant_scale', 'model.layers.30.mlp.down_proj.quant_scale', 'model.layers.30.self_attn.o_proj.quant_scale', 'model.layers.31.mlp.down_proj.quant_scale', 'model.layers.31.self_attn.o_proj.quant_scale', 'model.layers.4.mlp.down_proj.quant_scale', 'model.layers.4.self_attn.o_proj.quant_scale', 'model.layers.5.mlp.down_proj.quant_scale', 'model.layers.5.self_attn.o_proj.quant_scale', 'model.layers.6.mlp.down_proj.quant_scale', 'model.layers.6.self_attn.o_proj.quant_scale', 'model.layers.7.mlp.down_proj.quant_scale', 'model.layers.7.self_attn.o_proj.quant_scale', 'model.layers.8.mlp.down_proj.quant_scale', 'model.layers.8.self_attn.o_proj.quant_scale', 'model.layers.9.mlp.down_proj.quant_scale', 'model.layers.9.self_attn.o_proj.quant_scale']

  • This IS expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Traceback (most recent call last):
    File "/data/AutoSmoothQuant/autosmoothquant/examples/test_model.py", line 60, in
    main()
    File "/data/conda_env/base/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
    File "/data/AutoSmoothQuant/autosmoothquant/examples/test_model.py", line 54, in main
    output_ids = model.generate(**inputs, max_new_tokens=20)
    File "/data/conda_env/base/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
    File "/data/conda_env/base/lib/python3.9/site-packages/transformers/generation/utils.py", line 1527, in generate
    result = self._greedy_search(
    File "/data/conda_env/base/lib/python3.9/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
    outputs = self(
    File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    File "/data/conda_env/base/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward
    outputs = self.model(
    File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
    File "/data/conda_env/base/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 990, in forward
    causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position)
    File "/data/conda_env/base/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in getattr
    raise AttributeError("'{}' object has no attribute '{}'".format(
    AttributeError: 'Int8LlamaModel' object has no attribute '_update_causal_mask'
    推理时报错时什么原因呢?该如何解决?
@AniZpZ
Copy link
Owner

AniZpZ commented Jul 9, 2024

可能是transformers的版本问题。 建议使用4.36.2版本试试。 或者修改llama.py代码来适配你的transforemrs库

@AlexMa0
Copy link
Author

AlexMa0 commented Jul 9, 2024

Some weights of the model checkpoint at /data/AutoSmoothQuant/quantized_model/llama2-7b-chat/Llama-2-7b-chat-hf-smoothquant were not used when initializing Int8LlamaForCausalLM: ['model.layers.30.mlp.down_proj.quant_scale', 'model.layers.6.self_attn.o_proj.quant_scale', 'model.layers.5.self_attn.o_proj.quant_scale', 'model.layers.12.mlp.down_proj.quant_scale', 'model.layers.29.mlp.down_proj.quant_scale', 'model.layers.19.self_attn.o_proj.quant_scale', 'model.layers.26.self_attn.o_proj.quant_scale', 'model.layers.3.self_attn.o_proj.quant_scale', 'model.layers.10.mlp.down_proj.quant_scale', 'model.layers.23.self_attn.o_proj.quant_scale', 'model.layers.8.mlp.down_proj.quant_scale', 'model.layers.4.mlp.down_proj.quant_scale', 'model.layers.5.mlp.down_proj.quant_scale', 'model.layers.24.self_attn.o_proj.quant_scale', 'model.layers.27.mlp.down_proj.quant_scale', 'model.layers.18.mlp.down_proj.quant_scale', 'model.layers.29.self_attn.o_proj.quant_scale', 'model.layers.28.self_attn.o_proj.quant_scale', 'model.layers.30.self_attn.o_proj.quant_scale', 'model.layers.15.mlp.down_proj.quant_scale', 'model.layers.20.self_attn.o_proj.quant_scale', 'model.layers.21.mlp.down_proj.quant_scale', 'model.layers.9.self_attn.o_proj.quant_scale', 'model.layers.22.mlp.down_proj.quant_scale', 'model.layers.10.self_attn.o_proj.quant_scale', 'model.layers.28.mlp.down_proj.quant_scale', 'model.layers.23.mlp.down_proj.quant_scale', 'model.layers.25.mlp.down_proj.quant_scale', 'model.layers.14.mlp.down_proj.quant_scale', 'model.layers.9.mlp.down_proj.quant_scale', 'model.layers.7.self_attn.o_proj.quant_scale', 'model.layers.27.self_attn.o_proj.quant_scale', 'model.layers.16.mlp.down_proj.quant_scale', 'model.layers.1.self_attn.o_proj.quant_scale', 'model.layers.14.self_attn.o_proj.quant_scale', 'model.layers.31.self_attn.o_proj.quant_scale', 'model.layers.16.self_attn.o_proj.quant_scale', 'model.layers.11.mlp.down_proj.quant_scale', 'model.layers.20.mlp.down_proj.quant_scale', 'model.layers.2.self_attn.o_proj.quant_scale', 'model.layers.24.mlp.down_proj.quant_scale', 'model.layers.18.self_attn.o_proj.quant_scale', 'model.layers.8.self_attn.o_proj.quant_scale', 'model.layers.26.mlp.down_proj.quant_scale', 'model.layers.17.self_attn.o_proj.quant_scale', 'model.layers.17.mlp.down_proj.quant_scale', 'model.layers.2.mlp.down_proj.quant_scale', 'model.layers.22.self_attn.o_proj.quant_scale', 'model.layers.6.mlp.down_proj.quant_scale', 'model.layers.0.mlp.down_proj.quant_scale', 'model.layers.13.self_attn.o_proj.quant_scale', 'model.layers.4.self_attn.o_proj.quant_scale', 'model.layers.11.self_attn.o_proj.quant_scale', 'model.layers.0.self_attn.o_proj.quant_scale', 'model.layers.1.mlp.down_proj.quant_scale', 'model.layers.21.self_attn.o_proj.quant_scale', 'model.layers.7.mlp.down_proj.quant_scale', 'model.layers.12.self_attn.o_proj.quant_scale', 'model.layers.3.mlp.down_proj.quant_scale', 'model.layers.19.mlp.down_proj.quant_scale', 'model.layers.25.self_attn.o_proj.quant_scale', 'model.layers.31.mlp.down_proj.quant_scale', 'model.layers.13.mlp.down_proj.quant_scale', 'model.layers.15.self_attn.o_proj.quant_scale']

  • This IS expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing Int8LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    ["something to say I'keyvalton met im kyaswe, I'keyvalton met New L"]
    使用4.36.2版本,感觉这个输出不正常呀,
    还有就是,如果不更换transformers版本,我该如何修改llama.py文件呢?

@AlexMa0
Copy link
Author

AlexMa0 commented Jul 9, 2024

问1+1=?回答是这样:
['1+1=?"- c- c- m.Ъ, I'keyval\n wop.Ъ,']

@AniZpZ
Copy link
Owner

AniZpZ commented Jul 9, 2024

  1. 检查calibration数据和过程是否正常
  2. 使用per-token而非per-tensor量化

@AlexMa0
Copy link
Author

AlexMa0 commented Jul 10, 2024

:)hi 大佬
1、使用的数据集是:mit-han-lab/pile-val-backup/val.jsonl.zst
请问校准过程的正不正常的标准是啥呀?
2、我在量化时修改了quantconfig:{
"qkv": "per-token",
"out": "per-token",
"fc1": "per-token",
"fc2": "per-token"
}但是推理的时候依旧是有问题:['1+1=?\n Unterscheidung zwischen 1+1 und 1+2\n\n1+1=2\n']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants