Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to export model by using --device npu --provider QNNExecutionProvider #1595

Open
huanji-sun-007 opened this issue Feb 5, 2025 · 1 comment

Comments

@huanji-sun-007
Copy link

huanji-sun-007 commented Feb 5, 2025

Describe the bug
Hi
I am trying to quantize and export a fine-tuned microsoft/phi-3.5-mini-instruct model.
I was able to export it by using --device cpu --provider CPUExecutionProvider.
However when I try to export it by using --device npu --provider QNNExecutionProvider I got the following error.

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 2729203422

To Reproduce
Steps to reproduce the behavior.

olive quantize \
   --model_name_or_path microsoft/Phi-3.5-mini-instruct \
   --trust_remote_code \
   --algorithm awq \
   --output_path outputs/models/awq \
   --log_level 1

olive auto-opt \
   --model_name_or_path outputs/models/awq/model \
   --output_path outputs/models/onnx-quant \
   --device npu \
   --provider QNNExecutionProvider \
   --dynamic-to-fixed-shape-dim-param batch \
   --dynamic-to-fixed-shape-dim-value 1 \
   --precision int4 \
   --batch_size 1 \
   --use_ort_genai \
   --log_level 1

Expected behavior
Be able to export onnx model by using --device npu --provider QNNExecutionProvider

Olive config

olive-ai[ort-genai,auto-opt]==0.7.1.1
autoawq==0.2.7.post2
auto-gptq==0.7.1
transformers==4.44.2
optimum==1.23.1
peft==0.13.2
accelerate==1.1.1
scipy==1.14.1
onnxruntime-genai==0.5.0
torchvision==0.18.1
tabulate==0.9.0
onnxruntime-genai-cuda==0.5.0
torch==2.3.1
torchvision==0.18.1

Olive logs

Traceback (most recent call last):
  File "/opt/conda/envs/ptca/bin/olive", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/cli/launcher.py", line 62, in main
    service.run()
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/cli/auto_opt.py", line 183, in run
    olive_run(run_config)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/workflows/run/run.py", line 317, in run
    return run_engine(package_config, run_config)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/workflows/run/run.py", line 259, in run_engine
    engine.run(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/engine/engine.py", line 252, in run
    run_result = self.run_accelerator(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/engine/engine.py", line 330, in run_accelerator
    output_footprint = self.run_no_search(input_model_config, input_model_id, accelerator_spec, output_dir)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/engine/engine.py", line 400, in run_no_search
    should_prune, signal, model_ids = self._run_passes(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/engine/engine.py", line 664, in _run_passes
    model_config, model_id = self._run_pass(
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/engine/engine.py", line 764, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path, pass_search_point)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/systems/local.py", line 30, in run_pass
    output_model = the_pass.run(model, output_model_path, point)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/passes/olive_pass.py", line 245, in run
    output_model = self._run_for_config(model, config, output_model_path)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/olive/passes/onnx/dynamic_to_fixed_shape.py", line 81, in _run_for_config
    fix_output_shapes(onnx_model)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/onnxruntime/tools/onnx_model_utils.py", line 242, in fix_output_shapes
    m2 = onnx.shape_inference.infer_shapes(model)
  File "/opt/conda/envs/ptca/lib/python3.10/site-packages/onnx/shape_inference.py", line 45, in infer_shapes
    model_str = model if isinstance(model, bytes) else model.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 2729203422

Other information

  • OS: Linux
  • Olive version: 0.7.1.1
  • ONNXRuntime package and version: onnxruntime-genai-cuda==0.5.0
  • Transformers package version: 4.44.2

Additional context
I found a similar issue here, which is also an ONNX conversion issue when the file size exceeds 2GB.
#1165

@jambayk
Copy link
Contributor

jambayk commented Feb 6, 2025

Thanks for reporting the bug! I created #1600 to fix this.

Please note that the --provider QNNExecutionProvider option currently doesn't produce a model compatible with the qnn ep. It involves a more complicated workflow that we are actively testing. I will let you know once we have a working example. It will take some more time after that to bring the changes into the auto-opt tool.

jambayk added a commit that referenced this issue Feb 10, 2025
…1600)

## Describe your changes
- `onnxruntime.tools.onnx_model_utils.fix_output_shapes` cannot handle
large models (#1595), so we use the ort shape infer helper and handle
the logic ourselves. This also means it can now handle models with
contrib operators too.
- allow passing 0 as dim_value. this case is possible when creating a
prompt processing model from a dynamic shaped llm where we want to make
the past kv cache empty.

## Checklist before requesting a review
- [ ] Add unit tests for this change.
- [ ] Make sure all tests can pass.
- [ ] Update documents if necessary.
- [ ] Lint and apply fixes to your code by running `lintrunner -a`
- [ ] Is this a user-facing change? If yes, give a description of this
change to be included in the release notes.
- [ ] Is this PR including examples changes? If yes, please remember to
update [example
documentation](https://github.com/microsoft/Olive/blob/main/docs/source/examples.md)
in a follow-up PR.

## (Optional) Issue link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants