Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何将PaddleDetection中的ppyoloe_seg导出为ONNX + trtexec转换 #1483

Open
zjykzj opened this issue Jan 21, 2025 · 1 comment
Open

Comments

@zjykzj
Copy link

zjykzj commented Jan 21, 2025

请将下面信息填写完整,便于我们快速解决问题,谢谢!

你好,非常感谢开源PP系列框架。我目前使用PaddleDetection的ppyoloe_seg算法,训练完成后我希望能够转换成ONNX格式然后在jetson盒子上进行trtexec转换和测试,就像ppyoloe算法(https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.8/configs/ppyoloe/README_cn.md)一样:

# 导出模型
python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams exclude_nms=True trt=True

# 转化成ONNX格式
paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_s_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_crn_s_80e_coco.onnx

# 测试速度,半精度,batch_size=1
trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16

# 测试速度,半精度,batch_size=32
trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16

# 使用上边的脚本, 在T4 和 TensorRT 7.2的环境下,PPYOLOE-plus-s模型速度如下
# batch_size=1, 2.80ms, 357fps
# batch_size=32, 67.69ms, 472fps

但是我在ppyoloe_seg算法的README.md文件中没有发现类似的说明(https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.8/configs/ppyoloe_seg/README.md),所以我想要知道如何进行ppyoloe_seg的正常转换,非常感谢!!!

问题描述
请在此处详细的描述报错信息

我尝试着进行ONNX模型转换实现,看起来能过转换ONNX成功

# 导出模型
python tools/export_model.py -c configs/ppyoloe_seg/ppyoloe_seg_s_80e_xfy.yml -o exclude_nms=True
# 转化成ONNX格式
paddle2onnx --model_dir output_inference/ppyoloe_seg_s_80e_xfy --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_seg_s_80e_xfy.onnx

Image

但是最后一步在trtexec上出现了错误:

nvidia@linux:~/zj/paddle$ trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
[01/21/2025-13:49:02] [I] === Model Options ===
[01/21/2025-13:49:02] [I] Format: ONNX
[01/21/2025-13:49:02] [I] Model: ppyoloe_seg_s_80e_xfy.onnx
[01/21/2025-13:49:02] [I] Output:
[01/21/2025-13:49:02] [I] === Build Options ===
[01/21/2025-13:49:02] [I] Max batch: explicit
[01/21/2025-13:49:02] [I] Workspace: 1024 MB
[01/21/2025-13:49:02] [I] minTiming: 1
[01/21/2025-13:49:02] [I] avgTiming: 8
[01/21/2025-13:49:02] [I] Precision: FP32+FP16
[01/21/2025-13:49:02] [I] Calibration:
[01/21/2025-13:49:02] [I] Safe mode: Disabled
[01/21/2025-13:49:02] [I] Save engine: ppyoloe_seg_s_80e_xfy.engine
[01/21/2025-13:49:02] [I] Load engine:
[01/21/2025-13:49:02] [I] Builder Cache: Enabled
[01/21/2025-13:49:02] [I] NVTX verbosity: 0
[01/21/2025-13:49:02] [I] Inputs format: fp32:CHW
[01/21/2025-13:49:02] [I] Outputs format: fp32:CHW
[01/21/2025-13:49:02] [I] Input build shape: image=1x3x640x640+1x3x640x640+1x3x640x640
[01/21/2025-13:49:02] [I] Input build shape: scale_factor=1x2+1x2+1x2
[01/21/2025-13:49:02] [I] Input calibration shapes: model
[01/21/2025-13:49:02] [I] === System Options ===
[01/21/2025-13:49:02] [I] Device: 0
[01/21/2025-13:49:02] [I] DLACore:
[01/21/2025-13:49:02] [I] Plugins:
[01/21/2025-13:49:02] [I] === Inference Options ===
[01/21/2025-13:49:02] [I] Batch: Explicit
[01/21/2025-13:49:02] [I] Input inference shape: scale_factor=1x2
[01/21/2025-13:49:02] [I] Input inference shape: image=1x3x640x640
[01/21/2025-13:49:02] [I] Iterations: 10
[01/21/2025-13:49:02] [I] Duration: 3s (+ 200ms warm up)
[01/21/2025-13:49:02] [I] Sleep time: 0ms
[01/21/2025-13:49:02] [I] Streams: 1
[01/21/2025-13:49:02] [I] ExposeDMA: Disabled
[01/21/2025-13:49:02] [I] Spin-wait: Disabled
[01/21/2025-13:49:02] [I] Multithreading: Disabled
[01/21/2025-13:49:02] [I] CUDA Graph: Disabled
[01/21/2025-13:49:02] [I] Skip inference: Disabled
[01/21/2025-13:49:02] [I] Inputs:
[01/21/2025-13:49:02] [I] === Reporting Options ===
[01/21/2025-13:49:02] [I] Verbose: Disabled
[01/21/2025-13:49:02] [I] Averages: 1000 inferences
[01/21/2025-13:49:02] [I] Percentile: 99
[01/21/2025-13:49:02] [I] Dump output: Disabled
[01/21/2025-13:49:02] [I] Profile: Disabled
[01/21/2025-13:49:02] [I] Export timing to JSON file:
[01/21/2025-13:49:02] [I] Export output to JSON file:
[01/21/2025-13:49:02] [I] Export profile to JSON file:
[01/21/2025-13:49:02] [I]
----------------------------------------------------------------
Input filename:   ppyoloe_seg_s_80e_xfy.onnx
ONNX IR version:  0.0.7
Opset version:    13
Producer name:
Producer version:
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[01/21/2025-13:49:08] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: axes
Aborted
nvidia@linux:~/zj/paddle$
nvidia@linux:~/zj/paddle$
nvidia@linux:~/zj/paddle$ trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine
[01/21/2025-13:49:27] [I] === Model Options ===
[01/21/2025-13:49:27] [I] Format: ONNX
[01/21/2025-13:49:27] [I] Model: ppyoloe_seg_s_80e_xfy.onnx
[01/21/2025-13:49:27] [I] Output:
[01/21/2025-13:49:27] [I] === Build Options ===
[01/21/2025-13:49:27] [I] Max batch: 1
[01/21/2025-13:49:27] [I] Workspace: 16 MB
[01/21/2025-13:49:27] [I] minTiming: 1
[01/21/2025-13:49:27] [I] avgTiming: 8
[01/21/2025-13:49:27] [I] Precision: FP32
[01/21/2025-13:49:27] [I] Calibration:
[01/21/2025-13:49:27] [I] Safe mode: Disabled
[01/21/2025-13:49:27] [I] Save engine: ppyoloe_seg_s_80e_xfy.engine
[01/21/2025-13:49:27] [I] Load engine:
[01/21/2025-13:49:27] [I] Builder Cache: Enabled
[01/21/2025-13:49:27] [I] NVTX verbosity: 0
[01/21/2025-13:49:27] [I] Inputs format: fp32:CHW
[01/21/2025-13:49:27] [I] Outputs format: fp32:CHW
[01/21/2025-13:49:27] [I] Input build shapes: model
[01/21/2025-13:49:27] [I] Input calibration shapes: model
[01/21/2025-13:49:27] [I] === System Options ===
[01/21/2025-13:49:27] [I] Device: 0
[01/21/2025-13:49:27] [I] DLACore:
[01/21/2025-13:49:27] [I] Plugins:
[01/21/2025-13:49:27] [I] === Inference Options ===
[01/21/2025-13:49:27] [I] Batch: 1
[01/21/2025-13:49:27] [I] Input inference shapes: model
[01/21/2025-13:49:27] [I] Iterations: 10
[01/21/2025-13:49:27] [I] Duration: 3s (+ 200ms warm up)
[01/21/2025-13:49:27] [I] Sleep time: 0ms
[01/21/2025-13:49:27] [I] Streams: 1
[01/21/2025-13:49:27] [I] ExposeDMA: Disabled
[01/21/2025-13:49:27] [I] Spin-wait: Disabled
[01/21/2025-13:49:27] [I] Multithreading: Disabled
[01/21/2025-13:49:27] [I] CUDA Graph: Disabled
[01/21/2025-13:49:27] [I] Skip inference: Disabled
[01/21/2025-13:49:27] [I] Inputs:
[01/21/2025-13:49:27] [I] === Reporting Options ===
[01/21/2025-13:49:27] [I] Verbose: Disabled
[01/21/2025-13:49:27] [I] Averages: 10 inferences
[01/21/2025-13:49:27] [I] Percentile: 99
[01/21/2025-13:49:27] [I] Dump output: Disabled
[01/21/2025-13:49:27] [I] Profile: Disabled
[01/21/2025-13:49:27] [I] Export timing to JSON file:
[01/21/2025-13:49:27] [I] Export output to JSON file:
[01/21/2025-13:49:27] [I] Export profile to JSON file:
[01/21/2025-13:49:27] [I]
----------------------------------------------------------------
Input filename:   ppyoloe_seg_s_80e_xfy.onnx
ONNX IR version:  0.0.7
Opset version:    13
Producer name:
Producer version:
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[01/21/2025-13:49:29] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: axes
Aborted
@zjykzj
Copy link
Author

zjykzj commented Feb 6, 2025

我在PaddleDetection同样提问了类似问题:https://github.com/PaddlePaddle/PaddleDetection/issues/9289。

之前在GPU服务器上进行PaddleDetection ppyoloe_seg目标分割算法的训练/评估/转换实现都是在paddle 2.6.2环境下。经过实验,我使用paddle 3.0版本容器(paddlepaddle/paddle:3.0.0b1-gpu-cuda11.8-cudnn8.6-trt8.5),可以成功转换成ONNX格式,

λ b2a8e6f217f3 /data/zj/paddle/PaddleDetection python tools/export_model.py -c configs/ppyoloe_seg/ppyoloe_seg_s_80e_xfy.yml -o exclude_nms=True
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
[02/05 10:20:18] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_seg_s_80e_xfy/model_final.pdparams
loading annotations into memory...
Done (t=1.22s)
creating index...
index created!
[02/05 10:20:19] ppdet.engine INFO: Export inference config file to output_inference/ppyoloe_seg_s_80e_xfy/infer_cfg.yml
I0205 10:20:24.771257   104 program_interpreter.cc:243] New Executor is Running.
[02/05 10:20:25] ppdet.engine INFO: Export model and saved in output_inference/ppyoloe_seg_s_80e_xfy


λ b2a8e6f217f3 /data/zj/paddle/PaddleDetection paddle2onnx --model_dir output_inference/ppyoloe_seg_s_80e_xfy --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 13 --save_file ppyoloe_seg_s_80e_xfy.onnx
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[Paddle2ONNX] Start to parse PaddlePaddle model...
[Paddle2ONNX] Model file path: output_inference/ppyoloe_seg_s_80e_xfy/model.pdmodel
[Paddle2ONNX] Parameters file path: output_inference/ppyoloe_seg_s_80e_xfy/model.pdiparams
[Paddle2ONNX] Start to parsing Paddle model...
[Paddle2ONNX] [reduce_mean: mean_0.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [reduce_mean: mean_1.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [reduce_mean: mean_2.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [reduce_mean: mean_3.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [multiclass_nms3: multiclass_nms3_0.tmp_1] Requires the minimal opset version of 10.
[Paddle2ONNX] [reduce_sum: sum_0.tmp_0] Requires the minimal opset version of 13.
[Paddle2ONNX] Detected there's control flow op('conditional_block/select_input') in your model, this requires the minimal opset version of 11.
[Paddle2ONNX] Detected there's control flow op('conditional_block/select_input') in your model, this requires the minimal opset version of 11.
[Paddle2ONNX] Detected there's control flow op('conditional_block/select_input') in your model, this requires the minimal opset version of 11.
[Paddle2ONNX] [gather: gather_0.tmp_0] While rank of index is 2, Requires the minimal opset version of 11.
[Paddle2ONNX] [range: range_0.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [range: range_1.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [round: round_0.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [round: round_1.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [slice: bilinear_interp_v2_1.tmp_0_slice_0] While has input StartsTensorList/EndsTensorListStridesTensorList, Requires the minimal opset version of 10.
[Paddle2ONNX] Use opset_version = 13 for ONNX export.
[WARN][Paddle2ONNX] [multiclass_nms3: multiclass_nms3_0.tmp_1] [WARNING] Due to the operator multiclass_nms3, the exported ONNX model will only supports inference with input batch_size == 1.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.

但是在NVIDIA XAVIER边缘端转换ONNX -> TensorRT格式报错???求教!!!

nvidia@linux:~/zj/paddle$ trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
[02/05/2025-18:28:15] [I] === Model Options ===
[02/05/2025-18:28:15] [I] Format: ONNX
[02/05/2025-18:28:15] [I] Model: ppyoloe_seg_s_80e_xfy.onnx
[02/05/2025-18:28:15] [I] Output:
[02/05/2025-18:28:15] [I] === Build Options ===
[02/05/2025-18:28:15] [I] Max batch: explicit
[02/05/2025-18:28:15] [I] Workspace: 1024 MB
[02/05/2025-18:28:15] [I] minTiming: 1
[02/05/2025-18:28:15] [I] avgTiming: 8
[02/05/2025-18:28:15] [I] Precision: FP32+FP16
[02/05/2025-18:28:15] [I] Calibration:
[02/05/2025-18:28:15] [I] Safe mode: Disabled
[02/05/2025-18:28:15] [I] Save engine: ppyoloe_seg_s_80e_xfy.engine
[02/05/2025-18:28:15] [I] Load engine:
[02/05/2025-18:28:15] [I] Builder Cache: Enabled
[02/05/2025-18:28:15] [I] NVTX verbosity: 0
[02/05/2025-18:28:15] [I] Inputs format: fp32:CHW
[02/05/2025-18:28:15] [I] Outputs format: fp32:CHW
[02/05/2025-18:28:15] [I] Input build shape: image=1x3x640x640+1x3x640x640+1x3x640x640
[02/05/2025-18:28:15] [I] Input build shape: scale_factor=1x2+1x2+1x2
[02/05/2025-18:28:15] [I] Input calibration shapes: model
[02/05/2025-18:28:15] [I] === System Options ===
[02/05/2025-18:28:15] [I] Device: 0
[02/05/2025-18:28:15] [I] DLACore:
[02/05/2025-18:28:15] [I] Plugins:
[02/05/2025-18:28:15] [I] === Inference Options ===
[02/05/2025-18:28:15] [I] Batch: Explicit
[02/05/2025-18:28:15] [I] Input inference shape: scale_factor=1x2
[02/05/2025-18:28:15] [I] Input inference shape: image=1x3x640x640
[02/05/2025-18:28:15] [I] Iterations: 10
[02/05/2025-18:28:15] [I] Duration: 3s (+ 200ms warm up)
[02/05/2025-18:28:15] [I] Sleep time: 0ms
[02/05/2025-18:28:15] [I] Streams: 1
[02/05/2025-18:28:15] [I] ExposeDMA: Disabled
[02/05/2025-18:28:15] [I] Spin-wait: Disabled
[02/05/2025-18:28:15] [I] Multithreading: Disabled
[02/05/2025-18:28:15] [I] CUDA Graph: Disabled
[02/05/2025-18:28:15] [I] Skip inference: Disabled
[02/05/2025-18:28:15] [I] Inputs:
[02/05/2025-18:28:15] [I] === Reporting Options ===
[02/05/2025-18:28:15] [I] Verbose: Disabled
[02/05/2025-18:28:15] [I] Averages: 1000 inferences
[02/05/2025-18:28:15] [I] Percentile: 99
[02/05/2025-18:28:15] [I] Dump output: Disabled
[02/05/2025-18:28:15] [I] Profile: Disabled
[02/05/2025-18:28:15] [I] Export timing to JSON file:
[02/05/2025-18:28:15] [I] Export output to JSON file:
[02/05/2025-18:28:15] [I] Export profile to JSON file:
[02/05/2025-18:28:15] [I]
----------------------------------------------------------------
Input filename:   ppyoloe_seg_s_80e_xfy.onnx
ONNX IR version:  0.0.7
Opset version:    13
Producer name:
Producer version:
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[02/05/2025-18:28:18] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: axes
Aborted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant