如何将PaddleDetection中的ppyoloe_seg导出为ONNX + trtexec转换 #1483

zjykzj · 2025-01-21T05:59:43Z

请将下面信息填写完整，便于我们快速解决问题，谢谢！

你好，非常感谢开源PP系列框架。我目前使用PaddleDetection的ppyoloe_seg算法，训练完成后我希望能够转换成ONNX格式然后在jetson盒子上进行trtexec转换和测试，就像ppyoloe算法（https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.8/configs/ppyoloe/README_cn.md）一样：

# 导出模型
python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams exclude_nms=True trt=True

# 转化成ONNX格式
paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_s_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_crn_s_80e_coco.onnx

# 测试速度，半精度，batch_size=1
trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16

# 测试速度，半精度，batch_size=32
trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16

# 使用上边的脚本, 在T4 和 TensorRT 7.2的环境下，PPYOLOE-plus-s模型速度如下
# batch_size=1, 2.80ms, 357fps
# batch_size=32, 67.69ms, 472fps

但是我在ppyoloe_seg算法的README.md文件中没有发现类似的说明（https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.8/configs/ppyoloe_seg/README.md），所以我想要知道如何进行ppyoloe_seg的正常转换，非常感谢!!!

问题描述
请在此处详细的描述报错信息

我尝试着进行ONNX模型转换实现，看起来能过转换ONNX成功

# 导出模型
python tools/export_model.py -c configs/ppyoloe_seg/ppyoloe_seg_s_80e_xfy.yml -o exclude_nms=True
# 转化成ONNX格式
paddle2onnx --model_dir output_inference/ppyoloe_seg_s_80e_xfy --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_seg_s_80e_xfy.onnx

但是最后一步在trtexec上出现了错误:

nvidia@linux:~/zj/paddle$ trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
[01/21/2025-13:49:02] [I] === Model Options ===
[01/21/2025-13:49:02] [I] Format: ONNX
[01/21/2025-13:49:02] [I] Model: ppyoloe_seg_s_80e_xfy.onnx
[01/21/2025-13:49:02] [I] Output:
[01/21/2025-13:49:02] [I] === Build Options ===
[01/21/2025-13:49:02] [I] Max batch: explicit
[01/21/2025-13:49:02] [I] Workspace: 1024 MB
[01/21/2025-13:49:02] [I] minTiming: 1
[01/21/2025-13:49:02] [I] avgTiming: 8
[01/21/2025-13:49:02] [I] Precision: FP32+FP16
[01/21/2025-13:49:02] [I] Calibration:
[01/21/2025-13:49:02] [I] Safe mode: Disabled
[01/21/2025-13:49:02] [I] Save engine: ppyoloe_seg_s_80e_xfy.engine
[01/21/2025-13:49:02] [I] Load engine:
[01/21/2025-13:49:02] [I] Builder Cache: Enabled
[01/21/2025-13:49:02] [I] NVTX verbosity: 0
[01/21/2025-13:49:02] [I] Inputs format: fp32:CHW
[01/21/2025-13:49:02] [I] Outputs format: fp32:CHW
[01/21/2025-13:49:02] [I] Input build shape: image=1x3x640x640+1x3x640x640+1x3x640x640
[01/21/2025-13:49:02] [I] Input build shape: scale_factor=1x2+1x2+1x2
[01/21/2025-13:49:02] [I] Input calibration shapes: model
[01/21/2025-13:49:02] [I] === System Options ===
[01/21/2025-13:49:02] [I] Device: 0
[01/21/2025-13:49:02] [I] DLACore:
[01/21/2025-13:49:02] [I] Plugins:
[01/21/2025-13:49:02] [I] === Inference Options ===
[01/21/2025-13:49:02] [I] Batch: Explicit
[01/21/2025-13:49:02] [I] Input inference shape: scale_factor=1x2
[01/21/2025-13:49:02] [I] Input inference shape: image=1x3x640x640
[01/21/2025-13:49:02] [I] Iterations: 10
[01/21/2025-13:49:02] [I] Duration: 3s (+ 200ms warm up)
[01/21/2025-13:49:02] [I] Sleep time: 0ms
[01/21/2025-13:49:02] [I] Streams: 1
[01/21/2025-13:49:02] [I] ExposeDMA: Disabled
[01/21/2025-13:49:02] [I] Spin-wait: Disabled
[01/21/2025-13:49:02] [I] Multithreading: Disabled
[01/21/2025-13:49:02] [I] CUDA Graph: Disabled
[01/21/2025-13:49:02] [I] Skip inference: Disabled
[01/21/2025-13:49:02] [I] Inputs:
[01/21/2025-13:49:02] [I] === Reporting Options ===
[01/21/2025-13:49:02] [I] Verbose: Disabled
[01/21/2025-13:49:02] [I] Averages: 1000 inferences
[01/21/2025-13:49:02] [I] Percentile: 99
[01/21/2025-13:49:02] [I] Dump output: Disabled
[01/21/2025-13:49:02] [I] Profile: Disabled
[01/21/2025-13:49:02] [I] Export timing to JSON file:
[01/21/2025-13:49:02] [I] Export output to JSON file:
[01/21/2025-13:49:02] [I] Export profile to JSON file:
[01/21/2025-13:49:02] [I]
----------------------------------------------------------------
Input filename:   ppyoloe_seg_s_80e_xfy.onnx
ONNX IR version:  0.0.7
Opset version:    13
Producer name:
Producer version:
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[01/21/2025-13:49:08] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: axes
Aborted
nvidia@linux:~/zj/paddle$
nvidia@linux:~/zj/paddle$
nvidia@linux:~/zj/paddle$ trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine
[01/21/2025-13:49:27] [I] === Model Options ===
[01/21/2025-13:49:27] [I] Format: ONNX
[01/21/2025-13:49:27] [I] Model: ppyoloe_seg_s_80e_xfy.onnx
[01/21/2025-13:49:27] [I] Output:
[01/21/2025-13:49:27] [I] === Build Options ===
[01/21/2025-13:49:27] [I] Max batch: 1
[01/21/2025-13:49:27] [I] Workspace: 16 MB
[01/21/2025-13:49:27] [I] minTiming: 1
[01/21/2025-13:49:27] [I] avgTiming: 8
[01/21/2025-13:49:27] [I] Precision: FP32
[01/21/2025-13:49:27] [I] Calibration:
[01/21/2025-13:49:27] [I] Safe mode: Disabled
[01/21/2025-13:49:27] [I] Save engine: ppyoloe_seg_s_80e_xfy.engine
[01/21/2025-13:49:27] [I] Load engine:
[01/21/2025-13:49:27] [I] Builder Cache: Enabled
[01/21/2025-13:49:27] [I] NVTX verbosity: 0
[01/21/2025-13:49:27] [I] Inputs format: fp32:CHW
[01/21/2025-13:49:27] [I] Outputs format: fp32:CHW
[01/21/2025-13:49:27] [I] Input build shapes: model
[01/21/2025-13:49:27] [I] Input calibration shapes: model
[01/21/2025-13:49:27] [I] === System Options ===
[01/21/2025-13:49:27] [I] Device: 0
[01/21/2025-13:49:27] [I] DLACore:
[01/21/2025-13:49:27] [I] Plugins:
[01/21/2025-13:49:27] [I] === Inference Options ===
[01/21/2025-13:49:27] [I] Batch: 1
[01/21/2025-13:49:27] [I] Input inference shapes: model
[01/21/2025-13:49:27] [I] Iterations: 10
[01/21/2025-13:49:27] [I] Duration: 3s (+ 200ms warm up)
[01/21/2025-13:49:27] [I] Sleep time: 0ms
[01/21/2025-13:49:27] [I] Streams: 1
[01/21/2025-13:49:27] [I] ExposeDMA: Disabled
[01/21/2025-13:49:27] [I] Spin-wait: Disabled
[01/21/2025-13:49:27] [I] Multithreading: Disabled
[01/21/2025-13:49:27] [I] CUDA Graph: Disabled
[01/21/2025-13:49:27] [I] Skip inference: Disabled
[01/21/2025-13:49:27] [I] Inputs:
[01/21/2025-13:49:27] [I] === Reporting Options ===
[01/21/2025-13:49:27] [I] Verbose: Disabled
[01/21/2025-13:49:27] [I] Averages: 10 inferences
[01/21/2025-13:49:27] [I] Percentile: 99
[01/21/2025-13:49:27] [I] Dump output: Disabled
[01/21/2025-13:49:27] [I] Profile: Disabled
[01/21/2025-13:49:27] [I] Export timing to JSON file:
[01/21/2025-13:49:27] [I] Export output to JSON file:
[01/21/2025-13:49:27] [I] Export profile to JSON file:
[01/21/2025-13:49:27] [I]
----------------------------------------------------------------
Input filename:   ppyoloe_seg_s_80e_xfy.onnx
ONNX IR version:  0.0.7
Opset version:    13
Producer name:
Producer version:
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[01/21/2025-13:49:29] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: axes
Aborted

The text was updated successfully, but these errors were encountered:

zjykzj · 2025-02-06T07:15:47Z

我在PaddleDetection同样提问了类似问题：https://github.com/PaddlePaddle/PaddleDetection/issues/9289。

之前在GPU服务器上进行PaddleDetection ppyoloe_seg目标分割算法的训练/评估/转换实现都是在paddle 2.6.2环境下。经过实验，我使用paddle 3.0版本容器（paddlepaddle/paddle:3.0.0b1-gpu-cuda11.8-cudnn8.6-trt8.5），可以成功转换成ONNX格式，

λ b2a8e6f217f3 /data/zj/paddle/PaddleDetection python tools/export_model.py -c configs/ppyoloe_seg/ppyoloe_seg_s_80e_xfy.yml -o exclude_nms=True
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): `pip install numba==0.56.4`
Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly
[02/05 10:20:18] ppdet.utils.checkpoint INFO: Finish loading model weights: output/ppyoloe_seg_s_80e_xfy/model_final.pdparams
loading annotations into memory...
Done (t=1.22s)
creating index...
index created!
[02/05 10:20:19] ppdet.engine INFO: Export inference config file to output_inference/ppyoloe_seg_s_80e_xfy/infer_cfg.yml
I0205 10:20:24.771257   104 program_interpreter.cc:243] New Executor is Running.
[02/05 10:20:25] ppdet.engine INFO: Export model and saved in output_inference/ppyoloe_seg_s_80e_xfy


λ b2a8e6f217f3 /data/zj/paddle/PaddleDetection paddle2onnx --model_dir output_inference/ppyoloe_seg_s_80e_xfy --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 13 --save_file ppyoloe_seg_s_80e_xfy.onnx
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[Paddle2ONNX] Start to parse PaddlePaddle model...
[Paddle2ONNX] Model file path: output_inference/ppyoloe_seg_s_80e_xfy/model.pdmodel
[Paddle2ONNX] Parameters file path: output_inference/ppyoloe_seg_s_80e_xfy/model.pdiparams
[Paddle2ONNX] Start to parsing Paddle model...
[Paddle2ONNX] [reduce_mean: mean_0.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [reduce_mean: mean_1.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [reduce_mean: mean_2.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [reduce_mean: mean_3.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [multiclass_nms3: multiclass_nms3_0.tmp_1] Requires the minimal opset version of 10.
[Paddle2ONNX] [reduce_sum: sum_0.tmp_0] Requires the minimal opset version of 13.
[Paddle2ONNX] Detected there's control flow op('conditional_block/select_input') in your model, this requires the minimal opset version of 11.
[Paddle2ONNX] Detected there's control flow op('conditional_block/select_input') in your model, this requires the minimal opset version of 11.
[Paddle2ONNX] Detected there's control flow op('conditional_block/select_input') in your model, this requires the minimal opset version of 11.
[Paddle2ONNX] [gather: gather_0.tmp_0] While rank of index is 2, Requires the minimal opset version of 11.
[Paddle2ONNX] [range: range_0.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [range: range_1.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [round: round_0.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [round: round_1.tmp_0] Requires the minimal opset version of 11.
[Paddle2ONNX] [slice: bilinear_interp_v2_1.tmp_0_slice_0] While has input StartsTensorList/EndsTensorListStridesTensorList, Requires the minimal opset version of 10.
[Paddle2ONNX] Use opset_version = 13 for ONNX export.
[WARN][Paddle2ONNX] [multiclass_nms3: multiclass_nms3_0.tmp_1] [WARNING] Due to the operator multiclass_nms3, the exported ONNX model will only supports inference with input batch_size == 1.
[Paddle2ONNX] PaddlePaddle model is exported as ONNX format now.

但是在NVIDIA XAVIER边缘端转换ONNX -> TensorRT格式报错？？？求教！！！

nvidia@linux:~/zj/paddle$ trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=ppyoloe_seg_s_80e_xfy.onnx --saveEngine=ppyoloe_seg_s_80e_xfy.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16
[02/05/2025-18:28:15] [I] === Model Options ===
[02/05/2025-18:28:15] [I] Format: ONNX
[02/05/2025-18:28:15] [I] Model: ppyoloe_seg_s_80e_xfy.onnx
[02/05/2025-18:28:15] [I] Output:
[02/05/2025-18:28:15] [I] === Build Options ===
[02/05/2025-18:28:15] [I] Max batch: explicit
[02/05/2025-18:28:15] [I] Workspace: 1024 MB
[02/05/2025-18:28:15] [I] minTiming: 1
[02/05/2025-18:28:15] [I] avgTiming: 8
[02/05/2025-18:28:15] [I] Precision: FP32+FP16
[02/05/2025-18:28:15] [I] Calibration:
[02/05/2025-18:28:15] [I] Safe mode: Disabled
[02/05/2025-18:28:15] [I] Save engine: ppyoloe_seg_s_80e_xfy.engine
[02/05/2025-18:28:15] [I] Load engine:
[02/05/2025-18:28:15] [I] Builder Cache: Enabled
[02/05/2025-18:28:15] [I] NVTX verbosity: 0
[02/05/2025-18:28:15] [I] Inputs format: fp32:CHW
[02/05/2025-18:28:15] [I] Outputs format: fp32:CHW
[02/05/2025-18:28:15] [I] Input build shape: image=1x3x640x640+1x3x640x640+1x3x640x640
[02/05/2025-18:28:15] [I] Input build shape: scale_factor=1x2+1x2+1x2
[02/05/2025-18:28:15] [I] Input calibration shapes: model
[02/05/2025-18:28:15] [I] === System Options ===
[02/05/2025-18:28:15] [I] Device: 0
[02/05/2025-18:28:15] [I] DLACore:
[02/05/2025-18:28:15] [I] Plugins:
[02/05/2025-18:28:15] [I] === Inference Options ===
[02/05/2025-18:28:15] [I] Batch: Explicit
[02/05/2025-18:28:15] [I] Input inference shape: scale_factor=1x2
[02/05/2025-18:28:15] [I] Input inference shape: image=1x3x640x640
[02/05/2025-18:28:15] [I] Iterations: 10
[02/05/2025-18:28:15] [I] Duration: 3s (+ 200ms warm up)
[02/05/2025-18:28:15] [I] Sleep time: 0ms
[02/05/2025-18:28:15] [I] Streams: 1
[02/05/2025-18:28:15] [I] ExposeDMA: Disabled
[02/05/2025-18:28:15] [I] Spin-wait: Disabled
[02/05/2025-18:28:15] [I] Multithreading: Disabled
[02/05/2025-18:28:15] [I] CUDA Graph: Disabled
[02/05/2025-18:28:15] [I] Skip inference: Disabled
[02/05/2025-18:28:15] [I] Inputs:
[02/05/2025-18:28:15] [I] === Reporting Options ===
[02/05/2025-18:28:15] [I] Verbose: Disabled
[02/05/2025-18:28:15] [I] Averages: 1000 inferences
[02/05/2025-18:28:15] [I] Percentile: 99
[02/05/2025-18:28:15] [I] Dump output: Disabled
[02/05/2025-18:28:15] [I] Profile: Disabled
[02/05/2025-18:28:15] [I] Export timing to JSON file:
[02/05/2025-18:28:15] [I] Export output to JSON file:
[02/05/2025-18:28:15] [I] Export profile to JSON file:
[02/05/2025-18:28:15] [I]
----------------------------------------------------------------
Input filename:   ppyoloe_seg_s_80e_xfy.onnx
ONNX IR version:  0.0.7
Opset version:    13
Producer name:
Producer version:
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[02/05/2025-18:28:18] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: axes
Aborted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何将PaddleDetection中的ppyoloe_seg导出为ONNX + trtexec转换 #1483

如何将PaddleDetection中的ppyoloe_seg导出为ONNX + trtexec转换 #1483

zjykzj commented Jan 21, 2025

zjykzj commented Feb 6, 2025

如何将PaddleDetection中的ppyoloe_seg导出为ONNX + trtexec转换 #1483

如何将PaddleDetection中的ppyoloe_seg导出为ONNX + trtexec转换 #1483

Comments

zjykzj commented Jan 21, 2025

zjykzj commented Feb 6, 2025