Unable to generate seg engine trained with Ultralytics version 8.0.183 #27

YumainOB · 2023-10-31T09:47:39Z

Hello and thank you for your good job in bringing Yolov8 to the TensorRT C++ side.

I would like to help if it is possible but for now I'm facing an issue with the engine creation in case of a segmentation model. It seems that there is a missing stuff for "ConvTranspose_178 (CaskDeconvolution)" if I don't missunderstand logs.

I run the code on a TX2 board (with branch feat/jetson-tx2 obviously)
Here is the jetson environment:
$ jetson_release
Software part of jetson-stats 4.2.3 - (c) 2023, Raffaello Bonghi
Model: quill - Jetpack 4.6.4 [L4T 32.7.4]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

P-Number: p3310-1000
Module: NVIDIA Jetson TX2
Platform:
Distribution: Ubuntu 18.04 Bionic Beaver
Release: 4.9.337-tegra
jtop:
Version: 4.2.3
Service: Active
Libraries:
CUDA: 10.2.300
cuDNN: 8.2.1.32
TensorRT: 8.2
VPI: 1.2.3
Vulkan: 1.2.70
OpenCV: 4.8.0 - with CUDA: YES

Here is the command I use:
./benchmark --model yolov8n_seg.onnx --input ~/workspace/ppanto_yolo/test_ressources --precision FP16 --class-names class1 class2

Here are the relevant pat of the logs.

--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 8192000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Float(409600,1,5120,64) -> Float(1638400,1,10240,64) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
GemmDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Half(409600,6400,80,1) -> Half(1638400,25600,160,1) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 4096000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Half(204800,6400:2,80,1) -> Half(819200,25600:2,160,1) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 4096000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
Deleting timing cache: 1496 entries, 2612 hits
10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_178.)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)

Do you have an idea of what I can do to get the model working right? What I don't understand is that I can export to engine using Ultralytics export and trtexec. Do you have a clue?

Best regards

The text was updated successfully, but these errors were encountered:

HXB-1997 · 2023-11-04T08:19:40Z

I alse met the same question:
nvidia@ubuntu:~/Desktop/HXB/11-4/YOLOv8-TensorRT-CPP/build$ ./detect_object_image --model /home/nvidia/Desktop/HXB/11-4/yolov8n-seg_sim.onnx --input ./bus2.jpg

Searching for engine file with name: yolov8n-seg_sim.engine.NVIDIATegraX2.fp16.1.1
Engine not found, generating. This could take a while...
onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Model only supports fixed batch size of 1

10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_177.)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
terminate called after throwing an instance of 'std::runtime_error'

  what():  Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)

Do you solved it? @YumainOB @cyrusbehr

YumainOB · 2023-11-06T08:16:12Z

Sorry I still have no clue on this issue.

@cyrusbehr do you have some idea?

HXB-1997 · 2023-11-06T08:18:59Z

Sorry I still have no clue on this issue.

@cyrusbehr do you have some idea?
I think convtranspose is supported by TensorRT 8.4, but current Jetpack Tensorrt version is 8.2, how can I upgrade tensorrt to 8.4 without upgrading Jetpack? @YumainOB

YumainOB · 2023-11-06T08:28:15Z

As far as I know, this is not possible to update TensorRT without upgrading the jetpack.

On the other side using Ultralytics repo and precisely "yolo export ..." using this jetpack/TensorRT without any issue, so I doubt that updating them is the only way to have the issue solved.

Best regards

4e4o · 2023-11-27T08:35:17Z

I v'got the same issue. Here is extra logs from tensorrt
11.txt

YumainOB · 2023-12-12T20:29:17Z

I found a way to get the engine completely generated. Thanks to this post: https://forums.developer.nvidia.com/t/convtranspose-onnx-to-tensorrt-conversion-fail/181720/2. To apply this idea I added the following line in engine.cpp:
config->setMaxWorkspaceSize(30);
rigth after the IBuilderConfig creation and cheking.

That's a nice point

But I'm facing another issue later with a runtime failure... Here are the logs:
CUDA_LAUNCH_BLOCKING=1 ./detect_object_image --model yolov8n_seg.onnx --input image.jpg
Searching for engine file with name: yolov8n_seg.engine.NVIDIATegraX2.fp16.1.1
Engine found, not regenerating...
[MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 301, GPU 7174 (MiB)
Loaded engine size: 13 MiB
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +169, now: CPU 475, GPU 7350 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +250, GPU +252, now: CPU 725, GPU 7602 (MiB)
Deserialization required 2294905 microseconds.
[MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 12 (MiB)
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 725, GPU 7602 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 725, GPU 7602 (MiB)
Total per-runner device persistent memory is 12509184
Total per-runner host persistent memory is 137424
Allocated activation device memory of size 14695424
[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +26, now: CPU 0, GPU 38 (MiB)
1: [reformat.cu::NCHHW2ToNCHW::1049] Error Code 1: Cuda Runtime (unspecified launch failure)
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: Unable to run inference.
Aborted (core dumped)

A similar message happens whith precision set to FP32:
CUDA_LAUNCH_BLOCKING=1 ./detect_object_image --model ~/workspace/ppanto_yolo/yolov8n_seg.onnx --input image.jpg --precision FP32
Searching for engine file with name: yolov8n_seg.engine.NVIDIATegraX2.fp32.1.1
Engine found, not regenerating...
[MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 315, GPU 6966 (MiB)
Loaded engine size: 27 MiB
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +170, now: CPU 489, GPU 7143 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +250, GPU +251, now: CPU 739, GPU 7394 (MiB)
Deserialization required 2305892 microseconds.
[MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +26, now: CPU 0, GPU 26 (MiB)
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 739, GPU 7394 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 739, GPU 7394 (MiB)
Total per-runner device persistent memory is 27359232
Total per-runner host persistent memory is 129312
Allocated activation device memory of size 22171136
[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +47, now: CPU 0, GPU 73 (MiB)
1: [pointWiseV2Helpers.h::launchPwgenKernel::546] Error Code 1: Cuda Driver (unspecified launch failure)
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: Unable to run inference.
Aborted (core dumped)

@cyrusbehr Do you have any clue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to generate seg engine trained with Ultralytics version 8.0.183 #27

Unable to generate seg engine trained with Ultralytics version 8.0.183 #27

YumainOB commented Oct 31, 2023 •

edited by cyrusbehr

Loading

HXB-1997 commented Nov 4, 2023 •

edited by cyrusbehr

Loading

YumainOB commented Nov 6, 2023

HXB-1997 commented Nov 6, 2023 •

edited

Loading

YumainOB commented Nov 6, 2023

4e4o commented Nov 27, 2023

YumainOB commented Dec 12, 2023 •

edited

Loading

Unable to generate seg engine trained with Ultralytics version 8.0.183 #27

Unable to generate seg engine trained with Ultralytics version 8.0.183 #27

Comments

YumainOB commented Oct 31, 2023 • edited by cyrusbehr Loading

HXB-1997 commented Nov 4, 2023 • edited by cyrusbehr Loading

YumainOB commented Nov 6, 2023

HXB-1997 commented Nov 6, 2023 • edited Loading

YumainOB commented Nov 6, 2023

4e4o commented Nov 27, 2023

YumainOB commented Dec 12, 2023 • edited Loading

YumainOB commented Oct 31, 2023 •

edited by cyrusbehr

Loading

HXB-1997 commented Nov 4, 2023 •

edited by cyrusbehr

Loading

HXB-1997 commented Nov 6, 2023 •

edited

Loading

YumainOB commented Dec 12, 2023 •

edited

Loading