Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to generate seg engine trained with Ultralytics version 8.0.183 #27

Open
YumainOB opened this issue Oct 31, 2023 · 6 comments
Open

Comments

@YumainOB
Copy link

YumainOB commented Oct 31, 2023

Hello and thank you for your good job in bringing Yolov8 to the TensorRT C++ side.

I would like to help if it is possible but for now I'm facing an issue with the engine creation in case of a segmentation model. It seems that there is a missing stuff for "ConvTranspose_178 (CaskDeconvolution)" if I don't missunderstand logs.

I run the code on a TX2 board (with branch feat/jetson-tx2 obviously)
Here is the jetson environment:
$ jetson_release
Software part of jetson-stats 4.2.3 - (c) 2023, Raffaello Bonghi
Model: quill - Jetpack 4.6.4 [L4T 32.7.4]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • P-Number: p3310-1000
  • Module: NVIDIA Jetson TX2
    Platform:
  • Distribution: Ubuntu 18.04 Bionic Beaver
  • Release: 4.9.337-tegra
    jtop:
  • Version: 4.2.3
  • Service: Active
    Libraries:
  • CUDA: 10.2.300
  • cuDNN: 8.2.1.32
  • TensorRT: 8.2
  • VPI: 1.2.3
  • Vulkan: 1.2.70
  • OpenCV: 4.8.0 - with CUDA: YES

Here is the command I use:
./benchmark --model yolov8n_seg.onnx --input ~/workspace/ppanto_yolo/test_ressources --precision FP16 --class-names class1 class2

Here are the relevant pat of the logs.

--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 8192000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Float(409600,1,5120,64) -> Float(1638400,1,10240,64) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
GemmDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Half(409600,6400,80,1) -> Half(1638400,25600,160,1) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 4096000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Half(204800,6400:2,80,1) -> Half(819200,25600:2,160,1) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 4096000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
Deleting timing cache: 1496 entries, 2612 hits
10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_178.)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)

Do you have an idea of what I can do to get the model working right? What I don't understand is that I can export to engine using Ultralytics export and trtexec. Do you have a clue?

Best regards

@HXB-1997
Copy link

HXB-1997 commented Nov 4, 2023

I alse met the same question:
nvidia@ubuntu:~/Desktop/HXB/11-4/YOLOv8-TensorRT-CPP/build$ ./detect_object_image --model /home/nvidia/Desktop/HXB/11-4/yolov8n-seg_sim.onnx --input ./bus2.jpg

Searching for engine file with name: yolov8n-seg_sim.engine.NVIDIATegraX2.fp16.1.1
Engine not found, generating. This could take a while...
onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Model only supports fixed batch size of 1

10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_177.)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
terminate called after throwing an instance of 'std::runtime_error'

  what():  Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)

Do you solved it? @YumainOB @cyrusbehr

@YumainOB
Copy link
Author

YumainOB commented Nov 6, 2023

Sorry I still have no clue on this issue.

@cyrusbehr do you have some idea?

@HXB-1997
Copy link

HXB-1997 commented Nov 6, 2023

Sorry I still have no clue on this issue.

@cyrusbehr do you have some idea?
I think convtranspose is supported by TensorRT 8.4, but current Jetpack Tensorrt version is 8.2, how can I upgrade tensorrt to 8.4 without upgrading Jetpack? @YumainOB

@YumainOB
Copy link
Author

YumainOB commented Nov 6, 2023

As far as I know, this is not possible to update TensorRT without upgrading the jetpack.

On the other side using Ultralytics repo and precisely "yolo export ..." using this jetpack/TensorRT without any issue, so I doubt that updating them is the only way to have the issue solved.

Best regards 

@4e4o
Copy link

4e4o commented Nov 27, 2023

I v'got the same issue. Here is extra logs from tensorrt
11.txt

@YumainOB
Copy link
Author

YumainOB commented Dec 12, 2023

I found a way to get the engine completely generated. Thanks to this post: https://forums.developer.nvidia.com/t/convtranspose-onnx-to-tensorrt-conversion-fail/181720/2. To apply this idea I added the following line in engine.cpp:
config->setMaxWorkspaceSize(30);
rigth after the IBuilderConfig creation and cheking.

That's a nice point

But I'm facing another issue later with a runtime failure... Here are the logs:
CUDA_LAUNCH_BLOCKING=1 ./detect_object_image --model yolov8n_seg.onnx --input image.jpg
Searching for engine file with name: yolov8n_seg.engine.NVIDIATegraX2.fp16.1.1
Engine found, not regenerating...
[MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 301, GPU 7174 (MiB)
Loaded engine size: 13 MiB
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +169, now: CPU 475, GPU 7350 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +250, GPU +252, now: CPU 725, GPU 7602 (MiB)
Deserialization required 2294905 microseconds.
[MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +12, now: CPU 0, GPU 12 (MiB)
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 725, GPU 7602 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 725, GPU 7602 (MiB)
Total per-runner device persistent memory is 12509184
Total per-runner host persistent memory is 137424
Allocated activation device memory of size 14695424
[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +26, now: CPU 0, GPU 38 (MiB)
1: [reformat.cu::NCHHW2ToNCHW::1049] Error Code 1: Cuda Runtime (unspecified launch failure)
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: Unable to run inference.
Aborted (core dumped)

A similar message happens whith precision set to FP32:
CUDA_LAUNCH_BLOCKING=1 ./detect_object_image --model ~/workspace/ppanto_yolo/yolov8n_seg.onnx --input image.jpg --precision FP32
Searching for engine file with name: yolov8n_seg.engine.NVIDIATegraX2.fp32.1.1
Engine found, not regenerating...
[MemUsageChange] Init CUDA: CPU +266, GPU +0, now: CPU 315, GPU 6966 (MiB)
Loaded engine size: 27 MiB
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +167, GPU +170, now: CPU 489, GPU 7143 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +250, GPU +251, now: CPU 739, GPU 7394 (MiB)
Deserialization required 2305892 microseconds.
[MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +26, now: CPU 0, GPU 26 (MiB)
Using cublas as a tactic source
[MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 739, GPU 7394 (MiB)
Using cuDNN as a tactic source
[MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 739, GPU 7394 (MiB)
Total per-runner device persistent memory is 27359232
Total per-runner host persistent memory is 129312
Allocated activation device memory of size 22171136
[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +47, now: CPU 0, GPU 73 (MiB)
1: [pointWiseV2Helpers.h::launchPwgenKernel::546] Error Code 1: Cuda Driver (unspecified launch failure)
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: Unable to run inference.
Aborted (core dumped)

@cyrusbehr Do you have any clue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants