Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I try to run rt-detr model in C++ libtorch i face the given error #725

Open
PranavhShetty opened this issue Jan 30, 2025 · 4 comments

Comments

@PranavhShetty
Copy link

🐛 Bug

Please reproduce using our template Colab and post here the link

To Reproduce

⚠️ We cannot help you without you sharing reproducible code. Do not ignore this part :)
Steps to reproduce the behavior:

1.Infer a Rt-detr model exported from ultralytics in torchscript format and infer it in cuda libtorch
2. the model runs well in cpu and also runs well when olo model is used only when rt-detr model is infered this error is faced
3. Rt-detr model is exported using pytorch version 2.6.0+cu118

Expected behavior

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0): Pytorch version of C++ is 1.13.0 in c++ 14(Due to restrictions cant be changed)
  • OS (e.g., Linux): Windows
  • How you installed PyTorch (conda, pip, source): using archive of pytorch
  • Build command you used (if compiling from source):
  • Python version: 3.10
  • CUDA/cuDNN version: 117
  • GPU models and configuration:NVIDIA T550
  • Any other relevant information:

Additional context

Error code:
Error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/ultralytics/nn/tasks.py", line 85, in forward
_33 = (_7).forward(act1, (_6).forward(act1, _32, ), )
_34 = (_10).forward((_9).forward((_8).forward(_33, ), ), )
_35 = (_12).forward(act0, (_11).forward(_34, ), )
~~~~~~~~~~~~ <--- HERE
_36 = (_15).forward((_13).forward(_35, ), (_14).forward(_33, ), )
_37 = (_17).forward(act0, (_16).forward(act0, act, _36, ), )
File "code/torch/ultralytics/nn/modules/transformer.py", line 39, in forward
pos_dim = torch.div(embed_dim, CONSTANTS.c0, rounding_mode="trunc")
_7 = torch.arange(annotate(number, pos_dim), dtype=6, layout=0, device=torch.device("cpu"), pin_memory=False)
_8 = torch.div(_7, pos_dim)
~~~~~~~~~ <--- HERE
_9 = torch.to(CONSTANTS.c1, torch.device("cpu"), 6)
_10 = torch.reciprocal(torch.pow(torch.detach(_9), _8))

Traceback of TorchScript, original code (most recent call last):
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\nn\modules\transformer.py(109): build_2d_sincos_position_embedding
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\nn\modules\transformer.py(96): forward
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\torch\nn\modules\module.py(1090): _slow_forward
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\torch\nn\modules\module.py(1102): _call_impl
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\nn\tasks.py(587): predict
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\nn\tasks.py(112): forward
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\torch\nn\modules\module.py(1090): _slow_forward
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\torch\nn\modules\module.py(1102): _call_impl
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\torch\jit_trace.py(958): trace_module
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\torch\jit_trace.py(741): trace
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\engine\exporter.py(434): export_torchscript
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\engine\exporter.py(141): outer_func
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\engine\exporter.py(355): call
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ultralytics\engine\model.py(737): export
C:\Users\Vijay M\AppData\Local\Temp\ipykernel_16012\1332778321.py(1):
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\IPython\core\interactiveshell.py(3508): run_code
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\IPython\core\interactiveshell.py(3448): run_ast_nodes
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\IPython\core\interactiveshell.py(3269): run_cell_async
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\IPython\core\async_helpers.py(129): _pseudo_sync_runner
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\IPython\core\interactiveshell.py(3064): _run_cell
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\IPython\core\interactiveshell.py(3009): run_cell
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\zmqshell.py(549): run_cell
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\ipkernel.py(449): do_execute
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\kernelbase.py(778): execute_request
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\ipkernel.py(362): execute_request
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\kernelbase.py(437): dispatch_shell
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\kernelbase.py(534): process_one
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\kernelbase.py(545): dispatch_queue
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\asyncio\events.py(81): _run
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\asyncio\base_events.py(1859): _run_once
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\asyncio\base_events.py(570): run_forever
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\tornado\platform\asyncio.py(205): start
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel\kernelapp.py(739): start
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\traitlets\config\application.py(1075): launch_instance
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\site-packages\ipykernel_launcher.py(18):
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\runpy.py(87): _run_code
c:\Users\Public\miniconda\envs\pytorch110-cu10.2\lib\runpy.py(194): _run_module_as_main
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

This is the error faced when i try to infer the model in c++ with cuda.
Kindly help me solve this issue.

@PranavhShetty
Copy link
Author

#include <torch/torch.h>
#include <torch/cuda.h>
#include <torch/script.h>
#include
#include <Windows.h> // For HMODULE and basic Windows types
#include <psapi.h>
#include <opencv2/opencv.hpp>
#include <opencv2/core.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/imgcodecs.hpp>

using namespace std;

int main() {
HMODULE torchCudaDll = LoadLibraryA("torch_cuda.dll");
try {
std::cout << "LibTorch version: " << TORCH_VERSION << std::endl;
std::cout << "LibTorch major version: " << TORCH_VERSION_MAJOR << std::endl;
std::cout << "LibTorch minor version: " << TORCH_VERSION_MINOR << std::endl;
std::cout << "LibTorch patch version: " << TORCH_VERSION_PATCH << std::endl;

    if (!torch::cuda::is_available()) {
        std::cerr << "CUDA is not available!" << std::endl;
        return -1;
    }
    else {
        std::cout << "CUDA is available\n";
    }


    std::string model_path = "C:/Users/prana/Downloads/rt-detr-v1.4.1.torchscript";
    torch::jit::script::Module model;
    try {
        torch::NoGradGuard no_grad;
        model = torch::jit::load(model_path, torch::kCUDA);
    }
    catch (const c10::Error& e) {
        std::cerr << "Error loading the model: " << e.what() << std::endl;
        return -1;
    }
   /* model.to(torch::kCUDA);*/
    model.eval();

    cv::Mat image = cv::imread("C:/Users/prana/Desktop/bhavith/images/Img_008_12108(0) (1)_316.png");
    if (image.empty()) {
        std::cerr << "Error loading the image" << std::endl;
        return -1;
    }

    cv::imshow("image", image);
    /*cv::waitKey(0);*/
    cv::Mat input_image;
    cv::cvtColor(image, input_image, cv::COLOR_BGR2RGB);

    torch::Tensor image_tensor = torch::from_blob(input_image.data, { input_image.rows, input_image.cols, 3 }, torch::kByte);
    image_tensor = image_tensor.toType(torch::kFloat32).div(255);
    image_tensor = image_tensor.permute({ 2, 0, 1 });
    image_tensor = image_tensor.unsqueeze(0);
    image_tensor = image_tensor.to(torch::kCUDA);
    std::vector<torch::jit::IValue> inputs{ image_tensor };

    //try {
    ////torch::NoGradGuard no_grad; // Disable gradient calculation
    torch::Tensor output = model.forward(inputs).toTensor();
    output = output.to(torch::kCPU);

	std::cout << output.slice(1, 0, 10) << std::endl;
    //}
    //catch (const c10::Error& e) {
    //std::cerr << "Error during model inference: " << e.what() << std::endl;
    //return -1;
    //}

    return 0;
}
catch (const std::exception& e) {
    std::cerr << "Error: " << e.what() << std::endl;
    return -1;


}

}

This is the code causing the above error

@iden-kalemaj
Copy link
Contributor

Hi there, I want to make sure you are posting this issue in the right forum. Are you using the Opacus library for differentially private training? I did not see that in the code snippet your provided.

@PranavhShetty
Copy link
Author

Sorry I did post it in the wrong forum I will post this issue in the right forum too. But still if u could still help me regarding this it would mean a lot.

@iden-kalemaj
Copy link
Contributor

Apologies, but I don't have the right expertise here. When posting in the other forum, I'd recommend creating a reproducible example of the code, currently it involves local file paths, making it difficult for someone to run and debug the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants