Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch model with Dictionary[Key,Tensor] output #7765

Open
cesumilo opened this issue Nov 5, 2024 · 0 comments
Open

PyTorch model with Dictionary[Key,Tensor] output #7765

cesumilo opened this issue Nov 5, 2024 · 0 comments

Comments

@cesumilo
Copy link

cesumilo commented Nov 5, 2024

Hi community, I'm sorry if I'm not writing this in the right place. I couldn't figure out where to ask a question about the triton inference server.

Description
I have a PyTorch model with the following structure:

name: "my_super_model"
platform: "pytorch_libtorch"
max_batch_size: 0
input [
  {
    name: "input0"
    data_type: TYPE_FP32
    dims: [ -1, 160, 160, 3 ]
  }
]
output [
  {
    name: "output__0"
    data_type: TYPE_STRING
    dims: [ -1 ]
  },
  {
    name: "output__1"
    data_type: TYPE_FP32
    dims: [ -1 ]
  },
  {
    name: "output__2"
    data_type: TYPE_STRING
    dims: [ -1 ]
  },
  {
    name: "output__3"
    data_type: TYPE_FP32
    dims: [ -1 ]
  },
  {
    name: "output__4"
    data_type: TYPE_FP32
    dims: [ -1, 1024 ]
  }
]
response_cache {
  enable: true
}
instance_group [
  {
    count: 1
    kind: KIND_GPU
  }
]

The output of the model is a dictionary mapping string keys to lists of tensors.

When I try to run an inference with the following script, I get the error below.

import asyncio
import grpc
import tritonclient.grpc.aio as grpcclient
import numpy as np
import time

URL = "localhost:8001"

async def infer_torch():
    # Create gRPC stub for communicating with the server
    triton_client = grpcclient.InferenceServerClient(
        url=URL, verbose=False
    )

    model_name = f"my_super_model"
    print(f"Running {model_name}...")

    # Infer
    nb_objects = 600
    objects = np.random.rand(nb_objects, 160, 160, 3).astype(np.float32)

    m1_input = grpcclient.InferInput('input0', [ nb_objects, 160, 160, 3 ], "FP32")
    m1_input.set_data_from_numpy(objects)

    output = grpcclient.InferRequestedOutput("output__0")

    t1 = time.time()
    results = await triton_client.infer(
        model_name=model_name,
        inputs=[m1_input],
        outputs=[output],
    )
    print(f"Inference time: {time.time() - t1}s")

    statistics = await triton_client.get_inference_statistics(model_name=model_name)
    print(statistics)

    output = results.get_output("output__0") 

    print(f"output: {output}")


async def main():
    await asyncio.gather(infer_torch())

if __name__ == '__main__':
    asyncio.run(main())

Error:

PyTorch execute failure: output must be of type Tensor, List[str] or Tuple containing one of these two types. It should not be a List / Dictionary of Tensors or a Scalar

Triton Information
Image: nvcr.io/nvidia/tritonserver:24.09-py3

Expected behavior
I was expecting to get my output tensors from the dictionary by specifying the dictionary structure in the model configuration. I couldn't figure out from the documentation how to make this work.

Any idea how to make this work? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant