Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault error when crafting pb_utils.Tensor object Triton BLS model #7953

Open
carldomond7 opened this issue Jan 18, 2025 · 1 comment

Comments

@carldomond7
Copy link

carldomond7 commented Jan 18, 2025

We are using Triton to serve a BLS model. Inside the model.py file for this BLS model, there is a function that uses the triton gRPC client to query another model hosted on the server. While this process works correctly, the issue arises in the execute function. When the final output tensor is extracted, I attempt to cast it as a pb_utils.Tensor object and append it to the InferenceResponse class as documented. However, during the pb_utils.Tensor casting, a Segmentation fault error occurs.

My triton inference server docker image is 24.07-py3
Cuda is 12.5

Error Stack

Output tensor has been extracted (1536, 1536)
Final output type: <class 'numpy.ndarray'>
Final output shape: (1536, 1536), dtype: uint8
Signal (11) received.
 0# 0x00005C1039DE580D in tritonserver
 1# 0x0000758CBE932520 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x0000758CB535CBD5 in /opt/tritonserver/backends/python/libtriton_python.so
 3# 0x0000758CB53604F2 in /opt/tritonserver/backends/python/libtriton_python.so
 4# 0x0000758CB5360943 in /opt/tritonserver/backends/python/libtriton_python.so
 5# 0x0000758CB533DFF7 in /opt/tritonserver/backends/python/libtriton_python.so
 6# TRITONBACKEND_ModelInstanceExecute in /opt/tritonserver/backends/python/libtriton_python.so
 7# 0x0000758CBD311944 in /opt/tritonserver/bin/../lib/libtritonserver.so
 8# 0x0000758CBD311CBB in /opt/tritonserver/bin/../lib/libtritonserver.so
 9# 0x0000758CBD42D23D in /opt/tritonserver/bin/../lib/libtritonserver.so
10# 0x0000758CBD3160F4 in /opt/tritonserver/bin/../lib/libtritonserver.so
11# 0x0000758CBF01A253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
12# 0x0000758CBE984AC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
13# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

I've confirmed that the dtype, tensor shape that the config.pbtxt for the BLS model is in alignment with what is being sent in the execute function.

name: "Tile_Blend_BLS"
backend: "python"
max_batch_size: 0

input [
  {
    name: "image_in"
    data_type: TYPE_FP64
    dims: [ 5, 1536, 1536 ]
  },
  {
    name: "feature_in"
    data_type: TYPE_FP64
    dims: [ 1536, 1536 ]
  }
]

output [
  {
    name: "Tile_Blend_Output"
    data_type: TYPE_UINT8
    dims: [ 1536, 1536 ]
  }
]


instance_group [
  {
    count: 1
    kind: KIND_GPU
    gpus: [0] 
  }
]

parameters: {
  key: "EXECUTION_ENV_PATH",
  value: {string_value: "/xxxxx/tile_blending.tar.gz"}
}

Here is the execute function that is throwing the error in model.py:

def execute(self, requests):
      """
      Called for each batch of inference requests.
      """
      responses = []
      for request in requests:
          # Step 1: Extract inputs necessary for getUpsampledDirect
          image_in = pb_utils.get_input_tensor_by_name(request, "image_in").as_numpy()
          feature_in = pb_utils.get_input_tensor_by_name(request, "feature_in").as_numpy()
          print(f"Received image_in: shape={image_in.shape}, dtype={image_in.dtype}")
          print(f"Received feature_in: shape={feature_in.shape}, dtype={feature_in.dtype}")
          
          output = self.tf_manager.getUpsampledDirect(image_in=image_in, feature_in=feature_in)
          final_tileblend_output = output[0]
          print(f"Output tensor has been extracted {final_tileblend_output.shape}")
          print(f"Final output type: {type(final_tileblend_output)}")
          print(f"Final output shape: {final_tileblend_output.shape}, dtype: {final_tileblend_output.dtype}")
          
          output_tensor = pb_utils.Tensor("Tile_Blend_Output", final_tileblend_output) #Where segmentation fault occurs I believe based on logs
          responses.append(pb_utils.InferenceResponse(output_tensors=[output_tensor]))
      return responses

My model repository is as shown:

|-- G1L_ONNX
|   |-- 1
|   |   `-- model.onnx
|   `-- config.pbtxt
|-- G1R_ONNX
|   |-- 1
|   |   `-- model.onnx
|   `-- config.pbtxt
`-- Tile_Blend_BLS
    |-- 1
    |   |-- __pycache__
    |   |-- aug_utils.py
    |   |-- debug_plot.py
    |   |-- histogram.py
    |   |-- histogram_inference.py
    |   |-- model.py
    |   `-- util.py
    |-- config.pbtxt
    `-- triton_python_backend_stub

I reference my custom execution environment in config.pbtxt (the tarball), and I have a custom triton_python_backend_stub. If possible could you assist me in finding where the source of the error is coming from.

@carldomond7
Copy link
Author

carldomond7 commented Jan 20, 2025

After debugging with GBD, I've obtained a backtrace that shows the following logs indicating why the crash occurs:

 #0  0x000076cf6215cbd5 in boost::intrusive::bstree_impl<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, void, void, unsigned long, true, (boost::intrusive::algo_types)5, void>::erase(boost::intrusive::tree_iterator<boost::intrusive::bhtraits<boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::block_ctrl, boost::intrusive::rbtree_node_traits<boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, true>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 3u>, true>) ()
   from /opt/tritonserver/backends/python/libtriton_python.so
#1  0x000076cf621604f2 in boost::interprocess::rbtree_best_fit<boost::interprocess::null_mutex_family, boost::interprocess::offset_ptr<void, long, unsigned long, 0ul>, 0ul>::priv_deallocate(void*) () from /opt/tritonserver/backends/python/libtriton_python.so
#2  0x000076cf62160943 in std::_Function_handler<void (triton::backend::python::ResponseBatch*), triton::backend::python::SharedMemoryManager::WrapObjectInUniquePtr<triton::backend::python::ResponseBatch>(triton::backend::python::ResponseBatch*, triton::backend::python::AllocatedShmOwnership*, long const&)::{lambda(triton::backend::python::ResponseBatch*)#1}>::_M_invoke(std::_Any_data const&, triton::backend::python::ResponseBatch*&&) () from /opt/tritonserver/backends/python/libtriton_python.so
#3  0x000076cf6213dff7 in triton::backend::python::ModelInstanceState::ProcessRequests(TRITONBACKEND_Request**, unsigned int, std::vector<std::unique_ptr<triton::backend::python::InferRequest, std::default_delete<triton::backend::python::InferRequest> >, std::allocator<std::unique_ptr<triton::backend::python::InferRequest, std::default_delete<triton::backend::python::InferRequest> > > >&, triton::backend::python::PbMetricReporter&) () from /opt/tritonserver/backends/python/libtriton_python.so
#4  0x000076cf6213e34a in TRITONBACKEND_ModelInstanceExecute ()
   from /opt/tritonserver/backends/python/libtriton_python.so

From my understanding the segmentation fault occurs post-inference. I also noticed via ps aux, that a triton backend stub process is created when starting the tritonserver then another one is created when the server is being queried.

Does this extra context assist with debugging the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant