Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment fault crash due to race condition of request cancellation (with fix proposal) #8034

Open
lunwang-ttd opened this issue Feb 25, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@lunwang-ttd
Copy link

lunwang-ttd commented Feb 25, 2025

Description
There is a race condition causes segment fault crash when a request is cancelled.

Triton Information
r25.01

Are you using the Triton container or did you build it yourself?
I built a RelWithDebInfo for debugging, and the issue can also be reproduced with the official Triton container.

To Reproduce
Start triton server in GDB with the simple model (docs/examples/model_repository/simple/config.pbtxt), suppose the model repository is /models

#!/bin/bash

# Delays to simulate the race condition
export TRITONSERVER_DELAY_GRPC_NOTIFICATION=5
export TRITONSERVER_DELAY_GRPC_ENQUEUE=2
export TRITONSERVER_DELAY_GRPC_PROCESS=2

gdb -ex=r --args tritonserver \
--log-verbose=0 \
--model-repository=/models \
--grpc-infer-allocation-pool-size=8

Send requests to the server.

#!/usr/bin/env python
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#  * Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#  * Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#  * Neither the name of NVIDIA CORPORATION nor the names of its
#    contributors may be used to endorse or promote products derived
#    from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import sys
import time
from functools import partial

import numpy as np
import tritonclient.grpc as grpcclient
from tritonclient.utils import InferenceServerException

if __name__ == "__main__":
    # 2 ms cancellation timeout
    client_timeout = 0.002
    url = "localhost:8001"

    try:
        triton_client = grpcclient.InferenceServerClient(
            url=url
        )
    except Exception as e:
        print("context creation failed: " + str(e))
        sys.exit()

    model_name = "simple"

    # Infer
    inputs = []
    outputs = []
    inputs.append(grpcclient.InferInput("INPUT0", [1, 16], "INT32"))
    inputs.append(grpcclient.InferInput("INPUT1", [1, 16], "INT32"))

    # Create the data for the two input tensors. Initialize the first
    # to unique integers and the second to all ones.
    input0_data = np.arange(start=0, stop=16, dtype=np.int32)
    input0_data = np.expand_dims(input0_data, axis=0)
    input1_data = np.ones(shape=(1, 16), dtype=np.int32)

    # Initialize the data
    inputs[0].set_data_from_numpy(input0_data)
    inputs[1].set_data_from_numpy(input1_data)

    outputs.append(grpcclient.InferRequestedOutput("OUTPUT0"))
    outputs.append(grpcclient.InferRequestedOutput("OUTPUT1"))

    # Define the callback function. Note the last two parameters should be
    # result and error. InferenceServerClient would povide the results of an
    # inference as grpcclient.InferResult in result. For successful
    # inference, error will be None, otherwise it will be an object of
    # tritonclientutils.InferenceServerException holding the error details
    def callback(user_data, result, error):
        if error:
            user_data.append(error)
        else:
            user_data.append(result)

    # list to hold the results of inference.
    user_data = []

    # Inference call
    for i in range(20):
        triton_client.async_infer(
            model_name=model_name,
            inputs=inputs,
            callback=partial(callback, user_data),
            outputs=outputs,
            client_timeout=client_timeout,
        )
        time.sleep(0.01)

    # Wait until the results are available in user_data
    time_out = 20
    while (len(user_data) == 0) and time_out > 0:
        time_out = time_out - 1
        time.sleep(1)

    print("results: ", len(user_data))

    # Display and validate the available results
    count = 1
    for result_data in user_data:
        # Check for the errors
        if type(result_data) == InferenceServerException:
            print("request -", count, result_data.message())
        else:
            print("request -", count, "PASS")
        count += 1

    print("PASS: Async infer")

A SIGSEGV with following stacks should be observed.

(gdb) bt
#0  ___pthread_mutex_lock (mutex=0x20) at ./nptl/pthread_mutex_lock.c:80
#1  0x0000564daff72377 in __gthread_mutex_lock (__mutex=0x20) at /usr/include/x86_64-linux-gnu/c++/13/bits/gthr-default.h:749
#2  __gthread_recursive_mutex_lock (__mutex=0x20) at /usr/include/x86_64-linux-gnu/c++/13/bits/gthr-default.h:811
#3  std::recursive_mutex::lock (this=0x20) at /usr/include/c++/13/mutex:120
#4  std::lock_guard<std::recursive_mutex>::lock_guard (__m=..., this=<synthetic pointer>)
    at /usr/include/c++/13/bits/std_mutex.h:249
#5  triton::server::grpc::InferHandlerState<grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>::Context::IsCancelled (this=0x0) at /workspace/src/grpc/infer_handler.h:677
#6  triton::server::grpc::InferHandlerState<grpc::ServerAsyncResponseWriter<inference::ModelInferResponse>, inference::ModelInferRequest, inference::ModelInferResponse>::IsGrpcContextCancelled (this=0x7f1918001960) at /workspace/src/grpc/infer_handler.h:1120
#7  triton::server::grpc::ModelInferHandler::Process (this=0x564db1fcf070, state=0x7f1918001960, rpc_ok=true,
    is_notification=true) at /workspace/src/grpc/infer_handler.cc:702
#8  0x0000564daff69127 in _ZZN6triton6server4grpc12InferHandlerIN9inference20GRPCInferenceService26WithAsyncMethod_ServerLiveINS4_27WithAsyncMethod_ServerReadyINS4_26WithAsyncMethod_ModelReadyINS4_30WithAsyncMethod_ServerMetadataINS4_29WithAsyncMethod_ModelMetadataINS4_26WithAsyncMethod_ModelInferINS4_32WithAsyncMethod_ModelStreamInferINS4_27WithAsyncMethod_ModelConfigINS4_31WithAsyncMethod_ModelStatisticsINS4_31WithAsyncMethod_RepositoryIndexINS4_35WithAsyncMethod_RepositoryModelLoadINS4_37WithAsyncMethod_RepositoryModelUnloadINS4_40WithAsyncMethod_SystemSharedMemoryStatusINS4_42WithAsyncMethod_SystemSharedMemoryRegisterINS4_44WithAsyncMethod_SystemSharedMemoryUnregisterINS4_38WithAsyncMethod_CudaSharedMemoryStatusINS4_40WithAsyncMethod_CudaSharedMemoryRegisterINS4_42WithAsyncMethod_CudaSharedMemoryUnregisterINS4_28WithAsyncMethod_TraceSettingINS4_27WithAsyncMethod_LogSettingsINS4_7ServiceEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEN4grpc25ServerAsyncResponseWriterINS3_18ModelInferResponseEEENS3_17ModelInferRequestES1C_E5StartEvENKUlvE_clEv (__closure=0x564db1fd41c8) at /workspace/src/grpc/infer_handler.h:1462
#9  0x00007f1d92a96db4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007f1d9274baa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#11 0x00007f1d927d8a34 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

Expected behavior
The crash should not happen when requests are cancelled.

Root cause analysis
For a single inference request, there are three threads involved.

  • gRPC handler thread A
  • gRPC handler thread B
  • Inference framework thread C to call InferResponseComplete when a inference is finished in framework

There is a race condition which causes state released while the cancellation event is yet processed.

  • T0: Cancellation event handled by thread A: set received notification as true (codes)
  • T1: Inference is finished by thread C: step changed to CANCELLED and a new event enqueued (codes)
  • T2: Event enqueued at T1 is processed by thread B: state object is released (codes)
  • T3: Cancellation event is continued to be processed by thread A with the dangling state object, and segfault happens (codes)

Fix proposal
Method SetReceivedNotification() to mark a state as received notification should also be protected by state->step_mtx_ to avoid the above race condition.

So the fix should be fairly simple:

diff --git a/src/grpc/infer_handler.cc b/src/grpc/infer_handler.cc
index 9c7eef48..f95922e2 100644
--- a/src/grpc/infer_handler.cc
+++ b/src/grpc/infer_handler.cc
@@ -697,6 +697,13 @@ ModelInferHandler::Process(
         std::chrono::milliseconds(state->delay_process_ms_));
   }

+  // SetReceivedNotification must be protected by state->step_mtx_ to avoid
+  // a race condition that state released before cancellation event is processed.
+  if (is_notification) {
+      state->context_->SetReceivedNotification(true);
+  }
+
   // Handle notification for cancellation which can be raised
   // asynchronously if detected on the network.
   if (state->IsGrpcContextCancelled()) {
diff --git a/src/grpc/infer_handler.h b/src/grpc/infer_handler.h
index 86428a51..017fb5f0 100644
--- a/src/grpc/infer_handler.h
+++ b/src/grpc/infer_handler.h
@@ -1437,7 +1437,6 @@ InferHandler<
       if (state->step_ == Steps::WAITING_NOTIFICATION) {
         State* state_wrapper = state;
         state = state_wrapper->state_ptr_;
-        state->context_->SetReceivedNotification(true);
         is_notification = true;
         LOG_VERBOSE(1) << "Received notification for " << Name() << ", "
                        << state->unique_id_;

I have verified the above fix works well.

@lunwang-ttd
Copy link
Author

lunwang-ttd commented Feb 25, 2025

CC @yinggeh since you worked on a similar PR: #7840

@yinggeh
Copy link
Contributor

yinggeh commented Feb 25, 2025

@lunwang-ttd Thanks for reporting this bug. There has been some ongoing gRPC enhancement from Triton team. Stay tuned!
cc @indrajit96

@yinggeh yinggeh added the bug Something isn't working label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants