Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Temporalio.Exceptions.RpcException:operation was canceled #395

Open
pauldotknopf opened this issue Jan 20, 2025 · 3 comments
Open
Labels
bug Something isn't working

Comments

@pauldotknopf
Copy link

What are you really trying to do?

Get info about a namespace, using 1.4.0 of the .NET SDK.

Describe the bug

public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken)
{
    var stopwatch = new Stopwatch();

    try
    {
        stopwatch.Start();
        
        var client = await _rcmClientConnectionProvider.OpenClient();
        
        // Error gets thrown here.
        var namespaceInfo = await client.WorkflowService.DescribeNamespaceAsync(
            new Temporalio.Api.WorkflowService.V1.DescribeNamespaceRequest { Namespace = "default" },
            new RpcOptions
            {
                CancellationToken = cancellationToken
            }
        );

        stopwatch.Stop();

        if (namespaceInfo == null)
        {
            return HealthCheckResult.Unhealthy(
                $"Temporal namespace is unreachable: elapsed: {stopwatch.Elapsed}");
        }

        return HealthCheckResult.Healthy();
    }
    catch (Exception ex)
    {
        stopwatch.Stop();
        return HealthCheckResult.Unhealthy($"Temporal client is unreachable: elapsed: {stopwatch.Elapsed}", ex);
    }
}

The _rcmClientConnectionProvider variable is a singleton service that maintains a single ITemporalClient (through OpenClient), used throughout the application (creating workflows and subscribing to task queues). It's code is like this:

public async Task<ITemporalClient> OpenClient()
{
    if (_client != null)
    {
        return _client;
    }

    await _semaphoreSlim.WaitAsync();
    
    try
    {
        if (_client == null)
        {
            var options = new TemporalClientConnectOptions
            {
                TargetHost = $"{_options.HostName}:{_options.Port}",
                Namespace = _options.Namespace
            };
            if (serviceProvider.GetService(typeof(ILoggerFactory)) is ILoggerFactory loggerFactory)
            {
                options.LoggerFactory = loggerFactory;
            }

            try
            {
                _client = await TemporalClient.ConnectAsync(options);
            }
            catch(InvalidOperationException e)
            {
                if (e.Message.StartsWith("Connection failed: Server connection error"))
                {
                    var message = $"Failed to connect to Temporal server at {_options.HostName}:{_options.Port}";
                    if (_options.HostName == "localhost")
                    {
                        message += "\nA local instance of temporal can be ran by running 'temporal server start-dev'";
                    }
                    throw new InvalidOperationException(message, e);
                }

                throw;
            }
        }

        return _client;
    }
    finally
    {
        _semaphoreSlim.Release();
    }
}

Minimal Reproduction

It only happens in one environment, so I fear minimal repo may be hard to do.

Environment/Versions

Temporal helm chart v0.54.0.
Temporal .NET SDK 1.4.0 (nuget)
Azure App Service (for Linux)

Additional context

This is the complete error, being reported by App Insights.

[
  {
    "severityLevel": "Error",
    "outerId": "0",
    "message": "operation was canceled",
    "type": "Temporalio.Exceptions.RpcException",
    "id": "11222409",
    "parsedStack": [
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 0
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 1
      },
      {
        "assembly": "Temporalio, Version=1.4.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Temporalio.Bridge.Client+<CallAsync>d__14`1.MoveNext",
        "level": 2
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 3
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 4
      },
      {
        "assembly": "Temporalio, Version=1.4.0.0, Culture=neutral, PublicKeyToken=null",
        "method": "Temporalio.Client.TemporalConnection+<InvokeRpcAsync>d__42`1.MoveNext",
        "level": 5
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
        "level": 6
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
        "level": 7
      },
      {
        "assembly": "System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
        "method": "System.Runtime.CompilerServices.TaskAwaiter`1.GetResult",
        "level": 8
      },
      {
        "assembly": "AutomatedActions.Services, Version=0.2.86.0, Culture=neutral, PublicKeyToken=null",
        "method": "AutomatedActions.Services.HealthChecks.TemporalConnectionHealthCheck+<CheckHealthAsync>d__2.MoveNext",
        "level": 9,
        "line": 26,
        "fileName": "/agent/_work/1/s/src/Workflow/src/AutomatedActions.Services/HealthChecks/TemporalConnectionHealthCheck.cs"
      }
    ]
  }
]

One this worth mentioning is that I don't think this is actually a timeout issue, because the exception is thrown immediately-ish: Temporal client is unreachable: elapsed: 00:00:00.0022905.

@pauldotknopf pauldotknopf added the bug Something isn't working label Jan 20, 2025
@cretz
Copy link
Member

cretz commented Jan 21, 2025

It only happens in one environment, so I fear minimal repo may be hard to do.

Is this only an issue on Azure or can anyone replicate it with the dev server? I fear if only one environment has the issue it may be specific to the environment. Ideally we can have a replication we can run in CI to know it has been fixed and continues to remain fixed.

@pauldotknopf
Copy link
Author

Update, it's happening in every environment now..

@pauldotknopf
Copy link
Author

Considering this timeout exception is being thrown immediately, should I not be sharing my TemporalClient across threads?

This particular method is a "health check" that's done periodically.

It doesn't look like TemporalClient implements IDisposable. Is it safe to create them and let the GC handle them? Will connection resources become starved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants