Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Operation was canceled when start_workflow #639

Open
duy-nguyen-ts opened this issue Sep 10, 2024 · 7 comments
Open

[Bug] Operation was canceled when start_workflow #639

duy-nguyen-ts opened this issue Sep 10, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@duy-nguyen-ts
Copy link

duy-nguyen-ts commented Sep 10, 2024

What are you really trying to do?

  • Hi team, I am having an issue when trying to start_workflow and signal_workflow

Describe the bug

  • It happens when I called method start_workflow. Maybe it cant connect to create workflow on temporal and return temporal_sdk_bridge.RPCError: (1, 'operation was canceled', b'')
  • I started 10 workflows but received 6 success and 4 error cancelled
  • I want to know why it happens, does it due to network or anything else ? How can I fix that ? E.x: Add retry policy when start_workflow,...

Environment/Versions

  • OS and processor: Mac M2
  • Temporal Version: ^1.6.0
  • Are you using Docker or Kubernetes or building Temporal from source: Using Docker

Additional context

@duy-nguyen-ts duy-nguyen-ts added the bug Something isn't working label Sep 10, 2024
@duy-nguyen-ts duy-nguyen-ts changed the title [Bug] FILL_TITLE_HERE [Bug] Operation was canceled when start_workflow Sep 10, 2024
@duy-nguyen-ts
Copy link
Author

I had check my logs again, this error also happens when I call signal to workflow.

@duy-nguyen-ts
Copy link
Author

After tracing this issue, I saw it happened at this line, maybe error when it made a rpc call to temporal
Screenshot 2024-09-10 at 11 25 47

@cretz
Copy link
Member

cretz commented Sep 10, 2024

Can you replicate this reliably? If so, can you alter a sample to show how to replicate? And is it against Temporal cloud or self-hosted server? We are releasing a fix in the next couple of days for a similar error at temporalio/sdk-core#807, but we believe that only affected 1.7.0.

@duy-nguyen-ts
Copy link
Author

duy-nguyen-ts commented Sep 11, 2024

Hi @cretz , thanks for your reply, I am using Temporal as self-hosted server and I can't always replicate it, sometime it happened and not. I investigated and assumed that it caused at point in above image. Currently, I added retry when call start_workflow and this error still happen but less than before. About my code, it just sample like this:

  • Create a client with connect
    temporal_client = await Client.connect(target_host=...,namespace=...)
  • Call start_workflow (maybe many calls at the same time)
    handler = await temporal_client.start_workflow(workflow, args=[arg], id="workflow_id", task_queue="task_queue")

@duy-nguyen-ts
Copy link
Author

I am using version 1.6.0 so maybe it not similar to temporalio/sdk-core#807

@cretz
Copy link
Member

cretz commented Sep 11, 2024

I am using Temporal as self-hosted server and I can't always replicate it, sometime it happened and not

Even if it takes a minute to replicate, any replication would help us debug.

I am afraid there's not much to go on here. We have many samples/users starting hundreds/thousands of workflows without any issues on self-hosted servers. Can you make sure you're not doing something like accidentally blocking the thread in an async def call thereby causing asyncio to stop working properly?

@duy-nguyen-ts
Copy link
Author

Okk @cretz , thank you for your response. I will continue monitor it 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants