Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextflow to gracefully handle Google gRPC API call failures #5703

Closed
robnewman opened this issue Jan 23, 2025 · 4 comments
Closed

Nextflow to gracefully handle Google gRPC API call failures #5703

robnewman opened this issue Jan 23, 2025 · 4 comments

Comments

@robnewman
Copy link
Contributor

New feature

Can Nextflow support retrying of gRPC API failures, specifically the following different failures:

  • UNAVAILABLE
  • DEADLINE_EXCEEDED
  • RESOURCE_EXHARSTED
  • UNKNOWN

Usage scenario

Running Nextflow pipelines on Google Batch, when there are job failures due to the resource not available (e.g. the requested VM was not be able to start due to exceeding the Google project quota), Nextflow was not able to receive any signal from these jobs and this causes the Nextflow job to not stop, consuming resources.

Suggest implementation

Could you make Nextflow support retrying of gRPC API failures based on failure response codes?

@robnewman robnewman changed the title Nextflow to gracefully handle Google gRPC API call failure Nextflow to gracefully handle Google gRPC API call failures Jan 23, 2025
@pditommaso
Copy link
Member

Duplicate of #4537

@pditommaso pditommaso marked this as a duplicate of #4537 Jan 23, 2025
@pditommaso
Copy link
Member

I believe this is already implemented

if( t instanceof UnavailableException )
return true

@robnewman
Copy link
Contributor Author

robnewman commented Jan 23, 2025

Duplicate of #4537

Thanks @pditommaso. Per @jorgee's comment, it didn't look like his PR resolved all the call failure modes:
#5690 (comment)

@bentsherman
Copy link
Member

Closing until we get a more specific (reproducible) bug report.

@bentsherman bentsherman closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants