[PENG-2342] Jobbergate Agent continually resubmits the same job if the job status update fails #607
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Create a local cache of job submissions for the jobbergate agent.:
Why
We've found a situation where the jobbergate-agent resubmits a job constantly to slurm despite the job having been successfully submitted.
After the Jobbergate Agent successfully submits a pending job submission to slurm, it then attempts to update the status of the job in the Jobbergate API to indicate that the job was submitted. However, if the call to update the job submission in the Jobbergate API fails for any reason, the job_submission will be left in the pending status. This means that in the next cycle of the Jobbergate Agent, it will still see the job as if it had never been submitted and will try to submit the job again.
Task
: https://app.clickup.com/t/18022949/PENG-2342Peer Review
Please follow the upstream omnivector documentation concerning
peer-review guidelines.