retryDeleteJob causes restart job to be unscheduled #3968

GhangZh · 2025-01-14T07:03:15Z

first delete Podgroup and delete Pod triggered retryDeleteJob called a few times and was penalized with a delay of a few seconds
at this point the deleteJob was called directly out of the queue, removing the job from the jobcache
At this point, the job retry or re-create a job with the same UID.
Then after a few seconds the old job is queued up from the ratelimit queue, and at this time to determine the JobTerminated = true, it will be deleted from the job cache again!

Then the job will always exist and never be scheduled.

Expect not to have to delete the cache repeatedly for this scenario

volcano 1.9

No response

The text was updated successfully, but these errors were encountered:

aryasoni98 · 2025-02-06T19:13:18Z

@GhangZh I would love to work on this issue. Could you please assign it to me?

GhangZh added the kind/bug Categorizes issue or PR as related to a bug. label Jan 14, 2025

Provide feedback