-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug where retrying an export RQ job may break scheduling #8584
Conversation
`_patched_retry` tries to schedule a copy of the current job. In particular, it copies the dependencies of the old job using `current_rq_job.dependency_ids`. Unfortunately, `dependency_ids` does not return IDs of dependency jobs, as one might expect. It actually returns the Redis _keys_ corresponding to those jobs, as bytestrings. The RQ job creation code does not support bytestrings as dependency specifiers, so it unintentionally treats them as sequences, saving the individual bytes (as integers) as the dependency job IDs. But since IDs have to be strings, the scheduler quickly crashes when it tries to use those integer "IDs" to construct Redis keys. Thankfully, we don't actually need to get the dependency IDs. `_patched_retry` is only used inside running jobs, and if a job is running, it means that all its dependencies are already completed. Thus, the newly scheduled job doesn't need to have any dependencies at all.
9da6484
to
daf91c5
Compare
WalkthroughThis update addresses a bug in the scheduling of export RQ jobs, ensuring that retrying a job does not disrupt the scheduling of new jobs. Key modifications include the removal of the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Scheduler
participant ExportJob
participant Cache
User->>Scheduler: Request export job
Scheduler->>ExportJob: Enqueue job
ExportJob->>Cache: Check lock availability
alt Lock available
ExportJob->>Cache: Remove cache file
Cache-->>ExportJob: Cache file removed
ExportJob-->>Scheduler: Job completed
else Lock not available
ExportJob->>ExportJob: Retry job after interval
ExportJob->>Cache: Check lock availability again
end
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
changelog.d/20241022_191618_roman_fix_integer_dependencies.md (1)
1-5
: LGTM with a minor suggestion for clarity.The changelog entry accurately reflects the bug fix described in the PR objectives and follows good practices for changelog entries. It's concise, informative, and includes the correct pull request link.
To slightly improve clarity, consider adding a brief mention of the root cause or the impact of the bug. For example:
### Fixed - Fixed a bug where an export RQ job being retried may break scheduling - of new jobs + of new jobs due to incorrect handling of dependency IDs (<https://github.com/cvat-ai/cvat/pull/8584>)This addition provides a bit more context about the nature of the bug without going into excessive technical detail.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- changelog.d/20241022_191618_roman_fix_integer_dependencies.md (1 hunks)
- cvat/apps/dataset_manager/views.py (0 hunks)
💤 Files with no reviewable changes (1)
- cvat/apps/dataset_manager/views.py
🧰 Additional context used
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8584 +/- ##
===========================================
- Coverage 74.29% 74.28% -0.02%
===========================================
Files 403 403
Lines 43287 43287
Branches 3914 3914
===========================================
- Hits 32162 32154 -8
- Misses 11125 11133 +8
|
Motivation and context
_patched_retry
tries to schedule a copy of the current job. In particular, it copies the dependencies of the old job usingcurrent_rq_job.dependency_ids
.Unfortunately,
dependency_ids
does not return IDs of dependency jobs, as one might expect. It actually returns the Redis keys corresponding to those jobs, as bytestrings. The RQ job creation code does not support bytestrings as dependency specifiers, so it unintentionally treats them as sequences, saving the individual bytes (as integers) as the dependency job IDs. But since IDs have to be strings, the scheduler quickly crashes when it tries to use those integer "IDs" to construct Redis keys.Thankfully, we don't actually need to get the dependency IDs.
_patched_retry
is only used inside running jobs, and if a job is running, it means that all its dependencies are already completed. Thus, the newly scheduled job doesn't need to have any dependencies at all.How has this been tested?
Checklist
develop
branch[ ] I have updated the documentation accordingly[ ] I have added tests to cover my changes[ ] I have linked related issues (see GitHub docs)[ ] I have increased versions of npm packages if it is necessary(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
Bug Fixes
Improvements
Documentation