Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit max_instances to 1 for the scheduler to prevent conflicts #529

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

djwooten
Copy link

PR Checklist

  • This comment contains a description of changes (with reason)
  • Referenced issue is linked
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated
  • docs/changelog.md is updated

Description of changes

Fixes #526

MegaQC ingests reports via an asynchronous scheduler job. If multiple scheduler jobs run concurrently, they can create conflicts. This change prevents that from happening be setting APScheduler's max_instances parameter to 1.

@multimeric
Copy link
Collaborator

Thanks for the issue and the PR. However, just limiting the number of workers doesn't seem like the ideal solution to me. I wonder if we can just edit the processing logic to use transactions and/or to just skip rows that have the TREATED status, so that this conflict doesn't happen, but retaining the multiple workers approach?

@djwooten
Copy link
Author

djwooten commented Jul 5, 2024

I can see how setting max_instances=1 doesn't necessarily scale well, since if there are so many reports that they are not processed within the 30 second period, then having multiple workers process them could help speed things up.

I don't really know enough about databases in general, or flask-sqlalchemy in particular to know whether there could still be some edge cases slipping through if it switches to a transaction model, nor if you would still get the full benefit of the parallel workers.

Perhaps one option would be to set all rows' status to IN TREATMENT before processing any of the individual reports, since I think it is the report processing and uploading that is the slow part. Something like

# Mark all current rows as "IN TREATMENT" to prevent other workers from processing these rows
for row in queued_uploads:
    row.status = "IN TREATMENT"
    db.session.add(row)
db.session.commit()

# Upload the contents of each report to the database
for row in queued_uploads:
    # ... continue with the remainder of the original loop

I would expect that most realistic scenarios would be able to finish the first loop before another worker kicks off 30 seconds later, though it's not a strict guarantee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug report - Overlapping scheduler jobs cause conflict
2 participants