Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Revert "Use create_or_find_by to reduce activity status deadlocks"" #4671

Closed
wants to merge 3 commits into from

Conversation

jorg-vr
Copy link
Contributor

@jorg-vr jorg-vr commented May 23, 2023

Reverts #4670

This reintroduces all changes made in #4547 and thus closes #4479

The only change compared to the original pr is that I added reconnect: true to the production DB settings.
I hope that because a roll backed query broke the connection, just reconnecting and continuing should result in an error free flow. (But we'll have to test it to know for sure)

@jorg-vr jorg-vr added the bug Something isn't working label May 23, 2023
@jorg-vr jorg-vr self-assigned this May 23, 2023
@bmesuere
Copy link
Member

I would put this on hold until after the exams.

@bmesuere
Copy link
Member

For future reference:

  • some submissions failed with internal error, but worked fine when rejudging
  • we could not reproduce the problem on naos
  • the error occurred "at random", but more with higher traffic. We could trigger it by doing a mass rejudge from the rails console
  • the error was triggered by setting the submission status to "running". Upon inspection, it seems that something goes wrong when creating an activity_status which causes a rollback, which causes the mysql connection to crash/drop?

@jorg-vr jorg-vr changed the base branch from develop to main June 5, 2023 07:33
@jorg-vr
Copy link
Contributor Author

jorg-vr commented Aug 10, 2023

i have been investigating this more, some conclusions:
These rollbacks happen almost always. This pr runs a create before a find, the create is often only required when the status is set to running by the first submission.
=> only a very select subset of these rollbacks causes an issue, DB connection issue
The requests are also always small and fast.
We do not match any of the common reasons why database connection is lost: https://dev.mysql.com/doc/refman/8.0/en/gone-away.html
If I remember correctly we only got an error on the first db request after the rollback.

A potential 'solution' to accept that the connection might fail sometimes, but force an immediate reconnect.
This can be achieved by simply setting 'reconnect: true' in the database yaml
This 'solution' might not last long, as the option has recently been deprecated by mysql: brianmario/mysql2#1322 But it is still unclear how the mysql2 gem will resolve this

@jorg-vr jorg-vr marked this pull request as ready for review August 10, 2023 09:30
@jorg-vr jorg-vr requested a review from a team as a code owner August 10, 2023 09:30
@jorg-vr jorg-vr requested review from bmesuere and chvp and removed request for a team August 10, 2023 09:30
Copy link
Member

@bmesuere bmesuere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this fix. It is a deprecated workaround of a workaround.

In the original code, we do a single retry before raising an error. What if we try more?

@jorg-vr jorg-vr closed this Aug 17, 2023
@jorg-vr jorg-vr deleted the revert-4670-revert-4547-fix/activity_status_for branch April 22, 2024 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid duplicate key error for activity statuses
3 participants