Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: duty scheduler - indices change synchronization #1724

Closed
olegshmuelov opened this issue Sep 10, 2024 · 3 comments · May be fixed by #1725
Closed

fix: duty scheduler - indices change synchronization #1724

olegshmuelov opened this issue Sep 10, 2024 · 3 comments · May be fixed by #1725
Assignees
Labels
alan bug Something isn't working critical Needs immediate attention
Milestone

Comments

@olegshmuelov
Copy link
Contributor

Describe the bug
Indices change can cause a bug where duties are reseted while executing committee duties

To Reproduce
Steps to reproduce the behavior:

  1. slot n: indices change triggered
  2. slot n+1: committee ticker event
  3. slot n+1: attester ticker event
  4. attester execute aggregator duties
  5. reset current epoch duties
  6. committee execute duties (no duties) - bug
@olegshmuelov olegshmuelov added the bug Something isn't working label Sep 10, 2024
@olegshmuelov olegshmuelov self-assigned this Sep 10, 2024
@olegshmuelov olegshmuelov added alan critical Needs immediate attention labels Sep 10, 2024
@y0sher
Copy link
Contributor

y0sher commented Sep 10, 2024

We pretty much know this bug from previously, well we didn't think of the bug, but we did talk about making the Committee handler in scheduler able to wait for a signal that the Duties container is ready. this can probably happen as well for reorg event.

Here's a short 2 ways of how this will be solved:

  1. Make sure committee handler is always run after attester and sync committee handlers in scheduler, if everything is run synchroniously and changes are updated on the same event. (not waiting to next epoch to get duities again but doing in inside the indiceChange event (without waiting for next slot like current code). Another issue is we're using Feed for that and feed adds every subscriber as a select case hence there's no order enforced between them.
  2. Make Duties containers LOCK whenever an event happens, and only release to readers after all updates have been done. this should work even if everything is asynchronously.

@olegshmuelov
Copy link
Contributor Author

We pretty much know this bug from previously, well we didn't think of the bug, but we did talk about making the Committee handler in scheduler able to wait for a signal that the Duties container is ready. this can probably happen as well for reorg event.

Here's a short 2 ways of how this will be solved:

  1. Make sure committee handler is always run after attester and sync committee handlers in scheduler, if everything is run synchroniously and changes are updated on the same event. (not waiting to next epoch to get duities again but doing in inside the indiceChange event (without waiting for next slot like current code). Another issue is we're using Feed for that and feed adds every subscriber as a select case hence there's no order enforced between them.
  2. Make Duties containers LOCK whenever an event happens, and only release to readers after all updates have been done. this should work even if everything is asynchronously.

We prefer not to synchronize slot ticker events between the duty scheduler handlers, as it would add complexity and make the code harder to maintain. Instead, I propose triggering all the blocking functions of both the Attester and SyncCommittee handlers directly from the Committee handler, which should streamline the process without the added overhead

@olegshmuelov olegshmuelov added this to the Alan milestone Sep 12, 2024
@moshe-blox
Copy link
Contributor

closed by #1741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alan bug Something isn't working critical Needs immediate attention
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants