Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error recovery in the node #493

Open
Tracked by #483
th7nder opened this issue Oct 31, 2024 · 1 comment
Open
Tracked by #483

Error recovery in the node #493

th7nder opened this issue Oct 31, 2024 · 1 comment
Labels
tech debt Flags technical debt
Milestone

Comments

@th7nder
Copy link
Contributor

th7nder commented Oct 31, 2024

          Error reporting, we should log it and eventually add them to rocksdb

About the retries, it depends where the error happened. If it was a disk failure, it might not be worth it

Originally posted by @jmg-duarte in #483 (comment)

E.g. when Piece was being added, and cancelled (node shutdown), on boot we need to check pieces which have been cancelled and retry them.

@jmg-duarte
Copy link
Contributor

jmg-duarte commented Nov 4, 2024

More details:

But how do we know we should retry it? When cancelling we need to place some state in the database that indicates that AddPiece with certain parameters was cancelled. Or better yet, add it in the beginning and removing it when add_piece finishes.

Also, is AddPiece idempotent? If retrying is fine, it indicates that it may be
https://github.com/eigerco/polka-storage/pull/483/files#r1824853789

You're right, AddPiece is not idempotent at this stage, as it'll create a new sector each time it is called.
It is cancellation safe though, because when interrupted it does not leave the state corrupted.
It won't add deal to a sector, the SP will be slashed and end of story.

About retries, right... We don't have a mechanism for that yet.
We'd need to store the state of the piece somewhere (modify the deal storage, because AddPiece can be called again solely based on published deal data) and have some periodic scanner, on startup that checks whether something was in process but not finished and clean it up, call add piece again.

Maybe can be done as part of #493?
https://github.com/eigerco/polka-storage/pull/483/files#r1825948465

As a takeaway, we'll need to track the pieces in some persistent storage (like RocksDB) and perform checks and retries.

@jmg-duarte jmg-duarte added this to the Phase 3 milestone Nov 14, 2024
@jmg-duarte jmg-duarte added the tech debt Flags technical debt label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech debt Flags technical debt
Projects
None yet
Development

No branches or pull requests

2 participants