spec: Consensus Write-Ahead Log (WAL) #469

cason · 2024-10-16T12:27:26Z

In order to support the crash-recovery failure model (#578), the consensus implementation should persist all relevant events that have lead it to its current state. When recovering from a crash, the implementation is initialized from its initial state of the latest active height H, then has to replay all the information persisted before it has crashed.

The valid events processed by the consensus implementation are therefore typically persisted in a Write-Ahead Log (WAL), an append-only log that was originally conceived to ensure atomicity of transactions in databases.

Definition of Done

Define exactly which events should be persisted to the WAL
Define the strategy for writing data to the WAL (e.g., synchronously versus asynchronously)
Define the procedure for replaying the content of the WAL when a node is (re)started

The text was updated successfully, but these errors were encountered:

cason · 2024-11-20T15:50:06Z

Define exactly which events should be persisted to the WAL

We need to persist all events, received from external components, that may lead to state-transition in the consensus logic.

More specifically:

Valid consensus messages: PROPOSAL, PREVOTE, and PRECOMMIT
- We don't plan to store full proposed values v in the consensus WAL
- So the PROPOSAL messages carry id(v) instead of v
- Vote messages always carry id(v)
Expired timeouts: timeout_propose, timeout_prevote, and timeout_precommit
- The implementation may, for optimization, cancel schedule timeouts when they become useless
- Notice that storing a timeout expiration event that did not produce any effect is not a problem at all
- The timeouts are going to be scheduled during the replay procedure, storing them speeds-up recovery
Events associated to the production or the receipt of proposed values
- Full proposed values are not stored in the WAL, but by the dissemination logic
  - For details, refer to spec: Candidate blocks (full proposed values) store #579
- Once a value to be proposed by a process is produced (getValue()), the consensus logic is notified
- Once a proposed value is received, processed, and validated (valid(v)), the consensus logic are notified
- The consensus state machine reacts to those notifications or events, that therefore must be persisted to the WAL

cason · 2024-11-20T15:55:26Z

Define the strategy for writing data to the WAL (e.g., synchronously versus asynchronously)

All the received events can be persisted to the WAL asynchronously, in a best effort manner. But once a set of input events lead to a state transition, with the potential production of an output, the WAL must be persisted in a synchronous way.

We define the following list of actions before which the WAL should be flushed (synchronously persisted):

A message is sent by the node
The process switches to a new round of consensus
The process decides a value in a height of consensus

The above listed actions are the result of receiving a number of events (inputs), which can be asynchronously written to the WAL. But once the resulting action is produced, we must be sure that all events that have lead to the action are persisted. In this way, when recovering from a crash, the node is able to produce, based on the same inputs, the exactly same actions.

cason · 2024-11-20T15:59:00Z

Define the procedure for replaying the content of the WAL when a node is (re)started

When a height of consensus is (re)started, the consensus state machine should:

Consume all the events present in the WAL, referring to the current height. (If the WAL is from a lower height, the WAL is reset/deleted, and a new WAL for the current height is created, otherwise, we continue with the points below)
Produce all actions resulting from processing the persisted events
Then start consuming external inputs, for instance, coming from the broadcast/gossip network

A relevant observation is that a height of consensus must be re-started after all the committed blocks are applied (see #580) and after the storage for produced or received full values is open and restored (see #579).

josef-widder · 2024-11-20T16:12:38Z

The WAL should ensure that a recovered process has the following behavior:

It reaches a state that it had been in when it crashed (or shortly before)
While processing the WAL, the process should not send messages that are in conflict with messages that were sent before the crash (no double sign)

The only different to a correct process is that

a recovering process my send the same message multiple times (typically no problem)
while being down, it might have missed some incoming messages. So there are corner cases where in order to ensure progress, vote sync is needed.

josef-widder · 2024-11-20T16:21:07Z

Observe that this is based on the fact that once a process locally has persisted a blockstore entry (block and commit) of height h, the process may ignore all messages from heights less than or equal to h from this point on. So persiting a point is a big synchronization event, while flushing on sending later are smaller synchronization events.

cason · 2024-11-21T10:52:11Z

The WAL should ensure that a recovered process has the following behavior:

1. It reaches a state that it had been in when it crashed (or shortly before)

2. While processing the WAL, the process should not send messages that are in conflict with messages that were sent before the crash (no double sign)

A more precise definition of "shortly before" can be derived from item 2.

Namely, if an action was produced before crashing, consider the latest action produced. The state of the process after recovering must be the same as when it produced the latest action, or a later (successive) state. The internal state transitions that do not produce actions might be lost, namely the events that triggered them might not be synchronously persisted, therefore are lost. This is not a problem as long the "lost" events did not produce any external observable action.

josef-widder · 2024-11-21T10:58:35Z

For the WAL we need to make sure that the driver is deterministic. So we need to review everything. In particular folds in Qunit. Also pendingInputs should be transformed from a set to a list.

cason · 2024-11-21T11:05:14Z

while being down, it might have missed some incoming messages. So there are corner cases where in order to ensure progress, vote sync is needed.

By vote sync we mean the protocol drafted in #576.

cason · 2025-01-06T14:18:21Z

Besides of better documenting the solution, can we consider it solved?

cason added spec Related to specifications work in progress Work in progress labels Oct 16, 2024

cason changed the title ~~spec: Consensus Write-Ahead Log (WAL)~~ spec: consensus Write-Ahead Log (WAL) Oct 16, 2024

This was referenced Nov 19, 2024

spec: Support to the crash-recovery model #578

Open

spec: Candidate blocks (full proposed values) store #579

Open

cason added synchronization Nodes' synchronization (different rounds, heights) issues and removed work in progress Work in progress labels Nov 19, 2024

romac mentioned this issue Nov 25, 2024

code: WAL for crash recovery with f+1 faulty #607

Closed

3 tasks

cason mentioned this issue Nov 28, 2024

spec: Formally define the valid(v) properties #510

Open

romac changed the title ~~spec: consensus Write-Ahead Log (WAL)~~ spec: Consensus Write-Ahead Log (WAL) Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: Consensus Write-Ahead Log (WAL) #469

spec: Consensus Write-Ahead Log (WAL) #469

cason commented Oct 16, 2024 •

edited

Loading

cason commented Nov 20, 2024

cason commented Nov 20, 2024

cason commented Nov 20, 2024 •

edited by josef-widder

Loading

josef-widder commented Nov 20, 2024

josef-widder commented Nov 20, 2024

cason commented Nov 21, 2024

josef-widder commented Nov 21, 2024

cason commented Nov 21, 2024

cason commented Jan 6, 2025

spec: Consensus Write-Ahead Log (WAL) #469

spec: Consensus Write-Ahead Log (WAL) #469

Comments

cason commented Oct 16, 2024 • edited Loading

Definition of Done

cason commented Nov 20, 2024

cason commented Nov 20, 2024

cason commented Nov 20, 2024 • edited by josef-widder Loading

josef-widder commented Nov 20, 2024

josef-widder commented Nov 20, 2024

cason commented Nov 21, 2024

josef-widder commented Nov 21, 2024

cason commented Nov 21, 2024

cason commented Jan 6, 2025

cason commented Oct 16, 2024 •

edited

Loading

cason commented Nov 20, 2024 •

edited by josef-widder

Loading