Skip to content

Replication

Enrico Olivelli edited this page Sep 3, 2020 · 4 revisions

In a clustered setup for each TableSpace it is possible to design a node as 'leader' and a number of nodes as 'replicas'. The leader is the only node which accepts "remote" queries for the TableSpace and it is the only which is able to change transaction data and metadata in the scope of the TableSpace.

For each data/metadata mutation the TableSpace leader writes an entry on the replicated log (implemented using Apache BookKeeper) and then updates the local copy of the data. Replica nodes continuously "tail" the transaction log (this is a facility of Apache BookKeeper) and apply the same updates to their local copy of the data. The role of the leader is decided quite statically, that is that there is no continuous "leader election". The administrator usually designs a node as the leader and it will be in charge of the tablespace. If a tablespace is underreplicated (there is a configuration parameter for each tablespace which tells the minimum number of replicas for it) that the whole group decides to add new replica to the group. If the leader is not available for quite a long time the system can automatically promote an existing replica to the role of the leader.

When a replica boots for the first time it has to contact the actual leader for the tablespace and download a consistent snapshot of the whole data and metadata for the TableSpace.

Beware that usually a node is leader for some tablespaces and replica for other tablespaces, this way the load due to the role of leadearship can be balanced among the whole cluster. Another very simple two machines setup can be to keep a master node as leader for all the tablespaces and a second node as replica. However this is not the common case for HerdDB, which is designed to handle hundres of tablespaces and it is better to spread the load among all the nodes.

See also Data storage in order to understand how we are storing data in Diskless cluster mode, that is when you want to store data entirely on BookKeeper

Clone this wiki locally