Set f as a configuration parameter #547

hesusruiz · 2023-05-21T15:07:40Z

First, congratulations for a fantastic library (that for some reason went under my radar until recently).

I wanted to make a PR for some functionality, but first I would like to know your opinion to see if it makes sense and it does not affect safety or liveness. Thanks in advance for that.

All PBFT-based implementations that I know derive f (max. faulty replicas) from n (total number of replicas), and this library is not different (in computeQuorum(n)). This is fine if you want to maximise resiliency against failures for a given cost of operating replicas (a given n), or alternatively minimise the cost of operating replicas for a given resiliency against byzantine failures.

But in our use case, we have many volunteer organisations (both private and public administrations), willing to operate each one a replica, and cost of operation is not a problem. We would like to set the desired resiliency f independently of n (except for the relationship n >= 3f + 1).

So for example, if we have a total of 21 replicas, instead of supporting 6 bft failures we would like to support "only" 3, but we would like to still operate 21 replicas. In our configuration supporting 3 bft failures, a quorum would require 7 replicas (2f+1) to ensure that two quorums overlap at least in a correct one. By the way, this configuration would support 10 CFT failures instead of just 6 of the current library.

The changes to enable the configuration of f and modify computeQuorum(n) seem easy, but the important thing is if this would affect the implementation in some wrong way.

If you are curious about why on earth we would want to "reduce" the resiliency of the network if we already are assuming the cost of operating a big number of replicas, I could elaborate. But in summary, we have been operating in production for 3 years a BFT decentralised network which may become soon much bigger (national scale). We want to have a reasonable BFT resiliency, big CFT resiliency (which is the most common failure we have experienced), and a big number of different entities operating the network collaboratively (there should not be any operator at all).

So, our limit for n would be based on the protocol overhead, not on the desired level of BFT resiliency.

The text was updated successfully, but these errors were encountered:

yacovm · 2024-03-07T09:17:50Z

So for example, if we have a total of 21 replicas, instead of supporting 6 bft failures we would like to support "only" 3, but we would like to still operate 21 replicas. In our configuration supporting 3 bft failures, a quorum would require 7 replicas (2f+1) to ensure that two quorums overlap at least in a correct one.

How do they overlap if they are of size 7? The 2f+1 formula no longer holds in such a case. How can a quorum be a third of the total number of nodes?

yacovm · 2024-03-07T09:27:04Z

If you set f to be less than a third you will still get a quorum that is at least half of the nodes, by the definition of a quorum.

I am not in favor of your suggestion, because the fundamental design principle of this library is that its consumer can use it without understanding the underlying protocol. Letting the user control f violates this principle because the user needs to understand the minimum and maximum range for f.

hesusruiz · 2024-03-09T23:30:14Z

If you set f to be less than a third you will still get a quorum that is at least half of the nodes, by the definition of a quorum.

If we use the "traditional" definition of a quorum in the BFT literature, then I agree. This definition assumes implicitly that all participants in the execution of the consensus algorithm are pre-defined and typically only change via a voting of the majority of the current participants.

But with our approach, the requirement for a quorum is more alligned with how electronic voting is performed in real-world associations, which can be expresed as:

"A quorum is the minimum number of members of a group necessary to constitute the group at a meeting, while protecting the group against totally unrepresentative action in the name of the body by an unduly small number of members."

Translating that language to technical terms in the consensus algorithm:

The group is the total number of nodes which can potentially participate in the consensus algorithm (the voting), which is n
The "unduly small number of members" is the number of byzantine nodes: f
The quorum is the minimum number of members which are required to be "present" in a given e-meeting to vote and consider the voting as valid representation of the group: q. While in many associations this is set to a majority of the total number of members, in many other groups, especially big ones, the requirement is relaxed and defined in their bylaws. As an extreme case, in the elections of a country, it is never necessary a majority of the citizens able to vote.

In our case, the definition of quorum would be q = 2f + 1, where f is the desired protection level against byzantine nodes, because this ensures that in every voting round the correct nodes are always a majority versus the byzantine nodes.

I am not in favor of your suggestion, because the fundamental design principle of this library is that its consumer can use it without understanding the underlying protocol. Letting the user control f violates this principle because the user needs to understand the minimum and maximum range for f.

I understand your reasoning, and I am thinking on creating a different (forked) library exactly for this reason.

But anyway, the users of the library have to understand the subtleties of byzantine faults and the differences with respect CFT algorithms like Raft, which are the most prevalent in distributed systems right now.

For example, regarding the level of protection against byzantine failures:

Users of your library must know how to calculate it from the total number of nodes, and in my experience, practitioners (not computer scientists) have some problems to get the number correctly without resorting to looking up the formula somewhere.
Users of the new library would understand the byzantine protection very easily, because it is a parameter they set: 'f', which is independent from the total number of nodes (assuming the number of nodes is enough to achieve the level of protection they desire).

What I mean is that in the current blockchain deployments I know today, most users have no idea of the real protection they have, or they have to calculate it, even if that is one of the most important properties of the system they use.

Anyway, thanks for your answer, and we would probably go ahead with a fork of this library, to avoid any possible confussion to the users, as you mention.

yacovm · 2024-03-09T23:43:49Z

Well, this library is a library for classical voting based consensus, where a quorum is defined to be the smallest set size such that each such two sets intersect at at least one honest node.

If your use case is huge networks (thousands of participants) then I suggest you look at other consensus protocols, namely ones used in the permissionless blockchain space.

If you have a softer assumption on the number of byzantine nodes, then maybe you can try to select a random committee and have that committee run SmartBFT consensus?

This formula gives you the committee size as a function of all nodes, the assumption on the actual byzantine number out of all nodes and the "unsafety probability" (how safe you want the random selection to not pick more than a third of the nodes to be Byzantine.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set f as a configuration parameter #547

Set f as a configuration parameter #547

hesusruiz commented May 21, 2023 •

edited

Loading

yacovm commented Mar 7, 2024

yacovm commented Mar 7, 2024

hesusruiz commented Mar 9, 2024

yacovm commented Mar 9, 2024 •

edited

Loading

Set f as a configuration parameter #547

Set f as a configuration parameter #547

Comments

hesusruiz commented May 21, 2023 • edited Loading

yacovm commented Mar 7, 2024

yacovm commented Mar 7, 2024

hesusruiz commented Mar 9, 2024

yacovm commented Mar 9, 2024 • edited Loading

hesusruiz commented May 21, 2023 •

edited

Loading

yacovm commented Mar 9, 2024 •

edited

Loading