The Stellar Consensus Protocol (SCP) is the underlying consensus algorithm of the Stellar Network that functions as a provably safe construction of Federated Byzantine Agreement (FBA). Stellar’s network implements many similar mechanisms for distributed fault tolerance across a financial network as other cryptocurrencies with some distinct variations.
The SCP is derived from the concept of Byzantine Agreements (BA) and tailored towards a decentralized and permissionless network using quorums and quorum slices. Understanding the SCP requires a brief history of BAs and how they compare to FBAs followed by a description of quorums and quorum slices, the federated voting model, and finally the commit/abort ballot system of the SCP protocol itself.
Byzantine Agreements and Federated Byzantine Agreements
Byzantine Agreement is Byzantine fault tolerance of distributed computing systems that enable them to come to consensus despite arbitrary behavior from a fraction of the nodes in the network. BA consensus makes no assumptions about the behavior of nodes in the system. Practical Byzantine Fault Tolerance (pBFT) is the prototypical model for Byzantine agreement, and it can reach consensus fast and efficiently while concurrently decoupling consensus from resources (i.e., financial stake in PoS or electricity in PoW).
Read: What is Practical Byzantine Fault Tolerance?
However, BA (pBFT) does not scale well and requires a large communication overhead between all the participating nodes. Further, the system needs unanimous agreement on membership of the network to mitigate Sybil attacks.
Federated Byzantine Agreement was introduced by the SCP white paper and explicitly addresses the limitations of BA by fostering a consensus protocol that guarantees the following:
- Decentralized Control
- Flexible Trust
- Low Latency
- Asymptotic Security
One of the primary consequences of FBA compared to BA is that an FBA system is open to nodes joining in a permissionless setting rather than through a closed (permissioned) membership list.
FBA comes to agreement on state updates using a unique slot where update dependencies between nodes are inferred. Nodes must agree on the slot update in each round of consensus. However, since the system is open to nodes joining and leaving the network at will, a majority-based quorum consensus mechanism will not work. Instead, the FBA in the SCP employs quorum slices that are subsets of quorums that are capable of convincing particular nodes of an agreement.
According to the Stellar blog:
“The key difference between a Byzantine agreement system and a federated Byzantine agreement system (FBAS) is that in FBA each node chooses its own quorum slices.”
Quorums and quorum slices will be discussed in more detail below, but the major takeaway here is that individual nodes can independently decide which other nodes (participants) they trust for information. Therefore, SCP is the first BA protocol to give each participant maximum freedom in selecting whom to trust.
Quorums and Quorum Slices
A quorum is defined as a set of nodes needed to reach an agreement in a distributed system. When nodes attempt to reach an agreement, they communicate with each other (under the assumption no messages are forged — cryptography comes in here) and concur that an update on the state is valid once a specific threshold of nodes in agreement is met.
Quorum slices are the subsets of a quorum that are capable of convincing particular nodes of an agreement, meaning that a node can rely on multiple sets of nodes asserting statements. A node can depend on numerous slices for information, and this trust can be based on information from outside of the system. Notably, trust is set up within the node’s config file, allowing for the dynamic formation of quorum slices and subsequent decentralization.
As an example:
Node A can determine that it does not trust banks, resulting in the need for another quorum slice that Node A trusts to come to an agreement with banks. Once an agreement is reached, a quorum is formed. The graphic below is excellent for better understanding this example. Node 7 (and 8) would represent Node A that does not trust banks.
Image Credit – David Mazieres Presentation at Google
Traditional BA requires that all nodes accept the same slices, rather than discerning sources of trusted information for themselves. As such, there is no way to distinguish slices and quorums, requiring a closed and permissioned member access to the network.
The FBA model relies on individual nodes to choose their own sets of quorum slices, effectively enabling the organic and more decentralized formation of quorums that rely on individual decisions, hence the name “federated.” In discussing safety and liveness in the FBA protocol, we need to evaluate quorum intersection and disjoint quorums.
According to the SCP white paper:
“A protocol can guarantee agreement only if the quorum slices represented by function Q satisfy a validity property we call quorum intersection.”
Quorums intersect if they share a node. Good quorums share nodes and lead to overlapping quorums. Nodes are responsible for ensuring that their selection of quorum slices do not violate quorum intersection and typically requires that nodes select slices that are conservative and lead to large quorums.
When quorums do not intersect, they are known as disjoint quorums. Disjoint quorums are bad quorums that can lead to contradictory statements that undermine consensus. To ensure a proper slice selection process, nodes need to balance safety and liveness.
Nodes lack safety when they externalize values that contradict other nodes. Nodes lack liveness when they are blocked on the way to agreement. The Federated Voting model plays a critical role in the nodes coming to agreement on a statement.
Federated Voting
Federated Voting is the method by which the SCP agree on statements made by participants. Overall, there are two sets of messages exchanged between nodes, and the two message rounds can be subdivided into agreement states of unknown, accepted, and confirmed. Notably, voting in a federated environment must accommodate open membership, which makes the process more complicated than a closed system.
The federated voting process consists of 4 phases:
- Initial Voting
- Acceptance
- Ratification
- Confirmation
Initial voting is where nodes vote for a specific statement that they assert is valid and that they will not vote for contradictory statements. However, this still leaves open the possibility for the node to change its vote if enough of the other participating nodes — that a node trusts — vote for another valid message. Votes in this stage are technically preliminary votes.
Acceptance is the stage where a node accepts a statement based on whether or not that specific node has accepted a contradictory statement or a v-blocking set of nodes that are in quorum slices with that node (quorum intersection) accept a statement. If the node has not accepted a contradictory statement or a v-blocking set of nodes vote to accept a statement, then the statement is accepted by the node.
Ratification is where all members of a quorum vote to accept a statement. If they do, then the statement is ratified by the nodes. Going back to the Node A that does not trust banks, if the nodes that Node A shares a quorum slice with in addition to other nodes that it trusts vote to accept a statement, then it is ratified by Node A.
Confirmation is system-wide agreement on a statement. The system agrees on a particular statement once a sufficient threshold of messages is processed across the network. Nodes propagate acceptance messages across the network from nodes within their quorum. These messages can influence other nodes to accept the message even if they had accepted a different initial message. Finally, a round of confirmation messages is broadcast to confirm the message, concluding the round of voting.
The voting mechanism is complicated, but Stellar offers some excellent resources on how to map it out more effectively. They provide a “Galactic Consensus” graphic for a broader overview as well as a useful blog post using the Lunchtime Example. For a technical deep dive, you can read the Federated Voting section of the SCP Paper.
The Stellar Consensus Protocol
The SCP is the implementation of the Federated Byzantine Agreement Protocol designed to minimize the instances of blocked agreement and to neutralize them through a ballot system. The SCP protocol is comprised of 2 primary sub-protocols, the nomination protocol, and the ballot protocol.
For each consensus slot, the nomination protocol produces candidate values. Eventually, every node can deterministically generate a convergence value for each slot. However, they cannot know when the convergence occurs, and malicious nodes may be able to reset the nomination process.
The ballot protocol is executed once nodes agree that the nomination protocol has converged. In the ballot protocol, a ballot is tied to the candidate value, and a node must commit or abort the value tied to that ballot. To avoid agreement blocking, nodes can abort certain votes and move on to another. Conversely, nodes can vote to commit a ballot, which externalizes the value associated with that ballot to the consensus slot.
At a high-level, the way in which the SCP treats each slot independently is similar to single-slot consensus in Paxos, just with many separate instances.
There are no blocked states in the SCP with quorum intersection. Befouled nodes — nodes which rely heavily on bad nodes — can even be bypassed through a dispensible set mechanism where good nodes can ratify statements without the cooperation of befouled nodes. Befouled nodes also cannot undermine the consensus.
Both the nomination protocol and ballot protocol contain some highly complex details for specific scenarios such as split votes. These details are available in the SCP paper as well.
One of the limitations of the SCP is that it can only guarantee safety if nodes choose adequate quorum slices. Additionally, security issues in federated systems such as widely trusted nodes leveraging their positions for unethical advantages is a possibility. For instance, if banks are relied on by a vast swathe of nodes, then they may have an information advantage not available to other nodes in the network.
Conclusion
Overall, the SCP is the first provably safe consensus protocol that can provide decentralized control, low latency, flexible trust, and asymptotic security. Different forms of consensus all come with their trade-offs, but the SCP maintains a high level of effectiveness for quickly coming to a consensus in a distributed, permissionless network without sacrificing safety.