Skip to main content

Byzantine Fault Tolerance (BFT)

Most consensus algorithms (like Paxos and Raft) assume Crash-Stop failures: nodes either work correctly or stop completely. But what if a node lies? What if it’s compromised by a hacker and sends conflicting messages to different peers? This is the Byzantine Generals Problem.
Module Duration: 10-12 hours
Key Topics: Byzantine Generals Problem, PBFT, Tendermint, HotStuff, Proof of Work vs. BFT
Interview Focus: Security in distributed systems, blockchain fundamentals, quorum math for BFT

1. The Byzantine Generals Problem

Imagine three generals surrounding a city. They must agree to either Attack or Retreat. If they don’t all do the same thing, they will be defeated.
  • One general might be a traitor and send “Attack” to one peer and “Retreat” to another.
  • The Magic Number: To survive ff Byzantine (malicious) nodes, you need at least 3f+13f + 1 total nodes.
    • For f=1f=1, you need 4 nodes. A 3-node system cannot reach consensus if one is a traitor.

2. Practical Byzantine Fault Tolerance (PBFT)

The first practical algorithm for BFT, introduced by Castro and Liskov in 1999.

2.1 The Three-Phase Protocol

PBFT uses a primary (leader) and backup nodes.
  1. Pre-Prepare: Primary sends a proposal to all backups.
  2. Prepare: Each node sends its own “Prepare” message if the proposal is valid. This proves the node saw the primary’s message.
  3. Commit: Once a node has 2f+12f + 1 “Prepare” messages, it sends a “Commit”. Once it has 2f+12f + 1 “Commit” messages, it executes the operation.

2.2 BASE: Separation of Agreement and Execution

Modern BFT architectures (like BASE - BFT with Agreement and State machine Execution) decouple the consensus process from the execution of the state machine. Why decouple?
  • Heterogeneity: You can run different implementations of the service (to avoid common-mode software bugs) while using a unified agreement protocol.
  • Performance: Agreement is O(N2)O(N^2), but execution might be O(1)O(1). Decoupling allows you to pipeline these stages.
  • Privacy: The nodes agreeing on the order of transactions don’t necessarily need to see the transaction content (if using Zero-Knowledge Proofs or Encryption).

2.3 Why 2f+12f + 1?

In a system of 3f+13f + 1 nodes:
  • A quorum of 2f+12f + 1 ensures that even if ff nodes are malicious and ff nodes are slow/down, we still have f+1f+1 honest nodes responding.
  • Any two quorums of size 2f+12f + 1 are guaranteed to overlap on at least f+1f+1 nodes. Since at most ff are malicious, there is at least one honest node in the intersection to maintain safety.

3. Modern BFT Deep Dive

PBFT is the foundation, but its O(N2)O(N^2) communication overhead (every node talks to every other node) makes it unsuitable for large networks (100+ nodes). Modern protocols solve this.

3.1 Tendermint: The Blockchain Workhorse

Used by the Cosmos Network, Tendermint simplifies BFT into a two-stage voting process integrated with a rotating leader.
  • The Process:
    1. Propose: A designated leader proposes a block.
    2. Prevote: Nodes vote on whether the proposal is valid.
    3. Precommit: Nodes vote again. If a node sees 2/32/3 prevotes, it precommits.
    4. Commit: Once 2/32/3 precommits are gathered, the block is finalized.
  • Key Innovation: Locking Mechanism. If a node precommits a block, it is “locked” into that block and cannot vote for anything else unless it sees a higher-round proof. This prevents split-brain finality.

3.2 HotStuff: The Three-Phase Chained BFT

HotStuff (powering Facebook’s Diem and Aptos) was a breakthrough because it achieved Linear Communication Complexity (O(N)O(N)) in the common case and unified the protocol for both normal operation and “View Changes” (leader rotation).

The Secret: Chained Quorum Certificates (QCs)

In PBFT, every node talks to every other node in each phase (O(N2)O(N^2)). HotStuff replaces this with a star-shaped topology where all nodes send their votes to a single collector (usually the Next Leader).
  1. Vote Collection: Each replica sends a partially signed vote to the leader.
  2. QC Generation: The leader aggregates these into a single Quorum Certificate (QC).
  3. Chaining: Instead of doing multiple rounds for one block, HotStuff “chains” them. Each QC confirms the previous one.
    • QC1 (Prepare): Proves that 2f+12f+1 nodes saw the proposal.
    • QC2 (Pre-Commit): Proves that 2f+12f+1 nodes saw the Prepare QC.
    • QC3 (Commit): Proves that 2f+12f+1 nodes saw the Pre-Commit QC.

Linear View Change

In PBFT, rotating a failing leader is expensive (O(N3)O(N^3)) because every node must prove to every other node what the “last stable state” was. In HotStuff, the Pacemaker module handles timeouts. Because the protocol is “responsive” (it moves at the speed of the network, not a fixed timer), a new leader can simply gather the highest QC from 2f+12f+1 nodes and start the next round. This is O(N)O(N).

4. Scaling BFT: Threshold Signatures (BLS)

The biggest bottleneck in BFT is the size of the messages. If 1,000 nodes each send a 64-byte signature, the aggregated message is 64KB. In a large network, this kills throughput.

4.1 What are Threshold Signatures?

Threshold signatures (specifically BLS - Boneh-Lynn-Shacham) allow NN nodes to each have a “key share.”
  • Aggregation: Any 2f+12f+1 signatures can be mathematically combined into a single constant-sized signature.
  • Verification: Anyone can verify this single signature using a single Public Key for the entire cluster.
  • Benefit: Communication complexity drops from O(N2)O(N^2) bytes to O(N)O(N) bytes.

4.2 Threshold Cryptography vs. Multisig

FeatureMultisig (e.g., Bitcoin)Threshold Signature (BLS)
SizeGrows with number of signersConstant size (e.g., 48 or 96 bytes)
PrivacyYou know exactly who signedIndividual signers are hidden
ComplexitySimpleRequires advanced elliptic curve math (Pairings)

4.3 Comparison of BFT Protocols

ProtocolLeader RotationCommon Message complexityView Change ComplexityFinality
PBFTOnly on failureO(N2)O(N^2)O(N3)O(N^3)2-Phase
TendermintEvery blockO(N2)O(N^2)O(N2)O(N^2)2-Phase
HotStuffEvery blockO(N)O(N)O(N)O(N)3-Phase (Chained)

5. Advanced Attack Vectors in BFT

Even with 3f+13f+1 nodes, BFT systems are vulnerable to sophisticated attacks:

5.1 The Nothing-at-Stake Problem

In “Long-Range Attacks,” an adversary who gains access to old private keys could create an alternative chain from the genesis block. BFT systems solve this through Check-pointing and Social Consensus (weak subjectivity), where nodes refuse to reorganize past a certain depth.

5.2 Censorship & Front-running

A malicious leader might not lie about the order of transactions, but they might “ignore” transactions from specific users or reorder them to profit from MEV (Maximal Extractable Value).
  • Solution: Threshold Encryption. Users encrypt their transactions with the cluster’s public key. The nodes agree on the order of the encrypted blobs. Only after the order is finalized is the decryption key revealed.

6. BFT vs. Proof of Work (Nakamoto Consensus)

FeatureBFT (PBFT/Tendermint)Proof of Work (Bitcoin)
MembershipPermissioned (known nodes)Permissionless (anyone can join)
FinalityAbsolute (immediate)Probabilistic (wait for 6 blocks)
ScalabilityHigh throughput, low nodesLow throughput, high nodes
EnergyMinimalMassive

5. When do you need BFT?

You probably don’t need BFT if:
  • All nodes are inside your private, secured VPC.
  • You trust your administrators.
  • Use Raft or Paxos instead (they are faster).
You do need BFT if:
  • You are building a decentralized system (Blockchain).
  • You are building a high-security system where internal compromise is a threat.
  • Different organizations are sharing a single state (Consortium networks).

6. Interview Questions

Answer: In Raft, we assume nodes are honest—if they send a message, it’s true. We only need to survive crashes, so a simple majority (f+1f+1 out of 2f+12f+1) is enough. In BFT, a node can lie. We need enough honest nodes to “outvote” the liars. The math shows that with 3f+13f+1, any two quorums of 2f+12f+1 will have at least f+1f+1 overlapping nodes, ensuring at least one honest node is there to preserve the truth.
Answer: A malfunctioning NIC that corrupts bits in a way that passes checksums, or a “gray failure” where a node responds to heartbeats but fails to process actual data correctly. BFT protects against these “silent” corruptions.

7. Key Takeaways

Trust No One

BFT assumes nodes can be malicious or compromised.

The 3f+1 Rule

The fundamental threshold for reaching agreement in an untrusted environment.