tags : Two Phase Locking (2PL) & Two Phase Commit (2PC), Distributed Systems, Concurrency Consistency Models, Data Replication, Eventual Consistency, Zookeeper, Data Replication
OG Blogposts - https://medium.com/@adamprout/categorizing-how-distributed-databases-utilize-consensus-algorithms-492c8ff9e916 đ - Consensus Resources - Durability and the Art of Consensus by Joran Dirk Greef - YouTube
âmaking sure participants come to the same conclusion about something and nobody has the wrong answerâ
FAQ
Partial Quorum
Consensus is NOT for Scaling
- consensus is not for scaling
- consensus impedes scaling
- An intuition for distributed consensus in OLTP systems | notes.eatonphil.com
- Consensus protocols like RAFT primarily solve high availability with strong consistency guarantees (linearizability), not horizontal scaling. Their main purpose is to ensure that a system remains available and consistent even when some nodes fail.
- Consensus actually impedes scaling due to several factors:
- The requirement for majority agreement creates coordination overhead
- Adding more nodes to a consensus group increases the likelihood of slow network responses
- The centralized leadership model (in protocols like RAFT) creates a bottleneck for writes
- True horizontal scaling requires sharding, where:
- Data is partitioned across multiple independent consensus groups
- Each shard has its own consensus cluster for that subset of data
- This allows the system to distribute load while maintaining consistency within each shard
Systems make different trade-offs:
- Systems like etcd, Consul focus on strong consistency via a single consensus group but donât scale horizontally for writes
- Some systems support âhorizontal scalingâ for reads by allowing stale (non-linearizable) reads
- Truly horizontally scalable systems like Cockroach, Spanner use consensus within shards but distribute data across many shards
The fundamental tension is clear: consensus
provides the critical property of linearizable consistency
withhigh availability
, but this comes at the cost of scalability. System designers must carefully consider their requirements when choosing between strong consistency and horizontal scalability.
Approaches
Paxos
Variants
-
Replicated State Machine (RSM)
- chain replication type advanced atomic storage protocols
Raft
See Raft
Viewstamped Replication Protocol
VR vs Raft
- Viewstamped Replication relies on Message Passing, while RAFT relies on RPC
- âVSR is also described in terms of message passing, whereas Raft took VSRâs original message passing and coupled it to RPCâshutting out things like multipath routing and leaving the logical networking protocol misaligned to the underlying physical network fault model.â - Joran
- Comparing the 2012 VSR and 2014 Raft papers, they are remarkably similar.
- VSR better in prod than raft (opinion)
- Itâs all the little things. All the quality, clear thinking and crisp terminology coming from Liskov, Oki and Cowling.
- Okiâs VSR was literally the first to pioneer consensus in â88, so itâs well aged, and the â12 revision again by Liskov and Cowling is a great vintage!
Resources
- Paper: VR Revisited - An analysis with TLA+ â Jack Vanlightly
- Viewstamped Replication explained
- https://github.com/tigerbeetle/viewstamped-replication-made-famous
- Implementing Viewstamped Replication protocol â Distributed Computing Musings
- Paper #74. Viewstamped Replication Revisited - YouTube