tags : Two Phase Locking (2PL) & Two Phase Commit (2PC), Distributed Systems, Concurrency Consistency Models, Data Replication, Eventual Consistency, Zookeeper, Data Replication

OG Blogposts - https://medium.com/@adamprout/categorizing-how-distributed-databases-utilize-consensus-algorithms-492c8ff9e916 🌟 - Consensus Resources - Durability and the Art of Consensus by Joran Dirk Greef - YouTube

“making sure participants come to the same conclusion about something and nobody has the wrong answer”

FAQ

Partial Quorum

Consensus is NOT for Scaling

  • Consensus protocols like RAFT primarily solve high availability with strong consistency guarantees (linearizability), not horizontal scaling. Their main purpose is to ensure that a system remains available and consistent even when some nodes fail.
  • Consensus actually impedes scaling due to several factors:
    • The requirement for majority agreement creates coordination overhead
    • Adding more nodes to a consensus group increases the likelihood of slow network responses
    • The centralized leadership model (in protocols like RAFT) creates a bottleneck for writes
  • True horizontal scaling requires sharding, where:
    • Data is partitioned across multiple independent consensus groups
    • Each shard has its own consensus cluster for that subset of data
    • This allows the system to distribute load while maintaining consistency within each shard

Systems make different trade-offs:

  • Systems like etcd, Consul focus on strong consistency via a single consensus group but don’t scale horizontally for writes
  • Some systems support “horizontal scaling” for reads by allowing stale (non-linearizable) reads
  • Truly horizontally scalable systems like Cockroach, Spanner use consensus within shards but distribute data across many shards

The fundamental tension is clear: consensus provides the critical property of linearizable consistency with high availability, but this comes at the cost of scalability. System designers must carefully consider their requirements when choosing between strong consistency and horizontal scalability.

Approaches

Paxos

Variants

  • chain replication type advanced atomic storage protocols

Raft

See Raft

Viewstamped Replication Protocol

VR vs Raft

  • Viewstamped Replication relies on Message Passing, while RAFT relies on RPC
  • “VSR is also described in terms of message passing, whereas Raft took VSR’s original message passing and coupled it to RPC—shutting out things like multipath routing and leaving the logical networking protocol misaligned to the underlying physical network fault model.” - Joran
  • Comparing the 2012 VSR and 2014 Raft papers, they are remarkably similar.
  • VSR better in prod than raft (opinion)
    • It’s all the little things. All the quality, clear thinking and crisp terminology coming from Liskov, Oki and Cowling.
    • Oki’s VSR was literally the first to pioneer consensus in ‘88, so it’s well aged, and the ‘12 revision again by Liskov and Cowling is a great vintage!

Resources

Election