tags : Representing Time and Date, Distributed Systems, crdt, Consensus Protocols
Overview
- Time & State: https://queue.acm.org/detail.cfm?id=2745385 (There is No Now)
- Logical clocks, vector clocks, hybrid logical clocks, snapshots
- They form the basis of
- crdt
- version vectors in NoSQL databases
- snapshot reads and commits in distributed SQL databases build upon.
- They form the basis of
FAQ
About state and Time
To read
- “relativity means there is no such thing as simultaneity” used as an argument that synchronized clocks cannot exist.
- This is a misunderstanding
- the equations of special and general relativity provide exact equations for time transformations, and it is possible to define any number of sensible, globally-synchronized time clocks.
- Consistency constraints that refer to these clocks will depend on the choice of clock
- e.g. depending on one’s reference frame, a system might or might not provide linearizability. See Concurrency Consistency Models
- The good news is that
- For all intents and purposes, clocks on earth are so close to each other’s velocities and accelerations that errors are much smaller than side-channel latencies
- Many of the algorithms for ensuring real-time bounds in asynchronous networks use causal messages to enforce a real-time order, and the resulting order is therefore invariant across all reference frames.
- This is a misunderstanding
- GPS clocks are changing databases (again) - YouTube
Monotonic?
Atomic clocks?
Lamport Clocks (Total order)
Vector Clocks (Partial order)
Timestamps and Databases
- I can’t find the original post now, but there’s a good suggestion that booleans should usually be timestamps
- UPDATE orders SET deletion_timestamp = CURRENT_TIMESTAMP() WHERE deletion_timestamp IS NULL
- See Database Locks
- See Bug story: Sorting by timestamp | Hacker News
TODO Learning Syllabus
- Time: Abstract
- Clocks: Implementation
Part 1: Time - Abstract Concept in Distributed Systems
Fundamental Time Concepts
- Time as a partial ordering of events in distributed systems
- Formal definition of concurrent events: events that are neither “happened before” nor “happened after” each other
- Synchrony models and their precise definitions:
- Synchronous: bounded message delay and bounded process execution speed
- Asynchronous: unbounded message delay and/or process execution speed
- Partially synchronous: eventually bounded message delay and process execution speed
- The impossibility of perfect global time in distributed systems (theoretical proof)
Causality and Ordering
- Lamport’s happens-before relation (→): formal definition and properties
- If a and b are events in the same process, and a occurs before b, then a → b
- If a is the sending of a message and b is the receipt of that message, then a → b
- If a → b and b → c, then a → c (transitivity)
- Concurrent events: a ∥ b ⟺ ¬(a → b) ∧ ¬(b → a)
- Causal history: C(e) = {e’ | e’ → e} ∪ {e}
- Formal proof that causality establishes a partial, not total, order
Consistency Models Based on Time Ordering
- Linearizability: formal definition as a history H that can be extended to a history H’ such that:
- Complete(H’) is equivalent to some legal sequential execution S
- If op1 completes before op2 begins in H, then op1 appears before op2 in S
- Sequential consistency: formal definition and distinction from linearizability
- Causal consistency: formal definition using happens-before relation
- PRAM consistency (Pipelined RAM): formal definition
- Eventual consistency: formal definition including convergence, conflict resolution
Time in Coordination Problems
- Time’s role in the Two Generals Problem (formal impossibility proof)
- Time assumptions in the Byzantine Generals Problem
- Time’s role in distributed deadlock detection
- Formal treatment of timeouts as imperfect failure detectors
- Time and leader election (impossibility in purely asynchronous systems)
The FLP Impossibility Result
- Precise statement of the FLP result: In an asynchronous system with even one faulty process, there is no deterministic algorithm that solves consensus
- Detailed analysis of time’s role in this impossibility
- Implications for real-world distributed system design
- Connection to other impossibility results (CAP theorem, Two Generals)
Part 2: Clocks - Mechanisms for Tracking Time
Physical Clock Systems
- Oscillator-based physical clocks: crystal oscillators, atomic clocks
- Precise definitions of clock terminology:
- Clock drift: deviation of a clock from perfect time
- Clock skew: difference between two clocks at a given moment
- Time-scale offset: constant difference between two clocks
- Frequency offset: difference in the rate of clock advancement
- Hardware clocks:
- Timer interrupts and their limitations
- High-precision timers (HPET)
- Time Stamp Counter (TSC) in modern CPUs
- Temperature and aging effects on physical clocks (quantitative analysis)
- Formal specification of clock synchronization problem:
- External synchronization: alignment with an external standard
- Internal synchronization: alignment among system clocks
Clock Synchronization Protocols
- Network Time Protocol (NTP):
- Stratum hierarchy and algorithm details
- Error estimation techniques
- Security vulnerabilities
- Precision Time Protocol (PTP/IEEE 1588):
- Hardware vs. software timestamping
- Best master clock algorithm
- Boundary and transparent clocks
- Marzullo’s algorithm for combining multiple time sources
- Optimal clock synchronization algorithms under byzantine conditions
- Cristian’s algorithm: mathematical formulation and error bounds
- Berkeley algorithm: detailed protocol steps and coordinator election
- Reference broadcast synchronization (RBS) for wireless networks
Logical Clocks
- Lamport clocks:
- Formal definition: C(a) < C(b) if a → b
- Clock update rules:
- Before executing an event, process Pi increments Ci: Ci = Ci + 1
- When sending a message m, process Pi sets m’s timestamp ts(m) = Ci
- Upon receiving message m, process Pj sets Cj = max(Cj, ts(m)) + 1
- Limitations: possibility of C(a) < C(b) when a ∥ b
- Lamport timestamps as an implementation of monotonic time
- Mathematical proof that Lamport clocks capture causal precedence but not concurrency
Vector Clocks
- Vector clock formal definition: VC(e) = [t₁, t₂, …, tₙ] where tᵢ represents local logical time at process i
- Vector clock rules:
- Initially, VC[j] = 0 for all j
- Before each event at process i: VC[i] = VC[i] + 1
- When sending a message m, process i includes current VC with m
- When process j receives message m from process i with vector timestamp VC_m:
- VC[k] = max(VC[k], VC_m[k]) for all k ≠ j
- VC[j] = VC[j] + 1
- Comparison of vector timestamps: VC(a) < VC(b) iff ∀k: VC(a)[k] ≤ VC(b)[k] and ∃k: VC(a)[k] < VC(b)[k]
- Mathematical proof that vector clocks accurately capture both happens-before and concurrent relationships
- Space and message overhead: O(n) where n is the number of processes
- Optimizations: compression techniques, relevant subset tracking
Advanced Logical Clock Systems
- Version vectors:
- Distinction from vector clocks: only track object dependencies, not process events
- Update rules for read/write operations
- Conflict detection algorithm
- Dotted version vectors: extension for tracking causal history in dynamo-style systems
- Interval tree clocks: O(log n) space complexity
- Matrix clocks: tracking transitive dependencies
- Plausible clocks: probabilistic approaches to causality tracking
- Hybrid logical clocks (HLC): combining physical and logical time
- Formal definition: HLC(e) = [l.e, c.e] where l.e is the logical component and c.e is the physical component
- Update rules integrating physical clock readings
- Error bounds and formal guarantees
Modern Clock Systems in Industry
- Google’s TrueTime API:
- Interval-based time representation: [earliest, latest]
- GPS and atomic clock integration
- Error bound guarantees and wait-out-uncertainty approach
- Amazon Time Sync Service:
- Architecture and failover mechanisms
- Integration with AWS services
- CockroachDB’s hybrid clock implementation:
- Integration with transaction ordering
- Bounded uncertainty handling
- Spanner’s implementation of external consistency using TrueTime
- Facebook’s time synchronization architecture for datacenters
Part 3: Applications and Advanced Topics
Distributed Snapshots
- Chandy-Lamport snapshot algorithm:
- Formal definition of a consistent global state
- Marker sending rules
- State and message recording rules
- Proof of correctness
- Mattern’s algorithm using vector clocks
- Lai-Yang algorithm for non-FIFO channels
- Alagar-Venkatesan algorithm using logical time
- Snapshot for termination detection
Time in Consensus Protocols
- Paxos and time:
- Unbounded delays vs. liveness
- Round numbers as logical time
- Multi-Paxos and leader leases
- Raft:
- Leader election timeouts
- Term numbers as logical clock values
- Heartbeats and time assumptions
- Viewstamped Replication:
- View changes and logical time
- Recovery protocol time assumptions
- Byzantine Fault Tolerance protocols:
- PBFT view changes and timeout mechanisms
- Honey Badger BFT’s asynchronous approach
Implementing Causal Broadcast
- Schiper-Eggli-Sandoz (SES) causal broadcast protocol using vector clocks
- Birman-Schiper-Stephenson protocol
- Optimized causal broadcast using dependency tracking
- Implementation challenges and optimizations
- Formal verification of causal broadcast protocols
Time in Distributed Databases
- Timestamp-based concurrency control:
- Thomas write rule: discard writes with older timestamps
- Multiversion concurrency control (MVCC) using timestamps
- Transaction ordering using hybrid clocks
- Causal consistency implementation using vector clocks
- Last-writer-wins conflict resolution with vector clocks
- Read-your-writes consistency implementation
Time and Failure Detection
- Unreliable failure detectors: completeness and accuracy properties
- ◊S (Eventually Strong) failure detector
- Heartbeat-based failure detection with adaptive timeouts
- Phi Accrual failure detector: probabilistic approach
- Gossip-based failure detection
- The relationship between failure detection and consensus
Advanced Research Topics
- Quantum clock synchronization algorithms
- Self-stabilizing clock synchronization
- Non-blocking atomic commit protocols based on logical time
- Time in large-scale geo-distributed systems
- Machine learning approaches to adaptive timeout calculation
- Time in Byzantine environments with adversarial nodes
Practical Projects and Implementation Exercises
- Implement and visualize vector clocks in a distributed simulator
- Measure clock drift and skew in a real multi-node system
- Build a causally consistent distributed key-value store
- Implement the Chandy-Lamport distributed snapshot algorithm
- Create a logical time-based debugging tool for distributed systems
- Develop a custom clock synchronization protocol for a specific network topology
- Implement a Byzantine-tolerant clock synchronization system
- Build a consensus protocol with configurable time assumptions