tags : System Design, Systems, Inter Process Communication, Database, Concurrency, CALM, Clocks

Theory

Predicate Logic

See Logic

  • safety properties (next, stable, invariant)
  • liveness properties (transient, ensures, leads-to, variant functions).

Formal tooling

See Formal Methods

What/Intro

A “distributed system” is simply a system that splits a problem over multiple machines, solving it in a way that is better, more efficient, possible etc than a single machine.

Use of the distribution

Use of the distribution provided by distributed system

FAQ

What Eventual Consistency?

See Eventual Consistency

How to Design Distributed Systems?

  • A correct distributed program achieves (nontrivial) distributed property X
  • Some tricky questions before we start coding:
    • Is X even attainable?
      • Cheapest protocol that gets me X?
    • Can I use something existing, or do I invent a new one
    • How should i implement it?
  • We depend on the different entities participating in the distributed system on having some local knowledge

Thinking frameworks

Meta

  • CAP and PACELC help us know what things we need to be aware of when designing a system, but they’re less helpful in actually designing that system.
  • CALM provide a way to reason through the behavior of systems under Eventual Consistency

CAP

  • Systems can do 2 of the 3, Eg. Abandon C(onsistency) but maintain (A)vailability and (P)artition tolerance.
  • When talking about CAP we have to make a difference between
    • The original CAP Conjecture
    • The subsequent CAP Theorem

The pillars

  • Consistency
    • Requires that there’s a total ordering of events where each operation looks as if it were completed instantaneously.
    • Linearizability is what the CAP Theorem calls Consistency. (See Concurrency Consistency Models)
  • Availability
    • All requests eventually return a successful response eventually.
    • There’s no bound on how long a process can wait.
    • Brewer: If a given consumer of the data can always reach some replica
    • Gilbert & Lynch: If a given consumer of the data can always reach any replica
  • Partition Tolerance
    • Some subset of nodes in the system is incapable of communicating with another subset of nodes.
    • All messages are lost.
    • Partition Tolerance is the ability to handle this situation.

Case of “total availability”

  • Total availability is not something that most datacenter applications care about.
  • Mobile, IoT, and other frequently-disconnected applications do care about it.

Real World

  • CA

    • Partition, P WILL ALWAYAS BE THERE
    • You can’t really be CA, Even though CAP says we can only choose 2, in real world though we can’t really ignore partition tolerance because network can always fail.
    • So since P is not optional in reality, systems make tradeoff between C and A. i.e Improve Consistency but weaken Availability and vice-versa.

PACELC

  • Partitioned(Availability, Consistency) Else (Latency, Consistency)
  • PACELC is a more nuanced than CAP for thinking about this stuff
  • Definition
    • If there is a partition (P)
      • How does the system trade off availability and consistency (A and C);
    • else (E), when the system is running normally in the absence of partitions
      • How does the system trade off latency (L) and consistency (C)
  • CAP does not really choose between C and A but only makes certain tradeoff
  • PACELC tells, we choose between Consistency and Latency
  • Eg. Cassandra is PA/EL system.

CALM

Hellerstein’s inequality

  • Useful in cases when sql queries remain the same but in the backend you can keep on fine tuning stuff like add indexes and what not. But this way of developing applications is much more useful in cloud applications where the environment changes all the time.
  • Distributed language requirements
    • Something unthinkable: Need to represent time-varying state
    • Something unpredictable: Need to represent uncertainty (i.e. nondeterminism in ordering and failure)

Impossibility Results

We can check the Impossibility Results for each of the different dist sys problems. Eg. CAP theorem considers the coordinating attack model for the atomic storage problem and shows that with arbitrarily unreliable channels, you cannot solve the atomic storage problem either.

Results

coordinated attack

  • Two Generals’ Problem - Wikipedia
  • The coordinating attack result says that if the communication channels can drop messages you cannot solve distributed consensus using a deterministic protocol in finite rounds.

FLP impossibility results

  • Paper summary: Perspectives on the CAP theorem
  • Assume reliable channel, or eventually for a sufficient period reliable channels.
  • Under an asynchronous model, you cannot solve distributed consensus using a deterministic protocol in finite rounds, in the presence of a single crash failure.

Solutions to impossibility results

These allow us to circumvent (not to beat) these impossibility results

consensus

fault-tolerance

Time and Clocks

See Clocks

Problems in Distributed Systems

Synchronization

See Database Transactions, see Synchronization

2PC

  • See Two Phase Locking (2PL) & Two Phase Commit (2PC)
  • Leader writes a durable transaction record indicating a cross-shard transaction.
  • Participants write a permanent record of their willingness to commit and notify the leader.
  • The leader commits the transaction by updating the durable transaction record after receiving all responses.
    • It can abort the transaction if no one responds.
  • Participants can show the new state after the leader announces the commit decision.
    • They delete the staged state if the leader aborts the transaction.

Real time sync

atomic storage problem

  • CAP tries to solve this

Replication

Consensus

See Consensus Protocols

Distributed Snapshots

Distributed snapshots are trying to do as little work as possible to get a consistent view of the distributed computation, without forcing the heavy cost of consensus on it. For example, node A is sending a message to node B, we don’t care if we capture in:

  • 1: A before it sends the message, B before it receives the message
  • 2: A after it has sent the message, the message, and B before it receives the message
  • 3: A after it has sent the message, B after it has received the message

No matter which of those states we restore, the computation will continue correctly.

Distributed Locks

Resources

Service Discovery and Peer to Peer communication

Notes from LK class

Class 1

  • Dist sys
    • Running on several nodes, connected y network
    • Characterized by partial failures
    • System w partial failure + Unbounded latency
  • Partial Failures
    • Eg. Cloud computing vs HPC
      • Cloud: Work around partial failures
      • HPC: Treat partial failures as total failure. Uses check-pointing.
  • Byzantine Faults
    • If you send a request to another node and don’t receive a response, it’s impossible to know why. (without a global knowledge system)
    • Byzantine faults can be seen as a subset of partial failures
    • Eg.
      • There could be issues in M1 -> M2
        • Network issues
        • Queue congestion
        • M1 breaking
      • There could be issues in M2 -> M1
        • M2 Lying
        • Cosmic rays
    • Solution
      • These do not address all kinds of uncertainties
      • Timeouts and Retries
        • Issue with timeouts is that it will not work well when the message causes a side effect. Eg. increment some counter
      • Predict Max delay
        • Eg. from M1->M2 =2d+r (if d is time for M1-M2 and r is processing time)