tags : Database, Concurrency, Concurrency Consistency Models, Distributed Systems

A transaction may contain arbitrary code, but it is fundamentally single-threaded.

FAQ

Are SELECT statements also transactions?

  • Yes. Since โ€œReadโ€ is subject to Isolation level, it needs to be a transaction. Eg. What is there that it can see depends on isolation level.

Commits and Rollbacks

  • Commit
    • When: If a transaction completes
    • Change is made permanent in the DB
  • Rollback
    • When: If a transaction aborts/fails
    • Undo all changes by the transaction

Distributed transaction vs normal DB transaction

  • Transaction Protocol = Concurrency Control(CC) + Atomic Commit Protocol
  • Distributed Transaction Protocol = Transaction Protocol + Replication Protocol

Normal DB transaction

  • Single DB system
  • Set of operations that we want to perform on our data
  • Typically, a transaction is an all-or-nothing affair
  • ACID properties can be implemented directly

Distributed transaction

  • Multiple systems/DB systems interacting. i.e Commit happens to more than one piece of hardware.
  • ACID properties over the network(handle network partitions)
  • Involves more coordination, messaging, logging etc. than normal transactions
  • Needs mechanisms like Two Phase Commit (2PC)

Normal Database Transactions

This is done differently for different DBs, following is PostgreSQLโ€™s approach

How to do transaction?

  • Start: BEGIN or START TRANSACTION
  • End: COMMIT or ROLLBACK
  • i.e You canโ€™t have multiple COMMIT in a transaction, COMMIT / ROLLBACK marks the end of the transaction. (Nested transactions behave little differently)
  • You can have multiple ROLLBACK if you use SAVEPOINT
  • Sometimes they can ABORT
  • SQL statements outside of explicit transactions(BEGIN/COMMIT) automatically use single-statement transactions.

Lifecycle & Ordering

  • T1 (started first) and T2 (started later) both are RW transaction
  • T1 does a lot of reads and then writes
  • T2 writes and then does some reads
  • In this case, even if T1 started before, it does not guarantee that writes from T1 would go in first.

Distributed Transactions

Approaches

Two Phase Commit

Active-passive replicas

Multi-active distributed transactions

  • This is the case where Sharding is handled for us by the database (Eg. CockroachDB)
  • This is similar to Two Phase Commit (2PC) + Some Replication protocol

Doing it in PostgreSQL directly

Long running transactions (Saga)