tags : Database, Concurrency, Concurrency Consistency Models, Distributed Systems
A transaction may contain arbitrary code, but it is fundamentally single-threaded.
FAQ
Are SELECT
statements also transactions?
- Yes. Since โReadโ is subject to Isolation level, it needs to be a transaction. Eg. What is there that it can see depends on isolation level.
Commits and Rollbacks
- Commit
- When: If a transaction completes
- Change is made permanent in the DB
- Rollback
- When: If a transaction aborts/fails
- Undo all changes by the transaction
Distributed transaction vs normal DB transaction
Transaction Protocol = Concurrency Control(CC) + Atomic Commit Protocol
Distributed Transaction Protocol = Transaction Protocol + Replication Protocol
Normal DB transaction
- Single DB system
- Set of operations that we want to perform on our data
- Typically, a transaction is an all-or-nothing affair
- ACID properties can be implemented directly
Distributed transaction
- Multiple systems/DB systems interacting. i.e Commit happens to more than one piece of hardware.
- ACID properties over the network(handle network partitions)
- Involves more coordination, messaging, logging etc. than normal transactions
- Needs mechanisms like Two Phase Commit (2PC)
Normal Database Transactions
This is done differently for different DBs, following is PostgreSQLโs approach
How to do transaction?
- Start:
BEGIN
orSTART TRANSACTION
- End:
COMMIT
orROLLBACK
- i.e You canโt have multiple
COMMIT
in a transaction,COMMIT
/ROLLBACK
marks the end of the transaction. (Nested transactions behave little differently) - You can have multiple
ROLLBACK
if you useSAVEPOINT
- Sometimes they can
ABORT
- SQL statements outside of explicit transactions(BEGIN/COMMIT) automatically use single-statement transactions.
Lifecycle & Ordering
T1
(started first) andT2
(started later) both are RW transactionT1
does a lot of reads and then writesT2
writes and then does some reads- In this case, even if
T1
started before, it does not guarantee that writes fromT1
would go in first.
Distributed Transactions
- See Distributed Systems, Data Replication
- See Using Atomic Transactions to Power an Idempotent API
- See CAP
- This is
Atomicity v/s Partial Failures
Transaction Protocol = Concurrency Control(CC) + Atomic Commit Protocol
Distributed Transaction Protocol = Transaction Protocol + Replication Protocol
Approaches
Two Phase Commit
- Two Phase Commit (2PC)
- We need to use this when we have physically Sharded Databases
Active-passive replicas
Multi-active distributed transactions
- This is the case where Sharding is handled for us by the database (Eg. CockroachDB)
- This is similar to Two Phase Commit (2PC) + Some Replication protocol
Doing it in PostgreSQL directly
- How to process distributed transaction within postgresql? - Stack Overflow
- Distributed transaction on PostgreSQL - Stack Overflow
- Atomic Commit of Distributed Transactions - PostgreSQL wiki
Long running transactions (Saga)
- See Microservices
- It operates differently comparerd to ACID but achieves similar things at business levels applied correctly
- Long-running transaction - Wikipedia
- Pattern: Saga