tags : Distributed Systems, Database Locks, Database Transactions, Concurrency Consistency Models
FAQ
2PC vs 2PL
2PL
- Solves for Isolation
- Locking mechanism implemented within a single database instance to achieve
Strict Serializability
2PC
- Solves for atomicity
- 2 phase commit is an algorithm for implementing distributed transaction(Database Transactions) across multiple database instances to ensure all nodes either commit or abort the transaction.
Paxos/Raft/Consensus/Replication Protocols vs 2PC
- Raft (Availability)
- Getting high availability by replicating data on multiple participants
- Keep being operational even if some of the participants have crashed
- All the participants are doing the same thing (Data Replication)
- 2PC (Atomicity)
- All the participants are doing the different thing specific to the participant
- There’s no notion of some, all the participants must be involved
- It has pretty low availability, any participant crashing will lead to the whole thing not working.
Combing both
By combing Raft + 2PC, we get both Availability+Atomicity
Eventual Consistency vs 2PC
Sagas vs 2PC
- cloud - 2PC vs Sagas (distributed transactions) - Stack Overflow
- SAGA vs 2PC: An Exhaustive Exploration of Distributed Transaction Protocols |…
crdt vs 2PC
2PL (Isolation)
- It’s a lock
- See Database Locks
- 2PL protocol defines a lock management strategy for ensuring
Strict Serializability
- It’s similar to RWMutex but more used in Database Transactions context
- With 2PL
Read
: requires a shared lock acquisition (others can read but not write)Write
: operation requires taking an exclusive lock. (others cannot read, cannot write)
Phases
Following two phases for each transaction
Lock / Expand / Grow
- Each query, Wants to
write
: Acquire a write/exclusive lock. - Each query, Wants to
read
: Acquire a read/shared lock. - Don’t release lock immediately after each query
- Hold the lock till the end of the transaction(commit or abort)
- While the transaction executes, no. of locks held by the transaction expand/grow.
Unlock / Shrink
- Once the transaction is complete
- Release all locks (shrink)
- No more locks can be acquired in this phase, only release
Issues
- Very easy to end up in a deadlock because locks won’t be released until transaction end (but these are detected by the DBMS usually)
2PC (Atomicity)
- It is a
Atomic commit protocol
(See Database Transactions) - 2 phase commit is an algorithm for implementing distributed transaction across multiple database instances to ensure all nodes either commit or abort the transaction.
- It works by having coordinator(could be a separate service or library within the application initiating the transaction) issue two requests - PREPARE to all nodes in phase 1 and COMMIT(if all nodes returned OK in PREPARE phase) or ABORT(if any node returned NOT OK in PREPARE PHASE) to all nodes in phase 2.
- PostgreSQL allows allows multiple distributed systems to work together in a transactional manner using
prepared transactions
Components
Transaction Coordinator/Manager
- Assume one computer that’s actually running one transaction
- Sends messages to other computers/paricipants that are involved in the transaction
- There can me multiple transaction coordinators; There can be multiple transaction from one transaction coordinator
Participants
- Does not make any* decision (Really depends on the situation)
- Just waits on instructions from the TC. Will Block if it doesn’t receive instructions from TC.
Phases
Prepare
- TC: sends
prepare
messages to allparticipants
with atxid
- Participants: (each)
- Will decide whether it can do the transaction/commit or not (YES/NO)
- Log the
decision:txid
persistently (so that can be used in recovery) - Hold a lock for for
txid:resource_record
- Respond with (YES/NO) to the TC
Commit
- TC: If all participant respond
YES
forprepare
, sendcommit
to eachparticipant
- Participants: (each)
- Log the
commitoccured:txid
persistently (so that can be used in recovery) - Send
acknowledgment
back to TC - Participants should be ready to handle duplicated
commit
messages and respond maintaining idempotency
- Log the
- TC: If any
participant
didn’t acknowledge,abort
transaction
Failure modes
- Messages might get dropped
- Need to think what to do here
- Participant crashes before saying
YES
- TC never sends
commit
- If TC ever sends
commit
without the participant sayingYES
, the participant will simply not do thecommit
- TC never sends
- Participant crashes after saying
YES
- TC sends
commit
- But the participant recovers about saying
YES
to thattxid
from the recovery log - Things go on as expected
- TC sends
- Participant crashes after processing the
commit
but before sendingacknowledgment
- Again when participant comes back up, it’ll recover that it did commit
- And if there’s another ask from the
TC
about whether it was committed, it can simply send back the transaction without doing any processing
- Participant got a
prepare
, sent aYES
, holding a lock, but not getting acommit
- In this case the
commit
message from TC must have dropped - Or the TC for some reason could not send the
commit
message - Participant MUST BLOCK
- i.e participant must keep waiting infinitely.
- Because TC might have sent
commit
to other participants and they might have already processed things. - Participant can’t/shouldn’t decide to
abort
orcommit
on its own after it has sentYES
and have not received acommit
message.
- 2PC implementation usually try to make this failure mode real fast so that the block is not long.
- In this case the
- TC crashes before sending
commit
- Not a problem
- TC crashes after sending
commit
- TC needs to log that it is going to send a
commit
- TC needs to log that it sent the
commit
- Depends on implementation whether we do it after sending all participants or after each participant
- After the crash, once the TC comes back up, it’ll replay the log and re-send the
commit
- And wait for the
acks
fromparticipant
- TC needs to log that it is going to send a
- QUESTION: Participant has no log for a
txid
, we get atxid
txid
might have been committed, aborted or never even sent for a prepare- Do we do then?
2PC Optimizations
- Issues with 2PC
- 2PC is slow because it has so many messages going in and around
- Lot of disk io (writing to logs)
- Locks are held by participants on
record:txid
afterprepare
, which can be an issue in busy system - Because of its limitations, in practice 2PC is not used to do distributed transactions where
participants
are geographical isolated etc. Might be used in small systems etc. Esp because of the blocking/locking issue.
TODO CHATGPT
Making it fast, Relazing rules, Specializing usecase etc.
- Single-Phase Commit (Optimized 2PC): This is a simplified version of 2PC used when there’s only a single resource manager or when it’s known in advance that the transaction will likely succeed. The prepare phase is skipped, and the transaction is directly committed, reducing the protocol to a single phase in these cases.
- Presumed Abort and Presumed Commit: These are optimizations of the 2PC protocol. In Presumed Abort, it’s assumed that transactions will abort by default. This reduces the logging overhead in the case of aborts. Conversely, in Presumed Commit, it’s assumed that transactions will commit, optimizing for scenarios where commits are more common.
- Coordinator Log Avoidance: An optimization where the coordinator does not write a log record during the prepare phase if all participants voted “Yes”. This reduces disk I/O, improving performance but at the cost of some increase in recovery complexity.
- Read-Only Optimization: If a participant in the transaction has only read data and not made any changes, it can inform the coordinator during the prepare phase. This allows the coordinator to exclude the participant from the commit or abort phases, reducing overhead.
- Early Prepare: This variation allows participants to send their vote (commit or abort) as soon as they are ready, rather than waiting for the coordinator to request votes from all participants. This can improve the performance of the protocol in some cases.
- Distributed 2PC: In a distributed environment, the 2PC protocol can be extended to support transactions across multiple distributed resource managers. This requires additional coordination and can introduce more complexity, especially in handling failures and network partitions.
- Hierarchical 2PC: This is a variant where participants are arranged in a hierarchical structure. The top-level coordinator communicates with lower-level coordinators, who in turn manage their respective participants. This can improve scalability and manageability in large distributed systems.
- Timeout-based 2PC: To handle the problem of blocking in case of failures, timeouts can be introduced. If the coordinator or a participant does not receive expected messages within a timeout period, it can unilaterally decide to abort or commit (based on the protocol’s assumption).
3PC (Three Phase Commit)
- An extension of 2PC, 3PC introduces an additional phase to reduce the chances of blocking. The new “PreCommit” phase comes between the voting and commit phases, allowing nodes to agree on the outcome even if the coordinator fails. However, 3PC is more complex and can still be vulnerable to certain network partition scenarios.