MVCC

tags : Distributed Systems, Database Locks, Concurrency

What?

MVCC is not a Concurrency control method, It’s a technique to achieve Reader/Writer non-locking behavior, we operate on multiple versions of the same record
MVCC itself is an umbrella term for its variants (types). MVCC can be combined with concurrency control mechanisms which can be implemented.
The DBMS maintains multiple physical versions of a single logical object in the database:
If a DB decides to use MVCC, then it’ll dictate how the DB handles transactions
Can support time-travel queries

Main Idea of MVCC

Readers don’t block Writers
Writers don’t block Readers
When a txn writes to an object, the DBMS creates a new version of that object.
When a txn reads an object, it reads the newest version that existed when the txn started.
If 2 txns update the same object/row, then first writer wins. Other rollsback.

History

First proposed in 1978 MIT PhD dissertation.
First implementation was InterBase (Firebird).
Used in almost every new DBMS in last 10 years.

Types of MVCC

Timestamp Ordering (TO / `MVTO`)

Assign txns timestamps that determine serial order.
Considered to be original MVCC protocol
This is what PostgreSQL uses

Optimistic Concurrency Control (OCC / `MVOCC`)

See Database Locks

Two Phase Locking (2PL / `MV2PL`)

See Two Phase Locking (2PL)

Version Storage

Append-Only Storage

New versions are appended to the same table space. (What PostgreSQL uses)

Version Chain Order

Basically a singly linked-list.
- Oldest-to-Newest (O2N)
  - Just append new version to end of the chain.
  - Have to traverse chain on look-ups.
  - PostgreSQL uses O2N
  - See The part of PostgreSQL we hate the most | OtterTune
- Newest-to-Oldest (N2O)
  - Have to update index pointers for every new version.
  - Don’t have to traverse chain on look ups.

Time-Travel Storage

Old versions are copied to separate table space.

Delta Storage

The original values of the modified attributes are copied into a separate delta record space
Txns can recreate old versions by applying the delta in reverse order

Indexes

See Database Indexing
Now with the version chain order, we can get to the version we need.
But indexes need to point to the tuple aswell.
Handling indexes is again involves thinking how MVCC is implemented

Primary key index

PKey indexes always point to version chain head.
How often the DBMS has to update the pkey index depends on whether the system creates new versions when a tuple is updated.
If a txn updates a tuple’s pkey attribute(s), then this is treated as an DELETE followed by an INSERT.

Secondary Indexes

Logical Pointers
- Use a fixed identifier per tuple that does not change.
- Requires an extra indirection layer.
- Primary Key vs. Tuple Id
Physical Pointers
- Use the physical address to the version chain head.

Different Implementations

Implementations differ in Types/Storage/Version chain order etc etc.

Oracle
- undo log
- maintained separately
PostgreSQL, Spanner use MVCC to see a snapshot(an older version of the database)

MVCC Implementation in PostgreSQL

See Database Transactions and MVCC
In PG due to its MVCC implementation, UPDATE to one field in the row means updating the whole row. See 16: 73.7. Heap-Only Tuples (HOT)
PG handles transaction isolation by using MVCC to create a concept called snapshots
One logical row, but can have like multiple physical versions
Even if a row is DELETE ed, MVCC will store the older version in the same table. Just we won’t be able to see it. Later Vaccum will clean it.
PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of an innovative Serializable Snapshot Isolation (SSI) level. (See Concurrency Consistency Models)

Lifecycle

A query(r/w) starts with a transaction
Sees snapshot of the current state of the database. (visibility depends on isolation level, following is read committed lifecycle)
- If read query
  - There will be no txid, only vtxid
  - Sees only data committed before the query began. If data was deleted by another transaction concurrently, it’ll still see that data because snapshot
  - Never sees either uncommitted data/changes committed by concurrent transactions
  - Does see the effects of previous updates executed within its own transaction (committed/uncommitted)
- If write query
  - Now the transaction is assigned a txid
  - Record that’s being deleted/updated is still retained an version
  - When trying to write the data it’ll either commit or rollback based on xmin, xmax values and other hints.

Visibility

-- think of this implicit WHERE on every row/tuple
-- every row will have xmin
WHERE xmin <= txid_current() AND (xmax = 0 OR txid_current() < xmax)

Transactions

	Read-Only	Read-Write	Wrap Around Issue
vtxid	Present	Present	No
xid	NULL	Present	Yes, Needs VACUMM

Virtual Transaction ID (vtxid / vxid)

Temporary identifier assigned to each session or backend that starts a transaction.
- backendID+localXID : 4/1234
Useful in deadlock detection.
Does not need vacuum :)

Transaction ID (txid / xid)

Assigned to a transaction only where there is “write”
Earlier transactions having smaller txids(so can be compared)
xid are used as the basis for PostgreSQL’s MVCC concurrency mechanism

Circular
- 32bit: 2^32-1 : ~4.2bn
- 2bn in the past are visible, 2bn in the future are not visible

0,1,2 are reserved
- 0: Invalid/Nothing happened
- 1: Initialization of cluster
- 2: Frozen txid

System Columns
- xmin
  - txid that INSERT‘ed the tuple
  - In case of UPDATE (UPDATE = DELETE+INSERT)
    - xmin will be of the txid that UPDATED‘ed the tuple
    - xmax will be set to 0
- xmax
  - If we’re trying to write and we see xmax is non-zero: we ROLLBACK
    - i.e in the example, if 519 tries to write in that row, it’ll have to ROLLBACK 519, unless we find that the previous version was rolled back.
    - i.e If 2 txns update the same object/row, then first writer wins. Other rollsback.
  - txid that UPDATE/DETELE
  - txid that UPDATE/DETELE + rollback
  - 0 if nothing happened

GID (Global Transaction Identifiers)

prepared transactions are also assigned (GID) (See Two Phase commit (2PC))
NOTE: prepared transaction is different from prepared query (SQL)

🐏 mogoz

Table of Contents

MVCC

What?

Main Idea of MVCC

History

Types of MVCC

Timestamp Ordering (TO / `MVTO`)

Optimistic Concurrency Control (OCC / `MVOCC`)

Two Phase Locking (2PL / `MV2PL`)

Version Storage

Append-Only Storage

Time-Travel Storage

Delta Storage

Indexes

Primary key index

Secondary Indexes

Different Implementations

MVCC Implementation in PostgreSQL

Lifecycle

Visibility

Transactions

Virtual Transaction ID (vtxid / vxid)

Transaction ID (txid / xid)

GID (Global Transaction Identifiers)

Graph View

Backlinks

🐏 mogoz

Table of Contents

MVCC

What? §

Main Idea of MVCC §

History §

Types of MVCC §

Timestamp Ordering (TO / MVTO) §

Optimistic Concurrency Control (OCC / MVOCC) §

Two Phase Locking (2PL / MV2PL) §

Version Storage §

Append-Only Storage §

Time-Travel Storage §

Delta Storage §

Indexes §

Primary key index §

Secondary Indexes §

Different Implementations §

MVCC Implementation in PostgreSQL §

Lifecycle §

Visibility §

Transactions §

Virtual Transaction ID (vtxid / vxid) §

Transaction ID (txid / xid) §

GID (Global Transaction Identifiers) §

Graph View

Backlinks

What?

Main Idea of MVCC

History

Types of MVCC

Timestamp Ordering (TO / `MVTO`)

Optimistic Concurrency Control (OCC / `MVOCC`)

Two Phase Locking (2PL / `MV2PL`)

Version Storage

Append-Only Storage

Time-Travel Storage

Delta Storage

Indexes

Primary key index

Secondary Indexes

Different Implementations

MVCC Implementation in PostgreSQL

Lifecycle

Visibility

Transactions

Virtual Transaction ID (vtxid / vxid)

Transaction ID (txid / xid)

GID (Global Transaction Identifiers)