tags : Distributed Systems, Database Locks, Concurrency

What?

  • MVCC is not a Concurrency control method, It’s a technique to achieve Reader/Writer non-locking behavior, we operate on multiple versions of the same record
  • MVCC itself is an umbrella term for its variants (types). MVCC can be combined with concurrency control mechanisms which can be implemented.
  • The DBMS maintains multiple physical versions of a single logical object in the database:
  • If a DB decides to use MVCC, then it’ll dictate how the DB handles transactions
  • Can support time-travel queries

Main Idea of MVCC

  • Readers don’t block Writers
  • Writers don’t block Readers
  • When a txn writes to an object, the DBMS creates a new version of that object.
  • When a txn reads an object, it reads the newest version that existed when the txn started.
  • If 2 txns update the same object/row, then first writer wins. Other rollsback.

History

  • First proposed in 1978 MIT PhD dissertation.
  • First implementation was InterBase (Firebird).
  • Used in almost every new DBMS in last 10 years.

Types of MVCC

Timestamp Ordering (TO / MVTO)

  • Assign txns timestamps that determine serial order.
  • Considered to be original MVCC protocol
  • This is what PostgreSQL uses

Optimistic Concurrency Control (OCC / MVOCC)

Two Phase Locking (2PL / MV2PL)

Version Storage

Append-Only Storage

New versions are appended to the same table space. (What PostgreSQL uses)

  • Version Chain Order

    Basically a singly linked-list.

Time-Travel Storage

Old versions are copied to separate table space.

Delta Storage

  • The original values of the modified attributes are copied into a separate delta record space
  • Txns can recreate old versions by applying the delta in reverse order

Indexes

  • See Database Indexing
  • Now with the version chain order, we can get to the version we need.
  • But indexes need to point to the tuple aswell.
  • Handling indexes is again involves thinking how MVCC is implemented

Primary key index

  • PKey indexes always point to version chain head.
  • How often the DBMS has to update the pkey index depends on whether the system creates new versions when a tuple is updated.
  • If a txn updates a tuple’s pkey attribute(s), then this is treated as an DELETE followed by an INSERT.

Secondary Indexes

  • Logical Pointers
    • Use a fixed identifier per tuple that does not change.
    • Requires an extra indirection layer.
    • Primary Key vs. Tuple Id
  • Physical Pointers
    • Use the physical address to the version chain head.

Different Implementations

Implementations differ in Types/Storage/Version chain order etc etc.

  • Oracle
    • undo log
    • maintained separately
  • PostgreSQL, Spanner use MVCC to see a snapshot(an older version of the database)

MVCC Implementation in PostgreSQL

  • See Database Transactions and MVCC
  • In PG due to its MVCC implementation, UPDATE to one field in the row means updating the whole row. See 16: 73.7. Heap-Only Tuples (HOT)
  • PG handles transaction isolation by using MVCC to create a concept called snapshots
  • One logical row, but can have like multiple physical versions
  • Even if a row is DELETE ed, MVCC will store the older version in the same table. Just we won’t be able to see it. Later Vaccum will clean it.
  • PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of an innovative Serializable Snapshot Isolation (SSI) level. (See Concurrency Consistency Models)

Lifecycle

  • A query(r/w) starts with a transaction
  • Sees snapshot of the current state of the database. (visibility depends on isolation level, following is read committed lifecycle)
    • If read query
      • There will be no txid, only vtxid
      • Sees only data committed before the query began. If data was deleted by another transaction concurrently, it’ll still see that data because snapshot
      • Never sees either uncommitted data/changes committed by concurrent transactions
      • Does see the effects of previous updates executed within its own transaction (committed/uncommitted)
    • If write query
      • Now the transaction is assigned a txid
      • Record that’s being deleted/updated is still retained an version
      • When trying to write the data it’ll either commit or rollback based on xmin, xmax values and other hints.

Visibility

-- think of this implicit WHERE on every row/tuple
-- every row will have xmin
WHERE xmin <= txid_current() AND (xmax = 0 OR txid_current() < xmax)

Transactions

Read-OnlyRead-WriteWrap Around Issue
vtxidPresentPresentNo
xidNULLPresentYes, Needs VACUMM

Virtual Transaction ID (vtxid / vxid)

  • Temporary identifier assigned to each session or backend that starts a transaction.
    • backendID+localXID : 4/1234
  • Useful in deadlock detection.
  • Does not need vacuum :)

Transaction ID (txid / xid)

  • Assigned to a transaction only where there is “write”
  • Earlier transactions having smaller txids(so can be compared)
  • xid are used as the basis for PostgreSQL’s MVCC concurrency mechanism
  • Circular

    • 32bit: 2^32-1 : ~4.2bn
    • 2bn in the past are visible, 2bn in the future are not visible
  • 0,1,2 are reserved

    • 0: Invalid/Nothing happened
    • 1: Initialization of cluster
    • 2: Frozen txid
  • System Columns

    • xmin

      • txid that INSERT‘ed the tuple
      • In case of UPDATE (UPDATE = DELETE+INSERT)
        • xmin will be of the txid that UPDATED‘ed the tuple
        • xmax will be set to 0
    • xmax

      • If we’re trying to write and we see xmax is non-zero: we ROLLBACK
        • i.e in the example, if 519 tries to write in that row, it’ll have to ROLLBACK 519, unless we find that the previous version was rolled back.
        • i.e If 2 txns update the same object/row, then first writer wins. Other rollsback.
      • txid that UPDATE/DETELE
      • txid that UPDATE/DETELE + rollback
      • 0 if nothing happened

GID (Global Transaction Identifiers)

  • prepared transactions are also assigned (GID) (See Two Phase commit (2PC))
  • NOTE: prepared transaction is different from prepared query (SQL)