tags : SQL, Distributed Systems, PostgreSQL, Database Transactions

Database types

Based on Relational(SQL) v/s Non-relational(NoSQL)

  • SQL: Usually columnar storage or row storage
  • NoSQL: Usually K/V, Wide Column, Document, Graph etc.
    • Wide-column and columnar storage are totally different!!

Relational

Non-Relational / NoSQL

  • Access paths
    • Relational model allows you to do “point” queries. i.e you can get “any” or “all” of something. Eg. any or all comments, any or all users.
    • w Non-relational(usually tree), you need to worry about access path.
      • Find any comment? you need to know what post it is attached to.
    • Find all comments? you have to traverse all posts to all comments.
      • If path changes, you have to go update application logic
  • Start node
    • Finding start nodes is what SQL explicitly excels at.
    • Once we do have a start node (a comment, post, user, etc.), a Graph query language is a better fit for application queries than SQL.

Few words by tantaman^

  • Schemaless
  • This should have been named NoRDBMS instead because it’s not no sql that is. Infact, no-sql dbs can support sql like queries etc.

Based on CAP

  • While most Relational DBs guarantee ACID, NoSQL DBs give you BASE
  • There are distributed SQL database aswell: Eg. CockroachDB, Yugabyte etc. which try to give you ACID guarantees while being distributed. (hard problem)

Based on Data model

These data models can be implemented in both Relational and Non-Relational types. Eg. PostgreSQL can also be turned into column store/json data store with indexes and extensions etc. But database built for specific data models would most probably be the ideal solution rather than make up for it w extensions etc.

NOTE: This picture is missing “column oriented” databases, “wide column” != column oriented.

Eg. KV, Graph, Document/Object, Column-family(wide column), Columnar, Row oriented, Array/Matrix/Vectors etc.

KV

  • Redis, DynamoDB etc.
  • See LSMT in Trees

Document/Object based

  • MongoDB, DocumentDB etc.

Row store oriented

Column store oriented

Usually Column store database are used in OLAP systems

  • AWS Redshift, DuckDB, Clickhouse(SQL OLAP DB)
  • Data Warehouse are usually built on columnar storage, See Data Engineering

Wide Column

Naming confusion

  • Wide column data model is usually used in NoSQL databases.
  • Wide column: (Partition Key, Cluster Key, Value)
  • Example: ScyllaDB, Cassandra, Hbase

Based on Query

  • Both SQL and NoSQL database can be either OLAP/OLTP or both based on what the database has to offer
  • As mentioned previously:
    • SQL: Usually columnar storage or row storage
    • NoSQL: Usually K/V, Wide Column, Document, Graph etc.
      • Wide-column and columnar storage are totally different!!

Differences

  • For example Scylladb(OLTP) unable to run select sum() from multibillion_table group by millions_key
  • And vice versa CH(OLAP) is super slow on select value from multibillion_table where key=

OLAP

  • Analytics workload
  • Clickhouse, Snowflake, DuckDB, Apache Druid

OLTP

  • Transactional workload
  • PostgreSQL(SQL, row-based), SyllaDB(NoSQL)

Operational

  • “automated interventions in the business”
  • Think of OLTP workload on a OLAP database

  • Operational is slightly different from transactional&analytical, somewhere in the middle
  • Usually involves maintenance workloads etc. Sometimes queries need to run on big datasets.
  • Operational workloads have fundamental requirements that are diametrically opposite from the requirements for analytical systems
  • If an operational system goes down for the day, there are people who will either be unable to do their job
  • Similarities with Analytics workload is that both sort of will share SQL queries that are complex, join-heavy, multi-level.
  • Solutions
    • Reverse ETL (See Data Engineering) for this purpose
    • Cache results (but now need to worry about cache invalidation)

Time Series DB

Relational Model basics

Meta

  • Access path independence: Decouple “what data do I want to retrieve” from “how do I retrieve it.”

Components

  • tables = relations
    • Unordered set that contain the relationship of attributes that represent entities.
    • Table w n-ary relation = Table w n columns
  • attributes = columns (has a type/domain)
  • tuple = row = set of attributes/columns
  • Values
    • Normally atomic/scalar
    • NULL is a member of every domain if allowed

Modeling of Databases

See DB Data Modeling

Partitioning

  • See Scaling Databases
  • A rather general concept and can be applied in many contexts.
  • Partitioning could be
    • Different database inside MySQL on the same server
    • Different tables
    • Different column value in a singular table

Horizontal/Row based

  • Splitting one table into different tables. Same table structure.

Vertical/Column based

  • Splitting one table into different tables. Each table will have a different table structure.
  • This is similar to the idea of Normalization
    • Usually vertical partitioning refers to a physical optimization
    • Whereas normalization is an optimization on the conceptual level

Important Features

Caching

External

Migrations

Migration FAQ

  • Migration version issue

    • Sequential versions vs Timestamps
      • 0005_first_developer_account_changes.sql vs 2017…08_first_developer_account_changes.sql
    • See this thread for proper explanation
    • Solution is to use a hybrid approach.
      • If using goose, it has the fix command which is suggested to be run in CI. Goose will also create a table goose_db_version in the same postgres schema.
      • Timestamps
        • in Local/Dev
        • Avoid annoying file renaming & conflicts on rebase
      • Sequential
        • in staging/prod
        • Rename those timestamps-based filenames to serial numbers
  • Online/Live/Realtime Migrations

    • Migrating from one database to another without downtime and compromising data correctness

Tips

  • No direct table access

    Never expose real tables to an application. Create an “API” schema which contains only views, functions, procedures, and only allow applications to use this schema. This gives you a layer of indirection on the DB side such that you can nearly eliminate the dance of coordinating application changes and database migrations

  • Get comfortable with forward-only migrations

    Rollbacks work fine for easy changes, but you can’t rollback the hard ones (like deleting a field), and you’re better off getting comfortable with forward-only migrations

  • Application code and Migrations

    • Usually recommended to decouple schema rollout from the application
    • Whether you can run migrations before application startup is something to be decided based on provisioned infra. Eg. in k8s, some ppl run migrations in init containers, don’t do rollbacks.
    • It isn’t possible to atomically deploy both the migration and the application code that uses it.
    • This means if you want to avoid a few seconds/minutes of errors
      • You need to deploy the migration first in a way that doesn’t break existing code
      • Then the application change
      • And then often a cleanup step to complete the migration in a way that won’t break.

Cursors

Functions

Triggers

Stored Procedures

Deferred Constraints

Application code constraints

Query Planner

  • Also see Query Engines. The term query planner and query optimizer is synonymous.
  • There can be muliple ways a result of a DB query can be found. Query planner’s role is to determine which strategy is the best option.
    • Eg. choosing which indexes to scan for a given query,
    • Eg. choosing the order in which to perform joins, and
    • Eg. choosing the join algorithm to use for any given join
  • It analyzes the queries + optimize them before running.
  • Some query planner offer escape hatches for end user to fiddle with
  • The life of a query begins and ends in the database itself.

Component of Query Planner

These are not actual component names but abstract ideas for understanding.

  • Compiler

    • A query will be lexed, parsed, typechecked, translated into an intermediate representation, “optimized”
    • Finally directly translated into a form more suitable for execution.
  • Domain knowledge extractor

    • Approximate domain knowledge that it can exploit to make queries run as fast as possible.
    • This is done through making approximations on statistics collected. Eg. In the following query this needs human intervention, but query planner will try its best to detect which way to go with from the statistics (how big is each table, how many distinct values does each column contain, etc).

TODO Plans

  • Postgres Specific

    • General plan
    • Custom plan
  • Physical Plan

    • The plan that gets executed

Others

ACID

What are these?

  • Loose description instead of a strict implementation standard.
  • Properties database transactions need to guarantee to their users for validity even in the event of crash, error, hardware failures and similar.
  • These guide application developers what’s their responsibility v/s what the databases provides

ACID can have multiple meanings

  • There are tradeoffs in implementing ACID capabilities/properties. So, if a DB claims to be ACID complaint, there might still be edge cases
  • Each property has its own spectrum, the spectrum is of about trafeoffs. So implementation of the property will fall somewhere in the spectrum based on the tradeoffs the DB developers took
  • Implementing some capabilities is more expensive than others. consistency and isolation are pretty expensive.

The properties

Atomicity

  • Database ordering of transaction if different from what application dev sees in their code. If you want it two transactions to maintain order, better wrap them in a database transaction.
  • ROLLBACK helps with atomicity

Consistency

  • Atomicity+Isolation leads to consistency
  • Consistency has different meanings based on context. (See Concurrency Consistency Models)
  • When it comes to ACID in databases, it simply means satisfying application specific constraints. Think of maintaining referential integrity etc.

Isolation

  • Question to ask: “Is it serializable?”
  • Question to ask: “Whether or not concurrent transactions can see/affect each other’s changes”
  • Helps avoid race conditions
  • Isolation is achieved using Database Transactions
  • Weak and Strong

    • Weak isolation = faster + more data races
      • Weak doesn’t mean unsafe
    • Strong isolation = slower + lesser data races
      • Eg. Cockroach and SQLite have effectively serializable transactions. But they don’t catch all bugs either.
  • Read Phenomena/Issues/Anomaly

    NOTE: See PostgreSQL: Documentation: 16: 13.2. Transaction Isolation for better summary

    • Non-repetable reads

      In the same thread, you read the same data twice and you now get different value because another thread committed. That’s non-repetable! i.e you’re not getting consistent results

    • Dirty reads

      In the same thread, you read the data, but another thread also changed the same data but did not even commit. You read the data again, you’re seeing uncommitted data now in your thread.

    • Phantom reads

      • This is similar to Non-repeatable reads but more related to search conditions/aggregates finding that previous results of the search/aggregations and current results are different
  • SQL standard isolation levels

    There are more levels to this than what the SQL standard offers. Eg. Google’s Spanner guarantee external serializability with clock synchronization (stricter isolation but not in the SQL standard, see Concurrency Consistency Models)

    NameDescriptionStrength(1-4)Availability
    SerializableAs if execution was sequential4Unavailable
    Repeatable readsChanges by other transactions aren’t visible, but can see own uncommitted reads.3Unavailable
    Read committedOnly committed changes are visible, no uncommitted reads. New committed changes by others can be seen (potential for phantom reads).2Totally Available
    Read Un-committedAllows dirty reads, transactions can see changes by others that aren’t yet committed.1Totally Available
    • The isolation levels only define what must not happen. Not what must happen. (i.e BTNH, See Correctness criteria)
    • Read commited is the default in PostgreSQL but allows no phantom reads.
    • Most DBs don’t even implement Read Un-committed as it’s not much useful
  • Databases and their implementation of isolation

    • Martin Kleppmann’s Hermitage is an excellent resource on this.
    • DBs provide different isolation layers based on tradeoffs, SQL standard has 4. Eg. PostgreSQL only provides 3.
    • DBs say something and mean something else from Concurrency Consistency Models
    • See ept/hermitage

Durability

  • This is about what happens after COMMIT
  • When i hit save, things should go to non-volatile shit, so that if system crashes, i can still read the data
  • Some sacrifice durability for performance.
  • This is again different for different DBs, eg. SQLite only acknowledges a transaction when it is safely persisted to the underlying storage.

Database Administration

See Infrastructure, System Design

Sessions, Connections and Transactions

Read this: How to calculate max_connections for PostgreSQL and default_pool_size for pgbouncer?

Connection

DBDefault connection limit
sqLiteN/A, limited by single-writer
poStgresql100 (configurable)
mySql150 (configurable)
  • A Connection is established when a client (like an application or a user) connects to the PostgreSQL server.
  • Each connection runs in its own process in the server.
  • Idle connections consume resources. pgbouncer has server_idle_timeout on which it’ll close a connection to the db if it has been idle for more than the specified duration.
  • In terms of PostgreSQL, each connection is then handled by a worker backend
  • “To get more storage, you buy SSD, and to get more connections, you buy more RAM & networking hardware”

Session

  • A Session begins when a connection is successfully established.
  • In PostgreSQL, a session is essentially synonymous with a connection.
  • PostgreSQL can handle multiple connection/sessions concurrently

Transaction

  • Transaction is a sequence of one or more SQL operations treated as a unit.
  • A session can have muliple transactions
  • All query run within a transaction
  • Life cycle: starts, runs the queries it contains, and ends

Pools

  • pgbouncer

    Since pgbouncer is single threaded, you can run multiple of it like this

    • Pooling types

      • Session pooling

        • Pgbouncer merely acts as a connection broker
        • We still get the 1-1 connection-session relationship with added queue
      • Transaction pooling

        • Can assign the connection to different client
          • Once the current transaction is completed
          • Session might not be complete
        • i.e same connection be used for multiple sessions
        • i.e connections can be shared among different sessions
        • Limitations when working with Prepared Statements.
        • We must make sure that, we do not depend on session-level states (like temporary tables, session variables)
    • Configuring pgbouncer

      • Auth (2 ways)

        I prefer the auth_query method. But even when using auth_query method, we’ll still need to use auth_file (userlist.txt) for auth_user pass. (yeah it’s damn confusing)

        • Adding the SCRAM verifiers to the userlist.txt file that the pgbouncer.ini references
        • Setting up an auth_query to be used by the pgbouncer.ini method

Edge and Databases

  • Usually edge environments don’t support making arbitrary TCP outbound connections
  • Only HTTP and WebSockets are allowed.
  • DBs usually use a custom TCP-based network protocol. (now that’s a conflict)
  • You will need to use a HTTP -> TCP proxy of some sort
  • Some edge ready DBs are now supporting
    • HTTP fetch (best for single-shot queries)
    • WebSocket (for sessions and transactions)

Management Processes

See PostgreSQL

ORMs

They are simply an abstraction with a quintessential pro and con—they abstract away some visibility and occasionally incur some performance hits.

  • Produce
    • ORM: Produce objects. (directed graph)
    • DB tables: Bidirectionally linked data via shared keys (undirected graph)
  • See Sketch of a Post-ORM

Notes on various database software out there

LevelDB

  • Uses LSMT (See Trees)
  • Progression: BigTable > LevelDB
  • Embedded key-value store, (library in an application)
  • I/O
    • Heavy write throughput workloads
    • NOT great at random reads
  • Designed for disk
    • Merges using bulk reads and writes: they minimize expensive seeks

RocksDB