tags : Database

FAQ

How does Supabase work?

What’s PostgREST?

  • Also see REpresentational State Transfer, PostgREST is not really REST.
  • PostgREST is based on the idea if you are accessing the database as a particular user, Row Level Security (RLS) will ensure that you’ll have access to the data you need.
    • Users need to be authenticated and authorized.
    • As this is like the backend now, some kind of pool for connections would be needed
  • Its API looks like a mixture of RESTful and GraphQL.

How to use EXPLAIN?

  • Can give you the cost before hand
  • Can estimate how many rows it’ll return etc.

How to use ANALYZE?

  • It runs the shit that collects the statics that the query planner uses and what we see when we run EXPLAIN

Postgres by default runs in read committed, so transactions are not serializable. How is this acceptable?

  • With read committed, Query in a transaction
    • Sees only data committed before the query began;
    • Never sees uncommitted data
    • Never sees changes committed by concurrent transactions during the query’s execution
  • This is slightly different for writes( See MVCC )
  • The partial transaction isolation provided by Read Committed mode is adequate for many applications

Postgres Schema and Database Objects

Meaning of schema

  • The concept of a schema is not very consistent between DBMSs.
    • In mysql, it calls schemas databases, and only has one database.
    • Schemas are the implementation in postgres of the namespace concept. It’s a collection of logical structures of data.
  • This is different from the general idea of database schema
  • SHOW SEARCH_PATH

Hierarchy of “things” in postgres

  • A single machine/host can have multiple clusters, clusters run in a port
  • A cluster can have many databases, roles/users and more cluster level objects. i.e role and database are couples. but role keeps cheating, so a single role can be the owner of multiple databases.
  • Then we have database level objects
    • Belonging to some schema (namespaced)
      • Eg. Tables, views, sequences, functions, procedures, types, domains, etc.
      • Eg. Indexes and triggers are always created in the same schema as the table they are for.
    • Not belonging to some schema (no namespace)
      • Eg. Extensions
  • There can be schemas with the same name in different databases (but not in the same database).
  • objects from one database cannot interact with objects from another one (catalogs are kept separated). Eg. we can’t join across two databases. Use schema instead.
  • A database can have multiple schemas

Understanding usecase of schema and search_path in postgres

  • Organization
    • Think of schema like directories and database objects as files. You can do without directories but it helps organize.
    • Eg. In a single database, you can have schemas like ecommerce, auth etc.
    • Helpful in dumping, selecting all tables in a schema etc.
  • Access control
    • This has roots into the usecase where a database has multiple users. In our web application world, we usually have one user accessing it. So we set the search path in the connection string.
    • Grant privileges to roles at schema level, so that all objects inside that schema will have those access.

When to create schema, when to create database?

  • If projects/users should be separate and be unaware of each other: databases
  • If projects/users are interrelated, should be able to use each other’s resources: schemas

search path and schema

search path

  • When using schema, you need to use “qualified” names. Instead you can use search path
  • If object is referenced without schema qualification, the search path is traversed until a matching object is found.
  • schema path is a list of path, whichever is found first is used.
    • The first schema in the search path that “exists” is the default location for creating new objects.
    • That is the reason that by default objects are created in the public schema.
  • Eg. follwing shows tables from all schema
      \dt *.*

Builtin schema

  • public schema: There’s nothing special about the public schema except that it exists by default. It can be dropped, too. It’s just is there when you create a new database, so if you create objects without specifying any schema, it’ll go land up in the public schema of the database.
  • pg_catalog schema: Contains the system tables and all the built-in data types, functions, and operators. pg_catalog is always effectively and implicitly part of the search path.

PUBLIC role / keyword

  • It’s not exactly a role but a keyword that can be a placeholder for role
  • This keyword usually means “everyone”
  • This is different from public schema. eg. REVOKE CREATE ON SCHEMA public FROM PUBLIC; (first “public” is the schema, the second “public” means “every user”)
  • It used to gives create permission on the public schema to “all” users despite what grant is at play.
    • USAGE privilege granted to PUBLIC: Allows anyone to refer to objects in the public schema.
    • CREATE privilege granted to PUBLIC: Allows anyone to create objects in the public schema.
      • This is problematic!
  • Postgrs14 onwards there is a fix for this issue
    • new role pg_database_owner, implicitly has the actual owner of the current database as a member.
    • public schema is owned by pg_database_owner now.
    • public schema does not have CREATE privileges anymore.

Postgres Syntax notes

JSONB

  • Prefer jsonb over json if possible

Operators & Functions

  • Checks
    • exists: ? , ?| (any), ?& (all)
      • eg. SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';
      • Not nested
      • works only with keys when objects
      • works with array items
    • containment
      • <@
        • Can operate on nested elements
        • Eg. SELECT doc->'site_name' FROM websites WHERE doc @> '{"tags":[{"term":"paris"}, {"term":"food"}]}';
      • @>
  • Access
    • subscripting [x] (Extract & Modify jsonb)
      • Result of subscript always a jsonb
      • Array indexes start at 0, -ve indexes supported
      • Can access nested element by successive subscripting
      • UPDATE & SET combined will work with subscripts
        • Eg. UPDATE table_name SET jsonb_column_name['a']['b']['c'] = '1';
        • This will create nested parent element if they don’t exist
        • Unlike(?) jsonb_set, it’ll try to fill things with null, and consider null column as {} etc.
    • ->
      • Array(index) and object(key) access
    • ->>
      • Array(index) and object(key) access as text
    • #> (path)
      • Array(index) and object(key) access
    • #>> (path)
      • Array(index) and object(key) access as text
  • Manupulation
    • || : concat objects
    • - : subtract keys

Updating jsonb values

  • Ways to do it
    • UPDATE & SET
    • jsonb_set
      • Returns target with the item designated by path replaced by new_value, or with new_value added

Indexes on jsonb

  • To use indexes(GIN), we need to apply the operator on the entire column of jsonb type, picking on specific field would not make use of the index if using simple index, otherwise we can use expression index for specific field in the json document.
  • Supported operators: ?, ?|, ?&, @>, jsonpath operators @?, @@
  • Types

    • jsonb_path_ops
      • Seems to have some advantage over jsonb_ops that I don’t care about at the moment
    • jsonb_ops
      • simple & expression
  • Expression indexes vs Simple indexes (jsonb_ops)

    • Simple index
      • Eg. CREATE INDEX idxgin ON api USING GIN (jdoc);
      • Will store copies of every key and value in the jdoc column
      • More flexible, supports any key but bigger in size and might be slower
    • Expression index
      • Eg. CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags'));
      • Stores only data found under the tags key in the example

Management/Admin

See Operations cheat sheet - PostgreSQL wiki

Postmaster & Backend workers & Backend processes

  • postmaster is the primary daemon process for a PostgreSQL server.
  • The use of term is not done these days, the primary daemon process is simply being called postgres now.
  • Responsibilities
    • Initializes the database system, prepares it for incoming connections, and ensures that the system is in a consistent state.
    • Manages connections (spun off processes/ backends processes)
    • Logging
  • Backens processes
    • Config: max_connections
    • Backend processes communicate with each other and with other processes of the instance using
    • semaphores and shared memory to ensure data integrity throughout concurrent data access.
    • i.e connection = backend process = session (unless using pools)
  • Background workers
    • Config: max_worker_processes
    • The backend processes spin up the background workers for actual work if parallel is enabled.
    • This is different from background writer which is a Auxilary process.

Shared Memory

  • As we read things from the disk etc. we put things in memory
  • Everything that is shared, it lives here. Pages, WAL records etc.
  • Every backend process and backend worker and everything else has access to it.
  • Related terms: shared_buffers, buffer pools
  • Provisioned by mmap (?)

Auxiliary Processes

  • background writer : It writes the dirty pages to the OS periodically. So it frees up shared memory in the postgres process but doesn’t ensure that shit actually got to disk, that’s upto the os whether it keeps it in the fs chache or what etc.
  • checkpointer: Flushes pages+wal to disk. Ensure that things have reached the disk.
  • Autovacuum Launcher
    • config: autovacuum_max_workers
    • It launches autovacuum workers
  • wal
    • archiver: Archives everything for the wal (can be used for replay)
    • reciever: Runs where we want to receive the WAL changes.
    • senders: Senders of the WALs to replicas
    • writer: Writing the WAL records from shared memory to disk. Controlled by wal_writer_delay
  • starter
    • This combines checkpointer and wal for REDO(roll-forward recovery).
    • Doesn’t let any incoming connection till it ensures that it loaded the wal into the shared memory and made that data dirty before the checkpoint
    • Now you’re back in the same point before the crash happened.

Cluster

What it’s NOT

  • It is NOT what we popularly understand as cluster(compute cluster). This tripped me up for years man.
    • If you were do that, you’ll end up creating cluster of postgres clusters. (This is basically bad naming for historic reasons)
  • It’s unrelated to the keyword CLUSTER which allows you to organize a table.

What it is

  • A postmaster / primary postgres process and a group of subsiduary processes
  • These processes manage a shared data directory that contains one or more databases.
  • It’s created for you when you do initdb
  • Unusual for a basic or intermediate user to ever need to create clusters or manage multiple clusters as one cluster can manage multiple DBs
  • Having multiple cluster means having >1 postmaster / primary postgres processes running on different ports. This is usually not needed.
  • See PostgreSQL: Documentation: 16: 19.2. Creating a Database Cluster

Replication

See Data Replication

System Metadata

System Catalogs

System Views

Users and Roles

  • ALTER: Modify a role for its attributes
  • GRANT/REMOVE: Groups/Memberships leading to inheritance of privilege

Roles

  • select * from pg_roles; same as \du
  • PostgreSQL manages database access permissions using the concept of roles
  • They have nothing to do with OS user.
  • ROLE = user / group of users. (Any role can act as a user, a group, or both)
  • Some attributes: SUPERUSER,CREATEDB,CREATEROLE,INHERIT,LOGIN,REPLICATION,BYPASSRLS (from the CREATE ROLE command)
  • Can own
    • databases
    • objects in that database
  • Can have
    • privileges on objects
  • Can give/grant
    • access to other role (s) on the owned objects
    • membership in its role to another role (once two roles share membership, they can share privileges)
  • Predefined roles

    • Predefined roles were introduced in pg14.
    • Useful for things like granting readonly access etc.

Privileges

  • objects can require certain privilege to be accessed.
  • This privilege can be granted to roles
    • Indirectly: from other roles
    • Directly: to objects
    • See GRANT for more info.
  • Inheritance

    • Whether the privileges of the granted role should be “inherited” by the new member.
    • NOTE: It does not apply to the special role attributes set by CREATE ROLE and ALTER ROLE.
      • Eg. Being a member of a role with CREATEDB privilege does not immediately grant the ability to create databases, even if INHERIT attribute is set
    • Controlled in 2 places
      • Role level: Role can have the INHERIT attribute to inherit.
      • Command level: GRANT has WITH INHERIT to explicitly inherit.
  • ALTER

    • Assigned attributes to a ROLE can be modified with ALTER ROLE
    • ALTER DEFAULT PRIVILEGES allows you to set the privileges that will be applied to objects created in the future. (It does not affect privileges assigned to already-existing objects.)

GRANT / REVOKE

  • Used for mutating memberships and privilege to objects
  • GRANT and REVOKE can also be done by a role that is not the owner of the affected object
    • If it is a member of the role that owns the object
    • If it is a member of a role that holds privileges WITH GRANT OPTION on the object.
  • Can be GRANT’ed on 2 things
    • objects
      • GRANT INSERT ON films TO PUBLIC;
      • GRANT ALL PRIVILEGES ON kinds TO manuel;
      • One of: SELECT,INSERT ,UPDATE ,DELETE ,TRUNCATE ,REFERENCES ,TRIGGER ,CREATE ,CONNECT ,TEMPORARY ,EXECUTE ,USAGE ,SET ,ALTER SYSTEM and ALL PRIVILEGES
    • roles
      • Eg. Grant membership in role coolbois to user joe: GRANT admins TO joe;

Features

Temporary Tables

  • Implemented like regular tables, but uses temp_buffers and uses disk for storing metadata and overflows etc.
  • Unlike regular tables, not guaranteed to be used by multiple connections at the same time, are not subject to locks, are not written to WAL, etc.
  • Located in PostgreSQL system tables. In addition, for each table, one or more files are created on disk (by default, in the same folder as the files for regular tables).
  • Allows different sessions to use the same temporary table name for different purposes
  • Automatically dropped at the end of a session, or optionally at the end of the current transaction based on ON COMMIT

Issues

  • Autovacuum daemon cannot access and therefore cannot vacuum or analyze temporary tables.
  • Need to be purged
    • DELETE ALL bad cuz MVCC is used for Temporary tables aswell and deleting records is slow (See Concurrency)
    • So we TRUNCATE: TRUNCATE simply creates a new file on disk and does an UPDATE of the pg_class table.

Read more

UPSERT

-- canonical
INSERT INTO table (col1, col2, col3)
VALUES (val1, val2, val3)
ON
 CONFLICT conflict_target conflict_action;
-- example
INSERT INTO employees (id, name, email)
VALUES (2, ‘Dennis’, ‘dennisp@weyland.corp’)
ON CONFLICT (id) DO UPDATE;
-- conflict_target: (id)
-- conflict_action: DO UPDATE

Granular Access Control

Row Level Security (RLS)

  • It needs to be enabled on per-table basis
  • Access control is fully modeled inside the database
  • It doesn’t matter as who you connect to it anymore
  • As long as your request can be appropriately authenticated into a database role for that row in the table you’re good.
  • Row security policies can be specific to commands, or to roles, or to both.

Nested/Sub Transactions

BEGIN; -- Start the main transaction
-- Perform some operations
INSERT INTO accounts (account_id, amount) VALUES (1, 1000);
-- Create a savepoint
SAVEPOINT my_savepoint;
-- Perform operations within the nested transaction
INSERT INTO transactions (trans_id, account_id, amount) VALUES (101, 1, 100);
UPDATE accounts SET amount = amount - 100 WHERE account_id = 1;
-- Decide to roll back the nested transaction
ROLLBACK TO SAVEPOINT my_savepoint;
-- The nested transaction's changes are undone, but the main transaction is still in progress
-- Continue with other operations
INSERT INTO logs (message) VALUES ('Nested transaction was rolled back.');
-- Finally, commit the main transaction
COMMIT;
  • See Database Transactions
  • We want to avoid such nested transactions if possible.
  • Let the application logic create the transaction and put things in. If possible don’t try to handle this at the DB level.
  • Subtransactions can commit or abort without affecting their parent transactions, allowing parent transactions to continue.
  • Started using SAVEPOINT / EXECEPTION

WAL

  • Main idea: Changes to data files (where tables and indexes reside) must be written only after those changes have been logged
  • With the WAL, we will be able to recover the database so we need not flush pages on every transaction
  • When restoring the WAL after a crash, we’ll recover to the checkpoint created by the checkpointer

Synchronous v/s Async commit

  • It is possible to have both synchronous and asynchronous commit transactions running concurrently
  • Certain transactions are always synchronous, eg. DROP TABLE, Two Phase Commit (2PC) prepared transaction etc.
  • Synchronous

    Usually commits are synchronous, i.e if there’s a commit, there’ll be a WAL flush, and the commit is successful only if the flush was successful. (This is the default)

  • Asynchronous

    • PostgreSQL allows you to do asynchronous commits aswell.
    • This introduces the risk of data loss. (not data corruption)
    • Should not be used if the client will take external actions relying on the assumption that the transaction will be remembered.
    • But for things like Event Logging, maybe this would be fine

WAL flush v/s Data flush

  • The WAL file is written sequentially, and so the cost of syncing the WAL is much less than the cost of flushing the data pages.
  • If DB is handling many small concurrent transactions, one fsync of the WAL file may suffice to commit many transactions. commit_delay can be used to increase the window of transactions to be flushed.

Journaling filesystems

  • In certain cases if the underlying Filesystem (the sqlite VFS interface) there’s journaling support, in those cases we can probably not use sqlite journaling and gain some performance gains but this is not needed usually and unnecessarily complex. This can also be vice versa, we can disable filesystem journaling too.

MVCC Implementation

See MVCC

VACUUM

See A Detailed Understanding of MVCC and Autovacuum Internals in PostgreSQL 14

Terms

  • Live Tuples : Tuples that are Inserted or up-to-date or can be read or modified.
  • Dead Tuples : Tuples that are changed (Updated/Deleted) and unavailable to be used for any future transactions. (This is what Vaccum clears)

Usecase

  • MVCC Version Cleanup

    • VACUUM is like a Garbage collector for older versions of rows created by MVCC
    • transaction id is a 32-bit integer
      • the VACUUM process is responsible for making sure that the id does not overflow.
      • Never disable the VACUUM, else transaction wraparound
    • See Database Transactions
  • Run ANALYZE on tables

Freezing

autovaccum conditions

Internals

Pages

  • AKA Data blocks
  • Page number starts from 0

Major updates and Replication

Major Upgrades

Other extra info on Postgres

Relevant Notes

  • Initializing a postgres directory: After logging into the postgres user, you can create a directory with necessary postgres files with the initdb command. It creates a directory in the file system and then you can start a postgres server directing it to that directory.
  • Tablespaces: All tables are by default created in pg_default tablespace, creating in a tablespace does not affect the logical SQL schema.

Tools

Routing

Others

Extension

Ecosystem

Other comments

  • tw1

    Some PostgreSQL footguns:

    • default configuration
    • long transactions mixed with OLTP workload
    • repmgr (and other HA tools not based on consensus algorithms)
    • LISTEN/NOTIFY
    • “pg_dump is a utility for backing up a PostgreSQL database” says the official doc
    • moving/copying PGDATA and ignoring glibc version change
    • hot_standby_feedback on/off dilemma
    • partitioning of tables having high number of indexes and receiving QPS >> 1k
    • “setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions” says the official doc
    • using replication slots w/o setting max_slot_wal_keep_size
    • relying only on statement_timeout & idle_in_transaction_session_timeout and thinking it’s enough (lack of transaction_timeout)
    • data types “money”, “enum”, “timestamp” (3 different cases)
    • int4 PK
    • massive DELETE
    • attempts to use transactional version of DDL (e.g., CREATE INDEX) under heavy loads
    • subtransactions
    • DDL under load without lock_timeout