tags : Database
FAQ
How does Supabase work?
- Supabase is more of BaaS (Backend-as-a-Service) than a Database service
- Every table, view, and stored procedure is automatically mapped to a RESTful API.
- Supabase uses PostgREST
- See
What’s PostgREST?
- Also see REpresentational State Transfer, PostgREST is not really REST.
- PostgREST is based on the idea if you are accessing the database as a particular user, Row Level Security (RLS) will ensure that you’ll have access to the data you need.
- Users need to be authenticated and authorized.
- As this is like the backend now, some kind of pool for connections would be needed
- Its API looks like a mixture of RESTful and GraphQL.
How to use EXPLAIN?
- Can give you the cost before hand
- Can estimate how many rows it’ll return etc.
How to use ANALYZE?
- It runs the shit that collects the statics that the query planner uses and what we see when we run
EXPLAIN
Postgres by default runs in read committed
, so transactions are not serializable. How is this acceptable?
- With
read committed
, Query in a transaction- Sees only data committed before the query began;
- Never sees uncommitted data
- Never sees changes committed by concurrent transactions during the query’s execution
- This is slightly different for writes( See MVCC )
- The partial transaction isolation provided by Read Committed mode is adequate for many applications
Postgres Schema and Database Objects
Meaning of schema
- The concept of a schema is not very consistent between DBMSs.
- In mysql, it calls schemas databases, and only has one database.
- Schemas are the implementation in postgres of the namespace concept. It’s a collection of logical structures of data.
- This is different from the general idea of database schema
SHOW SEARCH_PATH
Hierarchy of “things” in postgres
- A single machine/host can have multiple
clusters
, clusters run in aport
- A
cluster
can have manydatabases
,roles/users
and morecluster
levelobjects
. i.erole
anddatabase
are couples. butrole
keeps cheating, so a singlerole
can be the owner of multipledatabases
. - Then we have
database
levelobjects
- Belonging to some
schema
(namespaced)- Eg.
Tables
,views
, sequences, functions, procedures, types, domains, etc. - Eg.
Indexes
andtriggers
are always created in the same schema as the table they are for.
- Eg.
- Not belonging to some schema (no namespace)
- Eg. Extensions
- Belonging to some
- There can be
schemas
with the samename
in differentdatabases
(but not in the samedatabase
). objects
from onedatabase
cannot interact withobjects
from another one (catalogs are kept separated). Eg. we can’t join across two databases. Useschema
instead.- A
database
can have multipleschemas
Understanding usecase of schema
and search_path
in postgres
- Organization
- Think of
schema
like directories and database objects as files. You can do without directories but it helps organize. - Eg. In a single database, you can have schemas like
ecommerce
,auth
etc. - Helpful in dumping, selecting all tables in a schema etc.
- Think of
- Access control
- This has roots into the usecase where a database has multiple users. In our web application world, we usually have one user accessing it. So we set the
search path
in the connection string. - Grant
privileges
toroles
atschema
level, so that allobjects
inside thatschema
will have those access.
- This has roots into the usecase where a database has multiple users. In our web application world, we usually have one user accessing it. So we set the
When to create schema
, when to create database
?
- If projects/users should be separate and be unaware of each other:
databases
- If projects/users are interrelated, should be able to use each other’s resources:
schemas
search path and schema
search path
- When using schema, you need to use “qualified” names. Instead you can use
search path
- If
object
is referenced withoutschema
qualification, thesearch path
is traversed until a matching object is found. schema path
is a list of path, whichever is found first is used.- The first schema in the
search path
that “exists” is the default location for creating new objects. - That is the reason that by default objects are created in the
public
schema.
- The first schema in the
- Eg. follwing shows tables from all schema
\dt *.*
Builtin schema
public
schema: There’s nothing special about the public schema except that it exists by default. It can be dropped, too. It’s just is there when you create a newdatabase
, so if you createobjects
without specifying any schema, it’ll go land up in thepublic
schema of thedatabase
.pg_catalog
schema: Contains the system tables and all the built-in data types, functions, and operators.pg_catalog
is always effectively and implicitly part of thesearch path
.
Related access control notes
PUBLIC
role / keyword
- It’s not exactly a
role
but a keyword that can be a placeholder forrole
- This keyword usually means “everyone”
- This is different from
public
schema. eg.REVOKE CREATE ON SCHEMA public FROM PUBLIC;
(first “public” is the schema, the second “public” means “every user”) - It used to gives create permission on the
public schema
to “all” users despite what grant is at play.USAGE
privilege granted toPUBLIC
: Allows anyone to refer to objects in the public schema.CREATE
privilege granted toPUBLIC
: Allows anyone to create objects in the public schema.- This is problematic!
- Postgrs14 onwards there is a fix for this issue
- new role
pg_database_owner
, implicitly has the actual owner of the current database as a member. - public schema is owned by
pg_database_owner
now. - public schema does not have
CREATE
privileges anymore.
- new role
Postgres Syntax notes
JSONB
- Prefer jsonb over json if possible
Operators & Functions
- Checks
- exists:
?
,?|
(any),?&
(all)- eg.
SELECT '["foo", "bar", "baz"]'::jsonb ? 'bar';
- Not nested
- works with array items
- works only with keys when items in array are objects
- eg.
- containment
<@
- Can operate on nested elements
- Eg.
SELECT doc->'site_name' FROM websites WHERE doc @> '{"tags":[{"term":"paris"}, {"term":"food"}]}';
@>
- exists:
- Access
- subscripting
[x]
(Extract & Modify jsonb)- Result of subscript always a
jsonb
- Array indexes start at 0, -ve indexes supported
- Can access nested element by successive subscripting
- Result of subscript always a
->
- Array(index) and object(key) access
->>
- Array(index) and object(key) access
as text
- Array(index) and object(key) access
#>
(path)- Array(index) and object(key) access
#>>
(path)- Array(index) and object(key) access
as text
- Array(index) and object(key) access
- subscripting
- Manipulation
||
: concat objects-
: subtract keys
Updating jsonb values
It’s important to note that in each of these examples, you’re not actually updating a single field of the JSON data. Instead, you’re creating a temporary, modified version of the data, and assigning that modified version back to the column. In practice, the result should be the same, but keeping this in mind should make complex updates, like the last example, more understandable. SO answer
- Ways to do it
UPDATE
&SET
UPDATE
&SET
combined will work with subscripts- Eg.
UPDATE table_name SET jsonb_column_name['a']['b']['c'] = '1';
- This will create nested parent element if they don’t exist
- Unlike(?)
jsonb_set
, it’ll try to fill things with null, and consider null column as{}
etc.
- Eg.
- When using this, we can also use
||
for “merge” like operation, this is useful when doing ON CONFLICT UPDATE for jsonb. See How can I merge an existing jsonb field when upserting a record in Postgres?- But this merge is only shallow, doesn’t work with nested JSON.
- For nested JSON check these:
- json - Merging JSONB values in PostgreSQL? - Stack Overflow (Vanilla PG, has nice answers)
- Use a separate Procedural Language add-on like PLV8
- https://gist.github.com/phillip-haydon/54871b746201793990a18717af8d70dc (function using plv8)
jsonb_set
- Returns target with the item designated by
path
replaced by new_value, or with new_value added. - Useful when you know the structure of the json.
- It can also do merging if you specify the path. But not much useful when you’re unaware of the structure.
- Returns target with the item designated by
Indexes on jsonb
- To use indexes(GIN), we need to apply the operator on the entire column of jsonb type, picking on specific field would not make use of the index if using
simple index
, otherwise we can useexpression index
for specific field in the json document. - Supported operators:
?
,?|
,?&
,@>
, jsonpath operators@?
,@@
-
Types
jsonb_path_ops
- Seems to have some advantage over
jsonb_ops
that I don’t care about at the moment
- Seems to have some advantage over
jsonb_ops
- simple & expression
-
Expression indexes vs Simple indexes (jsonb_ops)
- Simple index
- Eg.
CREATE INDEX idxgin ON api USING GIN (jdoc);
- Will store copies of every key and value in the jdoc column
- More flexible, supports any key but bigger in size and might be slower
- Eg.
- Expression index
- Eg.
CREATE INDEX idxgintags ON api USING GIN ((jdoc -> 'tags'));
- Stores only data found under the
tags
key in the example
- Eg.
- Simple index
Management/Admin
See Operations cheat sheet - PostgreSQL wiki
Postmaster & Backend workers & Backend processes
postmaster
is the primary daemon process for a PostgreSQL server.- The use of term is not done these days, the primary daemon process is simply being called
postgres
now. - Responsibilities
- Initializes the database system, prepares it for incoming connections, and ensures that the system is in a consistent state.
- Manages connections (spun off processes/
backends processes
) - Logging
Backens processes
- Config:
max_connections
- Backend processes communicate with each other and with other processes of the instance using
- semaphores and shared memory to ensure data integrity throughout concurrent data access.
- i.e
connection
=backend process
=session
(unless usingpools
)
- Config:
Background workers
- Config:
max_worker_processes
- The
backend processes
spin up thebackground workers
for actual work if parallel is enabled. - This is different from
background writer
which is a Auxilary process.
- Config:
Shared Memory
- As we read things from the disk etc. we put things in memory
- Everything that is shared, it lives here. Pages, WAL records etc.
- Every
backend process
andbackend worker
and everything else has access to it. - Related terms:
shared_buffers
, buffer pools - Provisioned by mmap (?)
Auxiliary Processes
background writer
: It writes the dirty pages to the OS periodically. So it frees up shared memory in thepostgres
process but doesn’t ensure that shit actually got to disk, that’s upto the os whether it keeps it in the fs chache or what etc.checkpointer
: Flushes pages+wal to disk. Ensure that things have reached the disk.Autovacuum Launcher
- config:
autovacuum_max_workers
- It launches
autovacuum
workers
- config:
wal
archiver
: Archives everything for the wal (can be used for replay)reciever
: Runs where we want to receive the WAL changes.senders
: Senders of the WALs to replicaswriter
: Writing the WAL records from shared memory to disk. Controlled bywal_writer_delay
starter
- This combines
checkpointer
andwal
for REDO(roll-forward recovery). - Doesn’t let any incoming connection till it ensures that it loaded the wal into the shared memory and made that data dirty before the
checkpoint
- Now you’re back in the same point before the crash happened.
- This combines
Cluster
What it’s NOT
- It is NOT what we popularly understand as cluster(compute cluster). This tripped me up for years man.
- If you were do that, you’ll end up creating cluster of postgres clusters. (This is basically bad naming for historic reasons)
- It’s unrelated to the keyword
CLUSTER
which allows you to organize a table.
What it is
- A
postmaster
/ primary postgres process and a group of subsiduary processes - These processes manage a shared data directory that contains one or more databases.
- It’s created for you when you do
initdb
- Unusual for a basic or intermediate user to ever need to create clusters or manage multiple clusters as one cluster can manage multiple DBs
- Having multiple cluster means having >1
postmaster
/ primarypostgres
processes running on different ports. This is usually not needed. - See PostgreSQL: Documentation: 16: 19.2. Creating a Database Cluster
Replication
See Data Replication
System Metadata
System Catalogs
System Views
Users and Roles
- ALTER: Modify a role for its attributes
- GRANT/REMOVE: Groups/Memberships leading to inheritance of privilege
Roles
select * from pg_roles;
same as\du
- PostgreSQL manages database access permissions using the concept of
roles
- They have nothing to do with OS user.
ROLE
= user / group of users. (Any role can act as a user, a group, or both)- Some attributes: SUPERUSER,CREATEDB,CREATEROLE,INHERIT,LOGIN,REPLICATION,BYPASSRLS (from the
CREATE ROLE
command) - Can own
databases
objects
in thatdatabase
- Can have
privileges
onobjects
- Can give/grant
access
to otherrole
(s) on the ownedobjects
membership
in itsrole
to anotherrole
(once two roles sharemembership
, they can shareprivileges
)
-
Predefined roles
- Predefined roles were introduced in pg14.
- Useful for things like granting readonly access etc.
-
“postgres” role
- The default
postgres
role doesn’t have an “empty password”, it literally has no password which means one cannot connect using a password-based method.
- The default
Privileges
objects
can require certainprivilege
to be accessed.- This
privilege
can be granted toroles
- Indirectly: from
other roles
- Directly: to
objects
- See GRANT for more info.
- Indirectly: from
-
Inheritance
- Whether the
privileges
of the grantedrole
should be “inherited” by the new member. - NOTE: It does not apply to the special role attributes set by CREATE ROLE and ALTER ROLE.
- Eg. Being a member of a role with CREATEDB privilege does not immediately grant the ability to create databases, even if INHERIT attribute is set
- Controlled in 2 places
- Role level: Role can have the
INHERIT
attribute to inherit. - Command level:
GRANT
hasWITH INHERIT
to explicitly inherit.
- Role level: Role can have the
- Whether the
-
ALTER
- Assigned attributes to a
ROLE
can be modified withALTER ROLE
ALTER DEFAULT PRIVILEGES
allows you to set the privileges that will be applied to objects created in the future. (It does not affect privileges assigned to already-existing objects.)
- Assigned attributes to a
GRANT / REVOKE
- Used for mutating
memberships
and privilege to objects - GRANT and REVOKE can also be done by a
role
that is not the owner of the affected object- If it is a member of the role that owns the object
- If it is a member of a role that holds privileges
WITH GRANT OPTION
on theobject
.
- Can be GRANT’ed on 2 things
objects
- GRANT INSERT ON films TO PUBLIC;
- GRANT ALL PRIVILEGES ON kinds TO manuel;
- One of: SELECT,INSERT ,UPDATE ,DELETE ,TRUNCATE ,REFERENCES ,TRIGGER ,CREATE ,CONNECT ,TEMPORARY ,EXECUTE ,USAGE ,SET ,ALTER SYSTEM and
ALL PRIVILEGES
roles
- Eg. Grant membership in role
coolbois
to userjoe
:GRANT admins TO joe;
- Eg. Grant membership in role
Links
- Application users vs. Row Level Security - 2ndQuadrant | PostgreSQL
- PostgreSQL: Documentation: 8.1: Database Roles and Privileges
- PostgreSQL: Documentation: 8.1: Role Attributes
Vertically Scaling Postgres (Finetuning with ZFS)
- https://people.freebsd.org/%7Eseanc/postgresql/scale15x-2017-postgresql_zfs_best_practices.pdf
- Everything I’ve seen on optimizing Postgres on ZFS
- Templates/References
TLS/SSL for postgres
- TLS/SSL in a trusted network is not really necessary for postgres
- But you still would want to have SCRAM even in trusted networks
Features
Temporary Tables
- Implemented like regular tables, but uses
temp_buffers
and uses disk for storing metadata and overflows etc. - Unlike regular tables, not guaranteed to be used by multiple connections at the same time, are not subject to locks, are not written to WAL, etc.
- Located in PostgreSQL system tables. In addition, for each table, one or more files are created on disk (by default, in the same folder as the files for regular tables).
- Allows different sessions to use the same temporary table name for different purposes
- Automatically dropped at the end of a session, or optionally at the end of the current transaction based on
ON COMMIT
Issues
- Autovacuum daemon cannot access and therefore cannot vacuum or analyze temporary tables.
- Need to be purged
DELETE ALL
bad cuz MVCC is used for Temporary tables aswell and deleting records is slow (See Concurrency)- So we
TRUNCATE
:TRUNCATE
simply creates a new file on disk and does an UPDATE of thepg_class
table.
Read more
UPSERT
-- canonical
INSERT INTO table (col1, col2, col3)
VALUES (val1, val2, val3)
ON
CONFLICT conflict_target conflict_action;
-- example
INSERT INTO employees (id, name, email)
VALUES (2, ‘Dennis’, ‘dennisp@weyland.corp’)
ON CONFLICT (id) DO UPDATE;
-- conflict_target: (id)
-- conflict_action: DO UPDATE
Granular Access Control
Row Level Security (RLS)
- It needs to be enabled on per-table basis
- Access control is fully modeled inside the database
- It doesn’t matter as who you connect to it anymore
- As long as your request can be appropriately authenticated into a database role for that row in the table you’re good.
- Row security policies can be specific to commands, or to roles, or to both.
Nested/Sub Transactions
BEGIN; -- Start the main transaction
-- Perform some operations
INSERT INTO accounts (account_id, amount) VALUES (1, 1000);
-- Create a savepoint
SAVEPOINT my_savepoint;
-- Perform operations within the nested transaction
INSERT INTO transactions (trans_id, account_id, amount) VALUES (101, 1, 100);
UPDATE accounts SET amount = amount - 100 WHERE account_id = 1;
-- Decide to roll back the nested transaction
ROLLBACK TO SAVEPOINT my_savepoint;
-- The nested transaction's changes are undone, but the main transaction is still in progress
-- Continue with other operations
INSERT INTO logs (message) VALUES ('Nested transaction was rolled back.');
-- Finally, commit the main transaction
COMMIT;
- See Database Transactions
- We want to avoid such nested transactions if possible.
- Let the application logic create the transaction and put things in. If possible don’t try to handle this at the DB level.
- Subtransactions can commit or abort without affecting their parent transactions, allowing parent transactions to continue.
- Started using
SAVEPOINT
/EXECEPTION
WAL
- Main idea: Changes to data files (where tables and indexes reside) must be written only after those changes have been logged
- With the WAL, we will be able to recover the database so we need not flush pages on every transaction
- When restoring the WAL after a crash, we’ll recover to the
checkpoint
created by thecheckpointer
Synchronous v/s Async commit
- It is possible to have both synchronous and asynchronous commit transactions running concurrently
- Certain transactions are always synchronous, eg. DROP TABLE, Two Phase Commit (2PC) prepared transaction etc.
-
Synchronous
Usually commits are synchronous, i.e if there’s a commit, there’ll be a WAL flush, and the commit is successful only if the flush was successful. (This is the default)
-
Asynchronous
- PostgreSQL allows you to do asynchronous commits aswell.
- This introduces the risk of data loss. (not data corruption)
- Should not be used if the client will take external actions relying on the assumption that the transaction will be remembered.
- But for things like Event Logging, maybe this would be fine
WAL flush v/s Data flush
- The WAL file is written sequentially, and so the cost of syncing the WAL is much less than the cost of flushing the data pages.
- If DB is handling many small concurrent transactions, one fsync of the WAL file may suffice to commit many transactions.
commit_delay
can be used to increase the window of transactions to be flushed.
Journaling filesystems
- In certain cases if the underlying Filesystem (the sqlite VFS interface) there’s journaling support, in those cases we can probably not use sqlite journaling and gain some performance gains but this is not needed usually and unnecessarily complex. This can also be vice versa, we can disable filesystem journaling too.
MVCC Implementation
See MVCC
VACUUM
See A Detailed Understanding of MVCC and Autovacuum Internals in PostgreSQL 14
Terms
- Live Tuples : Tuples that are Inserted or up-to-date or can be read or modified.
- Dead Tuples : Tuples that are changed (Updated/Deleted) and unavailable to be used for any future transactions. (This is what Vaccum clears)
Usecase
-
MVCC Version Cleanup
- VACUUM is like a Garbage collector for older versions of rows created by MVCC
transaction id
is a32-bit integer
- the
VACUUM
process is responsible for making sure that the id does not overflow. - Never disable the VACUUM, else transaction wraparound
- the
- See Database Transactions
- Run ANALYZE on tables
Freezing
autovaccum conditions
Views & Materialized views
See SQL for more info on “views” in general.
Maintaining views
-
Theory
- See pgivm docs
-
Implementation
REFRESH MATERIALIZED VIEW
- complete re-computation of contents
- Incremental View Maintenance (IVM) (pg extension)
- https://github.com/sraoss/pg_ivm
- more efficient compared to
REFRESH
Internals
Pages
- AKA Data blocks
- Page number starts from 0
Major updates and Replication
Major Upgrades
Other extra info on Postgres
Psycogpg2 notes
Core Concepts
Connection > Transaction (each
with conn
block) > Cursor
-
Connection
- Encapsulates a database session
- Thread Safe and can be shared among many threads.
- There’s a big problem here. (See my psycogpg2 FAQ section in this note)
- closing a connection without committing the changes first will cause any pending change to be discarded
-
Transactions
- transactions are handled by the
connection
class- The
connection
is responsible for terminating its transaction, calling either thecommit() or rollback()
- If
close()
is called or something goes wrong in middle of a transaction its dropped by the db servr
- The
autocommit
modeconnection
can be set to autocommit mode- no rollback possible
- All the commands executed will be immediately committed
- Transactions are per-session, i.e. per-connection.
- transactions are handled by the
-
Cursor
- Not thread safe
-
Client side cursor
- Allows Python code to execute PostgreSQL command in a database session(
connection
->transaction
) - Bound to a connection for the entire lifetime
- Allows Python code to execute PostgreSQL command in a database session(
-
Server side cursor
- When creating a cursor, if a name is specified, it’ll be a server side cursor
- By default a named cursor is declared without
SCROLL
andWITHOUT HOLD
Context manager (with)
after with exit, it is closed? | after with exit, transaction is committed? | |
---|---|---|
connection | NO | YES, if no exception. (i.e transaction is terminated) |
cursor | YES, any resources is released | NO, no notion of transaction here. |
with
can be used with bothconnection
andcursor
. So thewith
exit action only applies if its used to it. Eg. Having aconnection
but only usingwith
with thecursor
will NOT close theconnection
.- In a way using
with
forconnection
is more conceptually along the lines of using context manager for thetransaction
- In a way using
- A
connection
can be used in more than awith
statement (i.e more than one context)- Each with block is effectively wrapped in a separate
transaction
- Each with block is effectively wrapped in a separate
- Since context manager will not handle
connection.close()
for you, if you don’t want a long running database session. Make sure to call it wherever it seems fit.
psycogpg2 FAQ
- In the docs there are mentions of
command
, it simply means sql query. got me confused.
-
Do we need to commit SELECT(s)?
- SELECT is a transaction
- But you can get by without committing it, but always good to commit it (context managers will do it for you)
-
.commit() vs SQL COMMIT
- See python - Postgres: cursor.execute(“COMMIT”) vs. connection.commit() - Stack Overflow
- TLDR prefer whatever the api provides unless you have other reasons
-
On closing Transaction and Connections
- https://github.com/psycopg/psycopg2/issues/1044
- leaving a transaction open is a worse outcome than keeping a connection open
- “I prefer to have 300 leaking connections open than 3 transactions forgot open on a busy server.”
- Is it required to close a Psycopg2 connection at the end of a script?
- It’s good practice to as soon as you don’t need the connection
- Not closing the connection might have weird errors on logs pg side
-
What about using psycopg2 in multi-threaded setting?
- Multiple concerns
- According to the docs, the
connection
objects are thread-safe and shareable among threads.cursors
are not.- But cursors created from the same context (connection) are not isolated!
- On top of that it’s not super clear what does
connection
object being sharable among threads mean. - If if it does need to be very careful when sharing a connection across threads because commits and rollbacks are performed for the entire connection. If you have a multistep transaction and another thread performs a commit in the middle, you may not like the result.
- For all these general confusion, go with the general rule: Each thread should have its own database connection.
- Now we have 2 options
- Create a new connection in each thread and close it
- Something like:
with psycopg2.connect(your_dsn) as conn:
- Something like:
- Use a connection pool
- With psycogpg2 context manager is not supported with pools for, just something to be aware of.
- We should be releasing the connection instead of closing it
- Some more context: Transaction Handling with Psycopg2 | Libelli (might not be 100% correct)
- Create a new connection in each thread and close it
-
Long running connections
- The difficulty with long-lived connections is that you might not be entirely sure that they’re still there.
- Generally try to close the connection asap if possible
- If using something like a webserver, re-creation of connection can be expecisive. This is when you’d want to use connection pools, server or client side based on the architecture of your application
- The difficulty with long-lived connections is that you might not be entirely sure that they’re still there.
To read (TODO)
- https://www.psycopg.org/docs/usage.html
- https://www.psycopg.org/docs/advanced.html#connection-and-cursor-factories
- https://www.psycopg.org/docs/faq.html
- https://discuss.python.org/t/stuck-with-psycopg2-multithreading/39918/3
- https://www.timescale.com/blog/when-and-how-to-use-psycopg2/
Relevant Notes
- Initializing a postgres directory: After logging into the
postgres
user, you can create a directory with necessary postgres files with theinitdb
command. It creates a directory in the file system and then you can start a postgres server directing it to that directory. - Tablespaces: All tables are by default created in
pg_default
tablespace, creating in a tablespace does not affect the logical SQL schema.
Tools
Routing
Others
- Ways to capture changes in Postgres | Hacker News
- https://github.com/zalando/patroni
- https://github.com/pganalyze/libpg_query
- https://github.com/wal-g/wal-g
- https://github.com/pgbackrest/pgbackrest
- https://github.com/cybertec-postgresql/pgwatch2
- https://github.com/citusdata/pg_cron
- https://github.com/cybertec-postgresql/pg_timetable
Extension
- Postgres Extensions Overview | Supabase Docs
- Pg_jsonschema – JSON Schema Support for Postgres | Hacker News
- pgvector: Embeddings and vector similarity | Supabase Docs
Ecosystem
- psql: Psql Tips | Hacker News | https://github.com/okbob/pspg
- pgcli: CLI for Postgres with auto-completion and syntax highlighting. (Can be used instead of psql)
- gunnarmorling/pgoutput-cli: Examining the output of logical replication slots using pgoutput encoding
- https://github.com/dalibo/pg_activity : htop for pg
- https://gitlab.com/dmfay/pdot
Links
- Types of Indexes in PostgreSQL
- https://twitter.com/samokhvalov/status/1713094683159941508
- https://twitter.com/samokhvalov/status/1713094683159941508
- https://twitter.com/samokhvalov/status/1702812327261950130
- Explain Guide
- https://github.com/allaboutapps/integresql
- Five Tips For a Healthier Postgres Database in the New Year
- Making PostgreSQL tick: New features in pg_cron | Hacker News
- Show HN: Light implementation of Event Sourcing using PostgreSQL as event store | Hacker News
- https://blog.peerdb.io/real-time-change-data-capture-from-postgres-16-read-replicas
- Postgres WAL Files and Sequence Numbers
- Operating on a minimal two-core Postgres instance: Query optimization insight…
- An automatic indexing system for Postgres | Hacker News
- Postgres schema changes are still a PITA | Lobsters
Other comments
-
tw1
Some PostgreSQL footguns:
- default configuration
- long transactions mixed with OLTP workload
- repmgr (and other HA tools not based on consensus algorithms)
- LISTEN/NOTIFY
- “pg_dump is a utility for backing up a PostgreSQL database” says the official doc
- moving/copying PGDATA and ignoring glibc version change
- hot_standby_feedback on/off dilemma
- partitioning of tables having high number of indexes and receiving QPS >> 1k
- “setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions” says the official doc
- using replication slots w/o setting max_slot_wal_keep_size
- relying only on statement_timeout & idle_in_transaction_session_timeout and thinking it’s enough (lack of transaction_timeout)
- data types “money”, “enum”, “timestamp” (3 different cases)
- int4 PK
- massive DELETE
- attempts to use transactional version of DDL (e.g., CREATE INDEX) under heavy loads
- subtransactions
- DDL under load without lock_timeout