PostgreSQL · The Platform

PostgreSQL is the default database of this curriculum because it refuses to lose your data and refuses to let it go inconsistent. This track moves from “create a table” to transactions, isolation levels, indexes you can prove are used, JSONB, and vector search — tagged by level so you can read only as deep as you need.

Install Postgres and open psql

Beginner

Install PostgreSQL, then connect to it with the psql command-line client.

What psql is, and why start here

psql is the official interactive terminal for Postgres. Every concept in this track — transactions, EXPLAIN, locks — is something you can drive and observe directly in psql before any application code exists. Learning the database from its own shell means you understand what your driver is doing later, instead of treating it as a black box. The default superuser and database are both named postgres.

Connect and check the version

Run these in your terminal / editor

# macOS (Homebrew): brew install postgresql@16 && brew services start postgresql@16
# Debian/Ubuntu:   sudo apt-get install -y postgresql
# Docker (any OS):  docker run --name pg -e POSTGRES_PASSWORD=pw -p 5432:5432 -d postgres:16

# Connect (defaults to the 'postgres' user and database)
psql -h localhost -U postgres

-- inside psql:
SELECT version();
\l      -- list databases
\q      -- quit

Create a table with real constraints

Beginner

Create a products table where the database itself rejects bad rows — no nulls, no duplicates, no negative prices.

Constraints are correctness you can't forget to write

A constraint is a rule the database enforces on every write, no matter which client or service performs it. PRIMARY KEY makes a row uniquely addressable; NOT NULL forbids missing values; UNIQUE blocks duplicates; CHECK enforces a predicate like price >= 0. Putting these in the schema means the rule cannot be bypassed by a buggy service or a hand-typed query — the database is the last line of defence, and it never gets tired. Store money as numeric or integer minor units (cents), never float: binary floating point cannot represent 0.10 exactly and will silently drift.

A table that defends itself

Run these in your terminal / editor

CREATE TABLE products (
    id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    sku         text        NOT NULL UNIQUE,
    name        text        NOT NULL,
    unit_price  integer     NOT NULL CHECK (unit_price >= 0), -- price in cents
    stock       integer     NOT NULL DEFAULT 0 CHECK (stock >= 0)
);

INSERT INTO products (sku, name, unit_price, stock)
VALUES ('WIDGET-1', 'Widget', 1999, 100);

-- This is rejected by the CHECK constraint:
INSERT INTO products (sku, name, unit_price) VALUES ('BAD', 'Bad', -5);
-- ERROR:  new row for relation "products" violates check constraint "products_unit_price_check"

Chat prompt — paste into a chat to get the code

For a plain chat. It returns complete code; you paste it in yourself.

Role: PostgreSQL teacher. The reader has no database access here — return complete SQL.
Task: Design two tables for an order system — `customers` and `orders` — with proper constraints and a foreign key.
Requirements:
- customers: identity PK, email text NOT NULL UNIQUE, created_at timestamptz NOT NULL DEFAULT now().
- orders: identity PK, customer_id bigint NOT NULL REFERENCES customers(id), total_cents integer NOT NULL CHECK (total_cents >= 0), status text NOT NULL CHECK (status IN ('pending','paid','cancelled')).
- Money is integer cents; timestamps are timestamptz; no float anywhere.
Tests / acceptance (describe, since no DB):
- Inserting an order whose customer_id has no matching customer must be rejected by the FK.
- Inserting status 'shipped' must be rejected by the CHECK.
Output: the complete CREATE TABLE statements, no commentary.

Query with parameters, not string concatenation

Beginner

Run a SELECT that filters on a value, and learn why you pass the value as a parameter rather than building the SQL string.

Parameters are the SQL-injection fix and a speed-up

When you concatenate user input into a SQL string, a value like '; DROP TABLE products; -- becomes executable SQL — that’s injection. Parameterised queries send the SQL and the values over separate channels, so a value is always treated as data, never as code. In psql you can prototype with \set-style variables; in application code your driver exposes placeholders ($1, $2). As a bonus the planner can reuse a prepared plan across calls. Make “always parameterise” a reflex, not a judgement call.

Filter safely

Run these in your terminal / editor

-- psql: prepared statement makes the placeholder explicit
PREPARE find_sku (text) AS
    SELECT id, name, unit_price FROM products WHERE sku = $1;

EXECUTE find_sku('WIDGET-1');

-- ordering, limiting, projecting only what you need
SELECT name, unit_price
FROM products
WHERE unit_price < 5000
ORDER BY unit_price ASC
LIMIT 10;

Wrap multi-step writes in a transaction

Intermediate

Group related writes inside BEGIN … COMMIT so they all succeed or all roll back together.

ACID, and why all-or-nothing is the whole point

A transaction is a unit of work that is atomic (all statements commit or none do), consistent (it moves the database from one valid state to another, constraints intact), isolated (concurrent transactions don’t see each other’s half-done work), and durable (once committed, it survives a crash). The classic example is moving stock and recording an order: if the order insert succeeds but the stock decrement fails, you’ve sold something you don’t have. BEGIN starts the transaction, COMMIT makes it permanent, and ROLLBACK (or any error) throws the whole thing away. This is the single most important reason to choose a relational database for a system of record.

Atomic order + stock decrement

Run these in your terminal / editor

BEGIN;

UPDATE products
SET stock = stock - 1
WHERE id = 1 AND stock >= 1;   -- the WHERE guards against overselling

-- imagine an INSERT INTO orders ... here

-- If anything above failed or returned 0 rows, undo everything:
-- ROLLBACK;

COMMIT;

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer working against this Postgres schema.
Context: Tables products(id, stock, unit_price) and orders(id, product_id, qty) exist. Connection via psql or any driver; money is integer cents.
Task: Write a single SQL transaction that reserves `qty` units of one product and records an order, atomically.
Requirements:
- Wrap the work in BEGIN ... COMMIT.
- Decrement products.stock only WHERE stock >= qty so it cannot go negative.
- If the UPDATE affects 0 rows (insufficient stock), ROLLBACK and surface an error instead of inserting the order.
- Use parameters/placeholders for product_id and qty; no string interpolation.
Tests / acceptance:
- Running it for a product with stock 0 leaves both tables unchanged (verify with SELECT before/after).
- Running it for a product with enough stock decrements stock by exactly qty and inserts exactly one order row.
Output: a SQL script (or function) plus a one-paragraph explanation of the rollback path.

Add indexes and prove they are used

Intermediate

Create an index for a slow query, then run EXPLAIN ANALYZE to confirm the planner actually uses it.

The planner decides — EXPLAIN ANALYZE shows the truth

An index is a sorted side structure that lets Postgres find rows without scanning the whole table. The default B-tree index serves equality and range queries on ordinary columns. But adding an index is a hypothesis, not a fact — the query planner is cost-based and may still choose a sequential scan if the table is tiny or the filter is unselective. EXPLAIN shows the planned strategy; EXPLAIN ANALYZE actually runs the query and reports real row counts and timings. Read it bottom-up: look for Index Scan (good) versus Seq Scan over a large table (a clue you may be missing an index). A partial index (WHERE …) covers only the rows you query, staying small and cheap to maintain.

Index, then verify

Run these in your terminal / editor

-- Equality/range index on a frequently filtered column
CREATE INDEX idx_products_sku ON products (sku);

-- A partial index: only the rows your hot query touches
CREATE INDEX idx_products_in_stock ON products (id) WHERE stock > 0;

-- Did the planner use it? Read bottom-up; look for "Index Scan".
EXPLAIN ANALYZE
SELECT id, name FROM products WHERE sku = 'WIDGET-1';

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior database engineer working against this Postgres database.
Context: A `products` table has grown to hundreds of thousands of rows. A query filtering on `sku` is slow.
Task: Diagnose the slow query and add the right index, proving the improvement with EXPLAIN ANALYZE.
Requirements:
- Run EXPLAIN ANALYZE on the query BEFORE any change and capture the plan (note Seq Scan vs Index Scan and the actual time).
- Add a B-tree index on the filtered column; if the query only ever filters in-stock rows, prefer a partial index WHERE stock > 0.
- Run EXPLAIN ANALYZE AFTER and confirm the plan now uses an Index Scan and the time dropped.
- Do not add indexes that the plan does not use — justify each one with its plan.
Tests / acceptance:
- The before/after EXPLAIN ANALYZE output is included; the after-plan shows an Index Scan on the new index.
- Reported execution time is lower after the index.
Output: the CREATE INDEX statement(s), the two EXPLAIN ANALYZE outputs, and a one-paragraph read of them.

Store and query JSON with JSONB

Intermediate

Add a jsonb column for flexible attributes, then query inside it and index it with GIN.

JSONB gives you document flexibility without leaving Postgres

jsonb stores JSON in a decomposed binary form that is fast to query and index — unlike json, which keeps the raw text. You get document-style flexibility for genuinely variable data (product attributes, event payloads) while keeping your relational columns, constraints, and transactions for the structured core. Query with the -> (returns json) and ->> (returns text) accessors, test containment with the @> operator, and accelerate containment queries with a GIN index — the index type built for “does this document contain that key/value”. The lesson: you rarely have to choose between relational and document — Postgres does both, in one ACID engine.

A jsonb column, queried and indexed

Run these in your terminal / editor

ALTER TABLE products ADD COLUMN attributes jsonb NOT NULL DEFAULT '{}';

UPDATE products
SET attributes = '{"color":"blue","tags":["clearance","new"]}'
WHERE sku = 'WIDGET-1';

-- Extract a field (->> returns text)
SELECT name, attributes ->> 'color' AS color FROM products;

-- Containment: rows whose attributes include color = blue
SELECT name FROM products WHERE attributes @> '{"color":"blue"}';

-- GIN index makes the @> containment query fast
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

Chat prompt — paste into a chat to get the code

For a plain chat. It returns complete code; you paste it in yourself.

Role: PostgreSQL teacher. The reader has no database access — return complete SQL.
Task: Show how to model semi-structured product attributes in a jsonb column and query them efficiently.
Requirements:
- Add a jsonb column `attributes` defaulting to '{}'.
- Provide a query that returns products containing a given tag, using the @> containment operator (e.g. attributes @> '{"tags":["clearance"]}').
- Add a GIN index that accelerates that containment query.
- Explain in one line when to use ->> (text) versus -> (json).
Tests / acceptance (describe, since no DB):
- EXPLAIN on the containment query should be able to use the GIN index (Bitmap Index Scan).
- The same query without the index would fall back to a Seq Scan.
Output: the complete SQL (ALTER, sample UPDATE, the containment SELECT, the GIN index), no commentary.

Enforce integrity across tables with foreign keys

Intermediate

Link orders to products with a foreign key so an order can never reference a product that doesn’t exist.

Referential integrity is a guarantee, not a convention

A FOREIGN KEY constraint ties a column in one table to a primary key in another and makes the database enforce the relationship on every insert, update, and delete. Without it, an application bug can leave orphan rows — an order pointing at a deleted product — that quietly corrupt reports and break joins. With it, the database rejects the bad write, and ON DELETE actions (RESTRICT, CASCADE, SET NULL) let you declare exactly what should happen to children when a parent goes away. This is integrity you get for free, forever, instead of integrity you hope every service remembers to check.

A foreign key with an explicit delete policy

Run these in your terminal / editor

CREATE TABLE orders (
    id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    product_id  bigint  NOT NULL REFERENCES products (id) ON DELETE RESTRICT,
    qty         integer NOT NULL CHECK (qty > 0),
    created_at  timestamptz NOT NULL DEFAULT now()
);

-- Rejected: no product with id 999999 exists
INSERT INTO orders (product_id, qty) VALUES (999999, 1);
-- ERROR:  insert or update on table "orders" violates foreign key constraint

-- Index the FK column: joins and ON DELETE checks both benefit
CREATE INDEX idx_orders_product_id ON orders (product_id);

Prevent oversell with SELECT ... FOR UPDATE

Advanced

Use a row lock so two concurrent checkouts can’t both buy the last unit in stock.

MVCC, row locks, and the lost-update problem

Postgres uses MVCC (Multi-Version Concurrency Control): readers see a consistent snapshot and don’t block writers. That’s great for throughput, but it means two transactions can both read stock = 1, both decide it’s fine, and both decrement — overselling. SELECT ... FOR UPDATE takes a row-level lock: the first transaction to grab the row makes the second wait until the first commits, so the second sees the already-decremented value and correctly refuses. This pessimistic lock is the simplest correct fix for “reserve the last unit” style contention. Keep the locked section short — you are serialising everyone who wants that row.

Lock the row, then decide

Run these in your terminal / editor

BEGIN;

-- Lock this product's row; a concurrent checkout blocks here until we COMMIT
SELECT stock FROM products WHERE id = 1 FOR UPDATE;

-- Now we hold the row exclusively: safe to check and decrement
UPDATE products SET stock = stock - 1 WHERE id = 1 AND stock >= 1;

COMMIT;

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer working against this Postgres database.
Context: products(id, stock) with stock = 1. Two checkout requests arrive at the same time for the same product.
Task: Implement a checkout transaction that uses SELECT ... FOR UPDATE so exactly one of two concurrent buyers wins the last unit.
Requirements:
- BEGIN; SELECT stock FROM products WHERE id = $1 FOR UPDATE; then conditionally UPDATE stock = stock - 1 WHERE stock >= 1; COMMIT.
- If stock is insufficient after acquiring the lock, ROLLBACK and report "out of stock" — do not let stock go negative.
- Parameterise the product id.
Tests / acceptance:
- Open two psql sessions. In session A run BEGIN + the FOR UPDATE select but do NOT commit; observe session B's same SELECT FOR UPDATE blocks.
- After A commits its decrement (stock -> 0), B unblocks, sees stock 0, and its checkout fails cleanly.
- Final stock is exactly 0, and exactly one order was recorded.
Output: the transaction SQL plus a step-by-step of the two-session test you ran.

Reach for SERIALIZABLE and retry on 40001

Advanced

For the strictest correctness, run the transaction at SERIALIZABLE isolation and retry when Postgres reports a serialization failure.

Isolation levels, and why SERIALIZABLE needs a retry loop

Isolation level controls how much concurrent transactions can affect each other. The Postgres default is READ COMMITTED: each statement sees rows committed before it started, which still allows some write-skew anomalies. SERIALIZABLE is the strongest level — it guarantees the result is as if transactions ran one at a time, catching anomalies that row locks alone miss. The trade-off: instead of blocking, Postgres may abort a transaction at commit time with a serialization failure, SQLSTATE 40001. That is not a bug — it’s the database telling you to retry the whole transaction. So SERIALIZABLE always comes with a bounded retry loop in application code. Use it when correctness under concurrency matters more than squeezing out the last bit of throughput.

Serializable, and the error you must handle

Run these in your terminal / editor

BEGIN ISOLATION LEVEL SERIALIZABLE;

-- ... your reads and writes that must be anomaly-free ...

COMMIT;
-- If this transaction conflicted, COMMIT raises:
-- ERROR:  could not serialize access due to read/write dependencies among transactions
-- SQLSTATE 40001  ->  the application should retry the whole transaction from BEGIN.

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer adding correctness-critical logic against this Postgres database.
Context: A balance-transfer (or inventory-reservation) operation must be free of write-skew under concurrency. Use any driver/language already in the repo.
Task: Implement the operation at ISOLATION LEVEL SERIALIZABLE wrapped in a bounded retry loop on serialization failures.
Requirements:
- Begin each attempt with BEGIN ISOLATION LEVEL SERIALIZABLE.
- On commit, detect SQLSTATE 40001 (serialization_failure) and retry the ENTIRE transaction from the start.
- Cap retries (e.g. 5 attempts with small backoff); after the cap, surface a clear error to the caller.
- Do NOT swallow other SQL errors — only 40001 triggers a retry; everything else propagates.
Tests / acceptance:
- A test launches two concurrent transactions that conflict; one commits, the other gets 40001 and is retried, and the final state is correct (no double-spend / no oversell).
- A unit test asserts that a non-40001 error is NOT retried.
Output: a unified diff plus a one-paragraph explanation of why a retry loop is mandatory at this level.

Add semantic search with pgvector

Advanced

Install the pgvector extension, store embedding vectors in a vector column, and find nearest neighbours by distance.

Why vectors live next to your rows

An embedding is a fixed-length array of floats that captures the meaning of text or an image; similar items have nearby vectors. pgvector is an official Postgres extension that adds a vector type and distance operators — <-> (L2), <=> (cosine), <#> (inner product) — so “find the most similar products” becomes an ORDER BY embedding <=> $1 LIMIT k query. Keeping vectors in Postgres means your semantic search shares the same transactions, constraints, and backups as the rest of your data — no second database to keep in sync. For large tables, an approximate index (HNSW or IVFFlat) trades a little recall for a large speed-up. You generate the embeddings with a model; Postgres stores and searches them.

Vectors and nearest-neighbour search

Run these in your terminal / editor

-- The extension ships with most managed Postgres; install it once per database
CREATE EXTENSION IF NOT EXISTS vector;

ALTER TABLE products ADD COLUMN embedding vector(768);  -- dimension must match your model

-- Approximate-nearest-neighbour index using cosine distance
CREATE INDEX idx_products_embedding
    ON products USING hnsw (embedding vector_cosine_ops);

-- The k nearest products to a query embedding (<=> is cosine distance)
SELECT id, name
FROM products
ORDER BY embedding <=> $1
LIMIT 5;

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior engineer adding semantic search to a product catalogue on this Postgres database.
Context: pgvector is available (CREATE EXTENSION vector succeeds). Embeddings are produced by an external model that returns 768-dimension vectors. Use the repo's existing language/driver.
Task: Add a vector(768) column to products, index it for cosine similarity with HNSW, and implement a "find similar products" query.
Requirements:
- CREATE EXTENSION IF NOT EXISTS vector, then ALTER TABLE ... ADD COLUMN embedding vector(768).
- Build an HNSW index with vector_cosine_ops; the search query must ORDER BY embedding <=> $1 LIMIT k and use a parameter for the query vector.
- The embedding dimension in the column type must equal the model's output dimension (768) — assert/document this.
- Do not fabricate the embedding model API; treat embeddings as input the caller supplies.
Tests / acceptance:
- After inserting rows with known embeddings, a query with the embedding of one row returns that row first (distance ~0).
- EXPLAIN on the search query shows the HNSW index being used once the table is large enough.
Output: a unified diff plus a one-paragraph note on the recall/speed trade-off of the ANN index.

Operate it: backups, VACUUM, and connection limits

Advanced

Schedule logical backups with pg_dump, understand why VACUUM reclaims space, and front Postgres with a connection pooler.

The three operational facts that keep a Postgres healthy

First, backups: pg_dump produces a consistent logical snapshot you can restore with pg_restore — schedule it and test the restore, because an untested backup is a rumour. Second, VACUUM: because MVCC keeps old row versions until no transaction can see them, deleted/updated rows leave “dead tuples”. Autovacuum reclaims them and updates the planner’s statistics; if you ever see tables bloating or plans going stale, autovacuum tuning is where you look. Third, connections: each Postgres connection is a real backend process, so thousands of direct app connections will exhaust memory. Put a pooler (PgBouncer, or your managed provider’s built-in pooler) in front, and size the pool to the database, not the app fleet. These three habits separate a database that survives production from one that surprises you at 3am.

Backup, restore, and vacuum

Run these in your terminal / editor

# Logical backup of one database (custom format -> use pg_restore)
pg_dump -h localhost -U postgres -Fc shop > shop.dump

# Restore into a fresh database (always test this path before you need it)
createdb -h localhost -U postgres shop_restored
pg_restore -h localhost -U postgres -d shop_restored shop.dump

-- Reclaim dead tuples and refresh planner statistics for one table
VACUUM (ANALYZE) products;

-- See how many dead tuples are waiting and when autovacuum last ran
SELECT relname, n_dead_tup, last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Where to take it next

Make these transactions load-bearing in Aurora Commerce, where Postgres is the source of truth for orders, inventory, and checkout under concurrency.
Add an in-memory cache or queue alongside it — see Redis, which complements Postgres rather than replacing it.
Weighing a document store for a schemaless corner of the system? Compare with MongoDB.