← Back to the course

This is the production spec — the contract the course builds toward. The guided course teaches you to reach exactly this runnable result. Skim it if you'd rather build straight from the target.

Project Spec — Concord

Single source of truth for the Concord course (src/content/projects/concord.mdx). The course must teach toward exactly this runnable system. Spotlight: Rust CRDT collaboration (tier: demanding ⇒ backends are Go + Rust only). Both backends implement the same wire contract and reach the same running result — Go is the honest hand-rolled alternative, not a stub.


1. Overview & definition of done

Concord is the sync core of a Google-Docs-style collaborative text editor. Many people type into one shared document at once; every keystroke from every collaborator merges correctly and converges to the same final text on every replica, regardless of order, duplicates, or a collaborator who went offline and came back. There is no lock, no “resolve conflict” dialog, and no lost keystrokes — that guarantee is the whole project, delivered by a CRDT (Conflict-free Replicated Data Type).

Definition of done — the learner ends with a running system they can see work, for $0, locally:

  1. docker compose up -d brings up Redis + Postgres.
  2. The chosen backend (go or rust) builds and runs a WebSocket sync server on :8080.
  3. The headline demo: open two WebSocket clients to the same room (two browser tabs of the bundled demo.html, or two websocat shells). Type in both at the same position at the same time — both converge to one agreed document. Disconnect one client, keep typing in it (ops queue locally), reconnect — its queued edits replay and both replicas end byte-equal. The learner watches this happen.
  4. A convergence property test passes (go test -run TestConvergence -race / cargo test convergence): the same op set applied in many random orders, with duplicates and out-of-order arrivals, yields one text.
  5. A room survives a server restart (snapshot in Postgres → reload → byte-equal).
  6. The native editor (Jetpack Compose / Flutter / SwiftUI) renders the live document with remote cursors.

Everything runs on free, local infrastructure: Docker (Redis + Postgres), the Go or Rust toolchain, a mobile emulator/simulator, and — only for the optional AI feature modules — a free Google AI Studio key.


2. Architecture (diagram-in-prose)

   client A ─┐                                         ┌─ client B
 (tab/app)   │   WebSocket  GET /rooms/{id}/ws         │  (tab/app)
             ▼                                          ▼
        ┌─────────────────────┐   pub/sub      ┌─────────────────────┐
        │  sync server inst. 1 │◄──room:{id}───►│  sync server inst. 2 │
        │  ┌───────────────┐  │   (Redis)       │  ┌───────────────┐  │
        │  │ Room (in-mem) │  │                 │  │ Room (in-mem) │  │
        │  │  CRDT Doc     │  │                 │  │  CRDT Doc     │  │
        │  └───────────────┘  │                 │  └───────────────┘  │
        └─────────┬───────────┘                 └──────────┬──────────┘
                  │ snapshot (BYTEA) every N ops / T sec    │
                  └──────────────► Postgres ◄───────────────┘
                                (snapshots, versions, doc_embeddings)
  presence: presence:{room} sorted set (member=user, score=expiry) + presence pub/sub messages
  • Transport: each client holds one WebSocket to GET /rooms/{id}/ws. The server upgrades, sends the newcomer a one-time sync message (the current document state), then streams live op/presence frames.
  • Merge: every room owns one in-memory CRDT Doc. A local edit is applied to the Doc and the resulting change is broadcast; a remote change is applied to the Doc. The CRDT guarantees convergence: apply the same changes in any order, with duplicates, and every Doc reaches the same text.
  • Fan-out: with more than one server instance, a client on instance 1 must see edits from a client on instance 2. Each instance publishes every applied change to the Redis channel room:{id} and subscribes to it, so all instances (and all their clients) stay in sync. Redis is the realtime backbone.
  • Durability: periodically (every N ops or T seconds) each room serializes its convergent state to a snapshots row in Postgres. Restart = load the latest snapshot, replay any later ops. This compaction bounds growth (an append-only op log grows forever; tombstones accumulate).
  • Presence: who’s online and where their cursor is, is ephemeral — never persisted into the document. A per-room Redis sorted set presence:{room} (member = user id, score = expiry timestamp) tracks membership with a heartbeat-refreshed TTL semantics; cursor moves broadcast as presence pub/sub messages. (Use a sorted set + ZADD/ZRANGEBYSCORE, never KEYS — see the Redis track’s SCAN lesson.)

3. Runnable structure (the repo the learner ends with)

Both backends share the layout idea; the engine differs. The app entrypoint composes everything: config from env, the Redis client, the Postgres pool, the room registry, the HTTP router with the WS route, and a graceful shutdown.

3.1 Go (backend="go") — github.com/you/concord

concord/
  docker-compose.yml
  db/schema.sql                 # snapshots, versions, doc_embeddings (features)
  web/demo.html                 # bundled 2-client demo page (served at GET /)
  cmd/server/main.go            # ENTRYPOINT: compose router + WS route + Redis + pgx + shutdown
  internal/crdt/rga.go          # hand-rolled RGA: Doc, Elem, ID, ApplyInsert, ApplyDelete, String
  internal/crdt/rga_test.go     # convergence property test
  internal/crdt/codec.go        # Doc <-> []byte (gob): elements + pending buffer (Snapshot/Load)
  internal/room/room.go         # Room: per-room Doc + mutex, ApplyLocal, ApplyRemote, Snapshot
  internal/room/registry.go     # Rooms map[string]*Room (Arc<Mutex>-equivalent ownership)
  internal/wire/wire.go         # the shared wire message types (Msg, Op, ID) — one canonical shape
  internal/hub/redis.go         # Publish(room, payload) + Subscribe(room) over go-redis
  internal/store/snapshots.go   # SaveSnapshot/LoadSnapshot (pgx); versions + embeddings (features)
  internal/server/server.go     # handleWS: upgrade, sync, read loop, presence; HTTP handlers

Entrypoint contract — cmd/server/main.go (composes the whole thing):

  • reads PORT (default 8080), REDIS_URL, DATABASE_URL from env,
  • opens redis.Client and pgxpool.Pool (one each, closed on shutdown),
  • builds the room.Registry (lazy-loads each room from its latest Postgres snapshot on first access),
  • mounts GET /rooms/{id}/ws, GET / (serves web/demo.html), GET /healthz, and the feature endpoints,
  • http.Server with Shutdown(ctx) on SIGINT/SIGTERM (closes rooms’ subscriber goroutines).

Key types (Go): Doc and Room are concrete structs; the method sets below are what the course builds on them (*Doc, *Room). Hub and Store are the seams kept as interfaces.

// internal/crdt
type ID struct{ Site string; Counter uint64 }
type Elem struct{ ID, After ID; Value rune; Deleted bool }

// concrete struct; key methods on *Doc:
//   ApplyInsert(e Elem)     idempotent; buffers if e.After unseen (causal readiness)
//   ApplyDelete(id ID)      tombstone; idempotent
//   String() string         visible text, tombstones skipped
//   Snapshot() ([]byte, error) / Load(b []byte) error   gob: placed elements + pending buffer
type Doc struct{ /* elems, index, pending, pendDel */ }

// internal/room
// concrete struct; key methods on *Room:
//   ApplyLocal(op wire.Op) (broadcast wire.Op)  apply to Doc, return op to publish
//   ApplyRemote(op wire.Op)                      apply a peer's op (idempotent)
//   SyncState() []byte                            current full state for a newcomer
//   Text() string
type Room struct{ /* doc *crdt.Doc + mutex */ }

// internal/hub
type Hub interface {
    Publish(ctx context.Context, room string, payload []byte) error
    Subscribe(ctx context.Context, room string) (<-chan []byte, error)
}

// internal/store
type Store interface {
    SaveSnapshot(ctx context.Context, room string, state []byte, opSeq int64) error
    LoadSnapshot(ctx context.Context, room string) (state []byte, opSeq int64, err error)
}

3.2 Rust (backend="rust") — concord crate

concord/
  docker-compose.yml
  db/schema.sql
  web/demo.html
  Cargo.toml
  src/main.rs                   # ENTRYPOINT: axum Router + WS route + Redis + sqlx pool + shutdown
  src/wire.rs                   # serde wire types (Msg) — same JSON contract as Go
  src/room.rs                   # Room { doc: yrs::Doc }, apply_local, apply_remote, sync_state
  src/registry.rs               # Rooms: Arc<Mutex<HashMap<String, Arc<Mutex<Room>>>>>
  src/hub.rs                    # redis pub/sub publish + subscribe
  src/store.rs                  # sqlx: save/load snapshot; versions + embeddings (features)
  src/server.rs                 # ws handler: upgrade, sync, recv loop, presence
  tests/convergence.rs          # convergence property test

Entrypoint contract — src/main.rs: #[tokio::main] builds the axum::Router with GET /rooms/{id}/ws (axum 0.8 {id} path syntax), GET / (demo), GET /healthz, and feature routes; shares a Registry via axum::extract::State (the room state is the canonical Arc<Mutex<…>> shared-mutable-state problem from the Rust track); axum::serve(...).with_graceful_shutdown(...).

Key types (Rust): Room { doc: yrs::Doc } with apply_local(edit) -> Vec<u8> (the encode_update_v2() bytes), apply_remote(&[u8]) -> Result<()> (decode + apply_update), and sync_state() -> Vec<u8> (encode_state_as_update_v2(&StateVector::default())).


4. Data model

Local Postgres (postgres:16, or pgvector/pgvector:pg16 once the semantic-search feature is on).

4.1 Base build

-- the durable convergent truth: one latest snapshot per room
CREATE TABLE snapshots (
  room_id    TEXT PRIMARY KEY,
  state      BYTEA       NOT NULL,   -- encoded CRDT state (yrs update bytes / Go gob)
  op_seq     BIGINT      NOT NULL,   -- last op sequence included
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Prerequisite/seed rows: none for the base build — a room is created lazily on first WebSocket connect (no FK behind it), and its first snapshot is written on the first compaction cadence. The demo page targets room demo, which exists implicitly the moment a client connects to /rooms/demo/ws. (The versions and doc_embeddings tables below have room_id TEXT columns but deliberately no FK to snapshots so a version/embedding can be captured before the first snapshot row exists.)

4.2 Feature: history

CREATE TABLE versions (
  id         BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  room_id    TEXT        NOT NULL,
  label      TEXT        NOT NULL,            -- "before the rewrite"
  author     TEXT        NOT NULL,
  state      BYTEA       NOT NULL,            -- FULL encoded state at this point
  op_seq     BIGINT      NOT NULL,            -- last op covered
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX versions_room_time ON versions (room_id, created_at DESC);

-- optional persisted op-log (if you compact a table rather than an in-memory log)
CREATE TABLE ops (
  room_id TEXT   NOT NULL,
  seq     BIGINT NOT NULL,
  op      BYTEA  NOT NULL,
  PRIMARY KEY (room_id, seq)
);
CREATE EXTENSION IF NOT EXISTS vector;          -- needs pgvector/pgvector:pg16 image
CREATE TABLE doc_embeddings (
  room_id    TEXT PRIMARY KEY,
  embedding  vector(768) NOT NULL,              -- MUST match the model output dimension
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX doc_embeddings_hnsw
  ON doc_embeddings USING hnsw (embedding vector_cosine_ops);

Accuracy invariant (embeddings): Gemini pre-normalizes only the 3072-dim output; for vector(768) the server must L2-normalize the embedding to unit length before storing/comparing so cosine distance is meaningful. (Link the embeddings docs; don’t pin a model id.)


5. API & event contract (canonical — every step, client, and test shares this)

5.1 WebSocket: GET /rooms/{id}/ws

On upgrade the server immediately sends one sync. Then both directions exchange op and presence. The op payload is one canonical envelope for BOTH backends; only the inside of op.data differs by engine — this is the contract that resolves the baseline’s wire-protocol contradiction:

// server → client, once on connect (full current state, base64 of the engine's encoding):
{ "t": "sync", "state": "<base64>" }

// either direction, per edit. The SAME envelope for Go and Rust:
//  - Go  (hand-rolled RGA): op.data is the JSON RGA op (kind/id/after/value)
//  - Rust (yrs):            op.data is base64 of encode_update_v2() bytes
{ "t": "op", "seq": 41, "data": { "kind": "insert", "id": {"site":"a3f9","counter":42},
                                  "after": {"site":"a3f9","counter":41}, "value": "h" } }   // Go
{ "t": "op", "seq": 41, "data": { "update": "<base64 yrs update>" } }                        // Rust

// ephemeral; broadcast, never persisted:
{ "t": "presence", "user": "kerim", "cursor": 128, "color": "#2DD4BF" }

Rules shared by both engines:

  • t{sync, op, presence}; unknown t is ignored (forward-compatible).
  • op is idempotent and order-independent — re-applying or reordering ops converges (the CRDT guarantee). A client that reconnects re-applies any locally-queued ops; duplicates are harmless.
  • sync.state is base64 of the engine’s full-state encoding (Go: gob of the element list + pending buffer; Rust: encode_state_as_update_v2(&StateVector::default())).
  • The frontend contract is engine-aware: a Go client emits/receives RGA ops keyed by element id; a yrs client round-trips opaque update bytes. The course states this per backend so the client matches.

5.2 HTTP endpoints

MethodPathBodyResponseCodesModule
GET/healthzok200base
GET/web/demo.html200base
GET/rooms/{id}/ws(WS upgrade)WS stream101, 400 (not WS)base
POST/rooms/{id}/copilot{ "selection": "...", "mode": "rewrite|continue" }{ "suggestion": "..." }200, 400 (bad mode), 502 (upstream)ai-copilot
POST/rooms/{id}/versions{ "label": "...", "author": "..." }{ "id": 12, "op_seq": 41 }201, 400history
GET/rooms/{id}/versions[{ "id", "label", "author", "op_seq", "created_at" }]200history
GET/rooms/{id}/diff?from=A&to=B{ "added": [...], "removed": [...], "moved": [...] }200, 400history
POST/rooms/{id}/restore{ "version_id": 7 }{ "applied": true }200, 404history
GET/rooms/{id}/search?q=...&k=5[{ "room_id", "similarity" }]200semantic-search
GET/rooms/{id}/backlinks?k=5[{ "room_id", "similarity" }]200semantic-search
POST/rooms/{id}/ai{ "instruction": "..." }{ "edits": [...pending...], "summary": "..." }200, 400ai-commands

/restore, /copilot-accept, and /ai-apply do not write document state directly — they emit ordinary CRDT ops through the same local-apply + broadcast path as a keystroke (no privileged write path).

5.3 Redis channels & keys

  • room:{id} — pub/sub channel; every applied op (and every presence) is published here; every instance subscribes. Payload = the op/presence JSON envelope above.
  • presence:{room}sorted set, member = user id, score = unix-ms expiry. Refresh on heartbeat (ZADD presence:{room} <now+ttl> <user>); list live members with ZRANGEBYSCORE presence:{room} <now> +inf and sweep expired with ZREMRANGEBYSCORE presence:{room} -inf <now>. Never KEYS.

6. Build order (dependency-ordered; each step’s prerequisites already exist)

  1. Concept — why CRDTs beat locks & OT (no code).
  2. Infradocker compose up -d (Redis + Postgres) + the language-toolchain / websocat prereqs + how to get a GEMINI_API_KEY (only needed by AI features).
  3. Model — the sequence-CRDT element shape (id, after, value, tombstone).
  4. Wire contract — the canonical sync/op/presence envelope, with the per-engine op.data caveat (Go = RGA op JSON; Rust = base64 yrs update). Everything downstream references this.
  5. Backend scaffold (go|rust) — WS server: upgrade GET /rooms/{id}/ws, send hello, echo (transport only).
  6. ★ Merge (go|rust) — the engine.
    • Rust: hold a yrs Doc; apply_local/apply_remote; unit-test convergence.
    • Go: bridge step first — a compiling, converging in-order RGA (ApplyInsert no-buffer case + String()), then the AgentPrompt extends it to full causal-readiness buffering + tie-break.
  7. Doc serialization (go) — define the gob codec once (elements + pending buffer); Rust uses encode_state_as_update_v2. Consumed by sync, snapshot, history.
  8. App entrypoint / composition (go|rust) — wire router + WS route + Redis client + Postgres pool + room registry + graceful shutdown into main. This is the step that makes everything runnable.
  9. Redis fan-out (go|rust) — publish each op to room:{id}; subscribe + apply; send newcomers sync.
  10. Presence (common) — sorted-set membership + presence pub/sub (placed next to fan-out, before the UI that consumes presence). No KEYS.
  11. Headline demo (common) — serve web/demo.html; open two clients, watch concurrent edits converge.
  12. Offline reconnect proof (common) — disconnect a client, queue edits, reconnect, replay, assert byte-equal. The promised payoff, built and tested.
  13. Snapshot + compaction (go|rust) — persist state to Postgres; restart → byte-equal.
  14. Convergence property test (go|rust) — the one property you must prove.
  15. Frontend (compose|flutter|swiftui) — editor screen; includes the index↔element-id bridge so a flat text widget’s (offset, len) change maps to engine ops and back.
  16. Deploy (common) — Cloud Run + Cloud SQL + Memorystore; reconnect-and-resync tied to the sync path.
  17. Features (off by default) — ai-copilot, history, semantic-search, ai-commands.

7. Backends — parity points (Go default + Rust spotlight, both complete)

MilestoneGo (hand-rolled)Rust (yrs)Same contract
WS scaffoldcoder/websocket, r.PathValue("id")axum 0.8 {id}, WebSocketUpgradeGET /rooms/{id}/ws, hello frame
MergeRGA: ApplyInsert/ApplyDelete/String, buffer + tie-breakyrs Doc: text.insert/remove_range, encode_update_v2both converge; op envelope §5.1
Serialize stategob: elements + pending bufferencode_state_as_update_v2(&StateVector::default())sync.state = base64 of it
Fan-outgo-redis Publish/Subscriberedis crate pub/subchannel room:{id}
Snapshotgob into snapshots.stateupdate bytes into snapshots.statesame table, same restart guarantee
Property testtesting/quick / seeded loop, -raceproptest / seeded loopidentical-final-text assertion
Entrypointcmd/server/main.gosrc/main.rssame routes, same shutdown discipline

Honest parity note (the Go≠stub guarantee): the Go path ships a running RGA, its own serialization (buffer included), the same Redis/Postgres/presence wiring, and the same demo + offline-reconnect proof. Go scores 3 not because it’s incomplete but because you feel the missing CRDT ecosystem — you own correctness yrs gives Rust for free. That felt cost (more code, hot-path GC pressure on every keystroke) is the lesson behind the techFit rust 5 / go 3 rating.


8. Optional feature modules (off by default; each extends, never rewrites, the base)

  • ai-copilotPOST /rooms/{id}/copilot asks Gemini for a rewrite/continuation of a selection; accepting it emits ordinary CRDT ops (delete the selection’s ids + insert the new text). Server-side key; the AI never writes directly. (2 steps.)
  • historyversions table; capture/list/diff(by stable id)/restore (non-destructively, as ops); compact the op-log behind the lowest retained version. Generalizes the base snapshot into event-sourced named versions. (5 steps.)
  • semantic-search — pgvector + HNSW; embed each persisted snapshot with Gemini (L2-normalize 768-dim before storing), re-embed on every snapshot, search + backlinks via the <=> cosine operator. (5 steps: three common + one forked per backend for the pgvector binding.)
  • ai-commands — typed editor tools (find_all/insert_at/replace_range) as Gemini function declarations; a bounded agentic loop; apply results as CRDT ops; a final “safe, observable, convergent” step (preview-confirm + provenance via history + range validation). (5 steps: three common + one forked per backend for the apply path.)

Cross-feature dependency to surface on the surface instruction: the ai-commands apex’s “capture a version before applying” assumes history is also enabled — state it plainly.


9. Free-to-complete ($0)

  • Redis + Postgres: local Docker (redis:7, postgres:16; swap to pgvector/pgvector:pg16 for semantic-search). Cost nothing.
  • Toolchain: Go (go.dev/dl) or Rust (rustup) — free. websocat/wscat for the CLI demo — free.
  • Demo client: the bundled web/demo.html the server serves itself — no build, no install.
  • Mobile: Android emulator / iOS simulator — free.
  • Gemini (features only): a free Google AI Studio key (https://aistudio.google.com/apikey) on the free tier covers Copilot, embeddings, and /ai for this course. No billing account required; quotas are small and change — check AI Studio for live limits.
  • Cloud deploy is optional (Cloud Run + Cloud SQL + Memorystore); the entire learning result runs locally for $0. Cloud Run caps request duration, so clients reconnect and re-sync — which the CRDT makes painless (duplicates and gaps converge).