Project Spec — Concord
Single source of truth for the Concord course (
src/content/projects/concord.mdx). The course must teach toward exactly this runnable system. Spotlight: Rust CRDT collaboration (tier: demanding⇒ backends are Go + Rust only). Both backends implement the same wire contract and reach the same running result — Go is the honest hand-rolled alternative, not a stub.
1. Overview & definition of done
Concord is the sync core of a Google-Docs-style collaborative text editor. Many people type into one shared document at once; every keystroke from every collaborator merges correctly and converges to the same final text on every replica, regardless of order, duplicates, or a collaborator who went offline and came back. There is no lock, no “resolve conflict” dialog, and no lost keystrokes — that guarantee is the whole project, delivered by a CRDT (Conflict-free Replicated Data Type).
Definition of done — the learner ends with a running system they can see work, for $0, locally:
docker compose up -dbrings up Redis + Postgres.- The chosen backend (
goorrust) builds and runs a WebSocket sync server on:8080. - The headline demo: open two WebSocket clients to the same room (two browser tabs of the bundled
demo.html, or twowebsocatshells). Type in both at the same position at the same time — both converge to one agreed document. Disconnect one client, keep typing in it (ops queue locally), reconnect — its queued edits replay and both replicas end byte-equal. The learner watches this happen. - A convergence property test passes (
go test -run TestConvergence -race/cargo test convergence): the same op set applied in many random orders, with duplicates and out-of-order arrivals, yields one text. - A room survives a server restart (snapshot in Postgres → reload → byte-equal).
- The native editor (Jetpack Compose / Flutter / SwiftUI) renders the live document with remote cursors.
Everything runs on free, local infrastructure: Docker (Redis + Postgres), the Go or Rust toolchain, a mobile emulator/simulator, and — only for the optional AI feature modules — a free Google AI Studio key.
2. Architecture (diagram-in-prose)
client A ─┐ ┌─ client B
(tab/app) │ WebSocket GET /rooms/{id}/ws │ (tab/app)
▼ ▼
┌─────────────────────┐ pub/sub ┌─────────────────────┐
│ sync server inst. 1 │◄──room:{id}───►│ sync server inst. 2 │
│ ┌───────────────┐ │ (Redis) │ ┌───────────────┐ │
│ │ Room (in-mem) │ │ │ │ Room (in-mem) │ │
│ │ CRDT Doc │ │ │ │ CRDT Doc │ │
│ └───────────────┘ │ │ └───────────────┘ │
└─────────┬───────────┘ └──────────┬──────────┘
│ snapshot (BYTEA) every N ops / T sec │
└──────────────► Postgres ◄───────────────┘
(snapshots, versions, doc_embeddings)
presence: presence:{room} sorted set (member=user, score=expiry) + presence pub/sub messages
- Transport: each client holds one WebSocket to
GET /rooms/{id}/ws. The server upgrades, sends the newcomer a one-timesyncmessage (the current document state), then streams liveop/presenceframes. - Merge: every room owns one in-memory CRDT Doc. A local edit is applied to the Doc and the resulting change is broadcast; a remote change is applied to the Doc. The CRDT guarantees convergence: apply the same changes in any order, with duplicates, and every Doc reaches the same text.
- Fan-out: with more than one server instance, a client on instance 1 must see edits from a client on
instance 2. Each instance publishes every applied change to the Redis channel
room:{id}and subscribes to it, so all instances (and all their clients) stay in sync. Redis is the realtime backbone. - Durability: periodically (every N ops or T seconds) each room serializes its convergent state to a
snapshotsrow in Postgres. Restart = load the latest snapshot, replay any later ops. This compaction bounds growth (an append-only op log grows forever; tombstones accumulate). - Presence: who’s online and where their cursor is, is ephemeral — never persisted into the
document. A per-room Redis sorted set
presence:{room}(member = user id, score = expiry timestamp) tracks membership with a heartbeat-refreshed TTL semantics; cursor moves broadcast aspresencepub/sub messages. (Use a sorted set +ZADD/ZRANGEBYSCORE, neverKEYS— see the Redis track’s SCAN lesson.)
3. Runnable structure (the repo the learner ends with)
Both backends share the layout idea; the engine differs. The app entrypoint composes everything: config from env, the Redis client, the Postgres pool, the room registry, the HTTP router with the WS route, and a graceful shutdown.
3.1 Go (backend="go") — github.com/you/concord
concord/
docker-compose.yml
db/schema.sql # snapshots, versions, doc_embeddings (features)
web/demo.html # bundled 2-client demo page (served at GET /)
cmd/server/main.go # ENTRYPOINT: compose router + WS route + Redis + pgx + shutdown
internal/crdt/rga.go # hand-rolled RGA: Doc, Elem, ID, ApplyInsert, ApplyDelete, String
internal/crdt/rga_test.go # convergence property test
internal/crdt/codec.go # Doc <-> []byte (gob): elements + pending buffer (Snapshot/Load)
internal/room/room.go # Room: per-room Doc + mutex, ApplyLocal, ApplyRemote, Snapshot
internal/room/registry.go # Rooms map[string]*Room (Arc<Mutex>-equivalent ownership)
internal/wire/wire.go # the shared wire message types (Msg, Op, ID) — one canonical shape
internal/hub/redis.go # Publish(room, payload) + Subscribe(room) over go-redis
internal/store/snapshots.go # SaveSnapshot/LoadSnapshot (pgx); versions + embeddings (features)
internal/server/server.go # handleWS: upgrade, sync, read loop, presence; HTTP handlers
Entrypoint contract — cmd/server/main.go (composes the whole thing):
- reads
PORT(default8080),REDIS_URL,DATABASE_URLfrom env, - opens
redis.Clientandpgxpool.Pool(one each, closed on shutdown), - builds the
room.Registry(lazy-loads each room from its latest Postgres snapshot on first access), - mounts
GET /rooms/{id}/ws,GET /(servesweb/demo.html),GET /healthz, and the feature endpoints, http.ServerwithShutdown(ctx)on SIGINT/SIGTERM (closes rooms’ subscriber goroutines).
Key types (Go): Doc and Room are concrete structs; the method sets below are what the course
builds on them (*Doc, *Room). Hub and Store are the seams kept as interfaces.
// internal/crdt
type ID struct{ Site string; Counter uint64 }
type Elem struct{ ID, After ID; Value rune; Deleted bool }
// concrete struct; key methods on *Doc:
// ApplyInsert(e Elem) idempotent; buffers if e.After unseen (causal readiness)
// ApplyDelete(id ID) tombstone; idempotent
// String() string visible text, tombstones skipped
// Snapshot() ([]byte, error) / Load(b []byte) error gob: placed elements + pending buffer
type Doc struct{ /* elems, index, pending, pendDel */ }
// internal/room
// concrete struct; key methods on *Room:
// ApplyLocal(op wire.Op) (broadcast wire.Op) apply to Doc, return op to publish
// ApplyRemote(op wire.Op) apply a peer's op (idempotent)
// SyncState() []byte current full state for a newcomer
// Text() string
type Room struct{ /* doc *crdt.Doc + mutex */ }
// internal/hub
type Hub interface {
Publish(ctx context.Context, room string, payload []byte) error
Subscribe(ctx context.Context, room string) (<-chan []byte, error)
}
// internal/store
type Store interface {
SaveSnapshot(ctx context.Context, room string, state []byte, opSeq int64) error
LoadSnapshot(ctx context.Context, room string) (state []byte, opSeq int64, err error)
}
3.2 Rust (backend="rust") — concord crate
concord/
docker-compose.yml
db/schema.sql
web/demo.html
Cargo.toml
src/main.rs # ENTRYPOINT: axum Router + WS route + Redis + sqlx pool + shutdown
src/wire.rs # serde wire types (Msg) — same JSON contract as Go
src/room.rs # Room { doc: yrs::Doc }, apply_local, apply_remote, sync_state
src/registry.rs # Rooms: Arc<Mutex<HashMap<String, Arc<Mutex<Room>>>>>
src/hub.rs # redis pub/sub publish + subscribe
src/store.rs # sqlx: save/load snapshot; versions + embeddings (features)
src/server.rs # ws handler: upgrade, sync, recv loop, presence
tests/convergence.rs # convergence property test
Entrypoint contract — src/main.rs: #[tokio::main] builds the axum::Router with
GET /rooms/{id}/ws (axum 0.8 {id} path syntax), GET / (demo), GET /healthz, and feature routes;
shares a Registry via axum::extract::State (the room state is the canonical Arc<Mutex<…>>
shared-mutable-state problem from the Rust track); axum::serve(...).with_graceful_shutdown(...).
Key types (Rust): Room { doc: yrs::Doc } with apply_local(edit) -> Vec<u8> (the
encode_update_v2() bytes), apply_remote(&[u8]) -> Result<()> (decode + apply_update), and
sync_state() -> Vec<u8> (encode_state_as_update_v2(&StateVector::default())).
4. Data model
Local Postgres (postgres:16, or pgvector/pgvector:pg16 once the semantic-search feature is on).
4.1 Base build
-- the durable convergent truth: one latest snapshot per room
CREATE TABLE snapshots (
room_id TEXT PRIMARY KEY,
state BYTEA NOT NULL, -- encoded CRDT state (yrs update bytes / Go gob)
op_seq BIGINT NOT NULL, -- last op sequence included
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Prerequisite/seed rows: none for the base build — a room is created lazily on first WebSocket connect
(no FK behind it), and its first snapshot is written on the first compaction cadence. The demo page targets
room demo, which exists implicitly the moment a client connects to /rooms/demo/ws. (The versions and
doc_embeddings tables below have room_id TEXT columns but deliberately no FK to snapshots so a
version/embedding can be captured before the first snapshot row exists.)
4.2 Feature: history
CREATE TABLE versions (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
room_id TEXT NOT NULL,
label TEXT NOT NULL, -- "before the rewrite"
author TEXT NOT NULL,
state BYTEA NOT NULL, -- FULL encoded state at this point
op_seq BIGINT NOT NULL, -- last op covered
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX versions_room_time ON versions (room_id, created_at DESC);
-- optional persisted op-log (if you compact a table rather than an in-memory log)
CREATE TABLE ops (
room_id TEXT NOT NULL,
seq BIGINT NOT NULL,
op BYTEA NOT NULL,
PRIMARY KEY (room_id, seq)
);
4.3 Feature: semantic-search
CREATE EXTENSION IF NOT EXISTS vector; -- needs pgvector/pgvector:pg16 image
CREATE TABLE doc_embeddings (
room_id TEXT PRIMARY KEY,
embedding vector(768) NOT NULL, -- MUST match the model output dimension
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX doc_embeddings_hnsw
ON doc_embeddings USING hnsw (embedding vector_cosine_ops);
Accuracy invariant (embeddings): Gemini pre-normalizes only the 3072-dim output; for vector(768)
the server must L2-normalize the embedding to unit length before storing/comparing so cosine distance is
meaningful. (Link the embeddings docs; don’t pin a model id.)
5. API & event contract (canonical — every step, client, and test shares this)
5.1 WebSocket: GET /rooms/{id}/ws
On upgrade the server immediately sends one sync. Then both directions exchange op and presence. The
op payload is one canonical envelope for BOTH backends; only the inside of op.data differs by engine —
this is the contract that resolves the baseline’s wire-protocol contradiction:
// server → client, once on connect (full current state, base64 of the engine's encoding):
{ "t": "sync", "state": "<base64>" }
// either direction, per edit. The SAME envelope for Go and Rust:
// - Go (hand-rolled RGA): op.data is the JSON RGA op (kind/id/after/value)
// - Rust (yrs): op.data is base64 of encode_update_v2() bytes
{ "t": "op", "seq": 41, "data": { "kind": "insert", "id": {"site":"a3f9","counter":42},
"after": {"site":"a3f9","counter":41}, "value": "h" } } // Go
{ "t": "op", "seq": 41, "data": { "update": "<base64 yrs update>" } } // Rust
// ephemeral; broadcast, never persisted:
{ "t": "presence", "user": "kerim", "cursor": 128, "color": "#2DD4BF" }
Rules shared by both engines:
t∈{sync, op, presence}; unknowntis ignored (forward-compatible).opis idempotent and order-independent — re-applying or reordering ops converges (the CRDT guarantee). A client that reconnects re-applies any locally-queued ops; duplicates are harmless.sync.stateis base64 of the engine’s full-state encoding (Go: gob of the element list + pending buffer; Rust:encode_state_as_update_v2(&StateVector::default())).- The frontend contract is engine-aware: a Go client emits/receives RGA ops keyed by element id; a yrs
client round-trips opaque
updatebytes. The course states this per backend so the client matches.
5.2 HTTP endpoints
| Method | Path | Body | Response | Codes | Module |
|---|---|---|---|---|---|
| GET | /healthz | — | ok | 200 | base |
| GET | / | — | web/demo.html | 200 | base |
| GET | /rooms/{id}/ws | (WS upgrade) | WS stream | 101, 400 (not WS) | base |
| POST | /rooms/{id}/copilot | { "selection": "...", "mode": "rewrite|continue" } | { "suggestion": "..." } | 200, 400 (bad mode), 502 (upstream) | ai-copilot |
| POST | /rooms/{id}/versions | { "label": "...", "author": "..." } | { "id": 12, "op_seq": 41 } | 201, 400 | history |
| GET | /rooms/{id}/versions | — | [{ "id", "label", "author", "op_seq", "created_at" }] | 200 | history |
| GET | /rooms/{id}/diff?from=A&to=B | — | { "added": [...], "removed": [...], "moved": [...] } | 200, 400 | history |
| POST | /rooms/{id}/restore | { "version_id": 7 } | { "applied": true } | 200, 404 | history |
| GET | /rooms/{id}/search?q=...&k=5 | — | [{ "room_id", "similarity" }] | 200 | semantic-search |
| GET | /rooms/{id}/backlinks?k=5 | — | [{ "room_id", "similarity" }] | 200 | semantic-search |
| POST | /rooms/{id}/ai | { "instruction": "..." } | { "edits": [...pending...], "summary": "..." } | 200, 400 | ai-commands |
/restore, /copilot-accept, and /ai-apply do not write document state directly — they emit ordinary
CRDT ops through the same local-apply + broadcast path as a keystroke (no privileged write path).
5.3 Redis channels & keys
room:{id}— pub/sub channel; every appliedop(and everypresence) is published here; every instance subscribes. Payload = theop/presenceJSON envelope above.presence:{room}— sorted set, member = user id, score = unix-ms expiry. Refresh on heartbeat (ZADD presence:{room} <now+ttl> <user>); list live members withZRANGEBYSCORE presence:{room} <now> +infand sweep expired withZREMRANGEBYSCORE presence:{room} -inf <now>. NeverKEYS.
6. Build order (dependency-ordered; each step’s prerequisites already exist)
- Concept — why CRDTs beat locks & OT (no code).
- Infra —
docker compose up -d(Redis + Postgres) + the language-toolchain /websocatprereqs + how to get aGEMINI_API_KEY(only needed by AI features). - Model — the sequence-CRDT element shape (id, after, value, tombstone).
- Wire contract — the canonical
sync/op/presenceenvelope, with the per-engineop.datacaveat (Go = RGA op JSON; Rust = base64 yrs update). Everything downstream references this. - Backend scaffold (go|rust) — WS server: upgrade
GET /rooms/{id}/ws, send hello, echo (transport only). - ★ Merge (go|rust) — the engine.
- Rust: hold a
yrsDoc;apply_local/apply_remote; unit-test convergence. - Go: bridge step first — a compiling, converging in-order RGA (
ApplyInsertno-buffer case +String()), then the AgentPrompt extends it to full causal-readiness buffering + tie-break.
- Rust: hold a
- Doc serialization (go) — define the gob codec once (elements + pending buffer); Rust uses
encode_state_as_update_v2. Consumed by sync, snapshot, history. - App entrypoint / composition (go|rust) — wire router + WS route + Redis client + Postgres pool + room
registry + graceful shutdown into
main. This is the step that makes everything runnable. - Redis fan-out (go|rust) — publish each op to
room:{id}; subscribe + apply; send newcomerssync. - Presence (common) — sorted-set membership + presence pub/sub (placed next to fan-out, before the UI
that consumes presence). No
KEYS. - Headline demo (common) — serve
web/demo.html; open two clients, watch concurrent edits converge. - Offline reconnect proof (common) — disconnect a client, queue edits, reconnect, replay, assert byte-equal. The promised payoff, built and tested.
- Snapshot + compaction (go|rust) — persist state to Postgres; restart → byte-equal.
- Convergence property test (go|rust) — the one property you must prove.
- Frontend (compose|flutter|swiftui) — editor screen; includes the index↔element-id bridge so a
flat text widget’s
(offset, len)change maps to engine ops and back. - Deploy (common) — Cloud Run + Cloud SQL + Memorystore; reconnect-and-resync tied to the
syncpath. - Features (off by default) —
ai-copilot,history,semantic-search,ai-commands.
7. Backends — parity points (Go default + Rust spotlight, both complete)
| Milestone | Go (hand-rolled) | Rust (yrs) | Same contract |
|---|---|---|---|
| WS scaffold | coder/websocket, r.PathValue("id") | axum 0.8 {id}, WebSocketUpgrade | GET /rooms/{id}/ws, hello frame |
| Merge | RGA: ApplyInsert/ApplyDelete/String, buffer + tie-break | yrs Doc: text.insert/remove_range, encode_update_v2 | both converge; op envelope §5.1 |
| Serialize state | gob: elements + pending buffer | encode_state_as_update_v2(&StateVector::default()) | sync.state = base64 of it |
| Fan-out | go-redis Publish/Subscribe | redis crate pub/sub | channel room:{id} |
| Snapshot | gob into snapshots.state | update bytes into snapshots.state | same table, same restart guarantee |
| Property test | testing/quick / seeded loop, -race | proptest / seeded loop | identical-final-text assertion |
| Entrypoint | cmd/server/main.go | src/main.rs | same routes, same shutdown discipline |
Honest parity note (the Go≠stub guarantee): the Go path ships a running RGA, its own serialization
(buffer included), the same Redis/Postgres/presence wiring, and the same demo + offline-reconnect proof. Go
scores 3 not because it’s incomplete but because you feel the missing CRDT ecosystem — you own correctness
yrs gives Rust for free. That felt cost (more code, hot-path GC pressure on every keystroke) is the lesson
behind the techFit rust 5 / go 3 rating.
8. Optional feature modules (off by default; each extends, never rewrites, the base)
ai-copilot—POST /rooms/{id}/copilotasks Gemini for a rewrite/continuation of a selection; accepting it emits ordinary CRDT ops (delete the selection’s ids + insert the new text). Server-side key; the AI never writes directly. (2 steps.)history—versionstable; capture/list/diff(by stable id)/restore (non-destructively, as ops); compact the op-log behind the lowest retained version. Generalizes the base snapshot into event-sourced named versions. (5 steps.)semantic-search— pgvector + HNSW; embed each persisted snapshot with Gemini (L2-normalize 768-dim before storing), re-embed on every snapshot, search + backlinks via the<=>cosine operator. (5 steps: three common + one forked per backend for the pgvector binding.)ai-commands— typed editor tools (find_all/insert_at/replace_range) as Gemini function declarations; a bounded agentic loop; apply results as CRDT ops; a final “safe, observable, convergent” step (preview-confirm + provenance via history + range validation). (5 steps: three common + one forked per backend for the apply path.)
Cross-feature dependency to surface on the surface instruction: the ai-commands apex’s “capture a version
before applying” assumes history is also enabled — state it plainly.
9. Free-to-complete ($0)
- Redis + Postgres: local Docker (
redis:7,postgres:16; swap topgvector/pgvector:pg16for semantic-search). Cost nothing. - Toolchain: Go (
go.dev/dl) or Rust (rustup) — free.websocat/wscatfor the CLI demo — free. - Demo client: the bundled
web/demo.htmlthe server serves itself — no build, no install. - Mobile: Android emulator / iOS simulator — free.
- Gemini (features only): a free Google AI Studio key (
https://aistudio.google.com/apikey) on the free tier covers Copilot, embeddings, and/aifor this course. No billing account required; quotas are small and change — check AI Studio for live limits. - Cloud deploy is optional (Cloud Run + Cloud SQL + Memorystore); the entire learning result runs
locally for $0. Cloud Run caps request duration, so clients reconnect and re-
sync— which the CRDT makes painless (duplicates and gaps converge).