This is the production spec — the contract the course builds toward. The guided course teaches you to reach exactly this runnable result. Skim it if you'd rather build straight from the target.

Catalens — Project Spec

Single source of truth for the Catalens course (src/content/projects/catalens.mdx). The course must teach toward exactly this runnable project. Spotlight: MongoDB Atlas Vector Search. Backends: Go (default) + TypeScript, both implementing the same contract. Free to complete ($0): Atlas M0 + a free Google AI Studio key + local runtime + an emulator.

1. Overview & definition of done

Catalens is a visual product-recognition service. A shopper photographs a product; the backend turns the photo into a typed descriptor (Gemini Vision), embeds that descriptor (Gemini embeddings), and matches it against a live catalog with one MongoDB Atlas $vectorSearch aggregation — vector similarity and a category pre-filter in a single query over heterogeneous documents — then ranks the results by score and cuts them off at a calibrated confidence threshold.

Definition of done (the runnable result a learner ends with):

An Atlas M0 cluster holding a products collection (~15 seeded products across ≥3 categories), each with a stored embedding and embeddingHash, behind a products_vec Atlas Vector Search index.
A backend (Go or TypeScript) exposing POST /recognize that, given a photo, returns { descriptor, matches:[{...product, score}], noMatch } — the spotlight pipeline end to end.
A mobile app (Compose, Flutter, or SwiftUI) that captures/picks a photo, calls /recognize, and shows ranked matches with their scores, the Vision descriptor + which pre-filter fired, a live threshold slider, and a first-class no-match state.
An integration test (per backend) proving the two outcomes against the real Atlas index: a known product photo matches above the threshold T; an out-of-catalog photo returns a clean noMatch.

How the learner SEES it run locally, for $0: start the backend against Atlas M0 with a free Gemini key, run the mobile app on an emulator/simulator, tap a bundled sample photo, and watch a ranked match list with scores appear — or a clean “no confident match” for an unstocked item. No cloud deploy is required; Cloud Run is an optional extra.

The spotlight is load-bearing: remove MongoDB Atlas Vector Search and the project’s core (similarity + metadata pre-filter in one query over a schema-volatile catalog) cannot exist. The backend language is a swappable shell; the match is the database.

2. Architecture (components and how they connect)

            ┌─────────────┐  multipart {image, category?}   ┌──────────────────────────┐
 Mobile app │ Compose /   │ ──────────────────────────────▶ │ Backend  POST /recognize │
 (camera or │ Flutter /   │ ◀────────────────────────────── │ (Go default | TypeScript)│
  gallery)  │ SwiftUI     │   { descriptor, matches[], noMatch }                        │
            └─────────────┘                                  └─────────┬────────────────┘
                                                                       │
                              1. Vision (image → typed descriptor)     │
                                ┌────────────────────────────────────▶ │
                                │  Gemini Vision  (generateContent,     │
                                │  responseSchema, category enum)       │
                                │                                       ▼
                                │  2. Embed (descriptor text → vector)  embeddingText(descriptor)
                                │  Gemini embeddings (gemini-embedding-001, outputDimensionality D)
                                │  → L2-normalize when D < 3072 ────────┐
                                │                                       ▼
                                │                            3. $vectorSearch (one aggregation)
                                │                            MongoDB Atlas:  embedding NN
                                └─────────────────────────── + category pre-filter + $meta score
                                                                       │
                                                                       ▼
                                                            4. threshold T → ranked matches | noMatch

Mobile app never calls Gemini or Mongo directly. It only calls your backend. The Gemini key and the Mongo URI live server-side.
Gemini does two jobs: Vision (photo → descriptor) and embeddings (text → vector). Same key, free tier.
MongoDB Atlas does the match: nearest-neighbour over embedding plus the metadata pre-filter, in one $vectorSearch aggregation, returning each candidate’s vectorSearchScore.
The ingest pass (run once, and again whenever a product’s content changes) embeds every product and stores the vector on its document, behind the same products_vec index the query reads.

Why descriptor-text embeddings (the road not taken)

We embed the text of a Vision descriptor for both catalog and query — not the raw image. The honest reason: it keeps the project on one free embedding model for catalog and query, yields a human-readable descriptor you can debug, and makes the match explainable. The invariant this creates: catalog and query must be embedded the same way (same model, same dimensions, same normalization), because we are comparing embed(text-of-catalog-product) against embed(text-of-Vision-descriptor-of-photo). A learner must seed products they can actually photograph (or generate matching images), because the comparison is descriptor-text vs descriptor-text — not image-pixels vs image-pixels.

3. Runnable structure (the repo the learner ends with)

Both backends share the same module layout in spirit; names differ by language. The app entrypoint composes everything: it opens one Mongo client, builds the Gemini client, registers routes + middleware, and shuts down cleanly.

Go (default)

catalens/
  go.mod                       # module github.com/you/catalens
  cmd/api/main.go              # ENTRYPOINT: load config, open Mongo client, build deps,
                               #   register routes, http.Server + graceful shutdown
  internal/config/config.go    # env: MONGODB_URI, MONGODB_DB, GEMINI_API_KEY,
                               #   GEMINI_VISION_MODEL, GEMINI_EMBED_MODEL, EMBED_DIM (D), THRESHOLD (T), PORT
  internal/catalog/store.go    # Store: Search(ctx, q) / Upsert / collection handles
  internal/gemini/gemini.go    # Vision (descriptor) + Embed (vector) + L2-normalize helper
  internal/recognize/service.go# RecognizeService: orchestrates Vision→embed→Search→threshold
  internal/recognize/handler.go# POST /recognize HTTP handler (multipart in, JSON out)
  internal/embedtext/text.go   # embeddingText(doc|descriptor) — the SHARED builder (ingest == query)
  cmd/seed/main.go             # seed ~15 products + create _worker_state doc (idempotent upsert)
  cmd/ingest/main.go           # embed every product, store embedding + embeddingHash
  cmd/worker/main.go           # (feature: dynamic-embeddings) change-stream re-embed worker
  testdata/                    # bundled sample product photos (match seeded products) + a true-negative

TypeScript

catalens/
  package.json
  src/server.ts                # ENTRYPOINT: MongoClient.connect, build deps, Hono routes, serve
  src/config.ts                # same env vars as Go
  src/catalog/store.ts         # CatalogStore: search()/upsert()/collection handles
  src/gemini.ts                # vision()/embed()/l2normalize()
  src/recognize/service.ts     # orchestrates Vision→embed→search→threshold
  src/recognize/handler.ts     # POST /recognize Hono handler
  src/embedText.ts             # embeddingText() — the SHARED builder (ingest == query)
  src/seed.ts                  # seed ~15 products + _worker_state doc (idempotent)
  src/ingest.ts                # embed every product, store embedding + embeddingHash
  src/worker.ts                # (feature) change-stream re-embed worker
  testdata/                    # bundled sample photos + a true-negative

Key interfaces (named explicitly — same contract, both languages)

Store / CatalogStore — the only thing the recognise service knows about persistence:

Search(ctx, query) -> []Match where query = { queryVector: float[D], categoryHint?: string, numCandidates: int, limit: int, exact?: bool, inStockOnly?: bool }. Runs the $vectorSearch aggregation and returns ranked Match{ id, name, brand, category, attributes, price, inStock, score }.

The Store.Search seam is a named contract, not a mandatory file: the course teaches the $vectorSearch pipeline inline in the /recognize handler for clarity (one place to read the spotlight end to end). Extracting it behind a Store.Search method is an idiomatic refactor, not a missing piece.

Upsert(ctx, product) -> id (seed + ingest).
SetEmbedding(ctx, id, vector, hash) (ingest + worker).
LogMiss(ctx, miss) (feature: no-match-analytics; writes to a separate search_misses collection).

Vision + Embed (the gemini package):

Vision(ctx, imageBytes, mime) -> Descriptor — generateContent with responseMimeType:"application/json" and a responseSchema whose category is an enum of the catalog’s known categories.
Embed(ctx, text) -> float[D] — embeds, then L2-normalizes when D < 3072 (see §4).

RecognizeService — the spotlight orchestration, language-agnostic in shape: Recognize(ctx, imageBytes, mime, categoryHint?) -> RecognizeResponse. It: (1) Vision → descriptor, (2) embeddingText(descriptor) → Embed → query vector, (3) Store.Search with the category pre-filter (falling back to no filter when the filtered result is empty — see §6 “category gap”), (4) apply threshold T, (5) on no-match optionally LogMiss. Returns the canonical response shape in §5.

4. Data model

Collection `products` (the catalog — heterogeneous documents)

Common fields every match relies on, plus per-category attributes:

field	type	notes
`_id`	ObjectId	generated
`name`	string	required
`brand`	string	required
`category`	string	required; must be one of the catalog’s known categories (drives the Vision enum + pre-filter)
`attributes`	object	per-category (sneakers: `colour/material/sizes`; tea: `flavour/caffeine/grams`; …)
`price`	int	cents
`inStock`	bool	used by the `substitutes` feature filter
`sku` / `barcode`	string	optional; unique index for the `barcode` feature fast-path
`imageRef`	string	optional; pointer to the product image (used by the change-stream worker)
`embedding`	float[D]	added at ingest; length === `numDimensions` of the index; L2-normalized when D < 3072
`embeddingHash`	string	sha256 of `embeddingText(doc)`; the idempotency guard for re-embedding

Atlas Vector Search index `products_vec` (on `products`)

{
  "fields": [
    { "type": "vector", "path": "embedding", "numDimensions": 768, "similarity": "cosine" },
    { "type": "filter", "path": "category" },
    { "type": "filter", "path": "brand" },
    { "type": "filter", "path": "inStock" }
  ]
}

numDimensions must equal the embedding length D you ingest with. A mismatch breaks the build/query — the single most common setup error.
Only fields declared type:"filter" can appear in a $vectorSearch filter. inStock is declared up front so the substitutes feature works without an index change.
Embedding-normalization invariant (load-bearing): gemini-embedding-001 returns embeddings that are only pre-normalized at the full 3072 dimensions. At any smaller outputDimensionality (e.g. 768 or 1536) the vectors carry varying magnitude that distorts cosine similarity, so you must L2-normalize every vector — catalog and query — before storing/searching, or use D = 3072. (Confirmed: https://ai.google.dev/gemini-api/docs/embeddings — manual normalization is required for non-3072 dims; gemini-embedding-2 auto-normalizes truncated dims, so pairing the model id with the normalize rule keeps a model swap correct.)

Collection `_worker_state` (prerequisite for the change-stream worker)

A single document { _id: "embeddings-worker", resumeToken: <BSON resume token | null> }, seeded by the seed step so the worker has a row to read/update from the start (the FK-row analogue). On each handled event the worker writes the latest resume token here; on startup it reads it back via resumeAfter/SetResumeAfter.

Collection `search_misses` (feature: no-match-analytics)

{
  "at": "ISODate",
  "descriptor": { "category": "...", "brand": "...", "colour": "...", "form": "...", "visibleText": "...", "attributes": ["..."] },
  "nearMisses": [ { "name": "...", "score": 0.62 } ],
  "threshold": 0.75
}

Separate collection so analytics writes never touch the catalog the recognise path reads. No raw image is stored (descriptor + scores only — privacy-aware default).

Migrations / seed order (prerequisites first)

Create products (lazily on first insert) and seed ~15 products across ≥3 categories — idempotent upsert by (brand, name).
Seed the _worker_state document (so the worker prerequisite exists before the feature).
Run ingest to populate embedding + embeddingHash on every product.
Create the products_vec index (Atlas UI or API) with numDimensions === D.
(features) create the unique index on sku/barcode; search_misses is lazily created on first miss.

There are no foreign keys (document store), but _worker_state is the explicit prerequisite row the worker path needs, and every product must have an embedding of length D before the products_vec index is usable — ingest is a hard prerequisite of the recognise step.

5. API & event contract (the one canonical shape)

Every step, client, and test shares exactly these shapes.

`POST /recognize`

Request: multipart/form-data
- image (file, required) — a product photo (JPEG/PNG).
- category (string, optional) — a category hint; normally omitted (the descriptor supplies it).

Response 200 — match:

{
  "descriptor": { "brand": "Northpeak", "category": "sneakers", "colour": "red",
                  "form": "low-top", "visibleText": "", "attributes": ["leather"] },
  "filterApplied": "sneakers",
  "matches": [
    { "id": "…", "name": "Trailblazer Low", "brand": "Northpeak", "category": "sneakers",
      "attributes": { "colour": "red", "material": "leather" }, "price": 8900, "inStock": true,
      "score": 0.88 }
  ],
  "noMatch": false
}

Response 200 — no confident match: { "descriptor": {…}, "filterApplied": null, "matches": [], "noMatch": true } (an honest “I don’t know”, not an error status). The no-match branch still carries descriptor and filterApplied — same envelope as a match, only matches is empty — so the client can keep showing what Vision saw and which pre-filter ran (or that it fell back). filterApplied is typically null on a no-match because a true out-of-catalog photo reaches the threshold step only after the unfiltered fallback (§6).
Status / error codes:
- 200 — match or no-match (both are success).
- 400 — image part missing or unreadable ({ "error": "image required" }).
- 415 — unsupported media type (not JPEG/PNG), optional.
- 502 — upstream Gemini call failed after retries ({ "error": "vision unavailable" }).
- 500 — unexpected server error.

Field contract (Match): id (string), name, brand, category (strings), attributes (object), price (int cents), inStock (bool), score (number in [0,1]). Matches are ordered best-first. score is the Atlas vectorSearchScore. filterApplied echoes which category pre-filter fired (or null if the search ran unfiltered) so the UI can show the spotlight at work.

Score semantics (load-bearing). For cosine similarity Atlas maps the raw cosine [-1,1] into [0,1] as (1 + cosine) / 2. So an unrelated (orthogonal) photo’s nearest stranger still scores ~0.5, not 0 — 0.5 is the “no real similarity” floor, and real matches for a clean photo sit well above it. A naive low threshold like T=0.3 is therefore meaningless; calibrate T above the ~0.5 floor. (Confirmed: MongoDB normalizes cosine as (1+cosine)/2.)

Feature endpoints (off by default — see §8)

GET /products/{id}/substitutes → { matches:[{...product, score}] } (in-stock only, lower threshold).
POST /recognize/shelf → [ { box, matches:[{...product, score}], noMatch } ] (multi-shelf fan-out).
GET /analytics/top-misses?since=<ISO> → [ { category, brand, requests, avgNearMiss, lastRequested } ].

Wire/event message — change-stream event (feature: dynamic-embeddings)

The worker consumes MongoDB change-stream events on products opened with fullDocument:"updateLookup" and a $match pipeline that only lets content edits through:

operationType ∈ {insert, update, replace}
AND ( insert | replace  OR  updateDescription.updatedFields has one of:
      name | brand | category | attributes | imageRef )

Per event: text = embeddingText(fullDocument); hash = sha256(text); skip if hash == doc.embeddingHash (idempotent); else Embed(text) → SetEmbedding(id, vector, hash). The write-back touches only embedding/embeddingHash, which the $match excludes — so it never re-triggers the worker. After each handled event, persist resumeToken to _worker_state; on startup read it back as resumeAfter.

6. Build order (dependency-ordered; each step’s prerequisites already exist)

Prerequisites & local tooling — Go toolchain (or Node), mongosh, curl/base64, a sample image.
Atlas M0 — cluster + MONGODB_URI, MONGODB_DB (Vector Search is Atlas-only).
Gemini key — confirm Vision + embeddings respond (portable base64; name sample.jpg).
Model the catalog + seed — products (~15, ≥3 categories) + the _worker_state doc; idempotent.
Shared embeddingText builder + normalize helper — defined once, reused by ingest, query, worker.
Ingest — embed each product (fetch with find(), iterate by doc._id), L2-normalize when D < 3072, store embedding + embeddingHash.
Create products_vec index — numDimensions === D; category/brand/inStock as filters; cosine.
Design the recognise pipeline + Vision responseSchema — category as an enum of the known categories (closes the Vision-vs-catalog gap), descriptor → embeddable text.
Scaffold the API + /recognize skeleton + a Gemini-from-code worked example (per backend) — full import block / package install; the entrypoint that composes client + routes + shutdown.
★ Recognise end to end (per backend) — Vision → embed (normalized) → $vectorSearch (category pre-filter, fall back to unfiltered when the filtered result is empty) → ranked matches + scores.
Threshold or no-match — apply T; clean noMatch.
Calibrate T — score known/unknown photos; the ~0.5 cosine floor; no-match UX.
Frontend (per platform) — capture/pick → /recognize → ranked matches + scores + descriptor/filter panel + live threshold slider + nearest-below-threshold no-match UX.
Integration tests (per backend) — known photo matches above T; unknown → noMatch.
Optional deploy — Cloud Run, free-tier eligible.
Feature modules (off by default) — §8.

Each step depends only on earlier ones: the index (7) needs ingest (6); ingest needs the shared builder (5) and the seed (4); recognise (10) needs the index (7), the responseSchema (8), and the scaffold (9); the worker feature needs _worker_state (4) and the shared builder (5).

7. Backends — Go (default) + TypeScript, same contract

Parity points (both must hold):

Same response shape (§5) byte-for-byte in field names and types; matches best-first; score in [0,1]; filterApplied echoed; noMatch is a 200.
Same $vectorSearch stage: index:"products_vec", path:"embedding", queryVector length D, numCandidates (~20× limit, must be ≥ limit), limit, optional filter { category:{$eq:hint} }, optional exact:true (ENN baseline). $project adds score:{$meta:"vectorSearchScore"}.
Same embedding rule: same model + same outputDimensionality D + L2-normalize when D < 3072 for both catalog and query.
Same category-gap handling: Vision category constrained to the known enum; on an empty filtered result, retry the search unfiltered before declaring no-match.
Same fire-and-forget miss logging (feature) into a separate search_misses collection; no raw image.

Go specifics (verified):

Module go.mongodb.org/mongo-driver/v2; import all three sub-packages used: go.mongodb.org/mongo-driver/v2/mongo, .../v2/mongo/options, .../v2/bson. One go get go.mongodb.org/mongo-driver/v2/mongo pulls the whole module; the import lines must list the sub-packages. bson.ObjectID is the v2 type name (was primitive.ObjectID in v1). mongo.Connect(options.Client().ApplyURI(uri)) is the v2 signature (no context arg).
Gemini Go SDK google.golang.org/genai: genai.NewClient(ctx, &genai.ClientConfig{APIKey: key, Backend: genai.BackendGeminiAPI}); Vision via client.Models.GenerateContent(ctx, model, contents, &genai.GenerateContentConfig{ResponseMIMEType:"application/json", ResponseSchema: …}) with &genai.Blob{Data: imageBytes, MIMEType:"image/jpeg"} as an inline Part; embeddings via client.Models.EmbedContent(ctx, model, contents, &genai.EmbedContentConfig{OutputDimensionality: &d}).

TypeScript specifics:

Official mongodb driver; new MongoClient(uri) + await client.connect(); collection.aggregate(pipeline) .toArray(); collection.watch(pipeline, { fullDocument:"updateLookup", resumeAfter }) (async-iterable).
Gemini via the REST API (x-goog-api-key, v1beta, generateContent/embedContent) or @google/genai.

Neither backend hard-codes a model id: read GEMINI_VISION_MODEL / GEMINI_EMBED_MODEL from config and link the official model list (https://ai.google.dev/gemini-api/docs/models). The only current free-tier embedding model needing normalization is gemini-embedding-001, so the normalize rule is the safe default.

8. Optional feature modules (off by default; each extends, never rewrites, the spec)

hybrid — fuse $vectorSearch with Atlas Search $search over brand/name/visibleText via RRF (or the Atlas $rankFusion stage where available); one confidence cut on the fused score; no-match path preserved.
substitutes — GET /products/{id}/substitutes: the product’s own embedding as the query vector, $vectorSearch with filter:{inStock:true} excluding the product, a lower threshold (recall over precision). Uses the inStock filter field already in the index.
multi-shelf — POST /recognize/shelf: Vision returns an array of items (array responseSchema) or detect-then-recognise per crop; reuse the per-item pipeline; return [{box, matches[], noMatch}].
barcode — exact sku/barcode lookup (unique index) before the vector fallback; cheap/deterministic path first, vector path only on a miss.
dynamic-embeddings — the change-stream re-embed worker (§5 event shape). Prerequisite: the _worker_state doc (§4) and the shared embeddingText builder. Go (mongo.ChangeStream + SetFullDocument(options.UpdateLookup) + SetResumeAfter) and TS (watch(...)) parity.
performance — make numCandidates configurable; sweep it against an exact:true ENN baseline (ENN is the ground truth for catalogs under ~10k docs) and pick the smallest value within recall tolerance; numCandidates ≥ limit, ~20× limit as the documented starting point.
no-match-analytics — fire-and-forget LogMiss into search_misses on the no-match branch (Go/TS parity), plus a common GET /analytics/top-misses aggregation ranking unmet demand.

9. Free-to-complete ($0)

Need	Free option	First-appears note
Vector DB	MongoDB Atlas M0 (no card; a real replica set; Vector Search + Atlas Search + change streams all run on it; local `mongod` cannot build the vector index)	“Costs nothing” on the Atlas step
AI (Vision + embeddings)	Google AI Studio free-tier key (one key, both jobs)	“Costs nothing” on the Gemini step
Backend runtime	Local Go toolchain or Node	—
Mobile	Android emulator / iOS Simulator + bundled sample photos (so every scan is free + reproducible)	—
Deploy (optional)	Cloud Run free monthly allotment (scales to zero); key in Secret Manager	”Costs nothing” on the deploy step

Confirm current free-tier limits on the official docs; nothing in the default path requires a paid service.