Catalens

Technology	Fit	Role	Why
MongoDB spotlight	5/5	The match itself — flexible catalog plus Atlas Vector Search.	Vector similarity and a metadata pre-filter run in one aggregation over heterogeneous documents.
Go	5/5	Default API running the recognise pipeline.	Small, fast service with the official Mongo driver and one-binary deploys.
Gemini API	4/5	The eyes and the vectors — Vision descriptor plus embeddings.	Essential to read the photo and embed it, but the matching engine is MongoDB, not the model.
TypeScript	4/5	Node alternative API with the same pipeline.	One typed language down to the mobile client, with a clean async Mongo driver.
Jetpack Compose	4/5	Android capture-and-results UI with CameraX.	Declarative UI ships the scan, ranked-results, and threshold-slider flow quickly.
Google Cloud	3/5	Optional free-tier host for the API (Cloud Run).	Serverless containers fit the stateless recognise endpoint; never required to see it work.
PostgreSQL	2/5	Alternative primary store (not chosen).	pgvector exists, but a per-category attribute pre-filter over a flexible catalog fights a relational schema — the inverse of Aurora.

Pick your backend (Go or TypeScript) and frontend (Compose, Flutter, or SwiftUI) above — the steps below adapt. Watch the spotlight: by the ★ recognise step, a photo becomes a query vector and a single $vectorSearch aggregation finds every catalog product that clears a confidence threshold — vector similarity and a category pre-filter in one query over documents that each carry their own attributes. The backend language is a shell around that store; the match is MongoDB.

How this differs from Aurora Commerce. Same retail world, opposite database lesson. Aurora owns transactional truth — checkout as one ACID transaction where Postgres makes overselling impossible. Catalens owns the flexible catalog and the vector match — a read-mostly, schema-volatile collection where the win is similarity plus a document pre-filter in one aggregation. That is why Postgres scores 5 there and 2 here, and MongoDB the reverse. And how it differs from Helix Assistant. Helix is text RAG: retrieve chunks over pgvector and ground a prose answer. Catalens is multimodal: read a photo into a structured descriptor and match it to product records over Atlas Vector Search — entity matching with scores and a threshold, no chatbot.

Install the local toolchain you'll need

Beginner

Install the few tools every step assumes — your backend runtime, the Mongo shell, and a sample image — so the on-ramp never breaks midway. Three quick setup steps stand between you and the first by-hand $vectorSearch match — knock them out once.

New in this step

mongosh

The MongoDB Shell — a command-line client for running commands and queries against a cluster, used here for the connection smoke-test.

mongosh mongodb shell track ↗ docs ↗

base64

A way to turn binary bytes (like an image) into plain text so they can ride inside a JSON request body to Gemini.

base64 encoding command line

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

What this project needs on your machine — and it costs nothing

Nothing here is exotic, but two early steps assume tools that aren’t always present, so install them once now:

Your backend runtime — the Go toolchain (1.22+, for the default path) or Node (20+, for the TypeScript path). Only the one you pick.
mongosh, the MongoDB Shell, for the connection smoke-test. Install it from the official download, or skip the install and use the built-in shell in the Atlas UI (Cluster → “Connect” → “MongoDB Shell”).
curl and base64 (both ship with macOS and Linux) for the Gemini smoke-test.
A sample product JPEG you can match later — grab any product photo and save it as sample.jpg in your working directory. Pick products you can actually photograph (or screenshot), because the match compares a text descriptor of your photo against a text descriptor of the catalog (more on that at ingest).

Costs nothing. Every tool here is free and open-source; no account or card is needed for this step.

Confirm the tools respond

Run these in your terminal / editor

go version        # default backend — OR: node --version  (TypeScript backend)
mongosh --version # or use the Atlas UI's built-in shell instead
curl --version
base64 --help 2>&1 | head -1   # present on macOS and Linux
ls sample.jpg     # a product photo you saved (any product you can also photograph)

Stand up a free MongoDB Atlas M0 cluster

Beginner

Create a free Atlas M0 cluster and copy its connection string — Atlas Vector Search runs on the free tier, and a local mongod cannot build the index you need.

New in this step

MongoDB Atlas

MongoDB’s fully-managed cloud database — the only place the Vector Search index this project needs can be built.

mongodb atlas overview track ↗ docs ↗

M0 free tier

Atlas’s no-card free cluster; small but a real cluster, and Vector Search runs on it, so the whole project costs nothing.

mongodb atlas M0 free tier limits docs ↗

Atlas Vector Search

The Lucene-backed index that finds documents by vector similarity — the spotlight feature the entire build leans on.

mongodb atlas vector search track ↗ docs ↗

replica set

A small group of synced database copies; Atlas runs one even on M0, which is what enables features like change streams later.

mongodb replica set

SRV connection string

The mongodb+srv://... address your code connects with; it carries the user, password, and cluster host in one line.

mongodb srv connection string mongodb+srv

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Why Atlas, not local mongod — and it costs nothing

The whole project hinges on Atlas Vector Search, a Lucene-backed vector index that is part of Atlas, not the core database engine. A local mongod (or the Docker image) can store documents and even hold an embedding array, but it cannot build the Atlas Vector Search index that $vectorSearch queries — so for this build the database lives in Atlas from day one. The M0 free tier is a real replica set, and Atlas Vector Search and Atlas Search both run on it, so the entire project is free to complete.

Costs nothing. M0 is free (no card), and the vector + text search indexes it supports are part of that tier — confirm the current limits on the Atlas docs.

Set your connection string

Run these in your terminal / editor

# From the Atlas UI: create a free M0 cluster, add a database user, allow your IP,
# then copy the SRV connection string into your environment:
export MONGODB_URI="mongodb+srv://USER:PASS@cluster0.xxxx.mongodb.net/?retryWrites=true&w=majority"
export MONGODB_DB="catalens"

# Ping it with mongosh (installed earlier) — or use the Atlas UI's built-in shell:
mongosh "$MONGODB_URI" --eval 'db.runCommand({ ping: 1 })'   # -> { ok: 1 }

What success looks like

The ping command returns { ok: 1 }, so the SRV string and your IP allow-list reach the cluster:

{ ok: 1 }

An auth or DNS error here (bad auth, getaddrinfo ENOTFOUND) means the user, password, or allow-list is wrong — fix it now, every later step needs this connection.

Get a free Gemini key and confirm Vision + embeddings respond

Beginner

Get a free Google AI Studio key and make two quick calls: one that reads an image into JSON, and one that returns an embedding vector.

New in this step

Gemini Vision

Google’s multimodal model reading an image and answering about it — here, turning a product photo into a structured description.

gemini api image understanding track ↗ docs ↗

embedding vector

A list of numbers a model produces from text so that similar meanings land near each other — what makes similarity search possible.

text embeddings explained track ↗ docs ↗

outputDimensionality

How many numbers the embedding has (e.g. 768); you must fix one value and use it everywhere so all vectors are comparable.

gemini embeddings output dimensionality docs ↗

L2-normalize

Scaling a vector to length 1; required for valid cosine scores when the model does not already return unit-length vectors.

L2 normalization vector unit length

cosine similarity

A measure of how aligned two vectors are, ignoring their length — the similarity metric this catalog match uses.

cosine similarity vectors

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Two Gemini capabilities, one key — and it costs nothing

Catalens uses Gemini for two different jobs, both on the free tier of a single Google AI Studio key: Vision (turn a photo into a structured descriptor) and embeddings (turn text into a vector). Confirm both respond before building on them. Keep the key server-side — the mobile app always calls your backend, never Gemini directly. Don’t hard-code a model id in your app: ids change, so read the current ones from the official model list and pin them in config.

Costs nothing. A Google AI Studio key has a free tier that covers Vision and embeddings for this project. Docs: image understanding — https://ai.google.dev/gemini-api/docs/image-understanding · embeddings — https://ai.google.dev/gemini-api/docs/embeddings · models — https://ai.google.dev/gemini-api/docs/models · Go SDK — https://pkg.go.dev/google.golang.org/genai

Smoke-test both capabilities (REST, portable across macOS + Linux)

Run these in your terminal / editor

export GEMINI_API_KEY="...your AI Studio key..."   # server-side only
# B64 of your sample image, portably: `base64 -w0` is GNU-only and fails on macOS/BSD,
# so strip newlines instead — this works everywhere:
IMG_B64=$(base64 < sample.jpg | tr -d '\n')

# 1) Vision: read an image into JSON. MODEL = a current vision model (see the models docs).
curl -s "https://generativelanguage.googleapis.com/v1beta/models/MODEL:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" -H 'Content-Type: application/json' -d '{
    "contents":[{"parts":[
      {"inlineData":{"mimeType":"image/jpeg","data":"'"$IMG_B64"'"}},
      {"text":"Describe this product as JSON: brand, category, colour."}
    ]}],
    "generationConfig":{"responseMimeType":"application/json"}
  }'

# 2) Embeddings: turn text into a vector. EMBED_MODEL = a current embedding model.
curl -s "https://generativelanguage.googleapis.com/v1beta/models/EMBED_MODEL:embedContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" -H 'Content-Type: application/json' -d '{
    "content":{"parts":[{"text":"red leather low-top sneaker"}]},
    "outputDimensionality":768
  }'

What success looks like

Call 1 returns JSON in the response’s text part ({"brand":"…","category":"…","colour":"…"}), not prose — that’s responseMimeType:"application/json" doing its job. Call 2 returns an embedding.values array of exactly 768 floats (the outputDimensionality you pinned). If your chosen model is gemini-embedding-001, its 768-dim output is unnormalized (only its full 3072 dims auto-normalize); other models may differ — check the docs (the warning above covers this). A 403 means a bad key; an empty candidates array on call 1 means the model couldn’t parse the image — both must be clean before you build on either capability.

Model the flexible product catalog and seed it

Beginner

Design each product as a MongoDB document whose attributes depend on its category, then seed a small sample catalog to match against — the products every later step will try to recognise.

New in this step

document

MongoDB’s record: a JSON-like object that can hold nested fields, so one product can carry whatever its category needs.

mongodb document model track ↗ docs ↗

heterogeneous documents

Documents in one collection with different shapes (a sneaker vs a tea) — the flexibility that makes a relational schema fight back.

mongodb flexible schema heterogeneous documents

descriptor

A short structured summary of a product (brand, category, attributes) that both the catalog and the photo get reduced to before matching.

structured product description

idempotent upsert

An insert-or-update keyed on (brand, name) so re-running the seed updates existing products instead of duplicating them.

mongodb upsert idempotent

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Heterogeneous documents are the point

A real catalog isn’t uniform: a sneaker has a size run and a material, a tea has a flavour and a caffeine level, a chair has dimensions and a finish. In Postgres that is a wide sparse table, an EAV side-table, or a jsonb column you’ve stopped validating. In MongoDB each product is just a document with the fields its category needs, plus a few common fields every match relies on — category, brand, name, attributes, and (added next) an embedding vector. The shared fields are what you pre-filter on; the per-category attributes are what make the catalog flexible. This heterogeneity is precisely why the document store, not a relational schema, is load-bearing here.

Seed products you can actually photograph. The match (next steps) compares a text descriptor of your photo against a text descriptor of each catalog product — not pixels against pixels — so for the demo to return a real match, the seeded products must be things you can take (or screenshot/generate) a recognisable photo of. Pick a handful of clearly distinct, photographable items.

Why a text embedding of a descriptor matches a photo at all

A fair question: why not embed the image directly? Two honest roads to visual search exist — (1) a multimodal image embedding (compare photo-pixels to catalog-image-pixels), or (2) turn the photo into words with Vision, then compare those words to the catalog’s words. Catalens takes road (2) on purpose: it keeps the whole project on one free embedding model for catalog and query, gives you a human-readable descriptor you can debug, and makes every match explainable. The cost is the invariant this creates — you must build the same descriptor text and embed it the same way for both the catalog (ingest) and the photo (query), or the vectors aren’t comparable. That shared builder is the next step.

Two products, two shapes (same collection)

Run these in your terminal / editor

// products collection — each doc carries only the attributes its category needs
{
  name: "Trailblazer Low",
  brand: "Northpeak",
  category: "sneakers",
  attributes: { colour: "red", material: "leather", style: "low-top", sizes: [7, 8, 9, 10] },
  price: 8900,            // cents
  inStock: true
  // embedding: [ ... ] added at ingest
}
{
  name: "Sencha Green",
  brand: "LeafAndCo",
  category: "tea",
  attributes: { flavour: "grassy", caffeine: "medium", grams: 100, origin: "Uji" },
  price: 1200,
  inStock: true
}

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: MongoDB Atlas M0 reachable via MONGODB_URI; database MONGODB_DB="catalens"; collection "products".
Task: Write a seed script that inserts ~15 sample products across at least 3 categories with per-category attributes.
Requirements:
- Every doc has common fields: name, brand, category, attributes (object), price (int cents), inStock (bool).
- Categories differ in their attributes (e.g. sneakers: colour/material/sizes; tea: flavour/caffeine/grams).
- Pick visually-distinct, photographable products (a learner must be able to take/screenshot a matching photo).
- Keep `category` values to a small known set (e.g. sneakers, tea, chair) — a later step constrains Vision to this exact set.
- Leave room for an `embedding` field added by the ingest step; do not invent it here.
- Idempotent: a re-run upserts products by (brand, name); it does not duplicate.
Tests / acceptance:
- After running, the products collection has ~15 docs spanning 3+ categories; querying by category returns the right shapes.
Output: a unified diff plus the seed command to run it.

What success looks like

db.products.countDocuments() returns ~15 and db.products.distinct("category") lists your ≥3 categories (e.g. [ "sneakers", "tea", "chair" ]). Each doc carries name, brand, category, attributes, price, inStock — but no embedding yet (ingest adds it). Re-running the seed leaves the count unchanged: the (brand, name) upsert is idempotent, not duplicating.

Write the shared embedding-text builder and the normalize helper

Intermediate

Define two tiny functions once — embeddingText(...) that turns a product (or a photo descriptor) into one canonical string, and l2normalize(...) — because ingest, query, and the live worker must all embed the exact same way, or the catalog and query vectors stop being comparable.

One builder + one normalizer, reused everywhere — the comparability contract

The whole match rests on an invariant: the catalog side and the query side must be embedded identically. The cleanest way to guarantee that is to write the descriptor-to-text logic once and call it from every place that embeds — the ingest pass, the /recognize query, and (later) the change-stream worker. Put it in a small module (embeddingText in the spec’s internal/embedtext / src/embedText.ts). It takes the fields that describe a product — brand, name, category, and the attribute values — and joins them into one short string, the same way for a catalog document and for a Vision descriptor.

The second function is l2normalize. Because some embedding models only auto-normalize at their full dimensions, anything you embed at a smaller D (like 768 or 1536) may need to be divided by its own magnitude so cosine similarity is meaningful. Check your model’s documentation; if it requires it, wrap that logic in l2normalize(vector) and call it on every vector you produce — catalog and query — right after the embed call. With both helpers shared, “embed the same way” stops being a comment you hope everyone honours and becomes a function everyone calls.

The two shared helpers (language-agnostic)

Run these in your terminal / editor

embeddingText(p):                      # p is a product doc OR a Vision descriptor
  # attributes differ in shape: catalog = object {colour:"red", ...}; descriptor = array ["red", ...].
  # Normalize BOTH to the VALUES list first, so the two sides build the same string:
  values = isArray(p.attributes) ? p.attributes : valuesOf(p.attributes)
  attrs  = comma-join values         # sneakers: "red, leather, low-top"; tea: "grassy, medium"
  return "{p.brand} {p.name}, {p.category}, {attrs}"    # for a descriptor, name/brand may be ""

l2normalize(v):                        # call on EVERY embedding if D < full dims and model needs it
  m = sqrt(sum(x*x for x in v))
  return v if m == 0 else [x / m for x in v]

Chat prompt — paste into a chat to get the code

For a plain chat. It returns complete code; you paste it in yourself.

Role: Backend engineer. The reader has no repo here — return complete code in the selected backend (Go or TypeScript).
Context: Products and Vision descriptors share brand, name (optional for descriptors), category, and attribute values. The embedding model is read from config; the dimension D is a single config constant.
Task: Implement a shared module exporting embeddingText(input) and l2normalize(vector), to be imported by ingest, the /recognize query, and the change-stream worker.
Requirements:
- embeddingText builds one deterministic string from brand, name, category, and the attribute VALUES (stable order), tolerant of empty brand/name (a descriptor may omit them). Reconcile the two attribute shapes — a catalog doc's `attributes` is an object (use its values), a Vision descriptor's is a string array (use as-is) — so both sides produce the same string; otherwise comparability silently breaks.
- l2normalize divides the vector by its L2 magnitude; returns the input unchanged if the magnitude is zero.
- No I/O, no model call — pure functions, unit-testable.
Tests / acceptance:
- embeddingText of a {brand,name,category,attributes} doc and of an equivalent descriptor produce the same string when the fields match.
- l2normalize output has magnitude ~1.0 (within 1e-6) for any non-zero input.
Output: the complete module, no commentary.

What success looks like

The two unit tests pass, which is the comparability contract made checkable: embeddingText of the seeded {brand:"Northpeak", name:"Trailblazer Low", category:"sneakers", attributes:{colour:"red",material:"leather",style:"low-top"}} and of the equivalent Vision descriptor {category:"sneakers", attributes:["red","leather","low-top"]} build the same string ("Northpeak Trailblazer Low, sneakers, red, leather, low-top"). And l2normalize returns a vector whose magnitude is 1.0 ± 1e-6 — if the two strings ever diverge, the catalog and query sides stop being comparable and every later match is silently wrong.

Embed each product and store the vector on its document

Intermediate

For every product, build one short descriptive text, embed it with Gemini, and write the resulting vector back onto the document’s embedding field — this is what fills the searchable side of the match.

New in this step

ingest pass

The one-time (and re-runnable) job that embeds every product up front, so search compares against vectors that already exist.

embedding ingestion pipeline

embeddingHash

A fingerprint of the text a product was embedded from; if it is unchanged, the ingest can safely skip re-embedding that product.

content hash idempotency cache

sha256

A standard hash that turns any text into a fixed 64-character fingerprint — used here as the embeddingHash value.

sha256 hash function

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Ingest builds the searchable side of the match

Vector search compares vectors, so each product needs one. Use the shared embeddingText(...) builder from the previous step, send its output to the Gemini embedding model, then l2normalize the result and store it on embedding. Three rules make or break the match later: (1) the embedding’s length must equal the numDimensions you’ll set on the index, so choose an outputDimensionality D and keep it fixed for the whole catalog; (2) you must embed the query the same way you embed the catalog — same model, same dimensions, same normalization — or the vectors aren’t comparable; (3) if your chosen model does not auto-normalize at smaller dimensions, L2-normalize every vector (the helper you just wrote), or cosine scores come back distorted. Also store an embeddingHash (sha256 of the embedding text) so the ingest and the later live worker can skip a product whose descriptive text hasn’t changed. Keep the API key server-side.

Iterate products → text → embed → normalize → store (language-agnostic)

Run these in your terminal / editor

for doc in db.products.find():                  # iterate the real docs so _id is in scope
  text   = embeddingText(doc)                    # SAME shared builder as query + worker
  hash   = sha256(text)
  if doc.embeddingHash == hash: continue         # unchanged → skip (idempotent re-run)
  vector = gemini.embed(text, outputDimensionality = D)   # D fixed for the whole catalog
  if model_needs_norm: vector = l2normalize(vector)       # REQUIRED for valid cosine if not auto-normalized
  db.products.updateOne({ _id: doc._id }, { $set: { embedding: vector, embeddingHash: hash } })
# Use the SAME model + D + normalize step when you embed the query photo's descriptor later.

Chat prompt — paste into a chat to get the code

For a plain chat. It returns complete code; you paste it in yourself.

Role: Gemini + MongoDB ingest engineer. The reader has no repo here — return complete code.
Context: products collection in Atlas (MONGODB_URI, MONGODB_DB); GEMINI_API_KEY server-side; the selected backend (Go or TypeScript); the shared embeddingText(...) + l2normalize(...) helpers already exist.
Task: Implement an ingest pass that embeds every product and stores the vector + content hash on its document.
Requirements:
- Iterate db.products.find() so each doc's _id is in scope; build the embedding text with the shared embeddingText(doc).
- Call the current Gemini embedding model (read GEMINI_EMBED_MODEL from config) with a fixed outputDimensionality D.
- L2-normalize the returned vector if your chosen model does not auto-normalize at D, using the shared helper; store the float array on `embedding`.
- Compute hash = sha256(embeddingText); store it on `embeddingHash`; skip docs whose stored embeddingHash already equals the new hash (idempotent).
- D is a single config constant reused at query time; document that ingest and query MUST share model + D + normalization.
- Batch or rate-limit politely; keep the key server-side; link the official embeddings docs instead of hard-coding a model id.
Tests / acceptance (describe):
- After a run, every product has an `embedding` array of length D and a non-empty `embeddingHash`.
- Each stored embedding has L2 magnitude ~1.0 (normalized if the model required it).
- Re-running without data changes performs no re-embedding (hashes match).
Output: the complete ingest script, no commentary.

What success looks like

After one run, db.products.findOne() shows a new embedding array of length D (768) and a 64-char hex embeddingHash on every product; db.products.countDocuments({ embedding: { $exists: false } }) returns 0. Each vector’s magnitude is ~1.0 (you ran l2normalize), which is the precondition the cosine index assumes. Re-running prints “skipped” for every product (hashes match) and makes no Gemini calls — the embeddingHash guard is what makes ingest idempotent.

Create the Atlas Vector Search index

Intermediate

Define a vectorSearch index on the products collection: the embedding field as a vector, plus the metadata fields you’ll pre-filter on declared as filter — without this index there is nothing for $vectorSearch to query.

New in this step

vector search index

A special Atlas index over your embedding field that makes nearest-neighbour lookups fast; $vectorSearch only works once it exists.

atlas vector search index definition track ↗ docs ↗

numDimensions

The vector length the index expects; it must equal the embedding length you ingested with, or the query fails — the top setup mistake.

atlas vector search numDimensions docs ↗

filter field

A metadata field (here category/brand/inStock) declared in the index so it can be used to pre-filter inside the vector search.

atlas vector search filter field docs ↗

$vectorSearch

The aggregation stage that runs the nearest-neighbour search; it carries the query vector, the candidate pool, the limit, and any filter.

mongodb $vectorSearch aggregation stage track ↗ docs ↗

numCandidates

How many vectors the approximate search examines before returning the best limit; bigger means more accurate but slower (~20× limit to start).

atlas vector search numCandidates docs ↗

vectorSearchScore

The per-result similarity score, pulled into your output with $meta: "vectorSearchScore"; it is what you later threshold on.

mongodb vectorSearchScore $meta docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

The index is what makes vector search possible

$vectorSearch needs a special index, created in the Atlas UI or via the API on your collection. The definition is a fields array: one entry of type: "vector" for the embedding path — with numDimensions equal to your embedding length and similarity set to cosine (the usual choice for text/descriptor embeddings; euclidean and dotProduct are the alternatives) — and one type: "filter" entry for each metadata field you want to pre-filter on (category, brand, and inStock — the last one for the Visual Substitutes feature later). Only fields declared as filter can be used in the stage’s filter. Name the index products_vec — you reference that name in $vectorSearch. numDimensions must match the model output you chose at ingest; a mismatch is the single most common setup error. Docs: https://www.mongodb.com/docs/atlas/atlas-vector-search/

Vector Search index definition (JSON)

Run these in your terminal / editor

{
  "fields": [
    { "type": "vector", "path": "embedding", "numDimensions": 768, "similarity": "cosine" },
    { "type": "filter", "path": "category" },
    { "type": "filter", "path": "brand" },
    { "type": "filter", "path": "inStock" }
  ]
}

See it by hand in mongosh (early payoff)

Run these in your terminal / editor

# Query the index directly to prove it works before building the API.
# (Wait a minute or two for the Atlas index build to finish first).
# Use a real vector array from your own ingest output.
mongosh "$MONGODB_URI" --eval '
  db.getSiblingDB("catalens").products.aggregate([
    {
      $vectorSearch: {
        index: "products_vec",
        path: "embedding",
        queryVector: [0.1, 0.2, -0.1], // STUB — replace with a real D-dim (768) vector from your ingest output; a length mismatch errors
        numCandidates: 60,
        limit: 3,
        filter: { category: { $eq: "sneakers" } }
      }
    },
    { $project: { _id: 0, name: 1, brand: 1, score: { $meta: "vectorSearchScore" } } }
  ])
'

The spotlight, proven by hand before any backend exists

In the Atlas UI the products_vec index shows Status: Active (it takes a minute to build). With a real 768-float vector from your ingest output as queryVector, the aggregation returns up to 3 in-category docs, each with a score — best-first — straight from $meta:"vectorSearchScore":

[ { name: 'Trailblazer Low', brand: 'Northpeak', score: 0.91 },
  { name: 'Court Classic',   brand: 'Northpeak', score: 0.79 } ]

A queryVector length ≠ 768 errors with Path 'embedding' needs ... dimensions; an empty result means the index is still building or the category filter excluded everything. This scored list — no app yet — is the spotlight.

Design the recognise pipeline and the Vision descriptor schema

Intermediate

Sketch the end-to-end flow and lock down the responseSchema that turns a photo into a typed product descriptor — the contract the rest of the pipeline depends on.

New in this step

structured output

Telling Gemini to answer in JSON (responseMimeType: "application/json") instead of prose, so the result is parseable, not a caption.

gemini structured output responseMimeType json track ↗ docs ↗

responseSchema

A schema you pass with the request that pins the exact fields and types Gemini must return — here the typed product descriptor.

gemini api responseSchema track ↗ docs ↗

enum field

A schema field restricted to a fixed list of allowed values; constraining category to your seeded categories keeps Vision from inventing synonyms.

json schema enum

category pre-filter

Narrowing the vector search to one category before comparing vectors, inside the same query — faster and more precise, but it must match the catalog’s exact category names.

atlas vector search pre-filter

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

A typed descriptor keeps the AI honest

The recognise flow is a short pipeline: photo → Gemini Vision (structured descriptor) → embedding → $vectorSearch (with a pre-filter) → threshold → ranked matches or no-match. The fragile link is the first hop — a free-text image caption is unusable downstream. So constrain Vision with responseMimeType: "application/json" and a responseSchema: ask for brand, category, attributes, visibleText, colour, and form as typed fields. A typed descriptor gives you two things at once: a clean string to embed, and structured fields (category, brand) to drive the $vectorSearch pre-filter. Treat the output as best guess, not identification — the model can be wrong, which is exactly why the threshold and the no-confident-match path exist.

Constrain category to your catalog’s known values. The pre-filter does an exact $eq on category, so if Vision answers “shoe” or “footwear” while your catalog says “sneakers”, a correct pre-filter silently excludes the right product — a false no-match that looks like “not in catalog”. Close that gap two ways, both used: (1) make category an enum in the responseSchema listing exactly the categories you seeded, so the model must pick one of them; and (2) at query time, fall back to an unfiltered search if the category-filtered search returns nothing, so a mislabel degrades to “search everything” rather than a silent miss. The pre-filter stays a win — it just can’t be allowed to be a silent loss.

The Vision responseSchema (category constrained to the catalog enum)

Run these in your terminal / editor

{
  "type": "object",
  "properties": {
    "brand":       { "type": "string" },
    "category":    { "type": "string", "enum": ["sneakers", "tea", "chair"] },
    "colour":      { "type": "string" },
    "form":        { "type": "string" },
    "visibleText": { "type": "string" },
    "attributes":  { "type": "array", "items": { "type": "string" } }
  },
  "required": ["category"]
}

Scaffold the Go API and connect to Atlas

Go Beginner

Create a Go module, open a MongoDB client with the official v2 driver, and expose a POST /recognize skeleton that accepts an uploaded image — the entrypoint every later Go step builds on.

New in this step

go mod init

Creates the module (the versioned root every package imports from); github.com/you/catalens becomes your import path.

go modules go mod init track ↗ docs ↗

mongo-driver v2

The official Go MongoDB driver; one go get of its mongo package pulls the whole module, but you import mongo, options, and bson by their full paths.

mongodb go driver v2 track ↗ docs ↗

bson.ObjectID

The v2 type for a Mongo _id (renamed from v1’s primitive.ObjectID); you convert it to a string for the API response.

mongodb go driver v2 bson ObjectID docs ↗

http.ServeMux

Go’s standard request router; POST /recognize patterns route by method and path with no third-party framework.

go net/http ServeMux method routing track ↗ docs ↗

multipart/form-data

The HTTP body format for uploading a file (the photo) plus optional fields (a category hint) in one request.

http multipart form-data file upload

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Why the official v2 driver — and the three sub-packages you import

go.mongodb.org/mongo-driver/v2 is the current official driver. One go get of the module’s mongo sub-package pulls the whole module, but the code uses three of its packages, so all three go in your import block: .../v2/mongo (the client + mongo.Connect, mongo.Pipeline), .../v2/mongo/options (options.Client().ApplyURI(...)), and .../v2/bson (bson.D, bson.ObjectID — note v2 renamed primitive.ObjectID to bson.ObjectID). Open one mongo.Client at startup and reuse it; pass a context on every call. This step is also the app entrypoint that the spec calls cmd/api/main.go: it loads config, opens the client, registers routes, and shuts the client down. /recognize takes a multipart/form-data image (plus an optional category hint) and, for now, just decodes it and returns 200 — you fill in the pipeline next.

Set up the module (one go get pulls the whole module)

Run these in your terminal / editor

go mod init github.com/you/catalens
go get go.mongodb.org/mongo-driver/v2/mongo   # brings mongo, mongo/options, and bson
go get google.golang.org/genai                # Gemini Go SDK (Vision + embeddings), used next step

Client + /recognize skeleton (full import block)

Run these in your terminal / editor

// cmd/api/main.go (essentials) — the app entrypoint that composes everything
package main

import (
	"context"
	"io"
	"log"
	"net/http"
	"os"

	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
	// bson is imported in the recognise step where the aggregation is built:
	// "go.mongodb.org/mongo-driver/v2/bson"
)

func main() {
	client, err := mongo.Connect(options.Client().ApplyURI(os.Getenv("MONGODB_URI")))
	if err != nil { log.Fatal(err) }
	defer func() { _ = client.Disconnect(context.Background()) }()
	products := client.Database(os.Getenv("MONGODB_DB")).Collection("products")

	mux := http.NewServeMux()
	mux.HandleFunc("POST /recognize", func(w http.ResponseWriter, r *http.Request) {
		file, _, err := r.FormFile("image")
		if err != nil { http.Error(w, "image required", http.StatusBadRequest); return }
		defer file.Close()
		img, err := io.ReadAll(file)
		if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
		categoryHint := r.FormValue("category") // optional pre-filter hint
		_ = img; _ = categoryHint; _ = products
		// next step: Vision descriptor -> embed -> $vectorSearch -> threshold
		w.WriteHeader(http.StatusOK)
	})

	port := os.Getenv("PORT")
	if port == "" { port = "8080" }
	log.Fatal(http.ListenAndServe(":"+port, mux))
}

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior Go engineer in this repo.
Context: Atlas via MONGODB_URI + MONGODB_DB; module go.mongodb.org/mongo-driver/v2. The three sub-packages used across the build are mongo, mongo/options, and bson (one `go get .../v2/mongo` pulls the whole module). Gemini SDK is google.golang.org/genai.
Task: Scaffold cmd/api/main.go as the app entrypoint: a Mongo client and a POST /recognize that accepts a multipart image and an optional category field.
Requirements:
- Import the sub-packages you actually use (mongo, mongo/options; bson lands in the recognise step) with their full paths so the file compiles as-is.
- Connect once at startup with options.Client().ApplyURI(MONGODB_URI); Disconnect on shutdown; every DB call takes r.Context().
- POST /recognize reads the "image" file part (reject with 400 if missing) and an optional "category" form value; returns 200 for now.
- Read the listen port from PORT (default 8080); compose routes in main; no business logic yet.
Tests / acceptance:
- `go build ./...` passes (no undefined options/bson symbols); posting a multipart image to /recognize returns 200; posting none returns 400.
Output: a unified diff plus a one-line note on client reuse.

What success looks like

go build ./... compiles with all three sub-packages resolved (no undefined: options / undefined: bson). With the server running, curl -F image=@sample.jpg localhost:8080/recognize returns 200 (empty body for now), and curl -X POST localhost:8080/recognize with no file returns 400 with image required — the skeleton accepts the multipart and rejects a missing part before any pipeline exists.

Scaffold the TypeScript API and connect to Atlas

TypeScript Beginner

Create a Node service with the official mongodb driver and a POST /recognize skeleton that accepts an uploaded image — the entrypoint every later TypeScript step builds on.

New in this step

mongodb driver

The official Node package for talking to MongoDB; you create one client at startup and reuse it everywhere.

mongodb node driver track ↗ docs ↗

MongoClient

The driver’s connection object; new MongoClient(uri) then await connect() opens the pooled connection once.

mongodb node MongoClient connect docs ↗

Hono

A small, typed web framework used here for routing and parsing the uploaded form.

hono web framework track ↗ docs ↗

parseBody

Hono’s helper that reads a multipart/form-data request into fields and files — how you pull the image part out.

hono parseBody multipart form

multipart/form-data

The HTTP body format for uploading a file (the photo) plus optional fields (a category hint) in one request.

http multipart form-data file upload

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

One typed language to the client

The mongodb package is the official Node driver; create one MongoClient at startup and reuse it. Use a small web framework (Hono here) for routing and multipart parsing. /recognize accepts a multipart/form-data image plus an optional category hint and returns 200 for now — the pipeline lands in the next step. Keeping the backend in TypeScript means the same types describe the API the mobile client calls.

Set up the project

Run these in your terminal / editor

npm init -y
npm add mongodb hono @hono/node-server @google/genai

Client + /recognize skeleton

Run these in your terminal / editor

// src/server.ts
import { Hono } from "hono";
import { serve } from "@hono/node-server";
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_URI!);
await client.connect();
const products = client.db(process.env.MONGODB_DB).collection("products");

const app = new Hono();

app.post("/recognize", async (c) => {
  const body = await c.req.parseBody();
  const image = body["image"];
  if (!(image instanceof File)) return c.json({ error: "image required" }, 400);
  const bytes = Buffer.from(await image.arrayBuffer());
  const categoryHint = typeof body["category"] === "string" ? body["category"] : undefined;
  void bytes; void categoryHint; void products;
  // next step: Vision descriptor -> embed -> $vectorSearch -> threshold
  return c.body(null, 200);
});

serve({ fetch: app.fetch, port: 8080 });

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior TypeScript engineer in this repo.
Context: Atlas via MONGODB_URI + MONGODB_DB; official mongodb driver; Hono for routing.
Task: Scaffold src/server.ts with a MongoClient and POST /recognize accepting a multipart image and optional category.
Requirements:
- Create and connect the MongoClient once at startup; reuse the products collection handle.
- POST /recognize parses the "image" file part (400 if missing) and an optional "category" field; returns 200 for now.
- Port from PORT (default 8080); no business logic yet.
Tests / acceptance:
- The server starts; posting a multipart image to /recognize returns 200, posting none returns 400.
Output: a unified diff plus the run command.

What success looks like

npm start boots and connects the MongoClient once at startup (no per-request reconnect). curl -F image=@sample.jpg localhost:8080/recognize returns 200 (empty body), and a POST with no file returns 400 { "error": "image required" } — byte-for-byte the same contract as the Go skeleton, which is the parity the two backends hold to.

Extract the Vision descriptor from a photo with Gemini

Intermediate

Call the Gemini Vision model from your backend, passing the photo and the responseSchema to get back the structured JSON descriptor.

New in this step

generateContent

The Gemini call that sends content (your image + prompt) and returns the model’s answer — here the typed descriptor.

gemini api generateContent track ↗ docs ↗

inline image data

Sending the photo’s bytes inside the request (a Blob/inlineData part) instead of a URL, so no upload step is needed for a one-shot scan.

gemini api inline image data blob

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Constraining the model with responseSchema in code

You designed the JSON schema earlier; now you pass it to the SDK. By enforcing responseMimeType: "application/json" and translating the schema into the SDK’s type system, you guarantee Gemini returns a typed descriptor rather than prose. Don’t hard-code a model id — read the current Vision model from config (GEMINI_VISION_MODEL) and pick it from the official model list, since ids change.

Vision call (Go)

Run these in your terminal / editor

// Using the google.golang.org/genai SDK
// import "google.golang.org/genai"

client, err := genai.NewClient(ctx, &genai.ClientConfig{
	APIKey:  os.Getenv("GEMINI_API_KEY"),
	Backend: genai.BackendGeminiAPI,
})
if err != nil { /* handle */ }

model := os.Getenv("GEMINI_VISION_MODEL") // a current vision model id, from config

schema := &genai.Schema{
	Type: genai.TypeObject,
	Properties: map[string]*genai.Schema{
		"brand":       {Type: genai.TypeString},
		"category":    {Type: genai.TypeString, Enum: []string{"sneakers", "tea", "chair"}},
		"colour":      {Type: genai.TypeString},
		"form":        {Type: genai.TypeString},
		"visibleText": {Type: genai.TypeString},
		"attributes":  {Type: genai.TypeArray, Items: &genai.Schema{Type: genai.TypeString}},
	},
	Required: []string{"category"},
}

// imgBytes is the byte slice of the uploaded image
contents := []*genai.Content{
	genai.NewContentFromParts([]*genai.Part{
		{InlineData: &genai.Blob{Data: imgBytes, MIMEType: "image/jpeg"}},
		genai.NewPartFromText("Describe this product as JSON: brand, category, colour."),
	}, genai.RoleUser),
}

resp, err := client.Models.GenerateContent(ctx, model, contents, &genai.GenerateContentConfig{
	ResponseMIMEType: "application/json",
	ResponseSchema:   schema,
})
if err != nil { /* handle */ }
// resp.Candidates[0].Content.Parts[0].Text contains the JSON string

Vision call (TypeScript)

Run these in your terminal / editor

// Using the @google/genai SDK
// import { GoogleGenAI, Type } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// imgBuffer is the Buffer of the uploaded image
const response = await ai.models.generateContent({
  model: process.env.GEMINI_VISION_MODEL!, // a current vision model id, from config
  contents: [
    { inlineData: { mimeType: "image/jpeg", data: imgBuffer.toString("base64") } },
    "Describe this product as JSON: brand, category, colour."
  ],
  config: {
    responseMimeType: "application/json",
    responseSchema: {
      type: Type.OBJECT,
      properties: {
        brand:       { type: Type.STRING },
        category:    { type: Type.STRING, enum: ["sneakers", "tea", "chair"] },
        colour:      { type: Type.STRING },
        form:        { type: Type.STRING },
        visibleText: { type: Type.STRING },
        attributes:  { type: Type.ARRAY, items: { type: Type.STRING } },
      },
      required: ["category"],
    }
  }
});
// response.text contains the JSON string (text is a getter, not a method)

What success looks like

The Vision call returns the typed descriptor as a JSON string (parse it into your struct), not a caption — responseSchema guarantees the shape and forces category to one of your enum values:

{ "brand": "Northpeak", "category": "sneakers", "colour": "red",
  "form": "low-top", "visibleText": "", "attributes": ["leather"] }

The responseSchema constrains category to your enum, so the model picks one of sneakers/tea/chair rather than a synonym like "shoe" or "footwear" — so the next step’s $eq pre-filter fires reliably. An unrecognisable photo still returns the shape, just with vaguer values; treat it as best-guess, not identification.

★ Recognise a photo end to end (Go)

Go Advanced

Fill in /recognize: send the image to Gemini Vision for a structured descriptor, embed the descriptor, then run one $vectorSearch aggregation — pre-filtered by category — that returns ranked matches with their scores.

New in this step

ANN search

Approximate nearest-neighbour: $vectorSearch finds very likely closest vectors fast instead of scanning every one — the speed/accuracy trade numCandidates tunes.

approximate nearest neighbor search ANN docs ↗

$project

The aggregation stage that selects and reshapes output fields; here it drops _id, adds the string id, and surfaces the score.

mongodb aggregation $project stage track ↗ docs ↗

$toString

Converts the BSON _id to a plain string id in the response, so clients (and the substitutes feature) can reference a product stably.

mongodb $toString aggregation operator docs ↗

(1+cosine)/2 score

Atlas maps raw cosine [-1,1] into [0,1] as (1 + cosine) / 2, so an unrelated item floors near 0.5, not 0 — which is why a naive low threshold is meaningless.

atlas vector search cosine score normalization docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

This is the spotlight: one aggregation does the match

Everything converges here. Vision returns the typed descriptor; you build the same embedding text you used at ingest and embed it (same model, same dimensions); that query vector goes straight into a $vectorSearch stage. The stage carries five things: the index name, the path (embedding), the queryVector, a numCandidates (set it roughly 20× your limit for accuracy), and a limit. The filter — { category: { $eq: descriptor.category } } — is a pre-filter: Atlas narrows to the right category before the vector comparison, in the same pipeline, and it doesn’t distort the score. Source the category from the Vision descriptor, not the optional category form value (which is normally omitted), so the pre-filter actually fires on a real scan. A $project stage pulls each document plus its similarity via { $meta: "vectorSearchScore" } and renames _id to a string id (the substitutes feature looks products up by it). Note what isn’t here: no join, no second query, no app-side re-ranking — the heterogeneous catalog and the vector index do the work together. That co-location is why MongoDB scores 5 on this build.

The category gap: pre-filter, then fall back to unfiltered

The pre-filter is a win that must not become a silent loss. If Vision labels a photo "footwear" while your catalog says "sneakers" — or labels a genuinely unstocked item with a real category that simply has no close product — the category-filtered $vectorSearch can return zero documents, which looks identical to “not in catalog”. So when the filtered search comes back empty, retry the same search with no filter before declaring a no-match: a mislabel degrades to “search everything”, and only a truly out-of-catalog photo reaches the threshold step with nothing. Report which path ran via filterApplied — the category string when the filtered search produced the matches, or null when you fell back to unfiltered — so the client can show the spotlight pre-filter at work (and show it stepping aside when it had to).

The $vectorSearch pipeline (v2 driver)

Run these in your terminal / editor

// recognize: descriptor -> queryVector already computed (same model+dims as ingest)
// search runs the aggregation once; filterCategory == "" means no pre-filter.
search := func(filterCategory string) ([]bson.M, error) {
	vs := bson.D{
		{Key: "index", Value: "products_vec"},
		{Key: "path", Value: "embedding"},
		{Key: "queryVector", Value: queryVector}, // []float32 length D
		{Key: "numCandidates", Value: 100},        // ~20x limit
		{Key: "limit", Value: 5},
	}
	if filterCategory != "" {
		vs = append(vs, bson.E{Key: "filter", Value: bson.D{{Key: "category", Value: bson.D{{Key: "$eq", Value: filterCategory}}}}})
	}
	pipeline := mongo.Pipeline{
		bson.D{{Key: "$vectorSearch", Value: vs}},
		bson.D{{Key: "$project", Value: bson.D{
			{Key: "_id", Value: 0},                                         // suppress ObjectId so only string id appears
			{Key: "id", Value: bson.D{{Key: "$toString", Value: "$_id"}}}, // string id for /substitutes etc.
			{Key: "name", Value: 1}, {Key: "brand", Value: 1}, {Key: "category", Value: 1},
			{Key: "attributes", Value: 1}, {Key: "price", Value: 1}, {Key: "inStock", Value: 1},
			{Key: "score", Value: bson.D{{Key: "$meta", Value: "vectorSearchScore"}}},
		}}},
	}
	cur, err := products.Aggregate(r.Context(), pipeline)
	if err != nil { return nil, err }
	var out []bson.M
	return out, cur.All(r.Context(), &out)
}

// Pre-filter on the DESCRIPTOR'S category (not the optional form value, which is usually empty).
filterApplied := descriptor.Category
matches, err := search(filterApplied)
if err != nil { http.Error(w, err.Error(), 500); return }
if len(matches) == 0 && filterApplied != "" { // category gap: a mislabel degrades to "search everything"
	filterApplied = "" // record that we fell back to unfiltered
	if matches, err = search(""); err != nil { http.Error(w, err.Error(), 500); return }
}
// matches are ranked best-first, each with a "score" in [0,1] and a string "id"; threshold them next.
// filterApplied is the category string when the pre-filter produced these, or "" (-> null in JSON) on fallback.

Agent prompt — paste into an agent with repo access

Before you run it: you photograph a seeded red sneaker. Which value drives the pre-filter — the descriptor's `category` or the (empty) `category` form field? And with Atlas's (1+cosine)/2 mapping, will the right product's `score` land nearer 0.5 or 0.9?

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior Go engineer integrating Gemini + Atlas Vector Search in this repo.
Context: net/http API; go.mongodb.org/mongo-driver/v2 (mongo, bson, options); products collection has an `embedding` field and a "products_vec" Vector Search index; GEMINI_API_KEY server-side.
Task: Implement POST /recognize: image -> Gemini Vision descriptor (responseSchema) -> embed (same model+dims as ingest) -> $vectorSearch (category pre-filter, with an unfiltered fallback) -> ranked matches with scores.
Requirements:
- Vision call uses inline image data + responseMimeType "application/json" + a responseSchema for {brand, category, colour, form, visibleText, attributes[]}.
- Embed the descriptor text with the SAME model and outputDimensionality used at ingest; do not hard-code a model id — read it from config and link the official docs.
- Build the pipeline: $vectorSearch { index, path:"embedding", queryVector, numCandidates ~20x limit, limit, filter category $eq <descriptor.category> } then $project the fields plus id:{$toString:"$_id"} and score:{$meta:"vectorSearchScore"}.
- Source the pre-filter category from the Vision DESCRIPTOR'S category, not the optional `category` form value (which is normally omitted).
- Category gap: if the category-filtered search returns ZERO docs, retry the SAME search WITHOUT the filter before declaring anything; set filterApplied to the category when the filtered search produced the matches, or null after a fallback.
- Use r.Context() throughout; return JSON { descriptor, filterApplied, matches:[{id, ...product, score}] } ordered best-first.
- Frame results as best-match-by-similarity, never identification.
Tests / acceptance:
- A photo of a seeded product returns that product among the top matches with a string `id` and a score in [0,1]; filterApplied echoes its category.
- A mislabelled or empty-filtered category retries unfiltered (filterApplied:null) and still returns matches, not an error — only the threshold step decides no-match.
Output: a unified diff plus a one-paragraph note on why the match needs no join or second query.

What success looks like

curl -F image=@testdata/sneaker.jpg localhost:8080/recognize | jq returns the full envelope: the Vision descriptor, filterApplied:"sneakers" (the descriptor’s category drove the pre-filter), and matches ordered best-first, each with a string id and a score:

{ "descriptor": { "category": "sneakers", "colour": "red", "...": "..." },
  "filterApplied": "sneakers",
  "matches": [ { "id": "6630…", "name": "Trailblazer Low", "score": 0.88 } ],
  "noMatch": false }

The right product sits well above 0.5. One aggregation did the match — no join, no second query. If the category-filtered search returned nothing, filterApplied comes back null (the unfiltered fallback ran) but you still get ranked matches — only the threshold step decides no-match.

★ Recognise a photo end to end (TypeScript)

TypeScript Advanced

Implement the same /recognize pipeline with the Node driver: Vision descriptor → embed → one $vectorSearch aggregation, pre-filtered by category, returning ranked matches and scores.

New in this step

ANN search

Approximate nearest-neighbour: $vectorSearch finds very likely closest vectors fast instead of scanning every one — the speed/accuracy trade numCandidates tunes.

approximate nearest neighbor search ANN docs ↗

$project

The aggregation stage that selects and reshapes output fields; here it drops _id, adds the string id, and surfaces the score.

mongodb aggregation $project stage track ↗ docs ↗

$toString

Converts the BSON _id to a plain string id in the response, so clients (and the substitutes feature) can reference a product stably.

mongodb $toString aggregation operator docs ↗

(1+cosine)/2 score

Atlas maps raw cosine [-1,1] into [0,1] as (1 + cosine) / 2, so an unrelated item floors near 0.5, not 0 — which is why a naive low threshold is meaningless.

atlas vector search cosine score normalization docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Same pipeline, Node driver

The shape is identical to the Go path — only the syntax differs. Build the descriptor with a Vision call, embed it with the same model and dimensions as ingest, and pass the query vector into a $vectorSearch stage with index, path, queryVector, numCandidates (~20× limit), limit, and a filter pre-filter on the descriptor’s category. A $project renames _id to a string id and adds score: { $meta: "vectorSearchScore" }, and aggregate(...).toArray() returns the ranked matches. As in Go, if the category-filtered search returns nothing, retry it unfiltered before the threshold step sees anything, and report which path ran via filterApplied. Keeping the two backends behaviourally identical is the parity lesson: swap the language, the document-plus-vector store still carries the match.

The $vectorSearch pipeline (mongodb driver)

Run these in your terminal / editor

// recognize: queryVector already computed (same model+dims as ingest)
// search runs the aggregation once; pass "" to skip the pre-filter.
async function search(filterCategory: string) {
  const pipeline: object[] = [
    {
      $vectorSearch: {
        index: "products_vec",
        path: "embedding",
        queryVector,                 // number[] length D
        numCandidates: 100,          // ~20x limit
        limit: 5,
        ...(filterCategory ? { filter: { category: { $eq: filterCategory } } } : {}),
      },
    },
    {
      $project: {
        _id: 0,                      // suppress ObjectId so only string id appears
        id: { $toString: "$_id" },   // string id for /substitutes etc.
        name: 1, brand: 1, category: 1, attributes: 1, price: 1, inStock: 1,
        score: { $meta: "vectorSearchScore" },
      },
    },
  ];
  return products.aggregate(pipeline).toArray();
}

// Pre-filter on the DESCRIPTOR'S category (not the optional form value, which is usually empty).
let filterApplied: string | null = descriptor.category ?? null;
let matches = await search(filterApplied ?? "");
if (matches.length === 0 && filterApplied) {   // category gap: degrade to "search everything"
  filterApplied = null;                          // record that we fell back to unfiltered
  matches = await search("");
}
// matches are ranked best-first, each with a string `id` and a numeric `score` in [0,1]; threshold them next.
// filterApplied is the category when the pre-filter produced these, or null after a fallback.

Agent prompt — paste into an agent with repo access

You post the SAME sneaker photo to the Go handler and this one. Which fields of the JSON envelope must be identical, and which one (`score`) may differ slightly and why?

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior TypeScript engineer integrating Gemini + Atlas Vector Search in this repo.
Context: Hono/Node API; official mongodb driver; products collection has `embedding` and a "products_vec" Vector Search index; GEMINI_API_KEY server-side.
Task: Implement POST /recognize matching the Go version one-for-one: image -> Vision descriptor (responseSchema) -> embed -> $vectorSearch (category pre-filter, with an unfiltered fallback) -> ranked matches with scores.
Requirements:
- Vision call uses inline image data + responseMimeType "application/json" + a responseSchema for {brand, category, colour, form, visibleText, attributes[]}.
- Embed with the SAME model + outputDimensionality as ingest; do not hard-code a model id — read it from config and link the official docs.
- Pipeline: $vectorSearch { index, path:"embedding", queryVector, numCandidates ~20x limit, limit, filter category $eq <descriptor.category> } then $project id:{$toString:"$_id"} + fields + score:{$meta:"vectorSearchScore"}; aggregate(...).toArray().
- Source the pre-filter category from the Vision DESCRIPTOR'S category, not the optional `category` form value (which is normally omitted).
- Category gap: if the category-filtered search returns ZERO docs, retry the SAME search WITHOUT the filter before declaring anything; set filterApplied to the category when the filtered search produced the matches, or null after a fallback.
- Return JSON { descriptor, filterApplied, matches:[{id, ...product, score}] } best-first; frame as similarity, not identification.
Tests / acceptance:
- A seeded product's photo appears among the top matches with a string `id` and a score in [0,1]; filterApplied echoes its category; responses match the Go handler.
- A mislabelled or empty-filtered category retries unfiltered (filterApplied:null) and still returns matches, not an error — only the threshold step decides no-match.
Output: a unified diff plus a note on keeping Go and TypeScript responses identical.

What success looks like

The same curl -F image=@testdata/sneaker.jpg localhost:8080/recognize returns the same envelope as the Go handler — same field names, same filterApplied:"sneakers", the same product first, noMatch:false. The score may differ by a hair (floating-point in the embed/normalize path), but every field name and the ordering match: swap the language, the document-plus-vector store still carries the match. That parity is exactly what the integration tests later assert.

Apply the confidence threshold or return no match

Intermediate

Filter the ranked matches to those whose score clears a threshold T; if none do, return a clean “no confident match” instead of a wrong guess.

New in this step

confidence threshold T

The score cutoff that separates a real match from the nearest stranger; results at or above T are shown, below it is a no-match.

similarity score threshold cutoff

no-match as 200

Returning an honest empty result with HTTP 200 (not a 404 or error), so the client can still show what Vision saw — “I looked and found nothing confident” is success, not failure.

rest api return empty result not 404

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

The score is a confidence, so make a decision with it

$vectorSearch always returns up to limit nearest neighbours — even for a photo of something not in the catalog, the “nearest” is just less near. The vectorSearchScore (0 to 1 for cosine, higher is closer) is how you tell a real match from the nearest stranger. Apply a threshold T: keep matches with a score at or above T, ranked; if the list is empty, return a structured noMatch so the UI shows “no confident match” rather than a misleading product. Expose the raw score in the response so the client can show it and so you can tune T. Carry the descriptor and the filterApplied value from the recognise step into both branches of the response — the client shows what Vision saw and which pre-filter fired (or that it fell back) whether or not anything cleared T. This single decision — return ranked confident matches, or admit you don’t know — is what keeps the feature honest.

Threshold the ranked matches (language-agnostic)

Run these in your terminal / editor

results   = vectorSearch(...)                 # ranked, each with score in [0,1]; filterApplied set by recognise
confident = [ m for m in results if m.score >= T ]
if confident is empty:
    return { descriptor, filterApplied, matches: [], noMatch: true }     # HTTP 200, an honest "I don't know"
else:
    return { descriptor, filterApplied, matches: confident, noMatch: false }
# filterApplied is the category the pre-filter used, or null after an unfiltered fallback.
# expose each match's score so the client can display it and the slider can re-threshold.

Agent prompt — paste into an agent with repo access

You photograph an item with NO match in the catalog. With T=0.75 and Atlas's (1+cosine)/2 mapping, roughly what score do the nearest items get — and does the API return a 404, an error, or a 200? What is in the body?

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: /recognize returns ranked matches each with a `score` in [0,1], plus the Vision `descriptor` and a `filterApplied` value (the pre-filter category, or null after an unfiltered fallback).
Task: Add a confidence threshold T and a no-match path.
Requirements:
- Keep only matches with score >= T (T from config, default e.g. 0.75); preserve best-first order.
- If none clear T, return { descriptor, filterApplied, matches: [], noMatch: true } (HTTP 200, not an error); otherwise { descriptor, filterApplied, matches, noMatch: false }.
- Carry descriptor AND filterApplied into BOTH branches so the client can render them whether or not anything matched.
- Always include each match's numeric score in the JSON so the client can display it and re-threshold locally.
Tests / acceptance:
- A clear in-catalog photo returns at least one match above T with noMatch:false and filterApplied echoing the category.
- A photo of something absent returns matches:[] and noMatch:true, still carrying descriptor + filterApplied (no exception, no wrong product).
Output: a unified diff plus where T is configured.

What success looks like

With T=0.75, the in-catalog sneaker photo returns noMatch:false and at least one match whose score ≥ 0.75. A photo of an out-of-catalog item returns an HTTP 200 (not an error) with matches:[] and noMatch:true, still carrying the descriptor and filterApplied — same envelope, empty matches:

{ "descriptor": { "category": "tea", "...": "..." },
  "filterApplied": null, "matches": [], "noMatch": true }

The honest “I don’t know” is a 200 with the descriptor preserved, never a 404 or a low-confidence wrong product.

Calibrate the threshold: precision, recall, and the no-match UX

Intermediate

Pick T deliberately by looking at scores for known matches and known non-matches, and design what the learner sees when nothing clears it.

New in this step

precision

Of the matches you show, the fraction that are actually right; a high T raises precision but can hide real matches.

precision recall explained

recall

Of the right matches that exist, the fraction you actually show; a low T raises recall but lets strangers slip in.

precision recall explained

precision-recall trade-off

Raising T trades recall for precision and vice-versa; you calibrate T to sit in the gap between your known-match and known-stranger score clusters.

precision recall tradeoff threshold

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Threshold tuning is the real skill here

A threshold trades two errors against each other. Set T too high and real matches fall below it — high precision, low recall, lots of false “no match”. Set it too low and strangers sneak in — high recall, low precision, confident-looking wrong answers. Calibrate empirically: score a handful of photos you know are in the catalog and a handful you know aren’t, and put T in the gap between the two score clusters. Because cosine scores aren’t absolute across embedding models, re-calibrate if you change the model or dimensions. Then design the no-match experience as a first-class state, not an error: “No confident match — try a clearer photo or pick a category.” A slider that lets the learner move T live (the next screens add one) makes this trade-off something you can see, not just reason about.

The ~0.5 floor, seen in the scores themselves

Score your known and unknown photos and the two clusters separate visibly. In-catalog photos land well above the floor (often ≈0.8+) and cluster together; a genuinely out-of-catalog photo’s nearest stranger lands ≈0.50 — because Atlas maps cosine to (1 + cosine) / 2, so an orthogonal (unrelated) vector floors at (1 + 0) / 2 = 0.5, never 0. The real lesson is that those two clusters are visibly separate, and T=0.75 sits cleanly in the gap — between the floor and the in-catalog band — which is what makes the threshold meaningful. If you swap the embedding model or D, the clusters shift — re-score before trusting the old T.

Capture a photo and show ranked matches (Jetpack Compose)

Jetpack Compose Intermediate

Build the scan screen: capture with CameraX or pick from the gallery, POST the image to /recognize, and render ranked matches with their scores and a live threshold slider.

New in this step

CameraX

Android’s camera library for capturing the product photo; pair it with gallery-pick so it also works on a camera-less emulator.

android camerax capture photo track ↗ docs ↗

multipart upload

Sending the image bytes as multipart/form-data to /recognize from the app — the client side of the upload the backend already accepts.

android okhttp multipart form-data upload

LazyColumn

Compose’s scrolling list that renders only visible rows; used to show the ranked matches, keyed by each match’s id.

jetpack compose LazyColumn track ↗ docs ↗

Slider

The threshold control; dragging it re-filters the already-returned matches client-side, making precision-vs-recall visible.

jetpack compose Slider track ↗ docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

The payoff screen — and why a slider

This is the “something to see”. Capture a photo (CameraX) or pick one from the gallery, upload it as multipart/form-data to /recognize, and show the ranked results: each product with its details and a match score. Add a threshold slider that filters the already-returned list client-side, so dragging it shows precision versus recall move in real time — matches appear and disappear as the learner raises or lowers the cutoff. Above the list, render a small descriptor + filter panel: what Vision saw and the filterApplied value — the category the pre-filter used, or “searched all categories” when the backend fell back to an unfiltered search — so the spotlight is visible, not hidden in the response JSON. When the confident list is empty, show the no-confident-match state. Emulator cameras are limited, so ship a few bundled sample product photos alongside gallery-pick so every scan is free and reproducible.

Post the image, hold the threshold in state (Compose)

Run these in your terminal / editor

data class Match(
    val id: String, val name: String, val brand: String,
    val category: String, val score: Double,
)

@Composable
fun ScanResults(
    descriptor: Descriptor, filterApplied: String?,
    matches: List<Match>, threshold: Float, onThreshold: (Float) -> Unit,
) {
    Column(Modifier.fillMaxSize().padding(16.dp)) {
        // descriptor + filter panel: show the spotlight pre-filter at work
        Text("Vision saw: ${descriptor.brand} ${descriptor.colour} ${descriptor.category}")
        Text(if (filterApplied != null) "Pre-filtered to: $filterApplied"
             else "Searched all categories (no category filter)")
        Spacer(Modifier.height(8.dp))
        Text("Confidence ≥ ${"%.2f".format(threshold)}")
        Slider(value = threshold, onValueChange = onThreshold, valueRange = 0f..1f)
        val shown = matches.filter { it.score >= threshold }
        if (shown.isEmpty()) {
            Text("No confident match — try a clearer photo or pick a category")
        } else {
            LazyColumn {
                items(shown, key = { it.id }) { m ->
                    ListItem(
                        headlineContent = { Text("${m.brand} ${m.name}") },
                        supportingContent = { Text("${m.category} · score ${"%.2f".format(m.score)}") },
                    )
                }
            }
        }
    }
}

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Android engineer (Kotlin, Jetpack Compose, CameraX) in this repo.
Context: Backend POST /recognize accepts multipart { image, category? } and returns { descriptor, filterApplied, matches:[{id,name,brand,category,attributes,price,inStock,score}], noMatch }. filterApplied is the category the pre-filter used, or null when the backend fell back to an unfiltered search.
Task: Build a scan screen: capture (CameraX) or gallery-pick a photo, upload it, and render ranked matches with a live threshold slider.
Requirements:
- Upload the image as multipart/form-data; show a loading state; keep the API base URL configurable.
- Render each match with brand/name, category, and score, keyed by its `id`; a Slider (0..1) filters the shown matches by score client-side.
- Render a descriptor + filter panel: what Vision saw and filterApplied (the pre-filter category, or "searched all categories" when null) so the spotlight pre-filter is visible.
- Show a clear "no confident match" state when nothing clears the slider or the backend returns noMatch (still show the descriptor + filter panel).
- Bundle 2-3 sample product photos so the flow works without a real camera (emulator-friendly).
Tests / acceptance:
- ViewModel test: raising the slider above all scores yields the no-match state; lowering it reveals matches in best-first order.
Output: a unified diff plus the ViewModel state for image, matches, filterApplied, and threshold.

Capture a photo and show ranked matches (Flutter)

Flutter Intermediate

Build the same scan-to-results screen in Flutter: capture or pick an image, POST it to /recognize, and render ranked matches with a threshold slider.

New in this step

image_picker

The Flutter plugin for grabbing a photo from the camera or gallery; bundle sample assets too so it runs on any simulator.

flutter image_picker camera gallery track ↗ docs ↗

MultipartRequest

The http package’s way to send the image bytes as multipart/form-data to /recognize — the client side of the upload.

flutter http MultipartRequest file upload track ↗ docs ↗

ListView

Flutter’s scrolling list, used to render the ranked matches returned by the backend.

flutter ListView builder track ↗ docs ↗

Slider

The threshold control; its value re-filters the shown matches by score instantly, so precision-vs-recall is visible.

flutter Slider widget track ↗ docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Same flow, Dart

Use image_picker for camera/gallery capture and http (a MultipartRequest) to upload the image to /recognize. Hold the returned matches and the slider value in state; filter the shown list by score so the threshold slider re-filters instantly. Render the no-confident-match state when the filtered list is empty. Bundle a couple of sample asset images so the flow runs on any simulator.

Upload + threshold (Flutter)

Run these in your terminal / editor

Future<List<Match>> recognize(File image, {String? category}) async {
  final req = http.MultipartRequest('POST', Uri.parse('$baseUrl/recognize'))
    ..files.add(await http.MultipartFile.fromPath('image', image.path));
  if (category != null) req.fields['category'] = category;
  final res = await http.Response.fromStream(await req.send());
  final body = jsonDecode(res.body) as Map<String, dynamic>;
  return (body['matches'] as List).map((j) => Match.fromJson(j)).toList();
}

// in the widget:
//   Slider(value: threshold, min: 0, max: 1, onChanged: setThreshold)
//   final shown = matches.where((m) => m.score >= threshold).toList();

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Flutter engineer (Dart) in this repo.
Context: Backend POST /recognize accepts multipart { image, category? } and returns { descriptor, filterApplied, matches:[{id,name,brand,category,attributes,price,inStock,score}], noMatch }. filterApplied is the category the pre-filter used, or null when the backend fell back to an unfiltered search.
Task: Build a scan screen: capture/gallery-pick (image_picker), upload, and render ranked matches with a live threshold slider.
Requirements:
- Upload via http.MultipartRequest; loading state; configurable base URL.
- Each match shows brand/name, category, score and carries its `id`; a Slider (0..1) filters shown matches by score client-side.
- Render a descriptor + filter panel: what Vision saw and filterApplied (the pre-filter category, or "searched all categories" when null).
- Clear "no confident match" state when nothing clears the slider or the backend returns noMatch (still show the descriptor + filter panel).
- Bundle 2-3 sample asset photos so it runs without a camera.
Tests / acceptance:
- A widget/unit test: raising the slider past all scores shows the no-match state; lowering reveals best-first matches.
Output: a unified diff plus the state model for matches, filterApplied, and threshold.

Capture a photo and show ranked matches (SwiftUI)

SwiftUI Intermediate

Build the same scan-to-results screen in SwiftUI: capture or pick an image, upload it to /recognize, and render ranked matches with a threshold slider.

New in this step

PhotosPicker

SwiftUI’s photo-selection control for grabbing the product image; bundle sample assets for camera-less simulator runs.

swiftui PhotosPicker track ↗ docs ↗

URLSession multipart

Building a multipart/form-data body and POSTing the image to /recognize with URLSession — the client side of the upload.

swift URLSession multipart form-data upload

Codable

Swift’s JSON decoding protocol; you decode the { matches, noMatch } response straight into typed structs.

swift Codable decode json track ↗ docs ↗

Identifiable

The protocol that gives each Match a stable id so SwiftUI’s List can track rows — use the server’s id, not a fresh UUID.

swiftui Identifiable List track ↗ docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Same flow, Swift

Use PhotosPicker (or the camera) to get an image, upload it with URLSession as multipart/form-data, and decode { matches, noMatch } with Codable. Keep the matches and the threshold in an @Observable model; a Slider filters the shown matches by score. Show the no-confident-match state when the filtered list is empty, and bundle a few sample images in the asset catalog for camera-less runs.

Threshold-filtered results (SwiftUI)

Run these in your terminal / editor

struct Match: Codable, Identifiable {
    let id: String   // the server's string id (from _id) — used by /substitutes, stable across reloads
    let name: String; let brand: String; let category: String; let score: Double
}

struct ResultsView: View {
    let descriptor: Descriptor
    let filterApplied: String?
    let matches: [Match]
    @State private var threshold = 0.75
    var body: some View {
        VStack(alignment: .leading) {
            // descriptor + filter panel: show the spotlight pre-filter at work
            Text("Vision saw: \(descriptor.brand) \(descriptor.colour) \(descriptor.category)")
            Text(filterApplied.map { "Pre-filtered to: \($0)" } ?? "Searched all categories (no category filter)")
                .foregroundStyle(.secondary)
            Text(String(format: "Confidence ≥ %.2f", threshold))
            Slider(value: $threshold, in: 0...1)
            let shown = matches.filter { $0.score >= threshold }
            if shown.isEmpty {
                Text("No confident match — try a clearer photo or pick a category")
            } else {
                List(shown) { m in
                    VStack(alignment: .leading) {
                        Text("\(m.brand) \(m.name)")
                        Text("\(m.category) · score \(String(format: "%.2f", m.score))").foregroundStyle(.secondary)
                    }
                }
            }
        }
    }
}

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: iOS engineer (Swift, SwiftUI, Swift Concurrency) in this repo.
Context: Backend POST /recognize accepts multipart { image, category? } and returns { descriptor, filterApplied, matches:[{id,name,brand,category,attributes,price,inStock,score}], noMatch }. filterApplied is the category the pre-filter used, or null when the backend fell back to an unfiltered search.
Task: Build a scan screen: capture or PhotosPicker an image, upload it, and render ranked matches with a live threshold slider.
Requirements:
- Upload via URLSession multipart/form-data; @MainActor state updates; configurable base URL.
- Decode Match with the server's string `id` (do NOT fabricate a local UUID — the id is needed by /substitutes and keeps List identity stable).
- Each match shows brand/name, category, score; a Slider (0...1) filters shown matches by score client-side.
- Render a descriptor + filter panel: what Vision saw and filterApplied (the pre-filter category, or "searched all categories" when null).
- Clear "no confident match" state when nothing clears the slider or the backend returns noMatch (still show the descriptor + filter panel).
- Bundle 2-3 sample images in the asset catalog for camera-less runs.
Tests / acceptance:
- A unit test on the model: a threshold above all scores yields the no-match state; lowering reveals best-first matches.
Output: a unified diff plus the @Observable model definition.

Integration test: known photo matches, unknown returns no-match (Go)

Go Intermediate

Write a Go integration test against Atlas that proves the two outcomes: a known product photo matches above T, and an unknown photo returns no confident match.

New in this step

integration test

A test that exercises real components together (here the live Atlas index) rather than stand-ins — a mock can’t prove $vectorSearch, which exists only in Atlas.

integration test vs unit test track ↗ docs ↗

t.Skip

Marks a test skipped at runtime; skip when MONGODB_URI/GEMINI_API_KEY are unset so CI without infra stays green instead of failing.

go testing t.Skip track ↗ docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Test the real index, not a mock

$vectorSearch only exists in Atlas, so a mock can’t prove the match. Point the test at your M0 cluster (via MONGODB_URI): seed and embed a known product, post its bundled photo to /recognize, and assert it appears above T. Then post a photo of something not in the catalog and assert noMatch is true with an empty match list. Skip cleanly if MONGODB_URI or GEMINI_API_KEY is unset so the suite still runs without infra.

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior Go engineer in this repo.
Context: /recognize, ingest, and the products_vec index exist; Atlas via MONGODB_URI; GEMINI_API_KEY server-side.
Task: Add integration tests for the recognise pipeline.
Requirements:
- Seed + embed a known product; post its bundled sample photo to /recognize; assert that product is in matches with score >= T and noMatch=false.
- Post a photo of an out-of-catalog item; assert matches is empty and noMatch=true.
- t.Skip when MONGODB_URI or GEMINI_API_KEY is unset; isolate test data (unique collection or cleanup between runs).
Tests / acceptance:
- `go test ./... -run TestRecognize` passes against Atlas.
Output: a unified diff plus how the test isolates its catalog data.

What success looks like

go test ./... -run TestRecognize reports ok (or SKIP when MONGODB_URI/GEMINI_API_KEY is unset, so CI without infra stays green). Against the real Atlas index both assertions hold: the known product’s photo is in matches with score >= T and noMatch=false; the out-of-catalog photo yields matches:[] and noMatch=true. A mock could never produce this — $vectorSearch exists only in Atlas, so the test proves the actual index, not a stub.

Integration test: same assertions in the TypeScript stack

TypeScript Intermediate

Mirror the same two assertions in the TypeScript stack so both backends are proven equivalent: a known photo matches above T, an unknown returns no confident match.

New in this step

integration test

A test that exercises real components together (the live Atlas index) rather than stand-ins — a mock can’t prove $vectorSearch, which exists only in Atlas.

integration test vs unit test track ↗ docs ↗

node --test

Node’s built-in test runner (or vitest); run it against the same Atlas cluster as the Go suite so parity is asserted, not promised.

node built-in test runner node:test

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Parity is a test, not a promise

Run the same scenario with node --test (or vitest): seed and embed a known product, post its sample photo, assert it lands above T; then post an out-of-catalog photo and assert noMatch. Running both suites against the same Atlas cluster is what lets Catalens claim, honestly, that the document-plus-vector store — not the language — carries the match.

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior TypeScript engineer in this repo.
Context: /recognize, ingest, and the products_vec index exist; Atlas via MONGODB_URI; GEMINI_API_KEY server-side.
Task: Add integration tests matching the Go suite one-for-one.
Requirements:
- Seed + embed a known product; post its bundled sample photo; assert it is in matches with score >= T and noMatch=false.
- Post an out-of-catalog photo; assert matches empty and noMatch=true.
- Skip when MONGODB_URI or GEMINI_API_KEY is unset; isolate test data between runs.
Tests / acceptance:
- `npm test` passes against Atlas; assertions match the Go suite.
Output: a unified diff plus any flake mitigation (e.g. retry while the vector index warms up).

What success looks like

npm test passes the same two assertions against the same Atlas cluster: known photo → match with score >= T, noMatch=false; out-of-catalog photo → matches:[], noMatch=true — and it skips cleanly when the env vars are unset. Both suites green against one cluster is what lets Catalens claim, honestly, that the document-plus-vector store, not the language, carries the match.

Optional: deploy to Cloud Run (free) — never required to see it work

Intermediate

If you want a public URL, deploy the stateless API to Cloud Run pointing at your Atlas cluster — but everything above already runs free and locally, so this is optional.

New in this step

Cloud Run

A serverless host that runs your API container on demand; it fits a stateless /recognize endpoint and has a free monthly allotment.

google cloud run deploy container

scale to zero

With no traffic the service runs no instances (and costs nothing); it spins one up on the next request — why this stays free.

serverless scale to zero

Secret Manager

A managed store for the Mongo URI and Gemini key, injected at runtime so secrets never live in the image or the repo.

google cloud secret manager

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Optional, and free if you do it

Nothing about seeing Catalens work needs the cloud: the API runs locally against Atlas M0, and the mobile app talks to it. If you want a public endpoint, the recognise API is a stateless container, so Cloud Run fits — it scales to zero and has a generous free monthly allotment. The data stays in Atlas (M0 is fine); only the connection string and the GEMINI_API_KEY move into the service config. Keep the key server-side as a managed secret (Secret Manager), never in the app.

Deploy (optional, free-tier eligible)

Run these in your terminal / editor

gcloud run deploy catalens-api \
  --source . \
  --set-env-vars MONGODB_URI="$MONGODB_URI",MONGODB_DB=catalens,GEMINI_API_KEY="$GEMINI_API_KEY" \
  --region us-central1 --allow-unauthenticated
# For real deployments, store MONGODB_URI and GEMINI_API_KEY in Secret Manager and use --set-secrets instead.

Hybrid match: blend vector similarity with text relevance

Optional add-on Advanced

Combine $vectorSearch with Atlas Search text and attribute relevance so an exact brand or visible-text signal sharpens the ranking, not just visual similarity.

New in this step

$search

Atlas Search’s full-text stage over a Lucene index; here it ranks by brand/name/visibleText to add a text signal alongside the vector one.

mongodb atlas search $search stage track ↗ docs ↗

reciprocal rank fusion

A way to merge two ranked lists by summing 1 / (k + rank), so items ranked high by either search rise — no hand-tuned weights.

reciprocal rank fusion RRF hybrid search

$rankFusion

A newer Atlas stage that does reciprocal rank fusion for you; prefer the current official hybrid-search guide over hand-rolling it.

mongodb $rankFusion hybrid search track ↗ docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Two signals are better than one

Pure vector similarity can rank a look-alike from the wrong brand above the right product. Hybrid search fixes that by blending two retrievers: $vectorSearch (visual/semantic similarity) and Atlas Search ($search, a Lucene full-text index) over brand, name, and visibleText. Run both, then fuse their rankings — the common technique is reciprocal rank fusion (RRF): each result’s combined score sums 1 / (k + rank) across the two lists, so items ranked high by either signal rise. Recent Atlas versions expose a $rankFusion stage that does RRF for you; follow the current official hybrid-search guide rather than hand-tuning weights, and keep a single confidence cut on the fused score.

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: /recognize returns vector matches; products have brand, name, visibleText; Atlas supports $search and vector search.
Task: Add a hybrid ranker that fuses $vectorSearch with an Atlas Search text query over brand/name/visibleText.
Requirements:
- Run both retrievers on the descriptor (vector from the embedding, text from brand/name/visibleText).
- Fuse with reciprocal rank fusion (or the Atlas $rankFusion stage if your version has it); follow the official hybrid-search guide.
- Keep a single confidence cut on the fused score; preserve the no-match path.
Tests / acceptance:
- A query where a same-looking wrong-brand item beats the right one under pure vectors ranks the right one first under hybrid.
Output: a unified diff plus the fusion formula and any weights, with the official doc link.

Out of stock → visually similar substitutes

Optional add-on Intermediate

When a matched product is unavailable, run a pure $vectorSearch over the catalog to offer the closest in-stock alternatives.

New in this step

query-by-example vector

Using a product’s own stored embedding as the queryVector (no photo, no Gemini call) to ask “what looks like this?” — same index, new question.

vector search query by example

inStock filter

The filter: { inStock: true } pre-filter (using the index field declared earlier) so substitutes are limited to items a shopper can actually buy.

atlas vector search filter inStock docs ↗

favour recall

Deliberately lowering T here so more close-enough options surface — the opposite of strict identification, where precision wins.

recall over precision threshold

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

The same vector index, a different question

You already have the machinery; substitutes just ask it a new question. Take the matched (or out-of-stock) product’s own embedding as the query vector, run $vectorSearch excluding that product, and add an inStock: true pre-filter so only buyable items come back. The result is “things that look like this, that you can actually buy”, ranked by similarity. This is where you lower the threshold deliberately: for substitutes you want recall (show me close-enough options), the opposite of the strict precision you want for identification. Same index, same stage — a looser cutoff and an availability filter.

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: products have `embedding` and `inStock`; the products_vec index exists (add inStock as a filter field).
Task: Add GET /products/{id}/substitutes returning visually similar in-stock alternatives.
Requirements:
- Use the product's own embedding as the queryVector; $vectorSearch with filter { inStock: true } and exclude the product itself.
- Use a lower threshold than identification (favour recall); return ranked alternatives with scores.
- Declare inStock as a filter field in the vector index so the pre-filter is valid.
Tests / acceptance:
- For an out-of-stock product, the endpoint returns only in-stock items, ranked by similarity, excluding the original.
Output: a unified diff plus the chosen substitute threshold and why it differs from identification.

Recognise multiple products in one shelf photo

Optional add-on Advanced

Detect several products in a single shelf photo and run the recognise pipeline for each, returning a match list per detected item.

New in this step

object detection

Finding where each product is in the photo (not just describing one); the shelf flow detects many items, then recognises each.

object detection bounding boxes

bounding box

The rectangle marking a detected item’s location, returned per item so the UI can label which match belongs to which product on the shelf.

image bounding box coordinates

array responseSchema

A responseSchema whose top level is an array, so one Vision call returns many item descriptors to fan out through the per-item pipeline.

gemini responseSchema array of objects track ↗ docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Fan one photo into many recognitions

A shelf photo contains many products, so the single-product pipeline becomes a fan-out. Two honest approaches: (1) ask Gemini Vision to return an array of detected items — each with a short descriptor and an approximate bounding box — using a responseSchema whose top level is an array; or (2) detect regions first, then run the existing descriptor, embed, and $vectorSearch path per crop. Either way you reuse the same per-item pipeline, just N times, and return a list of { box, matches[] }. Keep each item’s threshold and no-match handling exactly as the single-product flow — a shelf with one unknown item should confidently match the rest and admit the one.

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: the single-product recognise pipeline (descriptor -> embed -> $vectorSearch -> threshold) exists.
Task: Add POST /recognize/shelf that recognises multiple products in one image.
Requirements:
- Use Gemini Vision with an array responseSchema to detect items (descriptor + approximate bounding box each), OR crop-then-recognise per region.
- Run the existing embed + $vectorSearch + threshold path per detected item; return [{ box, matches:[{...product, score}], noMatch }].
- Reuse the per-item threshold and no-match logic unchanged; cap the number of items to keep latency sane.
Tests / acceptance:
- A photo with 3 seeded products returns 3 entries, each matching the right product above T; an extra unknown item yields a noMatch entry, not a wrong match.
Output: a unified diff plus the per-item fan-out strategy and the item cap.

Barcode / label fast-path before vector matching

Optional add-on Intermediate

When a photo shows a barcode or clear label text, resolve it to an exact catalog product first, and only fall back to vector matching when there is no exact hit.

New in this step

unique index

An index on sku/barcode that both enforces no duplicates and makes the exact lookup instant.

mongodb unique index track ↗ docs ↗

exact lookup

A normal indexed findOne by the code — deterministic, free of model cost, and unambiguous when a code is present.

mongodb findOne indexed query docs ↗

fast-path

A cheap deterministic route tried first (the code lookup); only on a miss do you fall back to the costlier descriptor → embed → $vectorSearch path.

cheap path expensive path fallback pattern

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Exact beats approximate when you have it

Vector matching is for when you don’t have an identifier; when you do, use it. If the photo contains a barcode or legible label, read it — a barcode-scanning library on-device, or Gemini Vision’s visibleText from the descriptor you already extract — and look the product up by an exact field (sku or barcode) with a normal indexed query. An exact hit returns immediately with full confidence and skips the vector path entirely; only when there is no code, or no exact match, do you fall back to the descriptor, embed, and $vectorSearch path. This is a classic cheap-path/expensive-path design: the deterministic lookup is faster, free of model cost, and unambiguous, so it goes first.

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: products may carry a unique `sku`/`barcode`; the vector recognise pipeline exists; Vision returns visibleText.
Task: Add a barcode/label fast-path to /recognize.
Requirements:
- If the request carries a decoded barcode (from an on-device scanner) or Vision's visibleText yields a code, query products by exact sku/barcode (indexed) first.
- On an exact hit, return it immediately with full confidence and skip the vector pipeline; otherwise fall back to descriptor -> embed -> $vectorSearch -> threshold.
- Add a unique index on the code field; keep the existing no-match path for the fallback.
Tests / acceptance:
- A photo/label with a known code returns the exact product without calling the embedding/vector path.
- An item with no code or an unknown code falls back to vector matching and still thresholds correctly.
Output: a unified diff plus where the fast-path short-circuits the vector pipeline.

Keep the index live: watch the catalog with a change stream

Optional add-on Advanced

Decide up front that a product’s embedding is derived data that must follow its source: open a MongoDB change stream on products so the moment a document changes, a background worker can re-derive and rewrite its vector.

New in this step

derived data

A value computed from other fields (the embedding from a product’s text); it must be recomputed whenever its inputs change, or it goes stale.

derived data recompute on change

change stream

MongoDB’s real-time feed of writes; collection.watch() yields one event per insert/update so a worker can react the moment a product changes.

mongodb change streams watch track ↗ docs ↗

updateLookup

The fullDocument: "updateLookup" option, which attaches the current document to each update event — so the worker has the new text to embed without a second read.

mongodb change stream fullDocument updateLookup docs ↗

resume token

A bookmark carried in each event; persist it and pass it as resumeAfter on restart so the worker continues exactly where it stopped.

mongodb change stream resume token resumeAfter docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Why the index drifts, and what a change stream fixes — and it costs nothing

The embedding you stored at ingest is a snapshot of a product’s description at that moment. Edit the brand, rename it, add an attribute, swap the photo — and the stored vector now describes a product that no longer exists, so $vectorSearch quietly matches against stale text. The fix is to treat the embedding as derived data that must be recomputed whenever its inputs change. A change stream is MongoDB’s real-time feed of writes: collection.watch() opens a cursor that yields one event per insert/update/delete, each carrying a resume token (in the event’s _id) so a restarted worker continues exactly where it left off. Ask for fullDocument: "updateLookup" and each update event also includes the current document, so the worker has the new text to embed without a second read. The whole loop runs on the data you already provisioned.

Costs nothing. Atlas M0 is a 3-node replica set, and change streams run on it — the only free-tier restriction is on database-namespace filters (you filter on fields and collections, which is allowed), so a worker watching the products collection works on the free tier. Docs: https://www.mongodb.com/docs/manual/changeStreams/ · M0 limits: https://www.mongodb.com/docs/atlas/reference/free-shared-limitations/

The two hard parts: no infinite loop, and idempotency

Writing the fresh vector back is itself a write to products — which your change stream will see, which would trigger another re-embed, forever. Break the loop by watching only the changes that matter: pass a pipeline to watch() that $matches content edits and ignores the embedding write. Two robust guards, used together: (1) store a content hash of the embedding text on the document and skip any event whose hash is unchanged — so rewriting embedding (which doesn’t touch the hash) is a no-op; (2) $match on the event so embedding-only updates never reach the worker (e.g. require a content field in updateDescription.updatedFields, or filter operationType). Make the worker idempotent and debounced: a burst of edits to one product should collapse into a single re-embed of its final state, keyed by _id, so rapid saves don’t fan out into redundant Gemini calls.

What changes, and what the worker does (language-agnostic)

Run these in your terminal / editor

Watch products with a pipeline that ignores embedding-only writes:
  pipeline = [{ $match: {
    operationType: { $in: ["insert", "update", "replace"] },
    # only react when a CONTENT field changed (not the embedding/hash we write back):
    $or: [
      { "updateDescription.updatedFields.name":       { $exists: true } },
      { "updateDescription.updatedFields.brand":      { $exists: true } },
      { "updateDescription.updatedFields.category":   { $exists: true } },
      { "updateDescription.updatedFields.attributes": { $exists: true } },
      { "updateDescription.updatedFields.imageRef":   { $exists: true } },
      { operationType: { $in: ["insert", "replace"] } },   # full-doc writes
    ],
  }}]
On each event (fullDocument present via updateLookup):
  1. text = embeddingText(doc)                 # SAME builder as ingest
  2. hash = sha256(text)
  3. if doc.embeddingHash == hash: skip        # idempotent: nothing meaningful changed
  4. if the product's IMAGE changed: re-run the Vision descriptor first, fold it into text
  5. vector = gemini.embed(text, D)            # SAME model + D as ingest/query
  6. updateOne({_id}, {$set: {embedding: vector, embeddingHash: hash}})  # does NOT re-trigger
  7. persist the event's resume token so a restart continues from here

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Database administrator.
Context: MongoDB Atlas M0 reachable via MONGODB_URI; database MONGODB_DB="catalens".
Task: Seed the change-stream worker's state document.
Requirements:
- Upsert one document into a new "_worker_state" collection: { _id: "embeddings-worker", resumeToken: null } so the change-stream feature has its state row from the start.
Tests / acceptance:
- The _worker_state collection has exactly one { _id: "embeddings-worker" } doc.
Output: a unified diff plus the seed command to run it.

Run the re-embed worker (Go change stream)

Optional add-on Advanced

Build a background worker in Go that tails the products change stream with the v2 driver, re-embeds a product when its content changes, and writes the vector back — guarded so its own write never re-triggers it.

New in this step

mongo.ChangeStream

What collection.Watch(...) returns; drive it with stream.Next(ctx) + stream.Decode(&event) to read one change at a time.

mongodb go driver change stream Watch track ↗ docs ↗

SetFullDocument(UpdateLookup)

The option that makes each update event carry the current fullDocument, so the worker has the new text to embed.

mongodb go change stream SetFullDocument UpdateLookup docs ↗

SetResumeAfter

Passes a saved resume token on startup so a restarted worker continues without missing or re-doing an edit.

mongodb go change stream SetResumeAfter resume token docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

ChangeStream with the v2 driver, updateLookup, and a resume token

collection.Watch(ctx, pipeline, opts) returns a *mongo.ChangeStream; drive it with stream.Next(ctx) + stream.Decode(&event). Set options.ChangeStream().SetFullDocument(options.UpdateLookup) so each update event carries the current fullDocument to embed. Read stream.ResumeToken() after each handled event and persist it (a tiny _worker_state doc); on startup pass it back via SetResumeAfter(token) so a crash resumes without missing or re-doing work. The $match pipeline (built with bson.D) keeps embedding-only writes out of the stream, and the content-hash check makes the handler idempotent — together they close the infinite-loop door. Keep the API key server-side and reuse the same embeddingText builder and dimensions D as ingest. Go driver change streams: https://www.mongodb.com/docs/drivers/go/current/monitoring-and-logging/change-streams/

Tail the stream, re-embed, write back (v2 driver)

Run these in your terminal / editor

// worker/embeddings.go (essentials) — go.mongodb.org/mongo-driver/v2
match := bson.D{{Key: "$match", Value: bson.D{
	{Key: "operationType", Value: bson.D{{Key: "$in", Value: bson.A{"insert", "update", "replace"}}}},
}}}
opts := options.ChangeStream().SetFullDocument(options.UpdateLookup)
if tok := loadResumeToken(ctx, state); tok != nil {
	opts.SetResumeAfter(tok) // resume exactly where we stopped
}

stream, err := products.Watch(ctx, mongo.Pipeline{match}, opts)
if err != nil { log.Fatal(err) }
defer stream.Close(ctx)

for stream.Next(ctx) {
	var ev struct {
		FullDocument struct {
			ID            bson.ObjectID `bson:"_id"`
			EmbeddingHash string        `bson:"embeddingHash"`
		} `bson:"fullDocument"`
	}
	if err := stream.Decode(&ev); err != nil { continue }
	doc := ev.FullDocument
	text := embeddingText(doc)            // SAME builder as ingest
	if h := sha256Hex(text); h != doc.EmbeddingHash {
		vec := embed(ctx, text)            // SAME model + dimensions D as ingest/query
		_, _ = products.UpdateByID(ctx, doc.ID, bson.D{{Key: "$set",
			Value: bson.D{{Key: "embedding", Value: vec}, {Key: "embeddingHash", Value: h}}}})
		// writing embedding/embeddingHash does NOT match a content field -> no re-trigger
	}
	saveResumeToken(ctx, state, stream.ResumeToken()) // persist for restart
}

Chat prompt — paste into a chat to get the code

For a plain chat. It returns complete code; you paste it in yourself.

Role: Senior Go + MongoDB engineer building a change-stream worker. The reader has no repo here — return complete code.
Context: Atlas M0 (a replica set; change streams supported); go.mongodb.org/mongo-driver/v2 (mongo, bson, options); products collection with embedding + embeddingHash fields; the SAME embeddingText builder, embedding model, and dimensions D as the ingest step; GEMINI_API_KEY server-side.
Task: Implement a standalone worker that keeps embeddings in sync via a products change stream.
Requirements:
- Open products.Watch with SetFullDocument(options.UpdateLookup); pass a $match pipeline so only content changes (name/brand/category/attributes/imageRef, or insert/replace) produce events — never embedding-only writes.
- For each event: rebuild the embedding text, compute a content hash, and SKIP if it equals the stored embeddingHash (idempotent). If the product's image reference changed, re-run the Gemini Vision descriptor first and fold it into the text.
- Re-embed with the SAME model + dimensions D as ingest; write {embedding, embeddingHash} back with UpdateByID — this write must NOT re-trigger the worker (guarded by the $match + hash).
- Debounce per _id so a burst of edits collapses into one re-embed of the final state.
- Persist stream.ResumeToken() after each handled event and SetResumeAfter it on startup; do NOT hard-code a model id — read it from config and link the official embeddings docs.
Tests / acceptance (describe):
- Updating a product's brand causes exactly one re-embed; the new embedding length is still D.
- The worker's own write-back produces no further re-embed (no infinite loop).
- Killing and restarting the worker resumes from the saved token without missing an edit.
Output: the complete worker, no commentary.

Run the re-embed worker (TypeScript change stream)

Optional add-on Advanced

Build the same worker in TypeScript with the Node driver: collection.watch() with fullDocument: "updateLookup", re-embed on content changes, write the vector back, and guard against re-triggering.

New in this step

async-iterable change stream

collection.watch(...) returns a stream you consume with for await (const change of stream), one event per write.

mongodb node change stream async iterable track ↗ docs ↗

for await

The loop that pulls events from the async-iterable stream as they arrive, awaiting each one — the Node analogue of Go’s stream.Next.

javascript for await of async iterable track ↗ docs ↗

resumeAfter / change._id

Each event’s _id is its resume token; persist it and pass it as the resumeAfter option on startup so a restart continues cleanly.

mongodb node change stream resumeAfter resume token docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Same worker, node driver

The Node path mirrors the Go one. collection.watch(pipeline, { fullDocument: "updateLookup", resumeAfter }) returns an async-iterable change stream; consume it with for await (const change of stream). Each event’s resume token is change._id (also stream.resumeToken); persist it and pass it as resumeAfter on the next start. The same $match pipeline filters out embedding-only writes, and the same content-hash guard makes the handler idempotent — re-embed only when the rebuilt text’s hash differs from the stored embeddingHash. Reuse the ingest step’s embeddingText builder, model, and dimensions D so the worker and the query stay comparable. Keeping both workers behaviourally identical is the same parity lesson as the recognise pipeline. Node driver change streams: https://www.mongodb.com/docs/drivers/node/current/monitoring-and-logging/change-streams/

Tail the stream, re-embed, write back (mongodb driver)

Run these in your terminal / editor

// worker/embeddings.ts (essentials) — official mongodb driver
const pipeline = [{
  $match: {
    operationType: { $in: ["insert", "update", "replace"] },
    $or: [
      { "updateDescription.updatedFields.name": { $exists: true } },
      { "updateDescription.updatedFields.brand": { $exists: true } },
      { "updateDescription.updatedFields.category": { $exists: true } },
      { "updateDescription.updatedFields.attributes": { $exists: true } },
      { "updateDescription.updatedFields.imageRef": { $exists: true } },
      { operationType: { $in: ["insert", "replace"] } },
    ],
  },
}];
const resumeAfter = await loadResumeToken(state);            // undefined on first run
const stream = products.watch(pipeline, { fullDocument: "updateLookup", resumeAfter });

for await (const change of stream) {
  const doc = (change as any).fullDocument;
  if (!doc) continue;                                        // deleted between update & lookup
  const text = embeddingText(doc);                           // SAME builder as ingest
  const hash = sha256Hex(text);
  if (hash !== doc.embeddingHash) {
    const vector = await embed(text);                        // SAME model + dimensions D
    await products.updateOne({ _id: doc._id },
      { $set: { embedding: vector, embeddingHash: hash } }); // does NOT re-trigger (not a content field)
  }
  await saveResumeToken(state, change._id);                  // persist for restart
}

Chat prompt — paste into a chat to get the code

For a plain chat. It returns complete code; you paste it in yourself.

Role: Senior TypeScript + MongoDB engineer building a change-stream worker. The reader has no repo here — return complete code.
Context: Atlas M0 (a replica set; change streams supported); official mongodb driver; products collection with embedding + embeddingHash; the SAME embeddingText builder, model, and dimensions D as ingest; GEMINI_API_KEY server-side.
Task: Implement a standalone worker that keeps embeddings in sync via a products change stream, matching the Go worker one-for-one.
Requirements:
- collection.watch(pipeline, { fullDocument: "updateLookup", resumeAfter }); the $match pipeline lets only content changes (name/brand/category/attributes/imageRef, or insert/replace) through — never embedding-only writes.
- Consume with for-await; for each event rebuild the embedding text, compute a content hash, and SKIP if it equals the stored embeddingHash. If the image reference changed, re-run the Gemini Vision descriptor first.
- Re-embed with the SAME model + dimensions D as ingest; updateOne the {embedding, embeddingHash} back — this write must NOT re-trigger the worker.
- Debounce per _id; persist change._id (resume token) after each handled event and pass it as resumeAfter on startup.
- Do NOT hard-code a model id — read it from config and link the official embeddings docs.
Tests / acceptance (describe):
- Editing brand triggers exactly one re-embed; new embedding length is D; the write-back causes no further event.
- Restart resumes from the saved token; responses/behaviour match the Go worker.
Output: the complete worker, no commentary.

Tune numCandidates: trade latency against recall

Optional add-on Advanced

Atlas Vector Search is approximate (HNSW), so it doesn’t examine every vector. Learn the one knob that governs that approximation — numCandidates — and what raising or lowering it costs you.

New in this step

HNSW

The graph index Atlas walks for approximate search; it visits a bounded pool of vectors instead of all of them, which is why numCandidates matters.

HNSW hierarchical navigable small world index

ENN (exact:true)

Exact nearest-neighbour: setting exact: true scans every vector (perfect recall, slowest) — a ground-truth baseline to measure your ANN recall against on a small catalog.

atlas vector search exact ENN exhaustive docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

What numCandidates actually controls

$vectorSearch runs approximate nearest-neighbour (ANN) search over an HNSW graph index: instead of comparing your query against every product vector, it walks the graph and considers a bounded pool of candidates, then returns the best limit of them after applying any filter. numCandidates is the size of that pool. Bigger pool → the search explores more of the graph → it’s more likely to find the true nearest neighbours (higher recall) but does more work (higher latency). Smaller pool → faster, but it may miss a real match that the graph walk never visited. numCandidates is required for ANN search and applies only to ANN — set exact: true and you get exhaustive ENN (perfect recall, no numCandidates, slowest), which is mainly useful as a ground-truth baseline to measure your ANN recall against on a small catalog. The official guidance: set numCandidates at least 20× your limit (e.g. limit: 5 → numCandidates: 100) as a starting point, and it must be ≥ limit. Docs: https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/

A note on the pre-filter

The category filter from the ★ recognise step interacts with this knob: Atlas applies the filter while traversing the graph, so an aggressive filter over a small catalog can leave fewer than numCandidates qualifying vectors — exactly when bumping numCandidates (or, for a tiny in-category set, exact: true) keeps recall honest. The query call is otherwise unchanged from the recognise step, so this knob lives in the same $vectorSearch stage in both backends — only the integer differs (Go: numCandidates in the bson.D; TypeScript: numCandidates in the stage object).

The knob in the recognise pipeline (both backends)

Run these in your terminal / editor

# This is the SAME $vectorSearch stage from the ★ recognise step — only numCandidates is the variable.
$vectorSearch:
  index:         "products_vec"
  path:          "embedding"
  queryVector:   <query vector, length D>
  limit:         5
  numCandidates: 100          # start at ~20x limit; raise for recall, lower for latency
  filter:        { category: { $eq: hint } }   # applied during the graph walk

# Ground-truth baseline for measuring recall (small catalog only):
$vectorSearch:
  index: "products_vec"  path: "embedding"  queryVector: <...>  limit: 5
  exact: true            # exhaustive ENN — no numCandidates; use to score ANN recall against

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: /recognize runs a $vectorSearch over products_vec; numCandidates is currently a constant ~20x limit. Atlas Vector Search is ANN over HNSW; exact:true gives exhaustive ENN.
Task: Make numCandidates configurable and add a small recall-vs-latency harness so a value is chosen with evidence, not by guessing.
Requirements:
- Read numCandidates from config (default ~20x limit); validate it is >= limit; keep the existing filter + threshold behaviour.
- Add a benchmark over a fixed set of query photos that, for each numCandidates in a sweep (e.g. 25, 50, 100, 200, 400), records mean/p95 latency and recall@limit measured against an exact:true (ENN) baseline on the SAME queries.
- Print a small table (numCandidates, p95 latency ms, recall@limit) and recommend the smallest numCandidates whose recall is within a tolerance of the ENN baseline.
- Do not invent index fields; rely only on documented $vectorSearch fields (index, path, queryVector, numCandidates, limit, filter, exact).
Tests / acceptance:
- The harness runs against Atlas and emits the table; recall is non-decreasing as numCandidates rises and reaches ~1.0 by the ENN baseline.
- numCandidates < limit is rejected by config validation.
Output: a unified diff plus the chosen numCandidates and the recall/latency point that justifies it.

Measure recall against an exact baseline and pick a value

Optional add-on Intermediate

Turn the tuning into a decision: compare your approximate results to an exact (ENN) ground truth on the same queries, then choose the smallest numCandidates that holds the recall you need.

New in this step

recall@limit

Per query, the fraction of the exact top-limit results your approximate search also returned; average it over several photos to get a number you can trust.

recall at k vector search evaluation

the knee

The point where raising numCandidates stops buying much recall but keeps adding latency — the smallest value within your tolerance, and where you stop.

knee of curve diminishing returns

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

How to measure, not guess

Recall here is concrete: for each test query, run the ANN search and an exact: true ENN search with the same limit, and recall@limit is the fraction of the ENN top-limit that your ANN result also returned. Average it over a handful of representative query photos. Sweep numCandidates (say 25 → 400) and watch two curves: recall rises and flattens toward 1.0, while p95 latency rises roughly with the candidate pool. The sweet spot is the knee — the smallest numCandidates whose recall is within your tolerance of the ENN baseline (for a precision-sensitive shopping match, aim high). Because ENN is O(n), use it only as a measurement tool on a small catalog, never as the production path. Re-run the sweep if you change the embedding model, dimensions, or the catalog’s size — the knee moves with all three.

Turn no-matches into demand data

Optional add-on Intermediate

Every “no confident match” is a customer asking for something you don’t stock. Decide to log those misses — the descriptor and the near-miss scores — to a separate analytics collection instead of throwing them away.

New in this step

separate collection

Writing misses to search_misses, not products, so analytics writes never touch the catalog the recognise path reads.

mongodb separate collection write isolation track ↗ docs ↗

privacy-aware logging

Storing only the descriptor and scores (what was wanted), never the raw photo (who asked) — capture the demand signal without holding personal images.

privacy aware logging data minimization

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

The signal hiding in your failures — captured privately

The recognise pipeline already computes everything a demand signal needs: the Gemini Vision descriptor (what the customer photographed) and the scores of the nearest products (how close you came). When every score falls below the threshold T, that’s not just a UX dead-end — it’s a data point: someone wanted this, and you couldn’t match it. Write each miss to a dedicated search_misses collection: the descriptor fields (category, brand, colour, form, visibleText, attributes), the top near-miss {name, score} entries, and a timestamp. Over time that collection is a real-time ledger of unmet demand, far richer than a restock guess. Keep it privacy-aware: store the descriptor and scores, not the raw user image, by default — you’re logging what was wanted, not who asked. (If you ever need images for QA, make it opt-in and short-retention; the descriptor alone drives the analytics.) Use a separate collection so analytics writes never touch the catalog the recognise path reads.

The miss document (shape)

Run these in your terminal / editor

// search_misses collection — one doc per "no confident match"
{
  at:          ISODate("2026-06-20T10:21:00Z"),
  descriptor:  { category: "sneakers", brand: "Unknown", colour: "teal",
                 form: "high-top", visibleText: "", attributes: ["canvas", "striped"] },
  nearMisses:  [ { name: "Trailblazer Low", score: 0.62 },
                 { name: "Court Classic",   score: 0.58 } ],   // best below-T scores
  threshold:   0.75
  // NOTE: no raw image stored — descriptor + scores only (privacy-aware default)
}

Write the miss in the recognise path (Go)

Optional add-on Intermediate

In the Go /recognize handler, when nothing clears T, insert a miss document into search_misses before returning the clean no-match — fire-and-forget so analytics never slow or break the response.

New in this step

fire-and-forget

Kicking off the analytics insert and returning the response immediately; the customer’s no-match is never delayed by, or dependent on, the write.

fire and forget background write

goroutine

A lightweight concurrent function (go func(){...}()); it runs the insert off the request path with its own background context.

go goroutine concurrency track ↗ docs ↗

InsertOne

The driver call that writes one miss document into the search_misses collection; log its error, never return it to the client.

mongodb go driver InsertOne docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Log the miss without coupling it to the response

You already detect the no-match case (the confident list is empty). At that point, build the miss document from the descriptor you computed and the below-T matches you ranked, and InsertOne it into a separate search_misses collection. Do it without blocking or failing the request: the customer still gets their honest “no confident match” even if the analytics write errors, so log-and-ignore the insert error (or hand it to a small buffered channel / goroutine). Reuse the existing Mongo client — just a different collection handle — and never store the raw image. This is the only new code on the hot path; the ranking that consumes it is common and comes next.

Insert on no-match (v2 driver)

Run these in your terminal / editor

// inside POST /recognize, after thresholding, when confident is empty:
if len(confident) == 0 {
	miss := bson.D{
		{Key: "at", Value: time.Now()},
		{Key: "descriptor", Value: descriptor}, // the typed Vision descriptor
		{Key: "nearMisses", Value: topNearMisses(matches, 3)}, // [{name, score}] below T
		{Key: "threshold", Value: threshold},
	}
	go func() { // fire-and-forget: analytics must never break the response
		if _, err := misses.InsertOne(context.Background(), miss); err != nil {
			slog.Error("search_miss insert failed", "err", err)
		}
	}()
	// the no-match branch still carries descriptor + filterApplied (the same shape as a match response)
	writeJSON(w, recognizeResp{Descriptor: descriptor, FilterApplied: filterApplied, Matches: nil, NoMatch: true})
	return
}

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior Go engineer in this repo.
Context: POST /recognize computes a Vision descriptor and ranked matches with scores, then applies threshold T; the no-match branch returns { descriptor, filterApplied, matches: [], noMatch: true }. Mongo client is available; add a "search_misses" collection handle.
Task: On the no-match branch, record a miss document to search_misses without affecting the response.
Requirements:
- Build the miss from the descriptor (category/brand/colour/form/visibleText/attributes), the top 3 below-T {name, score} near-misses, the threshold, and a timestamp.
- Insert into a SEPARATE "search_misses" collection; the recognise path's catalog read is untouched.
- The write is fire-and-forget: an insert error is logged, never returned to the client; the no-match JSON is identical to before.
- Privacy-aware: do NOT store the raw uploaded image — descriptor + scores only.
Tests / acceptance:
- An out-of-catalog photo yields noMatch:true AND one new search_misses doc with the descriptor and below-T scores; no raw image field is present.
- Simulating an insert failure still returns the normal noMatch response.
Output: a unified diff plus where the analytics write is decoupled from the response.

Write the miss in the recognise path (TypeScript)

Optional add-on Intermediate

Mirror the miss-logging in the TypeScript /recognize handler: on the no-match branch, insert the descriptor and near-miss scores into search_misses without awaiting it on the response path.

New in this step

fire-and-forget

Starting the insert without await, so the no-match JSON returns immediately, independent of the write’s outcome.

javascript fire and forget promise not awaited

void with .catch

void promise.catch(log) deliberately ignores the result while still handling a rejection, so an analytics failure is logged, never thrown onto the response.

typescript void floating promise catch

insertOne

The driver call that writes one miss document into search_misses; identical in shape to the Go handler’s write for parity.

mongodb node driver insertOne docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

Same write, node driver

The TypeScript path matches the Go one. In the no-match branch, assemble the same miss document — descriptor, top below-T {name, score} near-misses, threshold, timestamp — and insertOne it into a separate search_misses collection handle. Don’t await it on the response path (or wrap it so a rejection is caught and logged): the no-match JSON returns regardless. Reuse the existing MongoClient, and store no raw image. Keeping both writes identical means the demand ledger is the same whichever backend is deployed — the same parity lesson as the recognise pipeline.

Insert on no-match (mongodb driver)

Run these in your terminal / editor

// inside POST /recognize, after thresholding, when confident.length === 0:
if (confident.length === 0) {
  const miss = {
    at: new Date(),
    descriptor,                                  // the typed Vision descriptor
    nearMisses: matches.slice(0, 3).map(m => ({ name: m.name, score: m.score })),
    threshold,
  };
  // fire-and-forget: analytics must never break the response
  void misses.insertOne(miss).catch(err => console.error("search_miss insert failed", err));
  // the no-match branch still carries descriptor + filterApplied (the same shape as a match response)
  return c.json({ descriptor, filterApplied, matches: [], noMatch: true });
}

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior TypeScript engineer in this repo.
Context: POST /recognize computes a descriptor and ranked scored matches, applies threshold T, and returns { descriptor, filterApplied, matches: [], noMatch: true } on a miss. The mongodb client is available; add a "search_misses" collection handle.
Task: On the no-match branch, record a miss document to search_misses, matching the Go handler.
Requirements:
- Build the miss from the descriptor, the top 3 below-T {name, score} near-misses, the threshold, and a timestamp.
- insertOne into a SEPARATE "search_misses" collection; do not touch the catalog read.
- Do not await on the response path (or catch+log the rejection): the no-match JSON is returned regardless of the insert outcome.
- Privacy-aware: store no raw image — descriptor + scores only.
Tests / acceptance:
- An out-of-catalog photo returns noMatch:true and creates one search_misses doc with descriptor + below-T scores; no image field; behaviour matches the Go handler.
- An insert rejection does not change the response.
Output: a unified diff plus a note on keeping the Go and TS miss documents identical.

Rank the most-wanted unstocked items

Optional add-on Intermediate

Aggregate search_misses into a leaderboard: group the misses by what was wanted (category, brand) and rank by how often it’s been requested — a real-time demand report.

New in this step

aggregation pipeline

A sequence of stages MongoDB runs over a collection to reshape and summarise it — here it turns raw misses into a demand report without app code.

mongodb aggregation pipeline track ↗ docs ↗

$group

Buckets documents by a key (category + brand) and computes per-bucket values like a request count and an average near-miss score.

mongodb aggregation $group stage docs ↗

$sort

Orders the grouped results — here by request count descending, so the most-wanted unstocked items come first.

mongodb aggregation $sort stage docs ↗

$limit

Caps the output to a top-N (e.g. 20), so the leaderboard stays a short, actionable list.

mongodb aggregation $limit stage docs ↗

Still fuzzy? Copy this into any AI chat — it explains, it doesn't do the step for you.

From a ledger of misses to a buying signal

A pile of miss documents only becomes useful when you summarise it. An aggregation pipeline does the whole job in the database: $group the misses by the descriptor fields that define “the same kind of thing” (descriptor.category plus descriptor.brand, optionally colour/form), $count each group, $sort descending, and $limit to a top-N. Add the average best near-miss score per group and you also learn how close you came — a group with high demand and high near-miss scores is a strong “stock something almost like this” signal, while high demand with low scores is genuinely new. This ranking is common to both backends (it’s one aggregation, no language-specific logic), and because it reads only search_misses it never burdens the recognise path. Surface it on an internal GET /analytics/top-misses endpoint or a scheduled report.

Most-requested unstocked items (aggregation)

Run these in your terminal / editor

db.search_misses.aggregate([
  // optional: { $match: { at: { $gte: ISODate("2026-06-01") } } },  // a time window
  { $group: {
      _id: { category: "$descriptor.category", brand: "$descriptor.brand" },
      requests:       { $sum: 1 },
      avgNearMiss:    { $avg: { $max: "$nearMisses.score" } },  // how close we got
      lastRequested:  { $max: "$at" },
  }},
  { $sort: { requests: -1 } },
  { $limit: 20 },
  // flatten to the contract's shape: { category, brand, requests, avgNearMiss, lastRequested }
  { $project: {
      _id: 0,
      category: "$_id.category", brand: "$_id.brand",
      requests: 1, avgNearMiss: 1, lastRequested: 1,
  }},
])

Agent prompt — paste into an agent with repo access

For Claude Code / Cursor / an agent that can read & edit this repo.

Role: Senior backend engineer in this repo (use the selected backend).
Context: a search_misses collection holds { at, descriptor, nearMisses:[{name,score}], threshold } documents.
Task: Add GET /analytics/top-misses returning the most-requested unstocked items.
Requirements:
- Aggregate search_misses: $group by descriptor.category + descriptor.brand; count requests; compute avg of each doc's best near-miss score; track lastRequested.
- $sort by requests desc, $limit to a top-N (default 20); accept an optional ?since date that adds a $match window.
- $project the grouped result to the FLAT contract shape { category, brand, requests, avgNearMiss, lastRequested } (lift category/brand out of the _id group key; drop _id) — do not return a nested _id object.
- Read ONLY search_misses (never the catalog); this ranking is identical across backends — keep it as one aggregation.
- Frame the output as demand signal: high requests + high avg near-miss = "stock something close"; high requests + low scores = "genuinely new".
Tests / acceptance:
- Seeding misses for the same (category, brand) several times ranks that pair at the top with the right count, each entry exposing flat category + brand fields (no nested _id).
- The ?since filter excludes older misses.
Output: a unified diff plus the aggregation and the endpoint shape.

Where to take it next

Go deep on the store that carries this whole build: the MongoDB track — documents, aggregation, and Atlas Search / Vector Search.
Sharpen the backend you chose: Go or TypeScript.
Build the recognise UI on another platform: Jetpack Compose, Flutter, or SwiftUI.
See the database contrast on the Compare page: MongoDB scores 5 here and Postgres 2 — the exact inverse of Aurora Commerce, where a transactional checkout makes Postgres the 5. And unlike Helix Assistant’s text RAG, Catalens matches a photo to product records, not a prose answer.

Why this stack

What you'll be able to do

TechFit — which tools actually suit this build

The build

Install the local toolchain you'll need

Stand up a free MongoDB Atlas M0 cluster

Get a free Gemini key and confirm Vision + embeddings respond

Model the flexible product catalog and seed it

Write the shared embedding-text builder and the normalize helper

Embed each product and store the vector on its document

Create the Atlas Vector Search index

Design the recognise pipeline and the Vision descriptor schema

Scaffold the Go API and connect to Atlas

Scaffold the TypeScript API and connect to Atlas

Extract the Vision descriptor from a photo with Gemini

★ Recognise a photo end to end (Go)

★ Recognise a photo end to end (TypeScript)

Apply the confidence threshold or return no match

Calibrate the threshold: precision, recall, and the no-match UX

Capture a photo and show ranked matches (Jetpack Compose)

Capture a photo and show ranked matches (Flutter)

Capture a photo and show ranked matches (SwiftUI)

Integration test: known photo matches, unknown returns no-match (Go)

Integration test: same assertions in the TypeScript stack

Optional: deploy to Cloud Run (free) — never required to see it work

Hybrid match: blend vector similarity with text relevance

Out of stock → visually similar substitutes

Recognise multiple products in one shelf photo

Barcode / label fast-path before vector matching

Keep the index live: watch the catalog with a change stream

Run the re-embed worker (Go change stream)

Run the re-embed worker (TypeScript change stream)

Tune numCandidates: trade latency against recall

Measure recall against an exact baseline and pick a value

Turn no-matches into demand data

Write the miss in the recognise path (Go)

Write the miss in the recognise path (TypeScript)

Rank the most-wanted unstocked items

Where to take it next