Pick your backend (Go or TypeScript) and frontend (Compose, Flutter, or SwiftUI) above — the steps
below adapt. Watch the spotlight: by the ★ recognise step, a photo becomes a query vector and a single
$vectorSearch aggregation finds every catalog product that clears a confidence threshold — vector
similarity and a category pre-filter in one query over documents that each carry their own attributes. The
backend language is a shell around that store; the match is MongoDB.
How this differs from Aurora Commerce. Same retail world, opposite database lesson. Aurora owns transactional truth — checkout as one ACID transaction where Postgres makes overselling impossible. Catalens owns the flexible catalog and the vector match — a read-mostly, schema-volatile collection where the win is similarity plus a document pre-filter in one aggregation. That is why Postgres scores 5 there and 2 here, and MongoDB the reverse. And how it differs from Helix Assistant. Helix is text RAG: retrieve chunks over pgvector and ground a prose answer. Catalens is multimodal: read a photo into a structured descriptor and match it to product records over Atlas Vector Search — entity matching with scores and a threshold, no chatbot.
Install the local toolchain you'll need
BeginnerInstall the few tools every step assumes — your backend runtime, the Mongo shell, and a sample image — so the on-ramp never breaks midway. Three quick setup steps stand between you and the first by-hand $vectorSearch match — knock them out once.
New in this step
mongosh The MongoDB Shell — a command-line client for running commands and queries against a cluster, used here for the connection smoke-test.
base64 A way to turn binary bytes (like an image) into plain text so they can ride inside a JSON request body to Gemini.
What this project needs on your machine — and it costs nothing
Nothing here is exotic, but two early steps assume tools that aren’t always present, so install them once now:
- Your backend runtime — the Go toolchain (1.22+, for the default path) or Node (20+, for the TypeScript path). Only the one you pick.
mongosh, the MongoDB Shell, for the connection smoke-test. Install it from the official download, or skip the install and use the built-in shell in the Atlas UI (Cluster → “Connect” → “MongoDB Shell”).curlandbase64(both ship with macOS and Linux) for the Gemini smoke-test.- A sample product JPEG you can match later — grab any product photo and save it as
sample.jpgin your working directory. Pick products you can actually photograph (or screenshot), because the match compares a text descriptor of your photo against a text descriptor of the catalog (more on that at ingest).
Costs nothing. Every tool here is free and open-source; no account or card is needed for this step.
Confirm the tools respond
go version # default backend — OR: node --version (TypeScript backend)
mongosh --version # or use the Atlas UI's built-in shell instead
curl --version
base64 --help 2>&1 | head -1 # present on macOS and Linux
ls sample.jpg # a product photo you saved (any product you can also photograph)Stand up a free MongoDB Atlas M0 cluster
BeginnerCreate a free Atlas M0 cluster and copy its connection string — Atlas Vector Search runs on the free tier, and a local mongod cannot build the index you need.
New in this step
MongoDB Atlas MongoDB’s fully-managed cloud database — the only place the Vector Search index this project needs can be built.
M0 free tier Atlas’s no-card free cluster; small but a real cluster, and Vector Search runs on it, so the whole project costs nothing.
Atlas Vector Search The Lucene-backed index that finds documents by vector similarity — the spotlight feature the entire build leans on.
replica set A small group of synced database copies; Atlas runs one even on M0, which is what enables features like change streams later.
SRV connection string The mongodb+srv://... address your code connects with; it carries the user, password, and cluster host in one line.
Why Atlas, not local mongod — and it costs nothing
The whole project hinges on Atlas Vector Search, a Lucene-backed vector index that is part of Atlas, not
the core database engine. A local mongod (or the Docker image) can store documents and even hold an
embedding array, but it cannot build the Atlas Vector Search index that $vectorSearch queries — so for
this build the database lives in Atlas from day one. The M0 free tier is a real replica set, and Atlas
Vector Search and Atlas Search both run on it, so the entire project is free to complete.
Costs nothing. M0 is free (no card), and the vector + text search indexes it supports are part of that tier — confirm the current limits on the Atlas docs.
Set your connection string
# From the Atlas UI: create a free M0 cluster, add a database user, allow your IP,
# then copy the SRV connection string into your environment:
export MONGODB_URI="mongodb+srv://USER:PASS@cluster0.xxxx.mongodb.net/?retryWrites=true&w=majority"
export MONGODB_DB="catalens"
# Ping it with mongosh (installed earlier) — or use the Atlas UI's built-in shell:
mongosh "$MONGODB_URI" --eval 'db.runCommand({ ping: 1 })' # -> { ok: 1 }What success looks like
The ping command returns { ok: 1 }, so the SRV string and your IP allow-list reach the cluster:
{ ok: 1 }An auth or DNS error here (bad auth, getaddrinfo ENOTFOUND) means the user, password, or allow-list is wrong — fix it now, every later step needs this connection.
Get a free Gemini key and confirm Vision + embeddings respond
BeginnerGet a free Google AI Studio key and make two quick calls: one that reads an image into JSON, and one that returns an embedding vector.
New in this step
Gemini Vision Google’s multimodal model reading an image and answering about it — here, turning a product photo into a structured description.
embedding vector A list of numbers a model produces from text so that similar meanings land near each other — what makes similarity search possible.
outputDimensionality How many numbers the embedding has (e.g. 768); you must fix one value and use it everywhere so all vectors are comparable.
L2-normalize Scaling a vector to length 1; required for valid cosine scores when the model does not already return unit-length vectors.
cosine similarity A measure of how aligned two vectors are, ignoring their length — the similarity metric this catalog match uses.
Two Gemini capabilities, one key — and it costs nothing
Catalens uses Gemini for two different jobs, both on the free tier of a single Google AI Studio key: Vision (turn a photo into a structured descriptor) and embeddings (turn text into a vector). Confirm both respond before building on them. Keep the key server-side — the mobile app always calls your backend, never Gemini directly. Don’t hard-code a model id in your app: ids change, so read the current ones from the official model list and pin them in config.
Costs nothing. A Google AI Studio key has a free tier that covers Vision and embeddings for this project. Docs: image understanding — https://ai.google.dev/gemini-api/docs/image-understanding · embeddings — https://ai.google.dev/gemini-api/docs/embeddings · models — https://ai.google.dev/gemini-api/docs/models · Go SDK — https://pkg.go.dev/google.golang.org/genai
Smoke-test both capabilities (REST, portable across macOS + Linux)
export GEMINI_API_KEY="...your AI Studio key..." # server-side only
# B64 of your sample image, portably: `base64 -w0` is GNU-only and fails on macOS/BSD,
# so strip newlines instead — this works everywhere:
IMG_B64=$(base64 < sample.jpg | tr -d '\n')
# 1) Vision: read an image into JSON. MODEL = a current vision model (see the models docs).
curl -s "https://generativelanguage.googleapis.com/v1beta/models/MODEL:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" -H 'Content-Type: application/json' -d '{
"contents":[{"parts":[
{"inlineData":{"mimeType":"image/jpeg","data":"'"$IMG_B64"'"}},
{"text":"Describe this product as JSON: brand, category, colour."}
]}],
"generationConfig":{"responseMimeType":"application/json"}
}'
# 2) Embeddings: turn text into a vector. EMBED_MODEL = a current embedding model.
curl -s "https://generativelanguage.googleapis.com/v1beta/models/EMBED_MODEL:embedContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" -H 'Content-Type: application/json' -d '{
"content":{"parts":[{"text":"red leather low-top sneaker"}]},
"outputDimensionality":768
}'What success looks like
Call 1 returns JSON in the response’s text part ({"brand":"…","category":"…","colour":"…"}), not prose — that’s responseMimeType:"application/json" doing its job. Call 2 returns an embedding.values array of exactly 768 floats (the outputDimensionality you pinned). If your chosen model is gemini-embedding-001, its 768-dim output is unnormalized (only its full 3072 dims auto-normalize); other models may differ — check the docs (the warning above covers this). A 403 means a bad key; an empty candidates array on call 1 means the model couldn’t parse the image — both must be clean before you build on either capability.
Model the flexible product catalog and seed it
BeginnerDesign each product as a MongoDB document whose attributes depend on its category, then seed a small sample catalog to match against — the products every later step will try to recognise.
New in this step
document MongoDB’s record: a JSON-like object that can hold nested fields, so one product can carry whatever its category needs.
heterogeneous documents Documents in one collection with different shapes (a sneaker vs a tea) — the flexibility that makes a relational schema fight back.
descriptor A short structured summary of a product (brand, category, attributes) that both the catalog and the photo get reduced to before matching.
idempotent upsert An insert-or-update keyed on (brand, name) so re-running the seed updates existing products instead of duplicating them.
Heterogeneous documents are the point
A real catalog isn’t uniform: a sneaker has a size run and a material, a tea has a flavour and a caffeine
level, a chair has dimensions and a finish. In Postgres that is a wide sparse table, an EAV side-table, or a
jsonb column you’ve stopped validating. In MongoDB each product is just a document with the fields its
category needs, plus a few common fields every match relies on — category, brand, name,
attributes, and (added next) an embedding vector. The shared fields are what you pre-filter on; the
per-category attributes are what make the catalog flexible. This heterogeneity is precisely why the document
store, not a relational schema, is load-bearing here.
Seed products you can actually photograph. The match (next steps) compares a text descriptor of your photo against a text descriptor of each catalog product — not pixels against pixels — so for the demo to return a real match, the seeded products must be things you can take (or screenshot/generate) a recognisable photo of. Pick a handful of clearly distinct, photographable items.
Why a text embedding of a descriptor matches a photo at all
A fair question: why not embed the image directly? Two honest roads to visual search exist — (1) a multimodal image embedding (compare photo-pixels to catalog-image-pixels), or (2) turn the photo into words with Vision, then compare those words to the catalog’s words. Catalens takes road (2) on purpose: it keeps the whole project on one free embedding model for catalog and query, gives you a human-readable descriptor you can debug, and makes every match explainable. The cost is the invariant this creates — you must build the same descriptor text and embed it the same way for both the catalog (ingest) and the photo (query), or the vectors aren’t comparable. That shared builder is the next step.
Two products, two shapes (same collection)
// products collection — each doc carries only the attributes its category needs
{
name: "Trailblazer Low",
brand: "Northpeak",
category: "sneakers",
attributes: { colour: "red", material: "leather", style: "low-top", sizes: [7, 8, 9, 10] },
price: 8900, // cents
inStock: true
// embedding: [ ... ] added at ingest
}
{
name: "Sencha Green",
brand: "LeafAndCo",
category: "tea",
attributes: { flavour: "grassy", caffeine: "medium", grams: 100, origin: "Uji" },
price: 1200,
inStock: true
}Agent prompt — paste into an agent with repo access
Role: Senior backend engineer in this repo (use the selected backend).
Context: MongoDB Atlas M0 reachable via MONGODB_URI; database MONGODB_DB="catalens"; collection "products".
Task: Write a seed script that inserts ~15 sample products across at least 3 categories with per-category attributes.
Requirements:
- Every doc has common fields: name, brand, category, attributes (object), price (int cents), inStock (bool).
- Categories differ in their attributes (e.g. sneakers: colour/material/sizes; tea: flavour/caffeine/grams).
- Pick visually-distinct, photographable products (a learner must be able to take/screenshot a matching photo).
- Keep `category` values to a small known set (e.g. sneakers, tea, chair) — a later step constrains Vision to this exact set.
- Leave room for an `embedding` field added by the ingest step; do not invent it here.
- Idempotent: a re-run upserts products by (brand, name); it does not duplicate.
Tests / acceptance:
- After running, the products collection has ~15 docs spanning 3+ categories; querying by category returns the right shapes.
Output: a unified diff plus the seed command to run it.What success looks like
db.products.countDocuments() returns ~15 and db.products.distinct("category") lists your ≥3 categories (e.g. [ "sneakers", "tea", "chair" ]). Each doc carries name, brand, category, attributes, price, inStock — but no embedding yet (ingest adds it). Re-running the seed leaves the count unchanged: the (brand, name) upsert is idempotent, not duplicating.
Embed each product and store the vector on its document
IntermediateFor every product, build one short descriptive text, embed it with Gemini, and write the resulting vector back onto the document’s embedding field — this is what fills the searchable side of the match.
New in this step
ingest pass The one-time (and re-runnable) job that embeds every product up front, so search compares against vectors that already exist.
embeddingHash A fingerprint of the text a product was embedded from; if it is unchanged, the ingest can safely skip re-embedding that product.
sha256 A standard hash that turns any text into a fixed 64-character fingerprint — used here as the embeddingHash value.
Ingest builds the searchable side of the match
Vector search compares vectors, so each product needs one. Use the shared embeddingText(...) builder from
the previous step, send its output to the Gemini embedding model, then l2normalize the result and store
it on embedding. Three rules make or break the match later: (1) the embedding’s length must equal the
numDimensions you’ll set on the index, so choose an outputDimensionality D and keep it fixed for the
whole catalog; (2) you must embed the query the same way you embed the catalog — same model, same
dimensions, same normalization — or the vectors aren’t comparable; (3) if your chosen model does not auto-normalize at smaller dimensions, L2-normalize every vector (the helper you just wrote), or cosine scores come back distorted.
Also store an embeddingHash (sha256 of the embedding text) so the ingest and the later live worker can
skip a product whose descriptive text hasn’t changed. Keep the API key server-side.
Iterate products → text → embed → normalize → store (language-agnostic)
for doc in db.products.find(): # iterate the real docs so _id is in scope
text = embeddingText(doc) # SAME shared builder as query + worker
hash = sha256(text)
if doc.embeddingHash == hash: continue # unchanged → skip (idempotent re-run)
vector = gemini.embed(text, outputDimensionality = D) # D fixed for the whole catalog
if model_needs_norm: vector = l2normalize(vector) # REQUIRED for valid cosine if not auto-normalized
db.products.updateOne({ _id: doc._id }, { $set: { embedding: vector, embeddingHash: hash } })
# Use the SAME model + D + normalize step when you embed the query photo's descriptor later.Chat prompt — paste into a chat to get the code
Role: Gemini + MongoDB ingest engineer. The reader has no repo here — return complete code.
Context: products collection in Atlas (MONGODB_URI, MONGODB_DB); GEMINI_API_KEY server-side; the selected backend (Go or TypeScript); the shared embeddingText(...) + l2normalize(...) helpers already exist.
Task: Implement an ingest pass that embeds every product and stores the vector + content hash on its document.
Requirements:
- Iterate db.products.find() so each doc's _id is in scope; build the embedding text with the shared embeddingText(doc).
- Call the current Gemini embedding model (read GEMINI_EMBED_MODEL from config) with a fixed outputDimensionality D.
- L2-normalize the returned vector if your chosen model does not auto-normalize at D, using the shared helper; store the float array on `embedding`.
- Compute hash = sha256(embeddingText); store it on `embeddingHash`; skip docs whose stored embeddingHash already equals the new hash (idempotent).
- D is a single config constant reused at query time; document that ingest and query MUST share model + D + normalization.
- Batch or rate-limit politely; keep the key server-side; link the official embeddings docs instead of hard-coding a model id.
Tests / acceptance (describe):
- After a run, every product has an `embedding` array of length D and a non-empty `embeddingHash`.
- Each stored embedding has L2 magnitude ~1.0 (normalized if the model required it).
- Re-running without data changes performs no re-embedding (hashes match).
Output: the complete ingest script, no commentary.What success looks like
After one run, db.products.findOne() shows a new embedding array of length D (768) and a 64-char hex embeddingHash on every product; db.products.countDocuments({ embedding: { $exists: false } }) returns 0. Each vector’s magnitude is ~1.0 (you ran l2normalize), which is the precondition the cosine index assumes. Re-running prints “skipped” for every product (hashes match) and makes no Gemini calls — the embeddingHash guard is what makes ingest idempotent.
Create the Atlas Vector Search index
IntermediateDefine a vectorSearch index on the products collection: the embedding field as a vector, plus the metadata fields you’ll pre-filter on declared as filter — without this index there is nothing for $vectorSearch to query.
New in this step
vector search index A special Atlas index over your embedding field that makes nearest-neighbour lookups fast; $vectorSearch only works once it exists.
numDimensions The vector length the index expects; it must equal the embedding length you ingested with, or the query fails — the top setup mistake.
filter field A metadata field (here category/brand/inStock) declared in the index so it can be used to pre-filter inside the vector search.
$vectorSearch The aggregation stage that runs the nearest-neighbour search; it carries the query vector, the candidate pool, the limit, and any filter.
numCandidates How many vectors the approximate search examines before returning the best limit; bigger means more accurate but slower (~20× limit to start).
vectorSearchScore The per-result similarity score, pulled into your output with $meta: "vectorSearchScore"; it is what you later threshold on.
The index is what makes vector search possible
$vectorSearch needs a special index, created in the Atlas UI or via the API on your collection. The
definition is a fields array: one entry of type: "vector" for the embedding path — with numDimensions
equal to your embedding length and similarity set to cosine (the usual choice for text/descriptor
embeddings; euclidean and dotProduct are the alternatives) — and one type: "filter" entry for each
metadata field you want to pre-filter on (category, brand, and inStock — the last one for the Visual
Substitutes feature later). Only fields declared as filter can be used in the stage’s filter. Name the
index products_vec — you reference that name in $vectorSearch. numDimensions must match the model output
you chose at ingest; a mismatch is the single most common setup error. Docs:
https://www.mongodb.com/docs/atlas/atlas-vector-search/
Vector Search index definition (JSON)
{
"fields": [
{ "type": "vector", "path": "embedding", "numDimensions": 768, "similarity": "cosine" },
{ "type": "filter", "path": "category" },
{ "type": "filter", "path": "brand" },
{ "type": "filter", "path": "inStock" }
]
}See it by hand in mongosh (early payoff)
# Query the index directly to prove it works before building the API.
# (Wait a minute or two for the Atlas index build to finish first).
# Use a real vector array from your own ingest output.
mongosh "$MONGODB_URI" --eval '
db.getSiblingDB("catalens").products.aggregate([
{
$vectorSearch: {
index: "products_vec",
path: "embedding",
queryVector: [0.1, 0.2, -0.1], // STUB — replace with a real D-dim (768) vector from your ingest output; a length mismatch errors
numCandidates: 60,
limit: 3,
filter: { category: { $eq: "sneakers" } }
}
},
{ $project: { _id: 0, name: 1, brand: 1, score: { $meta: "vectorSearchScore" } } }
])
'The spotlight, proven by hand before any backend exists
In the Atlas UI the products_vec index shows Status: Active (it takes a minute to build). With a real 768-float vector from your ingest output as queryVector, the aggregation returns up to 3 in-category docs, each with a score — best-first — straight from $meta:"vectorSearchScore":
[ { name: 'Trailblazer Low', brand: 'Northpeak', score: 0.91 },
{ name: 'Court Classic', brand: 'Northpeak', score: 0.79 } ]A queryVector length ≠ 768 errors with Path 'embedding' needs ... dimensions; an empty result means the index is still building or the category filter excluded everything. This scored list — no app yet — is the spotlight.
Design the recognise pipeline and the Vision descriptor schema
IntermediateSketch the end-to-end flow and lock down the responseSchema that turns a photo into a typed product descriptor — the contract the rest of the pipeline depends on.
New in this step
structured output Telling Gemini to answer in JSON (responseMimeType: "application/json") instead of prose, so the result is parseable, not a caption.
responseSchema A schema you pass with the request that pins the exact fields and types Gemini must return — here the typed product descriptor.
enum field A schema field restricted to a fixed list of allowed values; constraining category to your seeded categories keeps Vision from inventing synonyms.
category pre-filter Narrowing the vector search to one category before comparing vectors, inside the same query — faster and more precise, but it must match the catalog’s exact category names.
A typed descriptor keeps the AI honest
The recognise flow is a short pipeline: photo → Gemini Vision (structured descriptor) → embedding →
$vectorSearch (with a pre-filter) → threshold → ranked matches or no-match. The fragile link is the first
hop — a free-text image caption is unusable downstream. So constrain Vision with
responseMimeType: "application/json" and a responseSchema: ask for brand, category, attributes,
visibleText, colour, and form as typed fields. A typed descriptor gives you two things at once: a clean
string to embed, and structured fields (category, brand) to drive the $vectorSearch pre-filter. Treat
the output as best guess, not identification — the model can be wrong, which is exactly why the threshold
and the no-confident-match path exist.
Constrain category to your catalog’s known values. The pre-filter does an exact $eq on category, so
if Vision answers “shoe” or “footwear” while your catalog says “sneakers”, a correct pre-filter silently
excludes the right product — a false no-match that looks like “not in catalog”. Close that gap two ways, both
used: (1) make category an enum in the responseSchema listing exactly the categories you seeded, so
the model must pick one of them; and (2) at query time, fall back to an unfiltered search if the
category-filtered search returns nothing, so a mislabel degrades to “search everything” rather than a silent
miss. The pre-filter stays a win — it just can’t be allowed to be a silent loss.
The Vision responseSchema (category constrained to the catalog enum)
{
"type": "object",
"properties": {
"brand": { "type": "string" },
"category": { "type": "string", "enum": ["sneakers", "tea", "chair"] },
"colour": { "type": "string" },
"form": { "type": "string" },
"visibleText": { "type": "string" },
"attributes": { "type": "array", "items": { "type": "string" } }
},
"required": ["category"]
}Scaffold the Go API and connect to Atlas
Go BeginnerCreate a Go module, open a MongoDB client with the official v2 driver, and expose a POST /recognize skeleton that accepts an uploaded image — the entrypoint every later Go step builds on.
New in this step
go mod init Creates the module (the versioned root every package imports from); github.com/you/catalens becomes your import path.
mongo-driver v2 The official Go MongoDB driver; one go get of its mongo package pulls the whole module, but you import mongo, options, and bson by their full paths.
bson.ObjectID The v2 type for a Mongo _id (renamed from v1’s primitive.ObjectID); you convert it to a string for the API response.
http.ServeMux Go’s standard request router; POST /recognize patterns route by method and path with no third-party framework.
multipart/form-data The HTTP body format for uploading a file (the photo) plus optional fields (a category hint) in one request.
Why the official v2 driver — and the three sub-packages you import
go.mongodb.org/mongo-driver/v2 is the current official driver. One go get of the module’s mongo
sub-package pulls the whole module, but the code uses three of its packages, so all three go in your
import block: .../v2/mongo (the client + mongo.Connect, mongo.Pipeline), .../v2/mongo/options
(options.Client().ApplyURI(...)), and .../v2/bson (bson.D, bson.ObjectID — note v2 renamed
primitive.ObjectID to bson.ObjectID). Open one mongo.Client at startup and reuse it; pass a context on
every call. This step is also the app entrypoint that the spec calls cmd/api/main.go: it loads config,
opens the client, registers routes, and shuts the client down. /recognize takes a multipart/form-data
image (plus an optional category hint) and, for now, just decodes it and returns 200 — you fill in the
pipeline next.
Set up the module (one go get pulls the whole module)
go mod init github.com/you/catalens
go get go.mongodb.org/mongo-driver/v2/mongo # brings mongo, mongo/options, and bson
go get google.golang.org/genai # Gemini Go SDK (Vision + embeddings), used next stepClient + /recognize skeleton (full import block)
// cmd/api/main.go (essentials) — the app entrypoint that composes everything
package main
import (
"context"
"io"
"log"
"net/http"
"os"
"go.mongodb.org/mongo-driver/v2/mongo"
"go.mongodb.org/mongo-driver/v2/mongo/options"
// bson is imported in the recognise step where the aggregation is built:
// "go.mongodb.org/mongo-driver/v2/bson"
)
func main() {
client, err := mongo.Connect(options.Client().ApplyURI(os.Getenv("MONGODB_URI")))
if err != nil { log.Fatal(err) }
defer func() { _ = client.Disconnect(context.Background()) }()
products := client.Database(os.Getenv("MONGODB_DB")).Collection("products")
mux := http.NewServeMux()
mux.HandleFunc("POST /recognize", func(w http.ResponseWriter, r *http.Request) {
file, _, err := r.FormFile("image")
if err != nil { http.Error(w, "image required", http.StatusBadRequest); return }
defer file.Close()
img, err := io.ReadAll(file)
if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
categoryHint := r.FormValue("category") // optional pre-filter hint
_ = img; _ = categoryHint; _ = products
// next step: Vision descriptor -> embed -> $vectorSearch -> threshold
w.WriteHeader(http.StatusOK)
})
port := os.Getenv("PORT")
if port == "" { port = "8080" }
log.Fatal(http.ListenAndServe(":"+port, mux))
}Agent prompt — paste into an agent with repo access
Role: Senior Go engineer in this repo.
Context: Atlas via MONGODB_URI + MONGODB_DB; module go.mongodb.org/mongo-driver/v2. The three sub-packages used across the build are mongo, mongo/options, and bson (one `go get .../v2/mongo` pulls the whole module). Gemini SDK is google.golang.org/genai.
Task: Scaffold cmd/api/main.go as the app entrypoint: a Mongo client and a POST /recognize that accepts a multipart image and an optional category field.
Requirements:
- Import the sub-packages you actually use (mongo, mongo/options; bson lands in the recognise step) with their full paths so the file compiles as-is.
- Connect once at startup with options.Client().ApplyURI(MONGODB_URI); Disconnect on shutdown; every DB call takes r.Context().
- POST /recognize reads the "image" file part (reject with 400 if missing) and an optional "category" form value; returns 200 for now.
- Read the listen port from PORT (default 8080); compose routes in main; no business logic yet.
Tests / acceptance:
- `go build ./...` passes (no undefined options/bson symbols); posting a multipart image to /recognize returns 200; posting none returns 400.
Output: a unified diff plus a one-line note on client reuse.What success looks like
go build ./... compiles with all three sub-packages resolved (no undefined: options / undefined: bson). With the server running, curl -F image=@sample.jpg localhost:8080/recognize returns 200 (empty body for now), and curl -X POST localhost:8080/recognize with no file returns 400 with image required — the skeleton accepts the multipart and rejects a missing part before any pipeline exists.
Scaffold the TypeScript API and connect to Atlas
TypeScript BeginnerCreate a Node service with the official mongodb driver and a POST /recognize skeleton that accepts an uploaded image — the entrypoint every later TypeScript step builds on.
New in this step
mongodb driver The official Node package for talking to MongoDB; you create one client at startup and reuse it everywhere.
MongoClient The driver’s connection object; new MongoClient(uri) then await connect() opens the pooled connection once.
Hono A small, typed web framework used here for routing and parsing the uploaded form.
parseBody Hono’s helper that reads a multipart/form-data request into fields and files — how you pull the image part out.
multipart/form-data The HTTP body format for uploading a file (the photo) plus optional fields (a category hint) in one request.
One typed language to the client
The mongodb package is the official Node driver; create one MongoClient at startup and reuse it. Use a
small web framework (Hono here) for routing and multipart parsing. /recognize accepts a
multipart/form-data image plus an optional category hint and returns 200 for now — the pipeline lands in
the next step. Keeping the backend in TypeScript means the same types describe the API the mobile client
calls.
Set up the project
npm init -y
npm add mongodb hono @hono/node-server @google/genaiClient + /recognize skeleton
// src/server.ts
import { Hono } from "hono";
import { serve } from "@hono/node-server";
import { MongoClient } from "mongodb";
const client = new MongoClient(process.env.MONGODB_URI!);
await client.connect();
const products = client.db(process.env.MONGODB_DB).collection("products");
const app = new Hono();
app.post("/recognize", async (c) => {
const body = await c.req.parseBody();
const image = body["image"];
if (!(image instanceof File)) return c.json({ error: "image required" }, 400);
const bytes = Buffer.from(await image.arrayBuffer());
const categoryHint = typeof body["category"] === "string" ? body["category"] : undefined;
void bytes; void categoryHint; void products;
// next step: Vision descriptor -> embed -> $vectorSearch -> threshold
return c.body(null, 200);
});
serve({ fetch: app.fetch, port: 8080 });Agent prompt — paste into an agent with repo access
Role: Senior TypeScript engineer in this repo.
Context: Atlas via MONGODB_URI + MONGODB_DB; official mongodb driver; Hono for routing.
Task: Scaffold src/server.ts with a MongoClient and POST /recognize accepting a multipart image and optional category.
Requirements:
- Create and connect the MongoClient once at startup; reuse the products collection handle.
- POST /recognize parses the "image" file part (400 if missing) and an optional "category" field; returns 200 for now.
- Port from PORT (default 8080); no business logic yet.
Tests / acceptance:
- The server starts; posting a multipart image to /recognize returns 200, posting none returns 400.
Output: a unified diff plus the run command.What success looks like
npm start boots and connects the MongoClient once at startup (no per-request reconnect). curl -F image=@sample.jpg localhost:8080/recognize returns 200 (empty body), and a POST with no file returns 400 { "error": "image required" } — byte-for-byte the same contract as the Go skeleton, which is the parity the two backends hold to.
Extract the Vision descriptor from a photo with Gemini
IntermediateCall the Gemini Vision model from your backend, passing the photo and the responseSchema to get back the structured JSON descriptor.
New in this step
generateContent The Gemini call that sends content (your image + prompt) and returns the model’s answer — here the typed descriptor.
inline image data Sending the photo’s bytes inside the request (a Blob/inlineData part) instead of a URL, so no upload step is needed for a one-shot scan.
Constraining the model with responseSchema in code
You designed the JSON schema earlier; now you pass it to the SDK. By enforcing responseMimeType: "application/json" and translating the schema into the SDK’s type system, you guarantee Gemini returns a typed descriptor rather than prose. Don’t hard-code a model id — read the current Vision model from config (GEMINI_VISION_MODEL) and pick it from the official model list, since ids change.
Vision call (Go)
// Using the google.golang.org/genai SDK
// import "google.golang.org/genai"
client, err := genai.NewClient(ctx, &genai.ClientConfig{
APIKey: os.Getenv("GEMINI_API_KEY"),
Backend: genai.BackendGeminiAPI,
})
if err != nil { /* handle */ }
model := os.Getenv("GEMINI_VISION_MODEL") // a current vision model id, from config
schema := &genai.Schema{
Type: genai.TypeObject,
Properties: map[string]*genai.Schema{
"brand": {Type: genai.TypeString},
"category": {Type: genai.TypeString, Enum: []string{"sneakers", "tea", "chair"}},
"colour": {Type: genai.TypeString},
"form": {Type: genai.TypeString},
"visibleText": {Type: genai.TypeString},
"attributes": {Type: genai.TypeArray, Items: &genai.Schema{Type: genai.TypeString}},
},
Required: []string{"category"},
}
// imgBytes is the byte slice of the uploaded image
contents := []*genai.Content{
genai.NewContentFromParts([]*genai.Part{
{InlineData: &genai.Blob{Data: imgBytes, MIMEType: "image/jpeg"}},
genai.NewPartFromText("Describe this product as JSON: brand, category, colour."),
}, genai.RoleUser),
}
resp, err := client.Models.GenerateContent(ctx, model, contents, &genai.GenerateContentConfig{
ResponseMIMEType: "application/json",
ResponseSchema: schema,
})
if err != nil { /* handle */ }
// resp.Candidates[0].Content.Parts[0].Text contains the JSON stringVision call (TypeScript)
// Using the @google/genai SDK
// import { GoogleGenAI, Type } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// imgBuffer is the Buffer of the uploaded image
const response = await ai.models.generateContent({
model: process.env.GEMINI_VISION_MODEL!, // a current vision model id, from config
contents: [
{ inlineData: { mimeType: "image/jpeg", data: imgBuffer.toString("base64") } },
"Describe this product as JSON: brand, category, colour."
],
config: {
responseMimeType: "application/json",
responseSchema: {
type: Type.OBJECT,
properties: {
brand: { type: Type.STRING },
category: { type: Type.STRING, enum: ["sneakers", "tea", "chair"] },
colour: { type: Type.STRING },
form: { type: Type.STRING },
visibleText: { type: Type.STRING },
attributes: { type: Type.ARRAY, items: { type: Type.STRING } },
},
required: ["category"],
}
}
});
// response.text contains the JSON string (text is a getter, not a method)What success looks like
The Vision call returns the typed descriptor as a JSON string (parse it into your struct), not a caption — responseSchema guarantees the shape and forces category to one of your enum values:
{ "brand": "Northpeak", "category": "sneakers", "colour": "red",
"form": "low-top", "visibleText": "", "attributes": ["leather"] }The responseSchema constrains category to your enum, so the model picks one of sneakers/tea/chair rather than a synonym like "shoe" or "footwear" — so the next step’s $eq pre-filter fires reliably. An unrecognisable photo still returns the shape, just with vaguer values; treat it as best-guess, not identification.
★ Recognise a photo end to end (Go)
Go AdvancedFill in /recognize: send the image to Gemini Vision for a structured descriptor, embed the descriptor, then run one $vectorSearch aggregation — pre-filtered by category — that returns ranked matches with their scores.
New in this step
ANN search Approximate nearest-neighbour: $vectorSearch finds very likely closest vectors fast instead of scanning every one — the speed/accuracy trade numCandidates tunes.
$project The aggregation stage that selects and reshapes output fields; here it drops _id, adds the string id, and surfaces the score.
$toString Converts the BSON _id to a plain string id in the response, so clients (and the substitutes feature) can reference a product stably.
(1+cosine)/2 score Atlas maps raw cosine [-1,1] into [0,1] as (1 + cosine) / 2, so an unrelated item floors near 0.5, not 0 — which is why a naive low threshold is meaningless.
This is the spotlight: one aggregation does the match
Everything converges here. Vision returns the typed descriptor; you build the same embedding text you used at
ingest and embed it (same model, same dimensions); that query vector goes straight into a $vectorSearch
stage. The stage carries five things: the index name, the path (embedding), the queryVector, a
numCandidates (set it roughly 20× your limit for accuracy), and a limit. The filter —
{ category: { $eq: descriptor.category } } — is a pre-filter: Atlas narrows to the right category
before the vector comparison, in the same pipeline, and it doesn’t distort the score. Source the category
from the Vision descriptor, not the optional category form value (which is normally omitted), so the
pre-filter actually fires on a real scan. A $project stage pulls each document plus its similarity via
{ $meta: "vectorSearchScore" } and renames _id to a string id (the substitutes feature looks products up
by it). Note what isn’t here: no join, no second query, no app-side re-ranking — the heterogeneous catalog and
the vector index do the work together. That co-location is why MongoDB scores 5 on this build.
The category gap: pre-filter, then fall back to unfiltered
The pre-filter is a win that must not become a silent loss. If Vision labels a photo "footwear" while your
catalog says "sneakers" — or labels a genuinely unstocked item with a real category that simply has no close
product — the category-filtered $vectorSearch can return zero documents, which looks identical to “not in
catalog”. So when the filtered search comes back empty, retry the same search with no filter before
declaring a no-match: a mislabel degrades to “search everything”, and only a truly out-of-catalog photo reaches
the threshold step with nothing. Report which path ran via filterApplied — the category string when the
filtered search produced the matches, or null when you fell back to unfiltered — so the client can show the
spotlight pre-filter at work (and show it stepping aside when it had to).
The $vectorSearch pipeline (v2 driver)
// recognize: descriptor -> queryVector already computed (same model+dims as ingest)
// search runs the aggregation once; filterCategory == "" means no pre-filter.
search := func(filterCategory string) ([]bson.M, error) {
vs := bson.D{
{Key: "index", Value: "products_vec"},
{Key: "path", Value: "embedding"},
{Key: "queryVector", Value: queryVector}, // []float32 length D
{Key: "numCandidates", Value: 100}, // ~20x limit
{Key: "limit", Value: 5},
}
if filterCategory != "" {
vs = append(vs, bson.E{Key: "filter", Value: bson.D{{Key: "category", Value: bson.D{{Key: "$eq", Value: filterCategory}}}}})
}
pipeline := mongo.Pipeline{
bson.D{{Key: "$vectorSearch", Value: vs}},
bson.D{{Key: "$project", Value: bson.D{
{Key: "_id", Value: 0}, // suppress ObjectId so only string id appears
{Key: "id", Value: bson.D{{Key: "$toString", Value: "$_id"}}}, // string id for /substitutes etc.
{Key: "name", Value: 1}, {Key: "brand", Value: 1}, {Key: "category", Value: 1},
{Key: "attributes", Value: 1}, {Key: "price", Value: 1}, {Key: "inStock", Value: 1},
{Key: "score", Value: bson.D{{Key: "$meta", Value: "vectorSearchScore"}}},
}}},
}
cur, err := products.Aggregate(r.Context(), pipeline)
if err != nil { return nil, err }
var out []bson.M
return out, cur.All(r.Context(), &out)
}
// Pre-filter on the DESCRIPTOR'S category (not the optional form value, which is usually empty).
filterApplied := descriptor.Category
matches, err := search(filterApplied)
if err != nil { http.Error(w, err.Error(), 500); return }
if len(matches) == 0 && filterApplied != "" { // category gap: a mislabel degrades to "search everything"
filterApplied = "" // record that we fell back to unfiltered
if matches, err = search(""); err != nil { http.Error(w, err.Error(), 500); return }
}
// matches are ranked best-first, each with a "score" in [0,1] and a string "id"; threshold them next.
// filterApplied is the category string when the pre-filter produced these, or "" (-> null in JSON) on fallback.Agent prompt — paste into an agent with repo access
Before you run it: you photograph a seeded red sneaker. Which value drives the pre-filter — the descriptor's `category` or the (empty) `category` form field? And with Atlas's (1+cosine)/2 mapping, will the right product's `score` land nearer 0.5 or 0.9?
Role: Senior Go engineer integrating Gemini + Atlas Vector Search in this repo.
Context: net/http API; go.mongodb.org/mongo-driver/v2 (mongo, bson, options); products collection has an `embedding` field and a "products_vec" Vector Search index; GEMINI_API_KEY server-side.
Task: Implement POST /recognize: image -> Gemini Vision descriptor (responseSchema) -> embed (same model+dims as ingest) -> $vectorSearch (category pre-filter, with an unfiltered fallback) -> ranked matches with scores.
Requirements:
- Vision call uses inline image data + responseMimeType "application/json" + a responseSchema for {brand, category, colour, form, visibleText, attributes[]}.
- Embed the descriptor text with the SAME model and outputDimensionality used at ingest; do not hard-code a model id — read it from config and link the official docs.
- Build the pipeline: $vectorSearch { index, path:"embedding", queryVector, numCandidates ~20x limit, limit, filter category $eq <descriptor.category> } then $project the fields plus id:{$toString:"$_id"} and score:{$meta:"vectorSearchScore"}.
- Source the pre-filter category from the Vision DESCRIPTOR'S category, not the optional `category` form value (which is normally omitted).
- Category gap: if the category-filtered search returns ZERO docs, retry the SAME search WITHOUT the filter before declaring anything; set filterApplied to the category when the filtered search produced the matches, or null after a fallback.
- Use r.Context() throughout; return JSON { descriptor, filterApplied, matches:[{id, ...product, score}] } ordered best-first.
- Frame results as best-match-by-similarity, never identification.
Tests / acceptance:
- A photo of a seeded product returns that product among the top matches with a string `id` and a score in [0,1]; filterApplied echoes its category.
- A mislabelled or empty-filtered category retries unfiltered (filterApplied:null) and still returns matches, not an error — only the threshold step decides no-match.
Output: a unified diff plus a one-paragraph note on why the match needs no join or second query.What success looks like
curl -F image=@testdata/sneaker.jpg localhost:8080/recognize | jq returns the full envelope: the Vision descriptor, filterApplied:"sneakers" (the descriptor’s category drove the pre-filter), and matches ordered best-first, each with a string id and a score:
{ "descriptor": { "category": "sneakers", "colour": "red", "...": "..." },
"filterApplied": "sneakers",
"matches": [ { "id": "6630…", "name": "Trailblazer Low", "score": 0.88 } ],
"noMatch": false }The right product sits well above 0.5. One aggregation did the match — no join, no second query. If the category-filtered search returned nothing, filterApplied comes back null (the unfiltered fallback ran) but you still get ranked matches — only the threshold step decides no-match.
★ Recognise a photo end to end (TypeScript)
TypeScript AdvancedImplement the same /recognize pipeline with the Node driver: Vision descriptor → embed → one $vectorSearch aggregation, pre-filtered by category, returning ranked matches and scores.
New in this step
ANN search Approximate nearest-neighbour: $vectorSearch finds very likely closest vectors fast instead of scanning every one — the speed/accuracy trade numCandidates tunes.
$project The aggregation stage that selects and reshapes output fields; here it drops _id, adds the string id, and surfaces the score.
$toString Converts the BSON _id to a plain string id in the response, so clients (and the substitutes feature) can reference a product stably.
(1+cosine)/2 score Atlas maps raw cosine [-1,1] into [0,1] as (1 + cosine) / 2, so an unrelated item floors near 0.5, not 0 — which is why a naive low threshold is meaningless.
Same pipeline, Node driver
The shape is identical to the Go path — only the syntax differs. Build the descriptor with a Vision call,
embed it with the same model and dimensions as ingest, and pass the query vector into a $vectorSearch stage
with index, path, queryVector, numCandidates (~20× limit), limit, and a filter pre-filter on the
descriptor’s category. A $project renames _id to a string id and adds
score: { $meta: "vectorSearchScore" }, and aggregate(...).toArray() returns the ranked matches. As in Go,
if the category-filtered search returns nothing, retry it unfiltered before the threshold step sees anything,
and report which path ran via filterApplied. Keeping the two backends behaviourally identical is the parity
lesson: swap the language, the document-plus-vector store still carries the match.
The $vectorSearch pipeline (mongodb driver)
// recognize: queryVector already computed (same model+dims as ingest)
// search runs the aggregation once; pass "" to skip the pre-filter.
async function search(filterCategory: string) {
const pipeline: object[] = [
{
$vectorSearch: {
index: "products_vec",
path: "embedding",
queryVector, // number[] length D
numCandidates: 100, // ~20x limit
limit: 5,
...(filterCategory ? { filter: { category: { $eq: filterCategory } } } : {}),
},
},
{
$project: {
_id: 0, // suppress ObjectId so only string id appears
id: { $toString: "$_id" }, // string id for /substitutes etc.
name: 1, brand: 1, category: 1, attributes: 1, price: 1, inStock: 1,
score: { $meta: "vectorSearchScore" },
},
},
];
return products.aggregate(pipeline).toArray();
}
// Pre-filter on the DESCRIPTOR'S category (not the optional form value, which is usually empty).
let filterApplied: string | null = descriptor.category ?? null;
let matches = await search(filterApplied ?? "");
if (matches.length === 0 && filterApplied) { // category gap: degrade to "search everything"
filterApplied = null; // record that we fell back to unfiltered
matches = await search("");
}
// matches are ranked best-first, each with a string `id` and a numeric `score` in [0,1]; threshold them next.
// filterApplied is the category when the pre-filter produced these, or null after a fallback.Agent prompt — paste into an agent with repo access
You post the SAME sneaker photo to the Go handler and this one. Which fields of the JSON envelope must be identical, and which one (`score`) may differ slightly and why?
Role: Senior TypeScript engineer integrating Gemini + Atlas Vector Search in this repo.
Context: Hono/Node API; official mongodb driver; products collection has `embedding` and a "products_vec" Vector Search index; GEMINI_API_KEY server-side.
Task: Implement POST /recognize matching the Go version one-for-one: image -> Vision descriptor (responseSchema) -> embed -> $vectorSearch (category pre-filter, with an unfiltered fallback) -> ranked matches with scores.
Requirements:
- Vision call uses inline image data + responseMimeType "application/json" + a responseSchema for {brand, category, colour, form, visibleText, attributes[]}.
- Embed with the SAME model + outputDimensionality as ingest; do not hard-code a model id — read it from config and link the official docs.
- Pipeline: $vectorSearch { index, path:"embedding", queryVector, numCandidates ~20x limit, limit, filter category $eq <descriptor.category> } then $project id:{$toString:"$_id"} + fields + score:{$meta:"vectorSearchScore"}; aggregate(...).toArray().
- Source the pre-filter category from the Vision DESCRIPTOR'S category, not the optional `category` form value (which is normally omitted).
- Category gap: if the category-filtered search returns ZERO docs, retry the SAME search WITHOUT the filter before declaring anything; set filterApplied to the category when the filtered search produced the matches, or null after a fallback.
- Return JSON { descriptor, filterApplied, matches:[{id, ...product, score}] } best-first; frame as similarity, not identification.
Tests / acceptance:
- A seeded product's photo appears among the top matches with a string `id` and a score in [0,1]; filterApplied echoes its category; responses match the Go handler.
- A mislabelled or empty-filtered category retries unfiltered (filterApplied:null) and still returns matches, not an error — only the threshold step decides no-match.
Output: a unified diff plus a note on keeping Go and TypeScript responses identical.What success looks like
The same curl -F image=@testdata/sneaker.jpg localhost:8080/recognize returns the same envelope as the Go handler — same field names, same filterApplied:"sneakers", the same product first, noMatch:false. The score may differ by a hair (floating-point in the embed/normalize path), but every field name and the ordering match: swap the language, the document-plus-vector store still carries the match. That parity is exactly what the integration tests later assert.
Apply the confidence threshold or return no match
IntermediateFilter the ranked matches to those whose score clears a threshold T; if none do, return a clean “no confident match” instead of a wrong guess.
New in this step
confidence threshold T The score cutoff that separates a real match from the nearest stranger; results at or above T are shown, below it is a no-match.
no-match as 200 Returning an honest empty result with HTTP 200 (not a 404 or error), so the client can still show what Vision saw — “I looked and found nothing confident” is success, not failure.
The score is a confidence, so make a decision with it
$vectorSearch always returns up to limit nearest neighbours — even for a photo of something not in the
catalog, the “nearest” is just less near. The vectorSearchScore (0 to 1 for cosine, higher is closer) is
how you tell a real match from the nearest stranger. Apply a threshold T: keep matches with a score at or
above T, ranked; if the list is empty, return a structured noMatch so the UI shows “no confident match”
rather than a misleading product. Expose the raw score in the response so the client can show it and so you
can tune T. Carry the descriptor and the filterApplied value from the recognise step into both
branches of the response — the client shows what Vision saw and which pre-filter fired (or that it fell back)
whether or not anything cleared T. This single decision — return ranked confident matches, or admit you
don’t know — is what keeps the feature honest.
Threshold the ranked matches (language-agnostic)
results = vectorSearch(...) # ranked, each with score in [0,1]; filterApplied set by recognise
confident = [ m for m in results if m.score >= T ]
if confident is empty:
return { descriptor, filterApplied, matches: [], noMatch: true } # HTTP 200, an honest "I don't know"
else:
return { descriptor, filterApplied, matches: confident, noMatch: false }
# filterApplied is the category the pre-filter used, or null after an unfiltered fallback.
# expose each match's score so the client can display it and the slider can re-threshold.Agent prompt — paste into an agent with repo access
You photograph an item with NO match in the catalog. With T=0.75 and Atlas's (1+cosine)/2 mapping, roughly what score do the nearest items get — and does the API return a 404, an error, or a 200? What is in the body?
Role: Senior backend engineer in this repo (use the selected backend).
Context: /recognize returns ranked matches each with a `score` in [0,1], plus the Vision `descriptor` and a `filterApplied` value (the pre-filter category, or null after an unfiltered fallback).
Task: Add a confidence threshold T and a no-match path.
Requirements:
- Keep only matches with score >= T (T from config, default e.g. 0.75); preserve best-first order.
- If none clear T, return { descriptor, filterApplied, matches: [], noMatch: true } (HTTP 200, not an error); otherwise { descriptor, filterApplied, matches, noMatch: false }.
- Carry descriptor AND filterApplied into BOTH branches so the client can render them whether or not anything matched.
- Always include each match's numeric score in the JSON so the client can display it and re-threshold locally.
Tests / acceptance:
- A clear in-catalog photo returns at least one match above T with noMatch:false and filterApplied echoing the category.
- A photo of something absent returns matches:[] and noMatch:true, still carrying descriptor + filterApplied (no exception, no wrong product).
Output: a unified diff plus where T is configured.What success looks like
With T=0.75, the in-catalog sneaker photo returns noMatch:false and at least one match whose score ≥ 0.75. A photo of an out-of-catalog item returns an HTTP 200 (not an error) with matches:[] and noMatch:true, still carrying the descriptor and filterApplied — same envelope, empty matches:
{ "descriptor": { "category": "tea", "...": "..." },
"filterApplied": null, "matches": [], "noMatch": true }The honest “I don’t know” is a 200 with the descriptor preserved, never a 404 or a low-confidence wrong product.
Calibrate the threshold: precision, recall, and the no-match UX
IntermediatePick T deliberately by looking at scores for known matches and known non-matches, and design what the learner sees when nothing clears it.
New in this step
precision Of the matches you show, the fraction that are actually right; a high T raises precision but can hide real matches.
recall Of the right matches that exist, the fraction you actually show; a low T raises recall but lets strangers slip in.
precision-recall trade-off Raising T trades recall for precision and vice-versa; you calibrate T to sit in the gap between your known-match and known-stranger score clusters.
Threshold tuning is the real skill here
A threshold trades two errors against each other. Set T too high and real matches fall below it — high
precision, low recall, lots of false “no match”. Set it too low and strangers sneak in — high recall, low
precision, confident-looking wrong answers. Calibrate empirically: score a handful of photos you know are in
the catalog and a handful you know aren’t, and put T in the gap between the two score clusters. Because
cosine scores aren’t absolute across embedding models, re-calibrate if you change the model or dimensions.
Then design the no-match experience as a first-class state, not an error: “No confident match — try a
clearer photo or pick a category.” A slider that lets the learner move T live (the next screens add one)
makes this trade-off something you can see, not just reason about.
The ~0.5 floor, seen in the scores themselves
Score your known and unknown photos and the two clusters separate visibly. In-catalog photos land well above the floor (often ≈0.8+) and cluster together; a genuinely out-of-catalog photo’s nearest stranger lands ≈0.50 — because Atlas maps cosine to (1 + cosine) / 2, so an orthogonal (unrelated) vector floors at (1 + 0) / 2 = 0.5, never 0. The real lesson is that those two clusters are visibly separate, and T=0.75 sits cleanly in the gap — between the floor and the in-catalog band — which is what makes the threshold meaningful. If you swap the embedding model or D, the clusters shift — re-score before trusting the old T.
Capture a photo and show ranked matches (Jetpack Compose)
Jetpack Compose IntermediateBuild the scan screen: capture with CameraX or pick from the gallery, POST the image to /recognize, and render ranked matches with their scores and a live threshold slider.
New in this step
CameraX Android’s camera library for capturing the product photo; pair it with gallery-pick so it also works on a camera-less emulator.
multipart upload Sending the image bytes as multipart/form-data to /recognize from the app — the client side of the upload the backend already accepts.
LazyColumn Compose’s scrolling list that renders only visible rows; used to show the ranked matches, keyed by each match’s id.
Slider The threshold control; dragging it re-filters the already-returned matches client-side, making precision-vs-recall visible.
The payoff screen — and why a slider
This is the “something to see”. Capture a photo (CameraX) or pick one from the gallery, upload it as
multipart/form-data to /recognize, and show the ranked results: each product with its details and a
match score. Add a threshold slider that filters the already-returned list client-side, so dragging it
shows precision versus recall move in real time — matches appear and disappear as the learner raises or lowers
the cutoff. Above the list, render a small descriptor + filter panel: what Vision saw and the
filterApplied value — the category the pre-filter used, or “searched all categories” when the backend fell
back to an unfiltered search — so the spotlight is visible, not hidden in the response JSON. When the confident
list is empty, show the no-confident-match state. Emulator cameras are limited, so ship a few bundled sample
product photos alongside gallery-pick so every scan is free and reproducible.
Post the image, hold the threshold in state (Compose)
data class Match(
val id: String, val name: String, val brand: String,
val category: String, val score: Double,
)
@Composable
fun ScanResults(
descriptor: Descriptor, filterApplied: String?,
matches: List<Match>, threshold: Float, onThreshold: (Float) -> Unit,
) {
Column(Modifier.fillMaxSize().padding(16.dp)) {
// descriptor + filter panel: show the spotlight pre-filter at work
Text("Vision saw: ${descriptor.brand} ${descriptor.colour} ${descriptor.category}")
Text(if (filterApplied != null) "Pre-filtered to: $filterApplied"
else "Searched all categories (no category filter)")
Spacer(Modifier.height(8.dp))
Text("Confidence ≥ ${"%.2f".format(threshold)}")
Slider(value = threshold, onValueChange = onThreshold, valueRange = 0f..1f)
val shown = matches.filter { it.score >= threshold }
if (shown.isEmpty()) {
Text("No confident match — try a clearer photo or pick a category")
} else {
LazyColumn {
items(shown, key = { it.id }) { m ->
ListItem(
headlineContent = { Text("${m.brand} ${m.name}") },
supportingContent = { Text("${m.category} · score ${"%.2f".format(m.score)}") },
)
}
}
}
}
}Agent prompt — paste into an agent with repo access
Role: Android engineer (Kotlin, Jetpack Compose, CameraX) in this repo.
Context: Backend POST /recognize accepts multipart { image, category? } and returns { descriptor, filterApplied, matches:[{id,name,brand,category,attributes,price,inStock,score}], noMatch }. filterApplied is the category the pre-filter used, or null when the backend fell back to an unfiltered search.
Task: Build a scan screen: capture (CameraX) or gallery-pick a photo, upload it, and render ranked matches with a live threshold slider.
Requirements:
- Upload the image as multipart/form-data; show a loading state; keep the API base URL configurable.
- Render each match with brand/name, category, and score, keyed by its `id`; a Slider (0..1) filters the shown matches by score client-side.
- Render a descriptor + filter panel: what Vision saw and filterApplied (the pre-filter category, or "searched all categories" when null) so the spotlight pre-filter is visible.
- Show a clear "no confident match" state when nothing clears the slider or the backend returns noMatch (still show the descriptor + filter panel).
- Bundle 2-3 sample product photos so the flow works without a real camera (emulator-friendly).
Tests / acceptance:
- ViewModel test: raising the slider above all scores yields the no-match state; lowering it reveals matches in best-first order.
Output: a unified diff plus the ViewModel state for image, matches, filterApplied, and threshold.Capture a photo and show ranked matches (Flutter)
Flutter IntermediateBuild the same scan-to-results screen in Flutter: capture or pick an image, POST it to /recognize, and render ranked matches with a threshold slider.
New in this step
image_picker The Flutter plugin for grabbing a photo from the camera or gallery; bundle sample assets too so it runs on any simulator.
MultipartRequest The http package’s way to send the image bytes as multipart/form-data to /recognize — the client side of the upload.
ListView Flutter’s scrolling list, used to render the ranked matches returned by the backend.
Slider The threshold control; its value re-filters the shown matches by score instantly, so precision-vs-recall is visible.
Same flow, Dart
Use image_picker for camera/gallery capture and http (a MultipartRequest) to upload the image to
/recognize. Hold the returned matches and the slider value in state; filter the shown list by score so the
threshold slider re-filters instantly. Render the no-confident-match state when the filtered list is empty.
Bundle a couple of sample asset images so the flow runs on any simulator.
Upload + threshold (Flutter)
Future<List<Match>> recognize(File image, {String? category}) async {
final req = http.MultipartRequest('POST', Uri.parse('$baseUrl/recognize'))
..files.add(await http.MultipartFile.fromPath('image', image.path));
if (category != null) req.fields['category'] = category;
final res = await http.Response.fromStream(await req.send());
final body = jsonDecode(res.body) as Map<String, dynamic>;
return (body['matches'] as List).map((j) => Match.fromJson(j)).toList();
}
// in the widget:
// Slider(value: threshold, min: 0, max: 1, onChanged: setThreshold)
// final shown = matches.where((m) => m.score >= threshold).toList();Agent prompt — paste into an agent with repo access
Role: Flutter engineer (Dart) in this repo.
Context: Backend POST /recognize accepts multipart { image, category? } and returns { descriptor, filterApplied, matches:[{id,name,brand,category,attributes,price,inStock,score}], noMatch }. filterApplied is the category the pre-filter used, or null when the backend fell back to an unfiltered search.
Task: Build a scan screen: capture/gallery-pick (image_picker), upload, and render ranked matches with a live threshold slider.
Requirements:
- Upload via http.MultipartRequest; loading state; configurable base URL.
- Each match shows brand/name, category, score and carries its `id`; a Slider (0..1) filters shown matches by score client-side.
- Render a descriptor + filter panel: what Vision saw and filterApplied (the pre-filter category, or "searched all categories" when null).
- Clear "no confident match" state when nothing clears the slider or the backend returns noMatch (still show the descriptor + filter panel).
- Bundle 2-3 sample asset photos so it runs without a camera.
Tests / acceptance:
- A widget/unit test: raising the slider past all scores shows the no-match state; lowering reveals best-first matches.
Output: a unified diff plus the state model for matches, filterApplied, and threshold.Capture a photo and show ranked matches (SwiftUI)
SwiftUI IntermediateBuild the same scan-to-results screen in SwiftUI: capture or pick an image, upload it to /recognize, and render ranked matches with a threshold slider.
New in this step
PhotosPicker SwiftUI’s photo-selection control for grabbing the product image; bundle sample assets for camera-less simulator runs.
URLSession multipart Building a multipart/form-data body and POSTing the image to /recognize with URLSession — the client side of the upload.
Codable Swift’s JSON decoding protocol; you decode the { matches, noMatch } response straight into typed structs.
Identifiable The protocol that gives each Match a stable id so SwiftUI’s List can track rows — use the server’s id, not a fresh UUID.
Same flow, Swift
Use PhotosPicker (or the camera) to get an image, upload it with URLSession as multipart/form-data, and
decode { matches, noMatch } with Codable. Keep the matches and the threshold in an @Observable model; a
Slider filters the shown matches by score. Show the no-confident-match state when the filtered list is
empty, and bundle a few sample images in the asset catalog for camera-less runs.
Threshold-filtered results (SwiftUI)
struct Match: Codable, Identifiable {
let id: String // the server's string id (from _id) — used by /substitutes, stable across reloads
let name: String; let brand: String; let category: String; let score: Double
}
struct ResultsView: View {
let descriptor: Descriptor
let filterApplied: String?
let matches: [Match]
@State private var threshold = 0.75
var body: some View {
VStack(alignment: .leading) {
// descriptor + filter panel: show the spotlight pre-filter at work
Text("Vision saw: \(descriptor.brand) \(descriptor.colour) \(descriptor.category)")
Text(filterApplied.map { "Pre-filtered to: \($0)" } ?? "Searched all categories (no category filter)")
.foregroundStyle(.secondary)
Text(String(format: "Confidence ≥ %.2f", threshold))
Slider(value: $threshold, in: 0...1)
let shown = matches.filter { $0.score >= threshold }
if shown.isEmpty {
Text("No confident match — try a clearer photo or pick a category")
} else {
List(shown) { m in
VStack(alignment: .leading) {
Text("\(m.brand) \(m.name)")
Text("\(m.category) · score \(String(format: "%.2f", m.score))").foregroundStyle(.secondary)
}
}
}
}
}
}Agent prompt — paste into an agent with repo access
Role: iOS engineer (Swift, SwiftUI, Swift Concurrency) in this repo.
Context: Backend POST /recognize accepts multipart { image, category? } and returns { descriptor, filterApplied, matches:[{id,name,brand,category,attributes,price,inStock,score}], noMatch }. filterApplied is the category the pre-filter used, or null when the backend fell back to an unfiltered search.
Task: Build a scan screen: capture or PhotosPicker an image, upload it, and render ranked matches with a live threshold slider.
Requirements:
- Upload via URLSession multipart/form-data; @MainActor state updates; configurable base URL.
- Decode Match with the server's string `id` (do NOT fabricate a local UUID — the id is needed by /substitutes and keeps List identity stable).
- Each match shows brand/name, category, score; a Slider (0...1) filters shown matches by score client-side.
- Render a descriptor + filter panel: what Vision saw and filterApplied (the pre-filter category, or "searched all categories" when null).
- Clear "no confident match" state when nothing clears the slider or the backend returns noMatch (still show the descriptor + filter panel).
- Bundle 2-3 sample images in the asset catalog for camera-less runs.
Tests / acceptance:
- A unit test on the model: a threshold above all scores yields the no-match state; lowering reveals best-first matches.
Output: a unified diff plus the @Observable model definition.Integration test: known photo matches, unknown returns no-match (Go)
Go IntermediateWrite a Go integration test against Atlas that proves the two outcomes: a known product photo matches above T, and an unknown photo returns no confident match.
New in this step
integration test A test that exercises real components together (here the live Atlas index) rather than stand-ins — a mock can’t prove $vectorSearch, which exists only in Atlas.
t.Skip Marks a test skipped at runtime; skip when MONGODB_URI/GEMINI_API_KEY are unset so CI without infra stays green instead of failing.
Test the real index, not a mock
$vectorSearch only exists in Atlas, so a mock can’t prove the match. Point the test at your M0 cluster (via
MONGODB_URI): seed and embed a known product, post its bundled photo to /recognize, and assert it appears
above T. Then post a photo of something not in the catalog and assert noMatch is true with an empty match
list. Skip cleanly if MONGODB_URI or GEMINI_API_KEY is unset so the suite still runs without infra.
Agent prompt — paste into an agent with repo access
Role: Senior Go engineer in this repo.
Context: /recognize, ingest, and the products_vec index exist; Atlas via MONGODB_URI; GEMINI_API_KEY server-side.
Task: Add integration tests for the recognise pipeline.
Requirements:
- Seed + embed a known product; post its bundled sample photo to /recognize; assert that product is in matches with score >= T and noMatch=false.
- Post a photo of an out-of-catalog item; assert matches is empty and noMatch=true.
- t.Skip when MONGODB_URI or GEMINI_API_KEY is unset; isolate test data (unique collection or cleanup between runs).
Tests / acceptance:
- `go test ./... -run TestRecognize` passes against Atlas.
Output: a unified diff plus how the test isolates its catalog data.What success looks like
go test ./... -run TestRecognize reports ok (or SKIP when MONGODB_URI/GEMINI_API_KEY is unset, so CI without infra stays green). Against the real Atlas index both assertions hold: the known product’s photo is in matches with score >= T and noMatch=false; the out-of-catalog photo yields matches:[] and noMatch=true. A mock could never produce this — $vectorSearch exists only in Atlas, so the test proves the actual index, not a stub.
Integration test: same assertions in the TypeScript stack
TypeScript IntermediateMirror the same two assertions in the TypeScript stack so both backends are proven equivalent: a known photo matches above T, an unknown returns no confident match.
New in this step
integration test A test that exercises real components together (the live Atlas index) rather than stand-ins — a mock can’t prove $vectorSearch, which exists only in Atlas.
node --test Node’s built-in test runner (or vitest); run it against the same Atlas cluster as the Go suite so parity is asserted, not promised.
Parity is a test, not a promise
Run the same scenario with node --test (or vitest): seed and embed a known product, post its sample photo,
assert it lands above T; then post an out-of-catalog photo and assert noMatch. Running both suites against
the same Atlas cluster is what lets Catalens claim, honestly, that the document-plus-vector store — not the
language — carries the match.
Agent prompt — paste into an agent with repo access
Role: Senior TypeScript engineer in this repo.
Context: /recognize, ingest, and the products_vec index exist; Atlas via MONGODB_URI; GEMINI_API_KEY server-side.
Task: Add integration tests matching the Go suite one-for-one.
Requirements:
- Seed + embed a known product; post its bundled sample photo; assert it is in matches with score >= T and noMatch=false.
- Post an out-of-catalog photo; assert matches empty and noMatch=true.
- Skip when MONGODB_URI or GEMINI_API_KEY is unset; isolate test data between runs.
Tests / acceptance:
- `npm test` passes against Atlas; assertions match the Go suite.
Output: a unified diff plus any flake mitigation (e.g. retry while the vector index warms up).What success looks like
npm test passes the same two assertions against the same Atlas cluster: known photo → match with score >= T, noMatch=false; out-of-catalog photo → matches:[], noMatch=true — and it skips cleanly when the env vars are unset. Both suites green against one cluster is what lets Catalens claim, honestly, that the document-plus-vector store, not the language, carries the match.
Optional: deploy to Cloud Run (free) — never required to see it work
IntermediateIf you want a public URL, deploy the stateless API to Cloud Run pointing at your Atlas cluster — but everything above already runs free and locally, so this is optional.
New in this step
Cloud Run A serverless host that runs your API container on demand; it fits a stateless /recognize endpoint and has a free monthly allotment.
scale to zero With no traffic the service runs no instances (and costs nothing); it spins one up on the next request — why this stays free.
Secret Manager A managed store for the Mongo URI and Gemini key, injected at runtime so secrets never live in the image or the repo.
Optional, and free if you do it
Nothing about seeing Catalens work needs the cloud: the API runs locally against Atlas M0, and the mobile
app talks to it. If you want a public endpoint, the recognise API is a stateless container, so Cloud Run fits
— it scales to zero and has a generous free monthly allotment. The data stays in Atlas (M0 is fine); only the
connection string and the GEMINI_API_KEY move into the service config. Keep the key server-side as a managed
secret (Secret Manager), never in the app.
Deploy (optional, free-tier eligible)
gcloud run deploy catalens-api \
--source . \
--set-env-vars MONGODB_URI="$MONGODB_URI",MONGODB_DB=catalens,GEMINI_API_KEY="$GEMINI_API_KEY" \
--region us-central1 --allow-unauthenticated
# For real deployments, store MONGODB_URI and GEMINI_API_KEY in Secret Manager and use --set-secrets instead.Hybrid match: blend vector similarity with text relevance
Optional add-on AdvancedCombine $vectorSearch with Atlas Search text and attribute relevance so an exact brand or visible-text signal sharpens the ranking, not just visual similarity.
New in this step
$search Atlas Search’s full-text stage over a Lucene index; here it ranks by brand/name/visibleText to add a text signal alongside the vector one.
reciprocal rank fusion A way to merge two ranked lists by summing 1 / (k + rank), so items ranked high by either search rise — no hand-tuned weights.
$rankFusion A newer Atlas stage that does reciprocal rank fusion for you; prefer the current official hybrid-search guide over hand-rolling it.
Two signals are better than one
Pure vector similarity can rank a look-alike from the wrong brand above the right product. Hybrid search fixes
that by blending two retrievers: $vectorSearch (visual/semantic similarity) and Atlas Search ($search,
a Lucene full-text index) over brand, name, and visibleText. Run both, then fuse their rankings — the
common technique is reciprocal rank fusion (RRF): each result’s combined score sums 1 / (k + rank)
across the two lists, so items ranked high by either signal rise. Recent Atlas versions expose a
$rankFusion stage that does RRF for you; follow the current official hybrid-search guide rather than
hand-tuning weights, and keep a single confidence cut on the fused score.
Agent prompt — paste into an agent with repo access
Role: Senior backend engineer in this repo (use the selected backend).
Context: /recognize returns vector matches; products have brand, name, visibleText; Atlas supports $search and vector search.
Task: Add a hybrid ranker that fuses $vectorSearch with an Atlas Search text query over brand/name/visibleText.
Requirements:
- Run both retrievers on the descriptor (vector from the embedding, text from brand/name/visibleText).
- Fuse with reciprocal rank fusion (or the Atlas $rankFusion stage if your version has it); follow the official hybrid-search guide.
- Keep a single confidence cut on the fused score; preserve the no-match path.
Tests / acceptance:
- A query where a same-looking wrong-brand item beats the right one under pure vectors ranks the right one first under hybrid.
Output: a unified diff plus the fusion formula and any weights, with the official doc link.Out of stock → visually similar substitutes
Optional add-on IntermediateWhen a matched product is unavailable, run a pure $vectorSearch over the catalog to offer the closest in-stock alternatives.
New in this step
query-by-example vector Using a product’s own stored embedding as the queryVector (no photo, no Gemini call) to ask “what looks like this?” — same index, new question.
inStock filter The filter: { inStock: true } pre-filter (using the index field declared earlier) so substitutes are limited to items a shopper can actually buy.
favour recall Deliberately lowering T here so more close-enough options surface — the opposite of strict identification, where precision wins.
The same vector index, a different question
You already have the machinery; substitutes just ask it a new question. Take the matched (or out-of-stock)
product’s own embedding as the query vector, run $vectorSearch excluding that product, and add an
inStock: true pre-filter so only buyable items come back. The result is “things that look like this, that
you can actually buy”, ranked by similarity. This is where you lower the threshold deliberately: for
substitutes you want recall (show me close-enough options), the opposite of the strict precision you want for
identification. Same index, same stage — a looser cutoff and an availability filter.
Agent prompt — paste into an agent with repo access
Role: Senior backend engineer in this repo (use the selected backend).
Context: products have `embedding` and `inStock`; the products_vec index exists (add inStock as a filter field).
Task: Add GET /products/{id}/substitutes returning visually similar in-stock alternatives.
Requirements:
- Use the product's own embedding as the queryVector; $vectorSearch with filter { inStock: true } and exclude the product itself.
- Use a lower threshold than identification (favour recall); return ranked alternatives with scores.
- Declare inStock as a filter field in the vector index so the pre-filter is valid.
Tests / acceptance:
- For an out-of-stock product, the endpoint returns only in-stock items, ranked by similarity, excluding the original.
Output: a unified diff plus the chosen substitute threshold and why it differs from identification.Recognise multiple products in one shelf photo
Optional add-on AdvancedDetect several products in a single shelf photo and run the recognise pipeline for each, returning a match list per detected item.
New in this step
object detection Finding where each product is in the photo (not just describing one); the shelf flow detects many items, then recognises each.
bounding box The rectangle marking a detected item’s location, returned per item so the UI can label which match belongs to which product on the shelf.
array responseSchema A responseSchema whose top level is an array, so one Vision call returns many item descriptors to fan out through the per-item pipeline.
Fan one photo into many recognitions
A shelf photo contains many products, so the single-product pipeline becomes a fan-out. Two honest approaches:
(1) ask Gemini Vision to return an array of detected items — each with a short descriptor and an
approximate bounding box — using a responseSchema whose top level is an array; or (2) detect regions first,
then run the existing descriptor, embed, and $vectorSearch path per crop. Either way you reuse the same
per-item pipeline, just N times, and return a list of { box, matches[] }. Keep each item’s threshold and
no-match handling exactly as the single-product flow — a shelf with one unknown item should confidently match
the rest and admit the one.
Agent prompt — paste into an agent with repo access
Role: Senior backend engineer in this repo (use the selected backend).
Context: the single-product recognise pipeline (descriptor -> embed -> $vectorSearch -> threshold) exists.
Task: Add POST /recognize/shelf that recognises multiple products in one image.
Requirements:
- Use Gemini Vision with an array responseSchema to detect items (descriptor + approximate bounding box each), OR crop-then-recognise per region.
- Run the existing embed + $vectorSearch + threshold path per detected item; return [{ box, matches:[{...product, score}], noMatch }].
- Reuse the per-item threshold and no-match logic unchanged; cap the number of items to keep latency sane.
Tests / acceptance:
- A photo with 3 seeded products returns 3 entries, each matching the right product above T; an extra unknown item yields a noMatch entry, not a wrong match.
Output: a unified diff plus the per-item fan-out strategy and the item cap.Barcode / label fast-path before vector matching
Optional add-on IntermediateWhen a photo shows a barcode or clear label text, resolve it to an exact catalog product first, and only fall back to vector matching when there is no exact hit.
New in this step
unique index An index on sku/barcode that both enforces no duplicates and makes the exact lookup instant.
exact lookup A normal indexed findOne by the code — deterministic, free of model cost, and unambiguous when a code is present.
fast-path A cheap deterministic route tried first (the code lookup); only on a miss do you fall back to the costlier descriptor → embed → $vectorSearch path.
Exact beats approximate when you have it
Vector matching is for when you don’t have an identifier; when you do, use it. If the photo contains a
barcode or legible label, read it — a barcode-scanning library on-device, or Gemini Vision’s visibleText
from the descriptor you already extract — and look the product up by an exact field (sku or barcode) with
a normal indexed query. An exact hit returns immediately with full confidence and skips the vector path
entirely; only when there is no code, or no exact match, do you fall back to the descriptor, embed, and
$vectorSearch path. This is a classic cheap-path/expensive-path design: the deterministic lookup is faster,
free of model cost, and unambiguous, so it goes first.
Agent prompt — paste into an agent with repo access
Role: Senior backend engineer in this repo (use the selected backend).
Context: products may carry a unique `sku`/`barcode`; the vector recognise pipeline exists; Vision returns visibleText.
Task: Add a barcode/label fast-path to /recognize.
Requirements:
- If the request carries a decoded barcode (from an on-device scanner) or Vision's visibleText yields a code, query products by exact sku/barcode (indexed) first.
- On an exact hit, return it immediately with full confidence and skip the vector pipeline; otherwise fall back to descriptor -> embed -> $vectorSearch -> threshold.
- Add a unique index on the code field; keep the existing no-match path for the fallback.
Tests / acceptance:
- A photo/label with a known code returns the exact product without calling the embedding/vector path.
- An item with no code or an unknown code falls back to vector matching and still thresholds correctly.
Output: a unified diff plus where the fast-path short-circuits the vector pipeline.Keep the index live: watch the catalog with a change stream
Optional add-on AdvancedDecide up front that a product’s embedding is derived data that must follow its source: open a MongoDB change stream on products so the moment a document changes, a background worker can re-derive and rewrite its vector.
New in this step
derived data A value computed from other fields (the embedding from a product’s text); it must be recomputed whenever its inputs change, or it goes stale.
change stream MongoDB’s real-time feed of writes; collection.watch() yields one event per insert/update so a worker can react the moment a product changes.
updateLookup The fullDocument: "updateLookup" option, which attaches the current document to each update event — so the worker has the new text to embed without a second read.
resume token A bookmark carried in each event; persist it and pass it as resumeAfter on restart so the worker continues exactly where it stopped.
Why the index drifts, and what a change stream fixes — and it costs nothing
The embedding you stored at ingest is a snapshot of a product’s description at that moment. Edit the
brand, rename it, add an attribute, swap the photo — and the stored vector now describes a product that no
longer exists, so $vectorSearch quietly matches against stale text. The fix is to treat the embedding as
derived data that must be recomputed whenever its inputs change. A change stream is MongoDB’s
real-time feed of writes: collection.watch() opens a cursor that yields one event per insert/update/delete,
each carrying a resume token (in the event’s _id) so a restarted worker continues exactly where it left
off. Ask for fullDocument: "updateLookup" and each update event also includes the current document, so the
worker has the new text to embed without a second read. The whole loop runs on the data you already
provisioned.
Costs nothing. Atlas M0 is a 3-node replica set, and change streams run on it — the only free-tier
restriction is on database-namespace filters (you filter on fields and collections, which is allowed), so a
worker watching the products collection works on the free tier. Docs:
https://www.mongodb.com/docs/manual/changeStreams/ · M0 limits:
https://www.mongodb.com/docs/atlas/reference/free-shared-limitations/
The two hard parts: no infinite loop, and idempotency
Writing the fresh vector back is itself a write to products — which your change stream will see, which would
trigger another re-embed, forever. Break the loop by watching only the changes that matter: pass a
pipeline to watch() that $matches content edits and ignores the embedding write. Two robust guards, used
together: (1) store a content hash of the embedding text on the document and skip any event whose hash is
unchanged — so rewriting embedding (which doesn’t touch the hash) is a no-op; (2) $match on the event so
embedding-only updates never reach the worker (e.g. require a content field in updateDescription.updatedFields,
or filter operationType). Make the worker idempotent and debounced: a burst of edits to one product
should collapse into a single re-embed of its final state, keyed by _id, so rapid saves don’t fan out into
redundant Gemini calls.
What changes, and what the worker does (language-agnostic)
Watch products with a pipeline that ignores embedding-only writes:
pipeline = [{ $match: {
operationType: { $in: ["insert", "update", "replace"] },
# only react when a CONTENT field changed (not the embedding/hash we write back):
$or: [
{ "updateDescription.updatedFields.name": { $exists: true } },
{ "updateDescription.updatedFields.brand": { $exists: true } },
{ "updateDescription.updatedFields.category": { $exists: true } },
{ "updateDescription.updatedFields.attributes": { $exists: true } },
{ "updateDescription.updatedFields.imageRef": { $exists: true } },
{ operationType: { $in: ["insert", "replace"] } }, # full-doc writes
],
}}]
On each event (fullDocument present via updateLookup):
1. text = embeddingText(doc) # SAME builder as ingest
2. hash = sha256(text)
3. if doc.embeddingHash == hash: skip # idempotent: nothing meaningful changed
4. if the product's IMAGE changed: re-run the Vision descriptor first, fold it into text
5. vector = gemini.embed(text, D) # SAME model + D as ingest/query
6. updateOne({_id}, {$set: {embedding: vector, embeddingHash: hash}}) # does NOT re-trigger
7. persist the event's resume token so a restart continues from hereAgent prompt — paste into an agent with repo access
Role: Database administrator.
Context: MongoDB Atlas M0 reachable via MONGODB_URI; database MONGODB_DB="catalens".
Task: Seed the change-stream worker's state document.
Requirements:
- Upsert one document into a new "_worker_state" collection: { _id: "embeddings-worker", resumeToken: null } so the change-stream feature has its state row from the start.
Tests / acceptance:
- The _worker_state collection has exactly one { _id: "embeddings-worker" } doc.
Output: a unified diff plus the seed command to run it.Run the re-embed worker (Go change stream)
Optional add-on AdvancedBuild a background worker in Go that tails the products change stream with the v2 driver, re-embeds a product when its content changes, and writes the vector back — guarded so its own write never re-triggers it.
New in this step
mongo.ChangeStream What collection.Watch(...) returns; drive it with stream.Next(ctx) + stream.Decode(&event) to read one change at a time.
SetFullDocument(UpdateLookup) The option that makes each update event carry the current fullDocument, so the worker has the new text to embed.
SetResumeAfter Passes a saved resume token on startup so a restarted worker continues without missing or re-doing an edit.
ChangeStream with the v2 driver, updateLookup, and a resume token
collection.Watch(ctx, pipeline, opts) returns a *mongo.ChangeStream; drive it with stream.Next(ctx) +
stream.Decode(&event). Set options.ChangeStream().SetFullDocument(options.UpdateLookup) so each update
event carries the current fullDocument to embed. Read stream.ResumeToken() after each handled event and
persist it (a tiny _worker_state doc); on startup pass it back via SetResumeAfter(token) so a crash
resumes without missing or re-doing work. The $match pipeline (built with bson.D) keeps embedding-only
writes out of the stream, and the content-hash check makes the handler idempotent — together they close the
infinite-loop door. Keep the API key server-side and reuse the same embeddingText builder and dimensions D
as ingest. Go driver change streams: https://www.mongodb.com/docs/drivers/go/current/monitoring-and-logging/change-streams/
Tail the stream, re-embed, write back (v2 driver)
// worker/embeddings.go (essentials) — go.mongodb.org/mongo-driver/v2
match := bson.D{{Key: "$match", Value: bson.D{
{Key: "operationType", Value: bson.D{{Key: "$in", Value: bson.A{"insert", "update", "replace"}}}},
}}}
opts := options.ChangeStream().SetFullDocument(options.UpdateLookup)
if tok := loadResumeToken(ctx, state); tok != nil {
opts.SetResumeAfter(tok) // resume exactly where we stopped
}
stream, err := products.Watch(ctx, mongo.Pipeline{match}, opts)
if err != nil { log.Fatal(err) }
defer stream.Close(ctx)
for stream.Next(ctx) {
var ev struct {
FullDocument struct {
ID bson.ObjectID `bson:"_id"`
EmbeddingHash string `bson:"embeddingHash"`
} `bson:"fullDocument"`
}
if err := stream.Decode(&ev); err != nil { continue }
doc := ev.FullDocument
text := embeddingText(doc) // SAME builder as ingest
if h := sha256Hex(text); h != doc.EmbeddingHash {
vec := embed(ctx, text) // SAME model + dimensions D as ingest/query
_, _ = products.UpdateByID(ctx, doc.ID, bson.D{{Key: "$set",
Value: bson.D{{Key: "embedding", Value: vec}, {Key: "embeddingHash", Value: h}}}})
// writing embedding/embeddingHash does NOT match a content field -> no re-trigger
}
saveResumeToken(ctx, state, stream.ResumeToken()) // persist for restart
}Chat prompt — paste into a chat to get the code
Role: Senior Go + MongoDB engineer building a change-stream worker. The reader has no repo here — return complete code.
Context: Atlas M0 (a replica set; change streams supported); go.mongodb.org/mongo-driver/v2 (mongo, bson, options); products collection with embedding + embeddingHash fields; the SAME embeddingText builder, embedding model, and dimensions D as the ingest step; GEMINI_API_KEY server-side.
Task: Implement a standalone worker that keeps embeddings in sync via a products change stream.
Requirements:
- Open products.Watch with SetFullDocument(options.UpdateLookup); pass a $match pipeline so only content changes (name/brand/category/attributes/imageRef, or insert/replace) produce events — never embedding-only writes.
- For each event: rebuild the embedding text, compute a content hash, and SKIP if it equals the stored embeddingHash (idempotent). If the product's image reference changed, re-run the Gemini Vision descriptor first and fold it into the text.
- Re-embed with the SAME model + dimensions D as ingest; write {embedding, embeddingHash} back with UpdateByID — this write must NOT re-trigger the worker (guarded by the $match + hash).
- Debounce per _id so a burst of edits collapses into one re-embed of the final state.
- Persist stream.ResumeToken() after each handled event and SetResumeAfter it on startup; do NOT hard-code a model id — read it from config and link the official embeddings docs.
Tests / acceptance (describe):
- Updating a product's brand causes exactly one re-embed; the new embedding length is still D.
- The worker's own write-back produces no further re-embed (no infinite loop).
- Killing and restarting the worker resumes from the saved token without missing an edit.
Output: the complete worker, no commentary.Run the re-embed worker (TypeScript change stream)
Optional add-on AdvancedBuild the same worker in TypeScript with the Node driver: collection.watch() with fullDocument: "updateLookup", re-embed on content changes, write the vector back, and guard against re-triggering.
New in this step
async-iterable change stream collection.watch(...) returns a stream you consume with for await (const change of stream), one event per write.
for await The loop that pulls events from the async-iterable stream as they arrive, awaiting each one — the Node analogue of Go’s stream.Next.
resumeAfter / change._id Each event’s _id is its resume token; persist it and pass it as the resumeAfter option on startup so a restart continues cleanly.
Same worker, node driver
The Node path mirrors the Go one. collection.watch(pipeline, { fullDocument: "updateLookup", resumeAfter })
returns an async-iterable change stream; consume it with for await (const change of stream). Each event’s
resume token is change._id (also stream.resumeToken); persist it and pass it as resumeAfter on the next
start. The same $match pipeline filters out embedding-only writes, and the same content-hash guard makes the
handler idempotent — re-embed only when the rebuilt text’s hash differs from the stored embeddingHash. Reuse
the ingest step’s embeddingText builder, model, and dimensions D so the worker and the query stay
comparable. Keeping both workers behaviourally identical is the same parity lesson as the recognise pipeline.
Node driver change streams: https://www.mongodb.com/docs/drivers/node/current/monitoring-and-logging/change-streams/
Tail the stream, re-embed, write back (mongodb driver)
// worker/embeddings.ts (essentials) — official mongodb driver
const pipeline = [{
$match: {
operationType: { $in: ["insert", "update", "replace"] },
$or: [
{ "updateDescription.updatedFields.name": { $exists: true } },
{ "updateDescription.updatedFields.brand": { $exists: true } },
{ "updateDescription.updatedFields.category": { $exists: true } },
{ "updateDescription.updatedFields.attributes": { $exists: true } },
{ "updateDescription.updatedFields.imageRef": { $exists: true } },
{ operationType: { $in: ["insert", "replace"] } },
],
},
}];
const resumeAfter = await loadResumeToken(state); // undefined on first run
const stream = products.watch(pipeline, { fullDocument: "updateLookup", resumeAfter });
for await (const change of stream) {
const doc = (change as any).fullDocument;
if (!doc) continue; // deleted between update & lookup
const text = embeddingText(doc); // SAME builder as ingest
const hash = sha256Hex(text);
if (hash !== doc.embeddingHash) {
const vector = await embed(text); // SAME model + dimensions D
await products.updateOne({ _id: doc._id },
{ $set: { embedding: vector, embeddingHash: hash } }); // does NOT re-trigger (not a content field)
}
await saveResumeToken(state, change._id); // persist for restart
}Chat prompt — paste into a chat to get the code
Role: Senior TypeScript + MongoDB engineer building a change-stream worker. The reader has no repo here — return complete code.
Context: Atlas M0 (a replica set; change streams supported); official mongodb driver; products collection with embedding + embeddingHash; the SAME embeddingText builder, model, and dimensions D as ingest; GEMINI_API_KEY server-side.
Task: Implement a standalone worker that keeps embeddings in sync via a products change stream, matching the Go worker one-for-one.
Requirements:
- collection.watch(pipeline, { fullDocument: "updateLookup", resumeAfter }); the $match pipeline lets only content changes (name/brand/category/attributes/imageRef, or insert/replace) through — never embedding-only writes.
- Consume with for-await; for each event rebuild the embedding text, compute a content hash, and SKIP if it equals the stored embeddingHash. If the image reference changed, re-run the Gemini Vision descriptor first.
- Re-embed with the SAME model + dimensions D as ingest; updateOne the {embedding, embeddingHash} back — this write must NOT re-trigger the worker.
- Debounce per _id; persist change._id (resume token) after each handled event and pass it as resumeAfter on startup.
- Do NOT hard-code a model id — read it from config and link the official embeddings docs.
Tests / acceptance (describe):
- Editing brand triggers exactly one re-embed; new embedding length is D; the write-back causes no further event.
- Restart resumes from the saved token; responses/behaviour match the Go worker.
Output: the complete worker, no commentary.Tune numCandidates: trade latency against recall
Optional add-on AdvancedAtlas Vector Search is approximate (HNSW), so it doesn’t examine every vector. Learn the one knob that governs that approximation — numCandidates — and what raising or lowering it costs you.
New in this step
HNSW The graph index Atlas walks for approximate search; it visits a bounded pool of vectors instead of all of them, which is why numCandidates matters.
ENN (exact:true) Exact nearest-neighbour: setting exact: true scans every vector (perfect recall, slowest) — a ground-truth baseline to measure your ANN recall against on a small catalog.
What numCandidates actually controls
$vectorSearch runs approximate nearest-neighbour (ANN) search over an HNSW graph index: instead of
comparing your query against every product vector, it walks the graph and considers a bounded pool of
candidates, then returns the best limit of them after applying any filter. numCandidates is the size
of that pool. Bigger pool → the search explores more of the graph → it’s more likely to find the true nearest
neighbours (higher recall) but does more work (higher latency). Smaller pool → faster, but it may miss
a real match that the graph walk never visited. numCandidates is required for ANN search and applies only
to ANN — set exact: true and you get exhaustive ENN (perfect recall, no numCandidates, slowest),
which is mainly useful as a ground-truth baseline to measure your ANN recall against on a small catalog.
The official guidance: set numCandidates at least 20× your limit (e.g. limit: 5 → numCandidates: 100) as a starting point, and it must be ≥ limit. Docs:
https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/
A note on the pre-filter
The category filter from the ★ recognise step interacts with this knob: Atlas applies the filter while
traversing the graph, so an aggressive filter over a small catalog can leave fewer than numCandidates
qualifying vectors — exactly when bumping numCandidates (or, for a tiny in-category set, exact: true)
keeps recall honest. The query call is otherwise unchanged from the recognise step, so this knob lives in the
same $vectorSearch stage in both backends — only the integer differs (Go: numCandidates in the
bson.D; TypeScript: numCandidates in the stage object).
The knob in the recognise pipeline (both backends)
# This is the SAME $vectorSearch stage from the ★ recognise step — only numCandidates is the variable.
$vectorSearch:
index: "products_vec"
path: "embedding"
queryVector: <query vector, length D>
limit: 5
numCandidates: 100 # start at ~20x limit; raise for recall, lower for latency
filter: { category: { $eq: hint } } # applied during the graph walk
# Ground-truth baseline for measuring recall (small catalog only):
$vectorSearch:
index: "products_vec" path: "embedding" queryVector: <...> limit: 5
exact: true # exhaustive ENN — no numCandidates; use to score ANN recall againstAgent prompt — paste into an agent with repo access
Role: Senior backend engineer in this repo (use the selected backend).
Context: /recognize runs a $vectorSearch over products_vec; numCandidates is currently a constant ~20x limit. Atlas Vector Search is ANN over HNSW; exact:true gives exhaustive ENN.
Task: Make numCandidates configurable and add a small recall-vs-latency harness so a value is chosen with evidence, not by guessing.
Requirements:
- Read numCandidates from config (default ~20x limit); validate it is >= limit; keep the existing filter + threshold behaviour.
- Add a benchmark over a fixed set of query photos that, for each numCandidates in a sweep (e.g. 25, 50, 100, 200, 400), records mean/p95 latency and recall@limit measured against an exact:true (ENN) baseline on the SAME queries.
- Print a small table (numCandidates, p95 latency ms, recall@limit) and recommend the smallest numCandidates whose recall is within a tolerance of the ENN baseline.
- Do not invent index fields; rely only on documented $vectorSearch fields (index, path, queryVector, numCandidates, limit, filter, exact).
Tests / acceptance:
- The harness runs against Atlas and emits the table; recall is non-decreasing as numCandidates rises and reaches ~1.0 by the ENN baseline.
- numCandidates < limit is rejected by config validation.
Output: a unified diff plus the chosen numCandidates and the recall/latency point that justifies it.Measure recall against an exact baseline and pick a value
Optional add-on IntermediateTurn the tuning into a decision: compare your approximate results to an exact (ENN) ground truth on the same queries, then choose the smallest numCandidates that holds the recall you need.
New in this step
recall@limit Per query, the fraction of the exact top-limit results your approximate search also returned; average it over several photos to get a number you can trust.
the knee The point where raising numCandidates stops buying much recall but keeps adding latency — the smallest value within your tolerance, and where you stop.
How to measure, not guess
Recall here is concrete: for each test query, run the ANN search and an exact: true ENN search with the same
limit, and recall@limit is the fraction of the ENN top-limit that your ANN result also returned. Average
it over a handful of representative query photos. Sweep numCandidates (say 25 → 400) and watch two curves:
recall rises and flattens toward 1.0, while p95 latency rises roughly with the candidate pool. The sweet spot
is the knee — the smallest numCandidates whose recall is within your tolerance of the ENN baseline (for
a precision-sensitive shopping match, aim high). Because ENN is O(n), use it only as a measurement tool on
a small catalog, never as the production path. Re-run the sweep if you change the embedding model, dimensions,
or the catalog’s size — the knee moves with all three.
Turn no-matches into demand data
Optional add-on IntermediateEvery “no confident match” is a customer asking for something you don’t stock. Decide to log those misses — the descriptor and the near-miss scores — to a separate analytics collection instead of throwing them away.
New in this step
separate collection Writing misses to search_misses, not products, so analytics writes never touch the catalog the recognise path reads.
privacy-aware logging Storing only the descriptor and scores (what was wanted), never the raw photo (who asked) — capture the demand signal without holding personal images.
The signal hiding in your failures — captured privately
The recognise pipeline already computes everything a demand signal needs: the Gemini Vision descriptor
(what the customer photographed) and the scores of the nearest products (how close you came). When every
score falls below the threshold T, that’s not just a UX dead-end — it’s a data point: someone wanted this,
and you couldn’t match it. Write each miss to a dedicated search_misses collection: the descriptor fields
(category, brand, colour, form, visibleText, attributes), the top near-miss {name, score}
entries, and a timestamp. Over time that collection is a real-time ledger of unmet demand, far richer than a
restock guess. Keep it privacy-aware: store the descriptor and scores, not the raw user image, by default
— you’re logging what was wanted, not who asked. (If you ever need images for QA, make it opt-in and
short-retention; the descriptor alone drives the analytics.) Use a separate collection so analytics writes
never touch the catalog the recognise path reads.
The miss document (shape)
// search_misses collection — one doc per "no confident match"
{
at: ISODate("2026-06-20T10:21:00Z"),
descriptor: { category: "sneakers", brand: "Unknown", colour: "teal",
form: "high-top", visibleText: "", attributes: ["canvas", "striped"] },
nearMisses: [ { name: "Trailblazer Low", score: 0.62 },
{ name: "Court Classic", score: 0.58 } ], // best below-T scores
threshold: 0.75
// NOTE: no raw image stored — descriptor + scores only (privacy-aware default)
}Write the miss in the recognise path (Go)
Optional add-on IntermediateIn the Go /recognize handler, when nothing clears T, insert a miss document into search_misses before returning the clean no-match — fire-and-forget so analytics never slow or break the response.
New in this step
fire-and-forget Kicking off the analytics insert and returning the response immediately; the customer’s no-match is never delayed by, or dependent on, the write.
goroutine A lightweight concurrent function (go func(){...}()); it runs the insert off the request path with its own background context.
InsertOne The driver call that writes one miss document into the search_misses collection; log its error, never return it to the client.
Log the miss without coupling it to the response
You already detect the no-match case (the confident list is empty). At that point, build the miss document
from the descriptor you computed and the below-T matches you ranked, and InsertOne it into a separate
search_misses collection. Do it without blocking or failing the request: the customer still gets their
honest “no confident match” even if the analytics write errors, so log-and-ignore the insert error (or hand it
to a small buffered channel / goroutine). Reuse the existing Mongo client — just a different collection handle
— and never store the raw image. This is the only new code on the hot path; the ranking that consumes it is
common and comes next.
Insert on no-match (v2 driver)
// inside POST /recognize, after thresholding, when confident is empty:
if len(confident) == 0 {
miss := bson.D{
{Key: "at", Value: time.Now()},
{Key: "descriptor", Value: descriptor}, // the typed Vision descriptor
{Key: "nearMisses", Value: topNearMisses(matches, 3)}, // [{name, score}] below T
{Key: "threshold", Value: threshold},
}
go func() { // fire-and-forget: analytics must never break the response
if _, err := misses.InsertOne(context.Background(), miss); err != nil {
slog.Error("search_miss insert failed", "err", err)
}
}()
// the no-match branch still carries descriptor + filterApplied (the same shape as a match response)
writeJSON(w, recognizeResp{Descriptor: descriptor, FilterApplied: filterApplied, Matches: nil, NoMatch: true})
return
}Agent prompt — paste into an agent with repo access
Role: Senior Go engineer in this repo.
Context: POST /recognize computes a Vision descriptor and ranked matches with scores, then applies threshold T; the no-match branch returns { descriptor, filterApplied, matches: [], noMatch: true }. Mongo client is available; add a "search_misses" collection handle.
Task: On the no-match branch, record a miss document to search_misses without affecting the response.
Requirements:
- Build the miss from the descriptor (category/brand/colour/form/visibleText/attributes), the top 3 below-T {name, score} near-misses, the threshold, and a timestamp.
- Insert into a SEPARATE "search_misses" collection; the recognise path's catalog read is untouched.
- The write is fire-and-forget: an insert error is logged, never returned to the client; the no-match JSON is identical to before.
- Privacy-aware: do NOT store the raw uploaded image — descriptor + scores only.
Tests / acceptance:
- An out-of-catalog photo yields noMatch:true AND one new search_misses doc with the descriptor and below-T scores; no raw image field is present.
- Simulating an insert failure still returns the normal noMatch response.
Output: a unified diff plus where the analytics write is decoupled from the response.Write the miss in the recognise path (TypeScript)
Optional add-on IntermediateMirror the miss-logging in the TypeScript /recognize handler: on the no-match branch, insert the descriptor and near-miss scores into search_misses without awaiting it on the response path.
New in this step
fire-and-forget Starting the insert without await, so the no-match JSON returns immediately, independent of the write’s outcome.
void with .catch void promise.catch(log) deliberately ignores the result while still handling a rejection, so an analytics failure is logged, never thrown onto the response.
insertOne The driver call that writes one miss document into search_misses; identical in shape to the Go handler’s write for parity.
Same write, node driver
The TypeScript path matches the Go one. In the no-match branch, assemble the same miss document — descriptor,
top below-T {name, score} near-misses, threshold, timestamp — and insertOne it into a separate
search_misses collection handle. Don’t await it on the response path (or wrap it so a rejection is caught
and logged): the no-match JSON returns regardless. Reuse the existing MongoClient, and store no raw image.
Keeping both writes identical means the demand ledger is the same whichever backend is deployed — the same
parity lesson as the recognise pipeline.
Insert on no-match (mongodb driver)
// inside POST /recognize, after thresholding, when confident.length === 0:
if (confident.length === 0) {
const miss = {
at: new Date(),
descriptor, // the typed Vision descriptor
nearMisses: matches.slice(0, 3).map(m => ({ name: m.name, score: m.score })),
threshold,
};
// fire-and-forget: analytics must never break the response
void misses.insertOne(miss).catch(err => console.error("search_miss insert failed", err));
// the no-match branch still carries descriptor + filterApplied (the same shape as a match response)
return c.json({ descriptor, filterApplied, matches: [], noMatch: true });
}Agent prompt — paste into an agent with repo access
Role: Senior TypeScript engineer in this repo.
Context: POST /recognize computes a descriptor and ranked scored matches, applies threshold T, and returns { descriptor, filterApplied, matches: [], noMatch: true } on a miss. The mongodb client is available; add a "search_misses" collection handle.
Task: On the no-match branch, record a miss document to search_misses, matching the Go handler.
Requirements:
- Build the miss from the descriptor, the top 3 below-T {name, score} near-misses, the threshold, and a timestamp.
- insertOne into a SEPARATE "search_misses" collection; do not touch the catalog read.
- Do not await on the response path (or catch+log the rejection): the no-match JSON is returned regardless of the insert outcome.
- Privacy-aware: store no raw image — descriptor + scores only.
Tests / acceptance:
- An out-of-catalog photo returns noMatch:true and creates one search_misses doc with descriptor + below-T scores; no image field; behaviour matches the Go handler.
- An insert rejection does not change the response.
Output: a unified diff plus a note on keeping the Go and TS miss documents identical.Rank the most-wanted unstocked items
Optional add-on IntermediateAggregate search_misses into a leaderboard: group the misses by what was wanted (category, brand) and rank by how often it’s been requested — a real-time demand report.
New in this step
aggregation pipeline A sequence of stages MongoDB runs over a collection to reshape and summarise it — here it turns raw misses into a demand report without app code.
$group Buckets documents by a key (category + brand) and computes per-bucket values like a request count and an average near-miss score.
$sort Orders the grouped results — here by request count descending, so the most-wanted unstocked items come first.
$limit Caps the output to a top-N (e.g. 20), so the leaderboard stays a short, actionable list.
From a ledger of misses to a buying signal
A pile of miss documents only becomes useful when you summarise it. An aggregation pipeline does the whole
job in the database: $group the misses by the descriptor fields that define “the same kind of thing”
(descriptor.category plus descriptor.brand, optionally colour/form), $count each group, $sort
descending, and $limit to a top-N. Add the average best near-miss score per group and you also learn how
close you came — a group with high demand and high near-miss scores is a strong “stock something almost like
this” signal, while high demand with low scores is genuinely new. This ranking is common to both backends
(it’s one aggregation, no language-specific logic), and because it reads only search_misses it never burdens
the recognise path. Surface it on an internal GET /analytics/top-misses endpoint or a scheduled report.
Most-requested unstocked items (aggregation)
db.search_misses.aggregate([
// optional: { $match: { at: { $gte: ISODate("2026-06-01") } } }, // a time window
{ $group: {
_id: { category: "$descriptor.category", brand: "$descriptor.brand" },
requests: { $sum: 1 },
avgNearMiss: { $avg: { $max: "$nearMisses.score" } }, // how close we got
lastRequested: { $max: "$at" },
}},
{ $sort: { requests: -1 } },
{ $limit: 20 },
// flatten to the contract's shape: { category, brand, requests, avgNearMiss, lastRequested }
{ $project: {
_id: 0,
category: "$_id.category", brand: "$_id.brand",
requests: 1, avgNearMiss: 1, lastRequested: 1,
}},
])Agent prompt — paste into an agent with repo access
Role: Senior backend engineer in this repo (use the selected backend).
Context: a search_misses collection holds { at, descriptor, nearMisses:[{name,score}], threshold } documents.
Task: Add GET /analytics/top-misses returning the most-requested unstocked items.
Requirements:
- Aggregate search_misses: $group by descriptor.category + descriptor.brand; count requests; compute avg of each doc's best near-miss score; track lastRequested.
- $sort by requests desc, $limit to a top-N (default 20); accept an optional ?since date that adds a $match window.
- $project the grouped result to the FLAT contract shape { category, brand, requests, avgNearMiss, lastRequested } (lift category/brand out of the _id group key; drop _id) — do not return a nested _id object.
- Read ONLY search_misses (never the catalog); this ranking is identical across backends — keep it as one aggregation.
- Frame the output as demand signal: high requests + high avg near-miss = "stock something close"; high requests + low scores = "genuinely new".
Tests / acceptance:
- Seeding misses for the same (category, brand) several times ranks that pair at the top with the right count, each entry exposing flat category + brand fields (no nested _id).
- The ?since filter excludes older misses.
Output: a unified diff plus the aggregation and the endpoint shape.Where to take it next
- Go deep on the store that carries this whole build: the MongoDB track — documents, aggregation, and Atlas Search / Vector Search.
- Sharpen the backend you chose: Go or TypeScript.
- Build the recognise UI on another platform: Jetpack Compose, Flutter, or SwiftUI.
- See the database contrast on the Compare page: MongoDB scores 5 here and Postgres 2 — the exact inverse of Aurora Commerce, where a transactional checkout makes Postgres the 5. And unlike Helix Assistant’s text RAG, Catalens matches a photo to product records, not a prose answer.