MongoDB stores data as BSON documents inside collections — close to the JSON your application already
speaks — and lets the shape evolve without a migration. This track moves from the first insertOne to
compound indexes, the aggregation pipeline, $lookup, replica sets, transactions (and when to design them
away), and Atlas Search — tagged by level so you can read only as deep as you need.
Run MongoDB and open the shell
BeginnerStart a local MongoDB instance with Docker, then connect to it with mongosh.
Why mongosh, and what a connection string is
mongosh is the official MongoDB Shell — a JavaScript REPL connected to a running server. You point it at
a connection string (mongodb://host:port), and from there you select a database with use, then act on
collections (the document equivalent of tables). Nothing is created until you write the first document;
databases and collections come into existence on first insert. A managed alternative is MongoDB Atlas,
where the same connection string is mongodb+srv://... and you skip running the server yourself.
Start a server and connect
# A throwaway local server on the default port 27017
docker run -d --name mongo -p 27017:27017 mongo:7
# Connect with the official shell
mongosh "mongodb://localhost:27017"
# inside mongosh:
# use tidal // switch to (and lazily create) the "tidal" database
# db.getName() // -> tidalInsert and read your first documents
BeginnerUse insertOne / insertMany to write documents, then read them back with find.
Documents, _id, and the implicit schema
A document is a BSON object — JSON plus extra types like ObjectId, Date, and 64-bit integers. Every
document gets a unique _id; if you omit it, MongoDB generates an ObjectId (a 12-byte, time-ordered
identifier). Two documents in the same collection need not share the same fields — the schema is implicit
and per-document, which is exactly what makes early iteration fast. find() returns a cursor; pass a filter
object to narrow it, and a second object to project only the fields you want.
Write and read documents
// mongosh, database "tidal"
db.posts.insertOne({
author: "ada",
text: "first post",
tags: ["intro", "hello"],
likes: 0,
createdAt: new Date()
})
db.posts.insertMany([
{ author: "ada", text: "second", tags: ["update"], likes: 3, createdAt: new Date() },
{ author: "lin", text: "hi all", tags: ["intro"], likes: 1, createdAt: new Date() }
])
// filter + projection: posts by ada, show only text and likes
db.posts.find({ author: "ada" }, { text: 1, likes: 1, _id: 0 })Chat prompt — paste into a chat to get the code
Role: MongoDB teacher. The reader has no database in front of them — return complete, runnable mongosh script.
Task: Show a single mongosh script that seeds a "posts" collection and runs three reads.
Requirements:
- Insert at least four post documents with fields: author (string), text (string), tags (string array),
likes (int), createdAt (Date). Use insertMany.
- Read 1: all posts by one author, projecting only text and likes (suppress _id).
- Read 2: posts that contain a given tag (filter on the array field directly).
- Read 3: posts with likes greater than or equal to 2, sorted by likes descending.
Tests / acceptance (describe, since no live DB):
- Read 1 returns documents without an _id field.
- Read 2 matches because querying an array field matches any element.
- Read 3 is ordered highest-likes first.
Output: the complete mongosh script, no commentary.Update and delete by filter
BeginnerChange documents with updateOne / updateMany and the update operators, and remove them with deleteOne.
Update operators and atomic counters
Updates use operators, not a replacement object: $set assigns fields, $inc adds to a number,
$push appends to an array, $pull removes from one. A single-document update is atomic, so $inc is a
safe, race-free counter without a transaction — two concurrent likes both land. updateOne touches the
first match; updateMany touches all. Add { upsert: true } to insert the document when no match exists.
Mutate documents
// atomic like counter — no read-modify-write race
db.posts.updateOne({ _id: someId }, { $inc: { likes: 1 } })
// add a tag only if absent, and set an edited flag
db.posts.updateOne(
{ _id: someId },
{ $addToSet: { tags: "edited" }, $set: { editedAt: new Date() } }
)
// upsert: update if present, insert if not
db.profiles.updateOne(
{ author: "ada" },
{ $set: { displayName: "Ada L." }, $setOnInsert: { createdAt: new Date() } },
{ upsert: true }
)
db.posts.deleteOne({ _id: someId })Agent prompt — paste into an agent with repo access
Role: Senior backend engineer working in a Node.js service that uses the official mongodb driver.
Context: A "posts" collection exists. The driver is `mongodb` (v6). A db handle is available as `db`.
Task: Implement and unit-test a likePost(postId) function and a editPost(postId, text) function.
Requirements:
- likePost uses updateOne with $inc on "likes" and $set on "updatedAt"; it must be a single atomic update
(no read-then-write).
- editPost uses $set on "text" and "updatedAt"; both return the modifiedCount.
- Use the BSON ObjectId type to coerce the string id; throw on an invalid id.
Tests / acceptance:
- Use mongodb-memory-server (in-memory MongoDB) so tests need no external DB.
- Test: two sequential likePost calls leave likes incremented by exactly 2.
- Test: editPost on a non-existent id returns modifiedCount 0.
- `npm test` passes.
Output: a unified diff plus a one-paragraph note on why $inc avoids a lost-update race.Model a feed by embedding instead of joining
IntermediateStore a post and its small, bounded sub-data (author summary, top tags) inside one document so a feed read is a single query.
Embed vs reference — the core modelling decision
The document model rewards embedding data you read together. A feed post that carries a denormalised
author summary ({ id, displayName, avatarUrl }) and its tags renders from one find — no join, no second
round trip. The trade-off is duplication: if a display name changes you update it in many places. The rule
of thumb is embed bounded, read-together data; reference (store an id) for unbounded or independently
mutated data — comments that can grow forever, or an author record edited on its own. This is the opposite
instinct from relational normalisation, and it is deliberate: you optimise for the read shape.
An embedded feed document
db.posts.insertOne({
text: "shipped the feed",
author: { id: "ada", displayName: "Ada L.", avatarUrl: "/a/ada.png" }, // embedded summary
tags: ["release", "feed"],
likes: 0,
commentCount: 0, // a counter, not the comments themselves
createdAt: new Date()
})
// one query renders a feed card — no join needed
db.posts.find(
{ tags: "feed" },
{ text: 1, "author.displayName": 1, likes: 1, createdAt: 1 }
).sort({ createdAt: -1 }).limit(20)Index the queries your feed actually runs
IntermediateCreate single-field and compound indexes that match your real filters and sort order, then confirm they’re used with explain.
Compound index order follows your query (ESR)
Without an index, a query scans every document. An index is an ordered B-tree MongoDB walks instead. For
compound indexes, field order matters: follow Equality, Sort, Range — put fields you match exactly
first, then the field you sort on, then range fields. A feed query that filters author.id and sorts by
createdAt wants { "author.id": 1, createdAt: -1 }. An index on an array field (like tags) is a
multikey index automatically — one entry per element. Verify with explain("executionStats"): you want
an IXSCAN stage, not a COLLSCAN.
Create indexes and check the plan
// compound index supporting filter-by-author + sort-by-time
db.posts.createIndex({ "author.id": 1, createdAt: -1 })
// multikey index on the tags array (created the same way)
db.posts.createIndex({ tags: 1 })
// did the planner use it? look for stage "IXSCAN", totalDocsExamined low
db.posts.find({ "author.id": "ada" })
.sort({ createdAt: -1 })
.explain("executionStats")Agent prompt — paste into an agent with repo access
Role: Senior backend engineer optimising a MongoDB-backed feed service.
Context: A "posts" collection holds documents with fields author.id (string), tags (string array),
createdAt (Date), likes (int). The hot query filters by author.id and sorts by createdAt descending; a
second query filters by a single tag and sorts by createdAt descending.
Task: Add the minimal set of indexes to make both queries use an index scan, and add a script that proves it.
Requirements:
- Choose compound vs single-field per the Equality-Sort-Range rule; justify each in a code comment.
- Do not create redundant indexes (an index whose prefix another already covers).
- Provide a seed script (>= 2000 posts) and an explain() check for each hot query.
Tests / acceptance:
- For each hot query, explain("executionStats") shows winningPlan stage IXSCAN (not COLLSCAN).
- totalKeysExamined is close to nReturned; totalDocsExamined is not a full-collection scan.
Output: a unified diff (index creation + seed + explain script) plus a short note on the index choices.Aggregate with the pipeline: $match, $group, $project
IntermediateBuild an aggregation pipeline that filters, groups, and reshapes documents to compute summaries in the database.
The pipeline is your in-database analytics engine
The aggregation pipeline is an array of stages; each consumes the previous stage’s documents and emits
new ones. $match filters early (put it first so an index can help and less data flows downstream),
$group collapses documents by a key and accumulates ($sum, $avg, $max), and $project reshapes the
output — add, rename, or drop fields. This is how you compute “likes per author this week” or “top tags”
without pulling raw documents into the application and looping in code.
Likes per author, last 7 days
db.posts.aggregate([
// 1. filter early so the rest of the pipeline sees less data
{ $match: { createdAt: { $gte: new Date(Date.now() - 7 * 864e5) } } },
// 2. collapse by author, accumulate totals
{ $group: {
_id: "$author.id",
totalLikes: { $sum: "$likes" },
posts: { $sum: 1 }
} },
// 3. reshape: keep totalLikes and posts, rename _id to author
{ $project: { _id: 0, author: "$_id", totalLikes: 1, posts: 1 } },
{ $sort: { totalLikes: -1 } },
{ $limit: 10 }
])Chat prompt — paste into a chat to get the code
Role: MongoDB teacher. The reader has no database — return a complete, runnable mongosh aggregation.
Task: Write an aggregation that produces the top 5 tags by usage across all posts in the "posts" collection.
Requirements:
- Posts have a "tags" string array. Unwind the array so each tag becomes its own document ($unwind).
- Group by tag, count occurrences with $sum: 1.
- Project to { tag, count }, sort by count descending, limit to 5.
Tests / acceptance (describe, since no live DB):
- $unwind produces one document per (post, tag) pair.
- A tag appearing in N posts ends with count N.
- Output documents have exactly the fields tag and count.
Output: the complete aggregate([...]) call, no commentary.Join across collections with $lookup
IntermediateWhen you must reference rather than embed, pull related documents in with a $lookup stage.
$lookup is a left outer join — use it deliberately
$lookup performs a left outer join between collections inside an aggregation: for each input document it
finds matching documents in another collection and attaches them as an array field. It is the escape hatch
for the data you chose to reference — fetching a post’s comments, or hydrating an author record kept in its
own collection. It works, and a $lookup on an indexed localField/foreignField is efficient, but if you
find yourself joining on every read, that is a signal your model may want more embedding. Joins are not
MongoDB’s strong suit the way they are PostgreSQL’s; reach for $lookup, don’t lean on it.
Hydrate posts with their comments
db.posts.aggregate([
{ $match: { "author.id": "ada" } },
{ $lookup: {
from: "comments", // the other collection
localField: "_id", // posts._id
foreignField: "postId", // comments.postId
as: "comments" // attached as an array
} },
{ $project: {
text: 1,
commentCount: { $size: "$comments" } // count without returning bodies
} }
])Agent prompt — paste into an agent with repo access
Role: Senior backend engineer working in a MongoDB-backed service (mongodb driver v6, Node.js).
Context: Two collections — "posts" (with _id) and "comments" (each has postId referencing posts._id,
plus text and createdAt). comments.postId is indexed.
Task: Implement getPostWithComments(postId) returning the post plus its comments sorted oldest-first,
using a single aggregation with $lookup.
Requirements:
- Use the $lookup "pipeline" form (or a sub-$sort) so comments come back sorted by createdAt ascending.
- Return null when the post does not exist (empty aggregation result).
- Coerce the string id to ObjectId; throw on an invalid id.
Tests / acceptance:
- Use mongodb-memory-server so no external DB is required.
- Seed one post with three comments inserted out of order; assert the returned comments are time-ordered.
- Seed a post with zero comments; assert comments is an empty array, not undefined.
- `npm test` passes.
Output: a unified diff plus one sentence on when you'd embed comments instead of joining.Run a replica set and read your own writes
AdvancedRun MongoDB as a replica set, and use read/write concerns to control durability and consistency.
Replica sets, elections, and concerns
Production MongoDB runs as a replica set: one primary takes writes, secondaries replicate the oplog, and
if the primary fails the members elect a new one — that is how you get high availability and automatic
failover. You tune the guarantees per operation. A write concern of { w: "majority" } waits until a
majority of members have the write, so it survives a failover; { w: 1 } returns after just the primary.
A read concern of "majority" returns only data acknowledged by a majority (no reading writes that
could be rolled back). Multi-document transactions (next step) require a replica set — another reason it
is the realistic baseline, not a single mongod.
Start a one-node replica set
# start mongod as a replica set member
docker run -d --name mongo-rs -p 27017:27017 mongo:7 \
mongod --replSet rs0 --bind_ip_all
# initialise the set (one-time)
mongosh "mongodb://localhost:27017" --eval 'rs.initiate()'Tune write and read concerns
// write that survives failover: wait for a majority to acknowledge
db.posts.insertOne(
{ text: "durable", createdAt: new Date() },
{ writeConcern: { w: "majority" } }
)
// read only majority-committed data
db.posts.find({ text: "durable" }).readConcern("majority")Use a multi-document transaction — and design to avoid it
AdvancedWhen one logical change must touch several documents atomically, wrap it in a transaction — but first ask whether the model could make it a single-document update instead.
Transactions exist; the model shines when you don't need them
MongoDB has full multi-document ACID transactions (on a replica set): start a session, do several writes,
commitTransaction — all or nothing. They are real and correct. But they carry cost and contention, and
the document model is designed so you often don’t need them: keep the data that must change together in
one document, and a single-document update is already atomic. A like, a comment-count bump, a status flip —
all one updateOne. Reserve transactions for the genuine cross-document invariant (move an item between two
users’ collections). If you reach for transactions on every write, that is the signal a relational database
with row-level transactions — PostgreSQL — may fit the problem better. Honesty here saves you pain later.
A transaction across two collections
const session = db.getMongo().startSession()
session.startTransaction({ writeConcern: { w: "majority" } })
try {
const posts = session.getDatabase("tidal").posts
const audit = session.getDatabase("tidal").audit
posts.updateOne({ _id: postId }, { $set: { archived: true } })
audit.insertOne({ action: "archive", postId, at: new Date() })
session.commitTransaction() // both, or neither
} catch (e) {
session.abortTransaction()
throw e
} finally {
session.endSession()
}Agent prompt — paste into an agent with repo access
Role: Senior backend engineer reviewing a MongoDB data model (mongodb driver v6, Node.js, replica set up).
Context: A teammate wrapped "increment a post's commentCount and insert the comment" in a multi-document
transaction across the posts and comments collections.
Task: Show BOTH a correct transactional implementation AND a transaction-free redesign, then recommend one.
Requirements:
- Transactional version: a session with startTransaction/commit/abort, writeConcern majority, proper
abort on error.
- Transaction-free version: insert the comment and $inc the post's commentCount; explain why a small
drift in commentCount is tolerable and how to reconcile it (periodic count or $lookup $size on read).
- State explicitly the condition under which the transaction IS warranted.
Tests / acceptance:
- Use mongodb-memory-server configured as a replica set so transactions are supported in tests.
- Test the transactional path rolls back the count when the comment insert is forced to fail.
- Test the transaction-free path leaves a recoverable, eventually-consistent count.
- `npm test` passes.
Output: a unified diff with both implementations plus a one-paragraph recommendation.Validate the schema once the shape settles
AdvancedAttach a JSON Schema validator to a collection so MongoDB rejects malformed documents — without giving up flexibility.
Flexible early, governed later
Implicit schema is a gift while you iterate, but once a collection’s shape stabilises you usually want
guardrails so a bad write can’t slip in. MongoDB supports JSON Schema validation per collection: declare
required fields and types with $jsonSchema, set validationLevel and validationAction ("error" to
reject, "warn" to log). You can tighten an existing collection with collMod. This is the discipline that
lets the flexible model scale — flexible where you’re still learning the domain, validated where you’ve
committed to it.
A document validator
db.createCollection("posts", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["text", "author", "createdAt"],
properties: {
text: { bsonType: "string", maxLength: 5000 },
likes: { bsonType: "int", minimum: 0 },
createdAt: { bsonType: "date" },
author: {
bsonType: "object",
required: ["id", "displayName"],
properties: {
id: { bsonType: "string" },
displayName: { bsonType: "string" }
}
}
}
}
},
validationAction: "error"
})
// tighten an existing collection instead of recreating it
db.runCommand({ collMod: "posts", validator: { /* $jsonSchema ... */ } })Add full-text and vector search with Atlas
AdvancedOn MongoDB Atlas, define an Atlas Search index and query relevance with the $search aggregation stage.
When a text index isn't enough — Atlas Search
A plain MongoDB text index handles basic keyword search, but for real relevance ranking, typo tolerance,
and autocomplete you use Atlas Search, a Lucene-based full-text engine built into the managed Atlas
service. You define a search index on a collection, then query it with the $search aggregation stage.
Atlas also offers Vector Search — store embeddings on your documents and query by nearest-neighbour
($vectorSearch) for semantic search and retrieval-augmented generation, right next to your operational
data. These are Atlas features (not the open-source server), so this step assumes a cluster on Atlas.
Search the feed by relevance (Atlas)
// after creating an Atlas Search index named "postText" on the posts collection:
db.posts.aggregate([
{ $search: {
index: "postText",
text: { query: "feed release", path: "text" } // full-text, ranked
} },
{ $project: { text: 1, score: { $meta: "searchScore" } } },
{ $limit: 10 }
])Chat prompt — paste into a chat to get the code
Role: MongoDB Atlas specialist. The reader has an Atlas cluster but you cannot run it — return complete
artefacts they can paste.
Task: Provide (1) an Atlas Search index definition for a "posts" collection and (2) a $search aggregation
that ranks posts by relevance to a query string over the "text" field, returning the relevance score.
Requirements:
- The index definition is valid JSON for an Atlas Search index (dynamic or explicit mapping on "text").
- The aggregation uses the $search stage with a "text" operator, projects { text, score } where score is
{ $meta: "searchScore" }, and limits to 10.
- Add one sentence distinguishing Atlas Search ($search, full-text) from Atlas Vector Search
($vectorSearch, embedding nearest-neighbour) so the reader picks the right one.
Tests / acceptance (describe, since this needs a live Atlas cluster):
- Results are ordered by descending searchScore.
- A query word present in more posts still ranks the best lexical match first.
Output: the complete index JSON and the complete aggregation, no extra commentary.Where to take it next
- See where MongoDB’s flexible documents and aggregation pipeline are the load-bearing choice — and where they yield to a relational store — on the Compare page.
- Need relational integrity and cross-row transactions instead — money, inventory, strict invariants? Compare against PostgreSQL, the relational default.
- Pair MongoDB with Redis for an in-memory cache or hot counters in front of your durable document store.