← All tech

Database

MongoDB

Flexible BSON documents for denormalised, read-heavy data and fast-evolving schemas.

  • Denormalised, read-heavy models (feeds, catalogs)
  • Schemas that evolve without migrations
  • Rich aggregation pipeline analytics in the database
  • Documents with varied, nested, optional attributes
Use it when

Reach for MongoDB when your data is naturally document-shaped — a post with its embedded comments, a product with wildly varied attributes — and you read it far more than you cross-reference it. The flexible schema lets the model change as fast as the product does.

Reach for something else when

Skip MongoDB when correctness depends on multi-entity invariants enforced by the database — money, inventory, anything where a relational foreign key and an ACID transaction across rows are the point. There the document model fights you; use PostgreSQL.

Official docs ↗


MongoDB stores data as BSON documents inside collections — close to the JSON your application already speaks — and lets the shape evolve without a migration. This track moves from the first insertOne to compound indexes, the aggregation pipeline, $lookup, replica sets, transactions (and when to design them away), and Atlas Search — tagged by level so you can read only as deep as you need.

Run MongoDB and open the shell

Beginner

Start a local MongoDB instance with Docker, then connect to it with mongosh.

Why mongosh, and what a connection string is

mongosh is the official MongoDB Shell — a JavaScript REPL connected to a running server. You point it at a connection string (mongodb://host:port), and from there you select a database with use, then act on collections (the document equivalent of tables). Nothing is created until you write the first document; databases and collections come into existence on first insert. A managed alternative is MongoDB Atlas, where the same connection string is mongodb+srv://... and you skip running the server yourself.

Start a server and connect
Run these in your terminal / editor
# A throwaway local server on the default port 27017
docker run -d --name mongo -p 27017:27017 mongo:7

# Connect with the official shell
mongosh "mongodb://localhost:27017"

# inside mongosh:
#   use tidal        // switch to (and lazily create) the "tidal" database
#   db.getName()     // -> tidal

Insert and read your first documents

Beginner

Use insertOne / insertMany to write documents, then read them back with find.

Documents, _id, and the implicit schema

A document is a BSON object — JSON plus extra types like ObjectId, Date, and 64-bit integers. Every document gets a unique _id; if you omit it, MongoDB generates an ObjectId (a 12-byte, time-ordered identifier). Two documents in the same collection need not share the same fields — the schema is implicit and per-document, which is exactly what makes early iteration fast. find() returns a cursor; pass a filter object to narrow it, and a second object to project only the fields you want.

Write and read documents
Run these in your terminal / editor
// mongosh, database "tidal"
db.posts.insertOne({
  author: "ada",
  text: "first post",
  tags: ["intro", "hello"],
  likes: 0,
  createdAt: new Date()
})

db.posts.insertMany([
  { author: "ada", text: "second", tags: ["update"], likes: 3, createdAt: new Date() },
  { author: "lin", text: "hi all", tags: ["intro"], likes: 1, createdAt: new Date() }
])

// filter + projection: posts by ada, show only text and likes
db.posts.find({ author: "ada" }, { text: 1, likes: 1, _id: 0 })
Chat prompt — paste into a chat to get the code
For a plain chat. It returns complete code; you paste it in yourself.
Role: MongoDB teacher. The reader has no database in front of them — return complete, runnable mongosh script.
Task: Show a single mongosh script that seeds a "posts" collection and runs three reads.
Requirements:
- Insert at least four post documents with fields: author (string), text (string), tags (string array),
  likes (int), createdAt (Date). Use insertMany.
- Read 1: all posts by one author, projecting only text and likes (suppress _id).
- Read 2: posts that contain a given tag (filter on the array field directly).
- Read 3: posts with likes greater than or equal to 2, sorted by likes descending.
Tests / acceptance (describe, since no live DB):
- Read 1 returns documents without an _id field.
- Read 2 matches because querying an array field matches any element.
- Read 3 is ordered highest-likes first.
Output: the complete mongosh script, no commentary.

Update and delete by filter

Beginner

Change documents with updateOne / updateMany and the update operators, and remove them with deleteOne.

Update operators and atomic counters

Updates use operators, not a replacement object: $set assigns fields, $inc adds to a number, $push appends to an array, $pull removes from one. A single-document update is atomic, so $inc is a safe, race-free counter without a transaction — two concurrent likes both land. updateOne touches the first match; updateMany touches all. Add { upsert: true } to insert the document when no match exists.

Mutate documents
Run these in your terminal / editor
// atomic like counter — no read-modify-write race
db.posts.updateOne({ _id: someId }, { $inc: { likes: 1 } })

// add a tag only if absent, and set an edited flag
db.posts.updateOne(
  { _id: someId },
  { $addToSet: { tags: "edited" }, $set: { editedAt: new Date() } }
)

// upsert: update if present, insert if not
db.profiles.updateOne(
  { author: "ada" },
  { $set: { displayName: "Ada L." }, $setOnInsert: { createdAt: new Date() } },
  { upsert: true }
)

db.posts.deleteOne({ _id: someId })
Agent prompt — paste into an agent with repo access
For Claude Code / Cursor / an agent that can read & edit this repo.
Role: Senior backend engineer working in a Node.js service that uses the official mongodb driver.
Context: A "posts" collection exists. The driver is `mongodb` (v6). A db handle is available as `db`.
Task: Implement and unit-test a likePost(postId) function and a editPost(postId, text) function.
Requirements:
- likePost uses updateOne with $inc on "likes" and $set on "updatedAt"; it must be a single atomic update
  (no read-then-write).
- editPost uses $set on "text" and "updatedAt"; both return the modifiedCount.
- Use the BSON ObjectId type to coerce the string id; throw on an invalid id.
Tests / acceptance:
- Use mongodb-memory-server (in-memory MongoDB) so tests need no external DB.
- Test: two sequential likePost calls leave likes incremented by exactly 2.
- Test: editPost on a non-existent id returns modifiedCount 0.
- `npm test` passes.
Output: a unified diff plus a one-paragraph note on why $inc avoids a lost-update race.

Model a feed by embedding instead of joining

Intermediate

Store a post and its small, bounded sub-data (author summary, top tags) inside one document so a feed read is a single query.

Embed vs reference — the core modelling decision

The document model rewards embedding data you read together. A feed post that carries a denormalised author summary ({ id, displayName, avatarUrl }) and its tags renders from one find — no join, no second round trip. The trade-off is duplication: if a display name changes you update it in many places. The rule of thumb is embed bounded, read-together data; reference (store an id) for unbounded or independently mutated data — comments that can grow forever, or an author record edited on its own. This is the opposite instinct from relational normalisation, and it is deliberate: you optimise for the read shape.

An embedded feed document
Run these in your terminal / editor
db.posts.insertOne({
  text: "shipped the feed",
  author: { id: "ada", displayName: "Ada L.", avatarUrl: "/a/ada.png" }, // embedded summary
  tags: ["release", "feed"],
  likes: 0,
  commentCount: 0,        // a counter, not the comments themselves
  createdAt: new Date()
})

// one query renders a feed card — no join needed
db.posts.find(
  { tags: "feed" },
  { text: 1, "author.displayName": 1, likes: 1, createdAt: 1 }
).sort({ createdAt: -1 }).limit(20)

Index the queries your feed actually runs

Intermediate

Create single-field and compound indexes that match your real filters and sort order, then confirm they’re used with explain.

Compound index order follows your query (ESR)

Without an index, a query scans every document. An index is an ordered B-tree MongoDB walks instead. For compound indexes, field order matters: follow Equality, Sort, Range — put fields you match exactly first, then the field you sort on, then range fields. A feed query that filters author.id and sorts by createdAt wants { "author.id": 1, createdAt: -1 }. An index on an array field (like tags) is a multikey index automatically — one entry per element. Verify with explain("executionStats"): you want an IXSCAN stage, not a COLLSCAN.

Create indexes and check the plan
Run these in your terminal / editor
// compound index supporting filter-by-author + sort-by-time
db.posts.createIndex({ "author.id": 1, createdAt: -1 })

// multikey index on the tags array (created the same way)
db.posts.createIndex({ tags: 1 })

// did the planner use it? look for stage "IXSCAN", totalDocsExamined low
db.posts.find({ "author.id": "ada" })
  .sort({ createdAt: -1 })
  .explain("executionStats")
Agent prompt — paste into an agent with repo access
For Claude Code / Cursor / an agent that can read & edit this repo.
Role: Senior backend engineer optimising a MongoDB-backed feed service.
Context: A "posts" collection holds documents with fields author.id (string), tags (string array),
createdAt (Date), likes (int). The hot query filters by author.id and sorts by createdAt descending; a
second query filters by a single tag and sorts by createdAt descending.
Task: Add the minimal set of indexes to make both queries use an index scan, and add a script that proves it.
Requirements:
- Choose compound vs single-field per the Equality-Sort-Range rule; justify each in a code comment.
- Do not create redundant indexes (an index whose prefix another already covers).
- Provide a seed script (>= 2000 posts) and an explain() check for each hot query.
Tests / acceptance:
- For each hot query, explain("executionStats") shows winningPlan stage IXSCAN (not COLLSCAN).
- totalKeysExamined is close to nReturned; totalDocsExamined is not a full-collection scan.
Output: a unified diff (index creation + seed + explain script) plus a short note on the index choices.

Aggregate with the pipeline: $match, $group, $project

Intermediate

Build an aggregation pipeline that filters, groups, and reshapes documents to compute summaries in the database.

The pipeline is your in-database analytics engine

The aggregation pipeline is an array of stages; each consumes the previous stage’s documents and emits new ones. $match filters early (put it first so an index can help and less data flows downstream), $group collapses documents by a key and accumulates ($sum, $avg, $max), and $project reshapes the output — add, rename, or drop fields. This is how you compute “likes per author this week” or “top tags” without pulling raw documents into the application and looping in code.

Likes per author, last 7 days
Run these in your terminal / editor
db.posts.aggregate([
  // 1. filter early so the rest of the pipeline sees less data
  { $match: { createdAt: { $gte: new Date(Date.now() - 7 * 864e5) } } },

  // 2. collapse by author, accumulate totals
  { $group: {
      _id: "$author.id",
      totalLikes: { $sum: "$likes" },
      posts: { $sum: 1 }
  } },

  // 3. reshape: keep totalLikes and posts, rename _id to author
  { $project: { _id: 0, author: "$_id", totalLikes: 1, posts: 1 } },

  { $sort: { totalLikes: -1 } },
  { $limit: 10 }
])
Chat prompt — paste into a chat to get the code
For a plain chat. It returns complete code; you paste it in yourself.
Role: MongoDB teacher. The reader has no database — return a complete, runnable mongosh aggregation.
Task: Write an aggregation that produces the top 5 tags by usage across all posts in the "posts" collection.
Requirements:
- Posts have a "tags" string array. Unwind the array so each tag becomes its own document ($unwind).
- Group by tag, count occurrences with $sum: 1.
- Project to { tag, count }, sort by count descending, limit to 5.
Tests / acceptance (describe, since no live DB):
- $unwind produces one document per (post, tag) pair.
- A tag appearing in N posts ends with count N.
- Output documents have exactly the fields tag and count.
Output: the complete aggregate([...]) call, no commentary.

Join across collections with $lookup

Intermediate

When you must reference rather than embed, pull related documents in with a $lookup stage.

$lookup is a left outer join — use it deliberately

$lookup performs a left outer join between collections inside an aggregation: for each input document it finds matching documents in another collection and attaches them as an array field. It is the escape hatch for the data you chose to reference — fetching a post’s comments, or hydrating an author record kept in its own collection. It works, and a $lookup on an indexed localField/foreignField is efficient, but if you find yourself joining on every read, that is a signal your model may want more embedding. Joins are not MongoDB’s strong suit the way they are PostgreSQL’s; reach for $lookup, don’t lean on it.

Hydrate posts with their comments
Run these in your terminal / editor
db.posts.aggregate([
  { $match: { "author.id": "ada" } },
  { $lookup: {
      from: "comments",          // the other collection
      localField: "_id",         // posts._id
      foreignField: "postId",    // comments.postId
      as: "comments"             // attached as an array
  } },
  { $project: {
      text: 1,
      commentCount: { $size: "$comments" }   // count without returning bodies
  } }
])
Agent prompt — paste into an agent with repo access
For Claude Code / Cursor / an agent that can read & edit this repo.
Role: Senior backend engineer working in a MongoDB-backed service (mongodb driver v6, Node.js).
Context: Two collections — "posts" (with _id) and "comments" (each has postId referencing posts._id,
plus text and createdAt). comments.postId is indexed.
Task: Implement getPostWithComments(postId) returning the post plus its comments sorted oldest-first,
using a single aggregation with $lookup.
Requirements:
- Use the $lookup "pipeline" form (or a sub-$sort) so comments come back sorted by createdAt ascending.
- Return null when the post does not exist (empty aggregation result).
- Coerce the string id to ObjectId; throw on an invalid id.
Tests / acceptance:
- Use mongodb-memory-server so no external DB is required.
- Seed one post with three comments inserted out of order; assert the returned comments are time-ordered.
- Seed a post with zero comments; assert comments is an empty array, not undefined.
- `npm test` passes.
Output: a unified diff plus one sentence on when you'd embed comments instead of joining.

Run a replica set and read your own writes

Advanced

Run MongoDB as a replica set, and use read/write concerns to control durability and consistency.

Replica sets, elections, and concerns

Production MongoDB runs as a replica set: one primary takes writes, secondaries replicate the oplog, and if the primary fails the members elect a new one — that is how you get high availability and automatic failover. You tune the guarantees per operation. A write concern of { w: "majority" } waits until a majority of members have the write, so it survives a failover; { w: 1 } returns after just the primary. A read concern of "majority" returns only data acknowledged by a majority (no reading writes that could be rolled back). Multi-document transactions (next step) require a replica set — another reason it is the realistic baseline, not a single mongod.

Start a one-node replica set
Run these in your terminal / editor
# start mongod as a replica set member
docker run -d --name mongo-rs -p 27017:27017 mongo:7 \
  mongod --replSet rs0 --bind_ip_all

# initialise the set (one-time)
mongosh "mongodb://localhost:27017" --eval 'rs.initiate()'
Tune write and read concerns
Run these in your terminal / editor
// write that survives failover: wait for a majority to acknowledge
db.posts.insertOne(
  { text: "durable", createdAt: new Date() },
  { writeConcern: { w: "majority" } }
)

// read only majority-committed data
db.posts.find({ text: "durable" }).readConcern("majority")

Use a multi-document transaction — and design to avoid it

Advanced

When one logical change must touch several documents atomically, wrap it in a transaction — but first ask whether the model could make it a single-document update instead.

Transactions exist; the model shines when you don't need them

MongoDB has full multi-document ACID transactions (on a replica set): start a session, do several writes, commitTransaction — all or nothing. They are real and correct. But they carry cost and contention, and the document model is designed so you often don’t need them: keep the data that must change together in one document, and a single-document update is already atomic. A like, a comment-count bump, a status flip — all one updateOne. Reserve transactions for the genuine cross-document invariant (move an item between two users’ collections). If you reach for transactions on every write, that is the signal a relational database with row-level transactions — PostgreSQL — may fit the problem better. Honesty here saves you pain later.

A transaction across two collections
Run these in your terminal / editor
const session = db.getMongo().startSession()
session.startTransaction({ writeConcern: { w: "majority" } })
try {
  const posts = session.getDatabase("tidal").posts
  const audit = session.getDatabase("tidal").audit

  posts.updateOne({ _id: postId }, { $set: { archived: true } })
  audit.insertOne({ action: "archive", postId, at: new Date() })

  session.commitTransaction()   // both, or neither
} catch (e) {
  session.abortTransaction()
  throw e
} finally {
  session.endSession()
}
Agent prompt — paste into an agent with repo access
For Claude Code / Cursor / an agent that can read & edit this repo.
Role: Senior backend engineer reviewing a MongoDB data model (mongodb driver v6, Node.js, replica set up).
Context: A teammate wrapped "increment a post's commentCount and insert the comment" in a multi-document
transaction across the posts and comments collections.
Task: Show BOTH a correct transactional implementation AND a transaction-free redesign, then recommend one.
Requirements:
- Transactional version: a session with startTransaction/commit/abort, writeConcern majority, proper
  abort on error.
- Transaction-free version: insert the comment and $inc the post's commentCount; explain why a small
  drift in commentCount is tolerable and how to reconcile it (periodic count or $lookup $size on read).
- State explicitly the condition under which the transaction IS warranted.
Tests / acceptance:
- Use mongodb-memory-server configured as a replica set so transactions are supported in tests.
- Test the transactional path rolls back the count when the comment insert is forced to fail.
- Test the transaction-free path leaves a recoverable, eventually-consistent count.
- `npm test` passes.
Output: a unified diff with both implementations plus a one-paragraph recommendation.

Validate the schema once the shape settles

Advanced

Attach a JSON Schema validator to a collection so MongoDB rejects malformed documents — without giving up flexibility.

Flexible early, governed later

Implicit schema is a gift while you iterate, but once a collection’s shape stabilises you usually want guardrails so a bad write can’t slip in. MongoDB supports JSON Schema validation per collection: declare required fields and types with $jsonSchema, set validationLevel and validationAction ("error" to reject, "warn" to log). You can tighten an existing collection with collMod. This is the discipline that lets the flexible model scale — flexible where you’re still learning the domain, validated where you’ve committed to it.

A document validator
Run these in your terminal / editor
db.createCollection("posts", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["text", "author", "createdAt"],
      properties: {
        text:      { bsonType: "string", maxLength: 5000 },
        likes:     { bsonType: "int", minimum: 0 },
        createdAt: { bsonType: "date" },
        author: {
          bsonType: "object",
          required: ["id", "displayName"],
          properties: {
            id:          { bsonType: "string" },
            displayName: { bsonType: "string" }
          }
        }
      }
    }
  },
  validationAction: "error"
})

// tighten an existing collection instead of recreating it
db.runCommand({ collMod: "posts", validator: { /* $jsonSchema ... */ } })

Add full-text and vector search with Atlas

Advanced

On MongoDB Atlas, define an Atlas Search index and query relevance with the $search aggregation stage.

When a text index isn't enough — Atlas Search

A plain MongoDB text index handles basic keyword search, but for real relevance ranking, typo tolerance, and autocomplete you use Atlas Search, a Lucene-based full-text engine built into the managed Atlas service. You define a search index on a collection, then query it with the $search aggregation stage. Atlas also offers Vector Search — store embeddings on your documents and query by nearest-neighbour ($vectorSearch) for semantic search and retrieval-augmented generation, right next to your operational data. These are Atlas features (not the open-source server), so this step assumes a cluster on Atlas.

Search the feed by relevance (Atlas)
Run these in your terminal / editor
// after creating an Atlas Search index named "postText" on the posts collection:
db.posts.aggregate([
  { $search: {
      index: "postText",
      text: { query: "feed release", path: "text" }   // full-text, ranked
  } },
  { $project: { text: 1, score: { $meta: "searchScore" } } },
  { $limit: 10 }
])
Chat prompt — paste into a chat to get the code
For a plain chat. It returns complete code; you paste it in yourself.
Role: MongoDB Atlas specialist. The reader has an Atlas cluster but you cannot run it — return complete
artefacts they can paste.
Task: Provide (1) an Atlas Search index definition for a "posts" collection and (2) a $search aggregation
that ranks posts by relevance to a query string over the "text" field, returning the relevance score.
Requirements:
- The index definition is valid JSON for an Atlas Search index (dynamic or explicit mapping on "text").
- The aggregation uses the $search stage with a "text" operator, projects { text, score } where score is
  { $meta: "searchScore" }, and limits to 10.
- Add one sentence distinguishing Atlas Search ($search, full-text) from Atlas Vector Search
  ($vectorSearch, embedding nearest-neighbour) so the reader picks the right one.
Tests / acceptance (describe, since this needs a live Atlas cluster):
- Results are ordered by descending searchScore.
- A query word present in more posts still ranks the best lexical match first.
Output: the complete index JSON and the complete aggregation, no extra commentary.

Where to take it next

  • See where MongoDB’s flexible documents and aggregation pipeline are the load-bearing choice — and where they yield to a relational store — on the Compare page.
  • Need relational integrity and cross-row transactions instead — money, inventory, strict invariants? Compare against PostgreSQL, the relational default.
  • Pair MongoDB with Redis for an in-memory cache or hot counters in front of your durable document store.