Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Indexing & Performance
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan — scanning every document in a collection to find matches. On a collection with 10 million documents, that means reading 10 million documents even if only 5 match your query. The library analogy: Imagine a library with 100,000 books but no catalog system. To find a book by a specific author, you would have to walk down every aisle and check every spine. That is a collection scan. An index is the card catalog — it tells MongoDB exactly where to look, turning a full-aisle walk into a quick lookup.How Indexes Work Under the Hood
MongoDB indexes use a B-tree data structure (specifically, a variant called B+ tree). The key properties:- Sorted order: Index entries are stored sorted, so range queries (
$gt,$lt) and sort operations are efficient. - Logarithmic lookup: Finding a value in a B-tree is O(log n). In a collection of 10 million documents, that is roughly 23 comparisons instead of 10 million.
- Pointer to document: Each index entry contains a pointer to the full document on disk, so MongoDB can jump directly to the data.
Creating an Index
UsecreateIndex() to create an index on a field.
Index Types
Single Field Indexes
The simplest index type — one field, one direction.Compound Indexes
An index on multiple fields. The order of fields matters — this is one of the most important and most misunderstood concepts in MongoDB indexing.{ status, createdAt } supports queries on status alone or status + createdAt, but NOT on createdAt alone.
This is called the prefix rule: a compound index supports queries on any left prefix of the indexed fields.
Multikey Indexes
When you index a field that holds an array, MongoDB creates a multikey index — one index entry per array element. This happens automatically.Unique Indexes
Ensures that the indexed fields do not store duplicate values.Text Indexes
For full-text search across string fields.TTL Indexes (Time-To-Live)
Automatically delete documents after a specified time period. Perfect for session data, logs, or temporary records.Partial Indexes
Index only documents that match a filter condition — saves disk space and write overhead.Viewing Indexes
Dropping Indexes
explain() — Understanding Query Execution
explain() is your most important performance debugging tool. It shows you exactly how MongoDB plans to execute a query and what it actually did.
Reading explain() Output
The key fields to examine in executionStats:
| Field | What It Tells You | What to Look For |
|---|---|---|
winningPlan.stage | How MongoDB accessed data | IXSCAN (good) vs COLLSCAN (bad) |
nReturned | Documents returned to client | Should be close to what you expect |
totalDocsExamined | Documents MongoDB had to read | Should be close to nReturned |
totalKeysExamined | Index entries scanned | Should be close to nReturned |
executionTimeMillis | Wall clock time | Lower is better |
nReturned / totalDocsExamined should be close to 1.0. If you return 10 documents but examined 100,000, your index is not selective enough or is missing entirely.
Common Indexing Pitfalls
1. Too many indexes. Every index slows down writes and consumes RAM. A collection with 20 indexes will have noticeably slower inserts. Aim for the minimum set of indexes that covers your query patterns. 2. Indexing low-cardinality fields alone. An index on a boolean field likeisActive (only two possible values) is rarely useful by itself — MongoDB still has to scan half the collection. Combine it with a more selective field in a compound index.
3. Forgetting sort order in compound indexes. If your query sorts by createdAt descending but your compound index has createdAt: 1 (ascending), MongoDB may need an in-memory sort. Match the sort direction in your index to your query’s sort.
4. Not monitoring index usage. Use $indexStats to find unused indexes:
Summary
- Indexes use B-tree structures to turn O(n) collection scans into O(log n) lookups.
- The trade-off: faster reads but slower writes and more RAM usage. Index deliberately, not liberally.
- Compound indexes support queries on any left prefix of their fields. Field order matters — follow the ESR rule (Equality, Sort, Range).
- Use
explain("executionStats")to verify that queries use indexes. Watch thenReturned / totalDocsExaminedratio. - Specialized indexes: unique (enforce constraints), TTL (auto-expire), text (full-text search), partial (conditional indexing), multikey (arrays).
- Monitor index usage with
$indexStatsand drop unused indexes to reduce write overhead. - Ensure your index working set fits in RAM — disk-paged indexes negate most performance benefits.
Interview Deep-Dive
Your MongoDB collection has 200 million documents and 12 indexes. Write performance has degraded by 40% over the last quarter. Walk me through your diagnosis and remediation plan.
Your MongoDB collection has 200 million documents and 12 indexes. Write performance has degraded by 40% over the last quarter. Walk me through your diagnosis and remediation plan.
- The first thing I would check is index utilization. Every index on a collection adds overhead to every write operation — the document must be written, and then each of the 12 index B-trees must be updated. More indexes means more random I/O on writes. Run
db.collection.aggregate([{ $indexStats: {} }])on each index to see how often each index is actually used for reads. - In my experience, collections that accumulate 12 indexes have “zombie indexes” — indexes created for features that were later removed, for one-off migration queries, or by developers who added indexes without checking if an existing index already covered the query. I have consistently found that 30-50% of indexes on mature collections are unused or redundant.
- Check for redundant indexes specifically. If you have both
{ userId: 1 }and{ userId: 1, createdAt: -1 }, the single-field index onuserIdis almost always redundant because the compound index can serve any query that the single-field index serves (the compound index is a superset). Drop the single-field index. - Next, check the working set size. Run
db.serverStatus().wiredTiger.cacheand look atbytes currently in the cacheversusmaximum bytes configured. If total index size exceeds the WiredTiger cache, index operations spill to disk, which destroys write performance. With 12 indexes on 200M documents, index data could easily be 50+ GB. If your WiredTiger cache is 10 GB, you are in trouble. - The remediation plan, in order: (1) Drop unused indexes identified by
$indexStats. (2) Merge redundant indexes — replace two overlapping indexes with one compound index that serves both. (3) Convert full indexes to partial indexes where possible. If an index exists for querying active orders and 70% of orders are completed, a partial index with{ partialFilterExpression: { status: "active" } }is 70% smaller. (4) If the indexes are all legitimately needed, the remaining option is to increase the WiredTiger cache (more RAM) or move to a sharded cluster to distribute the write load. - The 40% degradation over a quarter suggests either the index count grew (new features added indexes) or the data volume grew (more documents means larger indexes). Track index count and total index size as operational metrics with alerts.
- Dropping an index in MongoDB is fast (nearly instant) and does not block other operations — it is a metadata change, not a data operation. However, the risk is not the drop itself but the consequences: if a query was depending on that index and you missed it, that query suddenly falls back to a COLLSCAN, potentially causing timeouts and cascading failures.
- Safe procedure: first, confirm the index is truly unused by checking
$indexStatsover a full business cycle. An index with zero usage in the last 24 hours might be used by a weekly report that runs on Sundays. Check over at least 2 weeks, covering all scheduled jobs and reporting cycles. - Second, use the “hide index” feature (available in MongoDB 4.4+).
db.collection.hideIndex("index_name")makes the index invisible to the query planner without dropping it. Queries that depended on it will now use a different plan (or COLLSCAN). Monitor your slow query log and application metrics for 24-48 hours. If nothing degrades, the index was truly unused, and you can safely drop it withdropIndex. If something breaks,unhideIndexinstantly restores it without rebuilding. - Third, have a rollback plan. Before dropping, record the full index definition (fields, options, partial filter expression). If you need to recreate it, you have the exact specification. On a 200M-document collection, rebuilding an index takes minutes to hours and consumes I/O, so avoid needing to do this.
Explain the ESR rule for compound index design. Why does the order of fields in a compound index matter so much?
Explain the ESR rule for compound index design. Why does the order of fields in a compound index matter so much?
- ESR stands for Equality, Sort, Range — and it is the most important rule for designing compound indexes that serve real-world queries efficiently. The order of fields in a compound index determines how MongoDB traverses the B-tree, and getting it wrong can make the difference between scanning 100 documents and scanning 100,000.
- Equality fields first. Fields queried with exact match conditions (
status: "active",country: "US") go at the beginning of the index. Equality conditions provide the tightest B-tree traversal — MongoDB descends directly to the matching node. Multiple equality fields can be in any relative order since they each narrow to an exact point. - Sort fields second. After the equality fields, add the fields used in
sort(). If the index order matches the sort direction, MongoDB reads index entries in order and avoids an in-memory sort. Without this, MongoDB must collect all matching documents and sort them in memory (limited to 100 MB). - Range fields last. Fields queried with range conditions (
$gt,$lt,$gte,$lte,$in) go at the end. Range conditions scan a contiguous section of the B-tree. If you put a range field before the sort field, the sort within each range value is not maintained, and MongoDB falls back to an in-memory sort. - Concrete example: for the query
db.orders.find({ status: "shipped", amount: { $gt: 100 } }).sort({ createdAt: -1 }), the optimal index is{ status: 1, createdAt: -1, amount: 1 }. The equality match onstatusnarrows the index to just “shipped” entries. The sort oncreatedAtis provided by the index order. The range onamountis evaluated last. - If you instead created
{ status: 1, amount: 1, createdAt: -1 }(range before sort), MongoDB can use the equality onstatusand the range onamountefficiently, but the results within that range are not sorted bycreatedAt. It must collect all matching entries and sort them in memory. For large result sets, this hits the 100 MB sort limit and fails. - The way I remember it: “Equality drills to a point, Sort provides order, Range scans a segment. Point-Order-Segment needs to be in that sequence for maximum efficiency.”
- This is where ESR requires judgment, not blind application. When you have multiple range conditions, only the first range field in the index can use the B-tree efficiently. The second range field becomes a residual filter applied to each index entry, not an index-guided scan.
- The decision comes down to selectivity: which range condition eliminates more documents? If the date filter reduces the dataset from 10 million to 500,000 and the price filter reduces it from 10 million to 2 million, put
datefirst in the index because it is more selective. The price condition is evaluated as a residual filter on the 500,000 date-matching entries. - Use
explain()with both orderings and comparetotalKeysExaminedversusnReturned. The index that examines fewer keys relative to returned documents is the better choice. - Another option: if this is a high-frequency query, consider two separate indexes and let MongoDB’s query planner choose the better one, or use index intersection (though MongoDB’s index intersection is less reliable than you might expect — it often falls back to using a single index).
- In practice, this is one of those cases where benchmarking beats theory. Create both index orderings, run
explainwith your actual data distribution, and pick the winner. Data skew (most orders are recent, most prices cluster in a narrow range) changes the optimal choice in ways that cannot be predicted from theory alone.
What is a multikey index in MongoDB? How does it work internally, and what restrictions does it impose?
What is a multikey index in MongoDB? How does it work internally, and what restrictions does it impose?
- A multikey index is an index on a field that contains an array. When MongoDB encounters an array value during indexing, it creates a separate index entry for each element in the array. So if a document has
tags: ["mongodb", "database", "nosql"], the multikey index ontagscreates three index entries, each pointing to the same document. - Internally, this is transparent — you create a multikey index with the same
createIndexsyntax as a regular index. MongoDB detects that the field contains an array and automatically uses multikey indexing. Theexplain()output showsisMultiKey: truefor these indexes. - The power of multikey indexes is that they enable efficient queries on array contents.
db.articles.find({ tags: "mongodb" })uses the multikey index to find all articles with “mongodb” in their tags array in O(log n) time instead of scanning every document. - Restriction one: a compound index can have at most one multikey field. You cannot create an index on
{ tags: 1, categories: 1 }if bothtagsandcategoriesare arrays. This is because the number of index entries would be the Cartesian product — a document with 10 tags and 10 categories would generate 100 index entries, which explodes exponentially. MongoDB prevents this. - Restriction two: multikey indexes cannot cover queries. Even if your projection only includes fields in the index, MongoDB must fetch the full document to correctly reconstruct the array. The explain output will show
totalDocsExamined > 0even for queries that would be covered on a non-multikey index. - Restriction three: array size impacts write performance. If a document has an array with 1,000 elements and you insert or update it, the multikey index creates or updates 1,000 index entries. For documents with large, frequently updated arrays, this write amplification can become a significant bottleneck.
- Restriction four: shard keys cannot be multikey. If you try to shard on a field that contains arrays, MongoDB rejects it because the shard key for a document must be a single, deterministic value.
- The query
db.products.find({ tags: { $all: ["wireless", "bluetooth"] } })uses the multikey index, but its execution is nuanced. MongoDB uses the index to find all documents that contain the first tag (“wireless”), then applies the second tag (“bluetooth”) as a filter on those results. It is not a single index seek for the intersection. - The efficiency depends on the selectivity of the first tag. If “wireless” appears in 500,000 out of 1 million products, MongoDB fetches 500,000 index entries and then filters for “bluetooth.” If “bluetooth” appears in only 1,000 products, the final result is small, but the intermediate work was large. MongoDB does not necessarily pick the most selective tag first — it depends on the query planner’s estimate.
- For frequently used multi-tag searches, consider a different data model. The “Attribute Pattern” uses an array of key-value pairs:
attributes: [{ key: "tag", value: "wireless" }, { key: "tag", value: "bluetooth" }]with a compound index on{ "attributes.key": 1, "attributes.value": 1 }. This enables more efficient compound queries but at the cost of a more complex schema. - Another approach: if the tag combinations are predictable and limited, create a compound index on the most commonly co-queried tags as separate boolean fields:
{ isWireless: 1, isBluetooth: 1 }. This trades schema flexibility for query efficiency.
Explain the explain() output in MongoDB. What are the key fields you look at, and how do you interpret them to diagnose a slow query?
Explain the explain() output in MongoDB. What are the key fields you look at, and how do you interpret them to diagnose a slow query?
explain()is the most important diagnostic tool in MongoDB query optimization. I use the"executionStats"verbosity level because it provides actual execution metrics, not just the query plan.- The fields I check, in order of importance:
winningPlan.stage(orwinningPlan.inputStage.stagefor nested plans): This tells you the top-level execution strategy.COLLSCANmeans no index was used — bad for any collection over a few thousand documents.IXSCANmeans an index was used — good.FETCHafterIXSCANmeans the query used an index to find matching documents, then fetched the full documents for additional filtering or projection.SORTas a separate stage means an in-memory sort was required — the sort was not covered by the index.totalDocsExaminedvsnReturned: This is the efficiency ratio. If the query examined 1 million documents to return 50, you have a selectivity problem. Ideal ratio is close to 1:1. A ratio above 10:1 usually means the index is not selective enough or the wrong index is being used.totalKeysExaminedvsnReturned: Similar to above but for index entries. If you examined 500,000 keys to return 50 documents, the index is being scanned broadly (range query on a low-selectivity field). Close to 1:1 is ideal.executionTimeMillis: The total wall-clock time. Compare this across different query plans (using.hint()to force different indexes) to find the fastest approach.rejectedPlans: MongoDB’s query planner evaluates multiple candidate plans and picks the winner based on a trial period. The rejected plans show you what alternatives were considered and discarded. If a seemingly better plan was rejected, it might be because the planner’s statistics are outdated — runplanCacheClear()to reset plan selection.- Red flags in explain output: COLLSCAN on a large collection, a SORT stage (indicates in-memory sort),
totalDocsExaminedorders of magnitude larger thannReturned, orexecutionTimeMillisin the thousands for what should be a simple lookup.
- MongoDB’s query planner caches the winning plan for a given “query shape” (the structure of the query, ignoring specific values). The first time a query shape is seen, the planner runs a trial execution of multiple candidate plans and caches the winner. Subsequent queries with the same shape reuse the cached plan without re-evaluation.
- The cache is invalidated automatically when: the collection receives 1,000+ writes (threshold varies by version), an index is created or dropped, or
mongodrestarts. But between invalidations, a plan that was optimal for one data distribution might become suboptimal as data changes. For example, a plan that used index A was optimal when the collection had 1 million documents, but after growing to 10 million documents with different data distribution, index B would be better — but the cached plan still uses A. - To force reevaluation:
db.collection.getPlanCache().clear()clears all cached plans for the collection. For a specific query shape,db.collection.getPlanCache().clearPlansByQuery(queryShape)targets just that shape. After clearing, the next execution of that query will trigger a fresh plan evaluation. - In production, I have seen cached plan issues manifest as sudden performance degradation after data grows or distribution shifts. The fix is simple (clear the plan cache), but the diagnosis requires recognizing that the explain output shows a cached plan that does not match what the planner would choose fresh.
Compare single-field indexes, compound indexes, text indexes, and wildcard indexes. When would you use each, and what are the limitations of each type?
Compare single-field indexes, compound indexes, text indexes, and wildcard indexes. When would you use each, and what are the limitations of each type?
- Single-field indexes are the simplest: one field, one direction. Use them when queries filter or sort on a single field. Example:
{ email: 1 }forfind({ email: "alice@example.com" }). Limitation: useless for queries that filter on multiple fields — they can only help with one filter condition, and MongoDB generally uses only one index per query (index intersection exists but is unreliable). - Compound indexes include multiple fields in a defined order. They are the workhorse of MongoDB indexing. A compound index on
{ status: 1, createdAt: -1, amount: 1 }can serve queries that filter onstatusalone,status + createdAt, orstatus + createdAt + amount(the “leftmost prefix” rule). It cannot serve a query on justcreatedAtor justamountwithoutstatus. Design them using the ESR rule: Equality, Sort, Range. Limitation: more fields means a larger index and more write overhead. A 5-field compound index on a high-write collection can noticeably degrade insert performance. - Text indexes enable full-text search on string fields.
db.articles.createIndex({ title: "text", body: "text" })allows queries likedb.articles.find({ $text: { $search: "mongodb performance" } }). They support stemming, stop words, and text scoring. Limitation: only one text index per collection, no support for partial matching or fuzzy search (no typo tolerance), and performance degrades significantly on large collections compared to dedicated search engines. For anything beyond basic text search, use Elasticsearch or Atlas Search. - Wildcard indexes (
db.collection.createIndex({ "$**": 1 })) index all fields in a document, or all fields matching a pattern. They are designed for collections with unpredictable or highly variable schemas — like event logs where each event type has different fields. Limitation: wildcard indexes cannot support compound queries (they are effectively many single-field indexes), cannot support sort operations, and are not as efficient as purpose-built compound indexes. Use them as a safety net for ad-hoc queries on schema-variable collections, not as a replacement for thoughtful compound indexes on well-understood query patterns. - My rule of thumb: start with compound indexes designed for your known query patterns (the 80% case). Add text indexes only if you need basic search and cannot afford a dedicated search service. Use wildcard indexes only for collections with truly unpredictable schemas where you need to support exploratory queries. Single-field indexes are mostly useful as building blocks during development — in production, they are usually either upgraded to compound indexes or found to be redundant.
- MongoDB’s query planner evaluates all candidate indexes for a query and selects the most efficient one through a trial execution. Having both wildcard and compound indexes does not cause a conflict — the planner simply considers both as candidates.
- In virtually all cases, the compound index wins over the wildcard index for queries it was designed for. A compound index on
{ status: 1, createdAt: -1 }will be chosen over a wildcard index forfind({ status: "active" }).sort({ createdAt: -1 })because the compound index provides both the filter and the sort in a single scan. The wildcard index can only help with the filter and would require a separate in-memory sort. - The wildcard index earns its value for the “other” queries — the ones you did not anticipate with compound indexes. An ad-hoc query on a rarely-filtered field benefits from the wildcard index rather than falling back to a COLLSCAN.
- The overhead: the wildcard index is maintained on every write, indexing every field in every document. This can significantly increase write latency and storage. Monitor the index size and write performance after adding a wildcard index. If the collection has documents with 100+ fields, the wildcard index could be enormous.