Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

MongoDB Indexing

Indexing & Performance

Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan — scanning every document in a collection to find matches. On a collection with 10 million documents, that means reading 10 million documents even if only 5 match your query. The library analogy: Imagine a library with 100,000 books but no catalog system. To find a book by a specific author, you would have to walk down every aisle and check every spine. That is a collection scan. An index is the card catalog — it tells MongoDB exactly where to look, turning a full-aisle walk into a quick lookup.
MongoDB automatically creates one index: a unique index on the _id field. Every other index is your responsibility. If a query does not have a supporting index, it will do a collection scan — and on production datasets, that can take seconds or even minutes.

How Indexes Work Under the Hood

MongoDB indexes use a B-tree data structure (specifically, a variant called B+ tree). The key properties:
  • Sorted order: Index entries are stored sorted, so range queries ($gt, $lt) and sort operations are efficient.
  • Logarithmic lookup: Finding a value in a B-tree is O(log n). In a collection of 10 million documents, that is roughly 23 comparisons instead of 10 million.
  • Pointer to document: Each index entry contains a pointer to the full document on disk, so MongoDB can jump directly to the data.
The trade-off: every index must be updated on every write (insert, update, delete). More indexes mean slower writes but faster reads. This is the fundamental indexing trade-off you will manage throughout your career with MongoDB.

Creating an Index

Use createIndex() to create an index on a field.
// Create a single-field index on "name" (ascending order)
// Ascending (1) vs descending (-1) doesn't matter for single-field indexes
// but matters for compound indexes used in sorting
db.users.createIndex({ name: 1 })

// Name the index for easier management
db.users.createIndex({ name: 1 }, { name: "idx_users_name" })

Index Types

Single Field Indexes

The simplest index type — one field, one direction.
// Index on email -- useful if you frequently query by email
db.users.createIndex({ email: 1 })

Compound Indexes

An index on multiple fields. The order of fields matters — this is one of the most important and most misunderstood concepts in MongoDB indexing.
// Compound index: status first, then createdAt
db.orders.createIndex({ status: 1, createdAt: -1 })
The phone book analogy: A compound index is like a phone book sorted by last name, then first name. You can efficiently look up “all Smiths” (first field) or “Smith, John” (both fields). But you cannot efficiently look up “all Johns” (second field only) — you would have to scan every letter of the alphabet. The same principle applies: a compound index on { status, createdAt } supports queries on status alone or status + createdAt, but NOT on createdAt alone. This is called the prefix rule: a compound index supports queries on any left prefix of the indexed fields.
// This compound index:
db.orders.createIndex({ status: 1, createdAt: -1, amount: 1 })

// Supports these queries efficiently:
db.orders.find({ status: "active" })                               // uses prefix [status]
db.orders.find({ status: "active", createdAt: { $gte: someDate } }) // uses prefix [status, createdAt]
db.orders.find({ status: "active", createdAt: { $gte: someDate }, amount: { $gt: 100 } }) // uses all three

// Does NOT support these efficiently (no left prefix match):
db.orders.find({ createdAt: { $gte: someDate } })  // COLLSCAN - createdAt is not the first field
db.orders.find({ amount: { $gt: 100 } })           // COLLSCAN - amount is not the first field
The ESR Rule for compound index field order: Equality fields first, then Sort fields, then Range fields. For example, if your query is { status: "active", createdAt: { $gte: date } } sorted by createdAt, the optimal index is { status: 1, createdAt: 1 } — equality (status) first, then sort/range (createdAt).

Multikey Indexes

When you index a field that holds an array, MongoDB creates a multikey index — one index entry per array element. This happens automatically.
// If "tags" is an array like ["mongodb", "database", "nosql"]
db.posts.createIndex({ tags: 1 })

// Now this query uses the index efficiently:
db.posts.find({ tags: "mongodb" })
A compound index can include at most one array field. MongoDB cannot create a compound multikey index on two array fields simultaneously (e.g., { tags: 1, categories: 1 } where both are arrays). This is a hard limitation.

Unique Indexes

Ensures that the indexed fields do not store duplicate values.
// Enforce unique email addresses
db.users.createIndex({ email: 1 }, { unique: true })

// Compound unique index: the combination must be unique
db.inventory.createIndex({ warehouse: 1, sku: 1 }, { unique: true })

Text Indexes

For full-text search across string fields.
// Create a text index on title and body
db.articles.createIndex({ title: "text", body: "text" })

// Search for documents containing "mongodb" or "performance"
db.articles.find({ $text: { $search: "mongodb performance" } })

TTL Indexes (Time-To-Live)

Automatically delete documents after a specified time period. Perfect for session data, logs, or temporary records.
// Documents expire 3600 seconds (1 hour) after the createdAt timestamp
db.sessions.createIndex({ createdAt: 1 }, { expireAfterSeconds: 3600 })

Partial Indexes

Index only documents that match a filter condition — saves disk space and write overhead.
// Only index active users -- no point indexing soft-deleted accounts
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { status: "active" } }
)

Viewing Indexes

// List all indexes on a collection
db.users.getIndexes()

// Check index sizes (useful for capacity planning)
db.users.stats().indexSizes

Dropping Indexes

// Drop a specific index by name
db.users.dropIndex("idx_users_name")

// Drop all indexes except _id (use with caution)
db.users.dropIndexes()
On large collections, createIndex() blocks all read and write operations by default. In production, always build indexes in the background by using a rolling index build strategy, or schedule index creation during low-traffic windows. On MongoDB 4.2+, index builds are optimized to not hold exclusive locks for the entire duration, but they still consume significant resources.

explain() — Understanding Query Execution

explain() is your most important performance debugging tool. It shows you exactly how MongoDB plans to execute a query and what it actually did.
// Three verbosity levels:
db.users.find({ name: "Alice" }).explain("queryPlanner")     // Shows the plan without executing
db.users.find({ name: "Alice" }).explain("executionStats")   // Executes and shows stats (most useful)
db.users.find({ name: "Alice" }).explain("allPlansExecution") // Shows all candidate plans

Reading explain() Output

The key fields to examine in executionStats:
FieldWhat It Tells YouWhat to Look For
winningPlan.stageHow MongoDB accessed dataIXSCAN (good) vs COLLSCAN (bad)
nReturnedDocuments returned to clientShould be close to what you expect
totalDocsExaminedDocuments MongoDB had to readShould be close to nReturned
totalKeysExaminedIndex entries scannedShould be close to nReturned
executionTimeMillisWall clock timeLower is better
The golden ratio: nReturned / totalDocsExamined should be close to 1.0. If you return 10 documents but examined 100,000, your index is not selective enough or is missing entirely.
// Example: diagnosing a slow query
const stats = db.orders.find({
  status: "active",
  createdAt: { $gte: new Date("2023-01-01") }
}).explain("executionStats")

// If you see:
//   winningPlan.stage: "COLLSCAN"    --> No index is being used, create one
//   totalDocsExamined: 1000000       --> Scanning too many documents
//   nReturned: 50                    --> Only 50 results -- huge waste

// Fix: create a compound index
db.orders.createIndex({ status: 1, createdAt: -1 })

Common Indexing Pitfalls

1. Too many indexes. Every index slows down writes and consumes RAM. A collection with 20 indexes will have noticeably slower inserts. Aim for the minimum set of indexes that covers your query patterns. 2. Indexing low-cardinality fields alone. An index on a boolean field like isActive (only two possible values) is rarely useful by itself — MongoDB still has to scan half the collection. Combine it with a more selective field in a compound index. 3. Forgetting sort order in compound indexes. If your query sorts by createdAt descending but your compound index has createdAt: 1 (ascending), MongoDB may need an in-memory sort. Match the sort direction in your index to your query’s sort. 4. Not monitoring index usage. Use $indexStats to find unused indexes:
// Find which indexes are actually being used
db.users.aggregate([{ $indexStats: {} }])
// Look for indexes with zero "ops" -- candidates for removal
5. Ignoring the working set. Indexes must fit in RAM to be effective. If your indexes exceed available memory, MongoDB will page them to disk, and index lookups become slow disk reads instead of fast memory reads.

Summary

  • Indexes use B-tree structures to turn O(n) collection scans into O(log n) lookups.
  • The trade-off: faster reads but slower writes and more RAM usage. Index deliberately, not liberally.
  • Compound indexes support queries on any left prefix of their fields. Field order matters — follow the ESR rule (Equality, Sort, Range).
  • Use explain("executionStats") to verify that queries use indexes. Watch the nReturned / totalDocsExamined ratio.
  • Specialized indexes: unique (enforce constraints), TTL (auto-expire), text (full-text search), partial (conditional indexing), multikey (arrays).
  • Monitor index usage with $indexStats and drop unused indexes to reduce write overhead.
  • Ensure your index working set fits in RAM — disk-paged indexes negate most performance benefits.

Interview Deep-Dive

Strong Answer:
  • The first thing I would check is index utilization. Every index on a collection adds overhead to every write operation — the document must be written, and then each of the 12 index B-trees must be updated. More indexes means more random I/O on writes. Run db.collection.aggregate([{ $indexStats: {} }]) on each index to see how often each index is actually used for reads.
  • In my experience, collections that accumulate 12 indexes have “zombie indexes” — indexes created for features that were later removed, for one-off migration queries, or by developers who added indexes without checking if an existing index already covered the query. I have consistently found that 30-50% of indexes on mature collections are unused or redundant.
  • Check for redundant indexes specifically. If you have both { userId: 1 } and { userId: 1, createdAt: -1 }, the single-field index on userId is almost always redundant because the compound index can serve any query that the single-field index serves (the compound index is a superset). Drop the single-field index.
  • Next, check the working set size. Run db.serverStatus().wiredTiger.cache and look at bytes currently in the cache versus maximum bytes configured. If total index size exceeds the WiredTiger cache, index operations spill to disk, which destroys write performance. With 12 indexes on 200M documents, index data could easily be 50+ GB. If your WiredTiger cache is 10 GB, you are in trouble.
  • The remediation plan, in order: (1) Drop unused indexes identified by $indexStats. (2) Merge redundant indexes — replace two overlapping indexes with one compound index that serves both. (3) Convert full indexes to partial indexes where possible. If an index exists for querying active orders and 70% of orders are completed, a partial index with { partialFilterExpression: { status: "active" } } is 70% smaller. (4) If the indexes are all legitimately needed, the remaining option is to increase the WiredTiger cache (more RAM) or move to a sharded cluster to distribute the write load.
  • The 40% degradation over a quarter suggests either the index count grew (new features added indexes) or the data volume grew (more documents means larger indexes). Track index count and total index size as operational metrics with alerts.
Follow-up: You identified 4 unused indexes to drop. What is the safest way to drop indexes in production without causing an outage?
  • Dropping an index in MongoDB is fast (nearly instant) and does not block other operations — it is a metadata change, not a data operation. However, the risk is not the drop itself but the consequences: if a query was depending on that index and you missed it, that query suddenly falls back to a COLLSCAN, potentially causing timeouts and cascading failures.
  • Safe procedure: first, confirm the index is truly unused by checking $indexStats over a full business cycle. An index with zero usage in the last 24 hours might be used by a weekly report that runs on Sundays. Check over at least 2 weeks, covering all scheduled jobs and reporting cycles.
  • Second, use the “hide index” feature (available in MongoDB 4.4+). db.collection.hideIndex("index_name") makes the index invisible to the query planner without dropping it. Queries that depended on it will now use a different plan (or COLLSCAN). Monitor your slow query log and application metrics for 24-48 hours. If nothing degrades, the index was truly unused, and you can safely drop it with dropIndex. If something breaks, unhideIndex instantly restores it without rebuilding.
  • Third, have a rollback plan. Before dropping, record the full index definition (fields, options, partial filter expression). If you need to recreate it, you have the exact specification. On a 200M-document collection, rebuilding an index takes minutes to hours and consumes I/O, so avoid needing to do this.
Strong Answer:
  • ESR stands for Equality, Sort, Range — and it is the most important rule for designing compound indexes that serve real-world queries efficiently. The order of fields in a compound index determines how MongoDB traverses the B-tree, and getting it wrong can make the difference between scanning 100 documents and scanning 100,000.
  • Equality fields first. Fields queried with exact match conditions (status: "active", country: "US") go at the beginning of the index. Equality conditions provide the tightest B-tree traversal — MongoDB descends directly to the matching node. Multiple equality fields can be in any relative order since they each narrow to an exact point.
  • Sort fields second. After the equality fields, add the fields used in sort(). If the index order matches the sort direction, MongoDB reads index entries in order and avoids an in-memory sort. Without this, MongoDB must collect all matching documents and sort them in memory (limited to 100 MB).
  • Range fields last. Fields queried with range conditions ($gt, $lt, $gte, $lte, $in) go at the end. Range conditions scan a contiguous section of the B-tree. If you put a range field before the sort field, the sort within each range value is not maintained, and MongoDB falls back to an in-memory sort.
  • Concrete example: for the query db.orders.find({ status: "shipped", amount: { $gt: 100 } }).sort({ createdAt: -1 }), the optimal index is { status: 1, createdAt: -1, amount: 1 }. The equality match on status narrows the index to just “shipped” entries. The sort on createdAt is provided by the index order. The range on amount is evaluated last.
  • If you instead created { status: 1, amount: 1, createdAt: -1 } (range before sort), MongoDB can use the equality on status and the range on amount efficiently, but the results within that range are not sorted by createdAt. It must collect all matching entries and sort them in memory. For large result sets, this hits the 100 MB sort limit and fails.
  • The way I remember it: “Equality drills to a point, Sort provides order, Range scans a segment. Point-Order-Segment needs to be in that sequence for maximum efficiency.”
Follow-up: A query has two range conditions: price between 10 and 100 AND date in the last 30 days. How do you design the index when ESR says range should come last, but you have two range fields?
  • This is where ESR requires judgment, not blind application. When you have multiple range conditions, only the first range field in the index can use the B-tree efficiently. The second range field becomes a residual filter applied to each index entry, not an index-guided scan.
  • The decision comes down to selectivity: which range condition eliminates more documents? If the date filter reduces the dataset from 10 million to 500,000 and the price filter reduces it from 10 million to 2 million, put date first in the index because it is more selective. The price condition is evaluated as a residual filter on the 500,000 date-matching entries.
  • Use explain() with both orderings and compare totalKeysExamined versus nReturned. The index that examines fewer keys relative to returned documents is the better choice.
  • Another option: if this is a high-frequency query, consider two separate indexes and let MongoDB’s query planner choose the better one, or use index intersection (though MongoDB’s index intersection is less reliable than you might expect — it often falls back to using a single index).
  • In practice, this is one of those cases where benchmarking beats theory. Create both index orderings, run explain with your actual data distribution, and pick the winner. Data skew (most orders are recent, most prices cluster in a narrow range) changes the optimal choice in ways that cannot be predicted from theory alone.
Strong Answer:
  • A multikey index is an index on a field that contains an array. When MongoDB encounters an array value during indexing, it creates a separate index entry for each element in the array. So if a document has tags: ["mongodb", "database", "nosql"], the multikey index on tags creates three index entries, each pointing to the same document.
  • Internally, this is transparent — you create a multikey index with the same createIndex syntax as a regular index. MongoDB detects that the field contains an array and automatically uses multikey indexing. The explain() output shows isMultiKey: true for these indexes.
  • The power of multikey indexes is that they enable efficient queries on array contents. db.articles.find({ tags: "mongodb" }) uses the multikey index to find all articles with “mongodb” in their tags array in O(log n) time instead of scanning every document.
  • Restriction one: a compound index can have at most one multikey field. You cannot create an index on { tags: 1, categories: 1 } if both tags and categories are arrays. This is because the number of index entries would be the Cartesian product — a document with 10 tags and 10 categories would generate 100 index entries, which explodes exponentially. MongoDB prevents this.
  • Restriction two: multikey indexes cannot cover queries. Even if your projection only includes fields in the index, MongoDB must fetch the full document to correctly reconstruct the array. The explain output will show totalDocsExamined > 0 even for queries that would be covered on a non-multikey index.
  • Restriction three: array size impacts write performance. If a document has an array with 1,000 elements and you insert or update it, the multikey index creates or updates 1,000 index entries. For documents with large, frequently updated arrays, this write amplification can become a significant bottleneck.
  • Restriction four: shard keys cannot be multikey. If you try to shard on a field that contains arrays, MongoDB rejects it because the shard key for a document must be a single, deterministic value.
Follow-up: You have a products collection where each product has a ‘tags’ array. Users search by multiple tags simultaneously. How does MongoDB evaluate this query against a multikey index, and is it efficient?
  • The query db.products.find({ tags: { $all: ["wireless", "bluetooth"] } }) uses the multikey index, but its execution is nuanced. MongoDB uses the index to find all documents that contain the first tag (“wireless”), then applies the second tag (“bluetooth”) as a filter on those results. It is not a single index seek for the intersection.
  • The efficiency depends on the selectivity of the first tag. If “wireless” appears in 500,000 out of 1 million products, MongoDB fetches 500,000 index entries and then filters for “bluetooth.” If “bluetooth” appears in only 1,000 products, the final result is small, but the intermediate work was large. MongoDB does not necessarily pick the most selective tag first — it depends on the query planner’s estimate.
  • For frequently used multi-tag searches, consider a different data model. The “Attribute Pattern” uses an array of key-value pairs: attributes: [{ key: "tag", value: "wireless" }, { key: "tag", value: "bluetooth" }] with a compound index on { "attributes.key": 1, "attributes.value": 1 }. This enables more efficient compound queries but at the cost of a more complex schema.
  • Another approach: if the tag combinations are predictable and limited, create a compound index on the most commonly co-queried tags as separate boolean fields: { isWireless: 1, isBluetooth: 1 }. This trades schema flexibility for query efficiency.
Strong Answer:
  • explain() is the most important diagnostic tool in MongoDB query optimization. I use the "executionStats" verbosity level because it provides actual execution metrics, not just the query plan.
  • The fields I check, in order of importance:
  • winningPlan.stage (or winningPlan.inputStage.stage for nested plans): This tells you the top-level execution strategy. COLLSCAN means no index was used — bad for any collection over a few thousand documents. IXSCAN means an index was used — good. FETCH after IXSCAN means the query used an index to find matching documents, then fetched the full documents for additional filtering or projection. SORT as a separate stage means an in-memory sort was required — the sort was not covered by the index.
  • totalDocsExamined vs nReturned: This is the efficiency ratio. If the query examined 1 million documents to return 50, you have a selectivity problem. Ideal ratio is close to 1:1. A ratio above 10:1 usually means the index is not selective enough or the wrong index is being used.
  • totalKeysExamined vs nReturned: Similar to above but for index entries. If you examined 500,000 keys to return 50 documents, the index is being scanned broadly (range query on a low-selectivity field). Close to 1:1 is ideal.
  • executionTimeMillis: The total wall-clock time. Compare this across different query plans (using .hint() to force different indexes) to find the fastest approach.
  • rejectedPlans: MongoDB’s query planner evaluates multiple candidate plans and picks the winner based on a trial period. The rejected plans show you what alternatives were considered and discarded. If a seemingly better plan was rejected, it might be because the planner’s statistics are outdated — run planCacheClear() to reset plan selection.
  • Red flags in explain output: COLLSCAN on a large collection, a SORT stage (indicates in-memory sort), totalDocsExamined orders of magnitude larger than nReturned, or executionTimeMillis in the thousands for what should be a simple lookup.
Follow-up: MongoDB’s query planner cached a suboptimal plan for a query. How does plan caching work, and how do you force it to reevaluate?
  • MongoDB’s query planner caches the winning plan for a given “query shape” (the structure of the query, ignoring specific values). The first time a query shape is seen, the planner runs a trial execution of multiple candidate plans and caches the winner. Subsequent queries with the same shape reuse the cached plan without re-evaluation.
  • The cache is invalidated automatically when: the collection receives 1,000+ writes (threshold varies by version), an index is created or dropped, or mongod restarts. But between invalidations, a plan that was optimal for one data distribution might become suboptimal as data changes. For example, a plan that used index A was optimal when the collection had 1 million documents, but after growing to 10 million documents with different data distribution, index B would be better — but the cached plan still uses A.
  • To force reevaluation: db.collection.getPlanCache().clear() clears all cached plans for the collection. For a specific query shape, db.collection.getPlanCache().clearPlansByQuery(queryShape) targets just that shape. After clearing, the next execution of that query will trigger a fresh plan evaluation.
  • In production, I have seen cached plan issues manifest as sudden performance degradation after data grows or distribution shifts. The fix is simple (clear the plan cache), but the diagnosis requires recognizing that the explain output shows a cached plan that does not match what the planner would choose fresh.
Strong Answer:
  • Single-field indexes are the simplest: one field, one direction. Use them when queries filter or sort on a single field. Example: { email: 1 } for find({ email: "alice@example.com" }). Limitation: useless for queries that filter on multiple fields — they can only help with one filter condition, and MongoDB generally uses only one index per query (index intersection exists but is unreliable).
  • Compound indexes include multiple fields in a defined order. They are the workhorse of MongoDB indexing. A compound index on { status: 1, createdAt: -1, amount: 1 } can serve queries that filter on status alone, status + createdAt, or status + createdAt + amount (the “leftmost prefix” rule). It cannot serve a query on just createdAt or just amount without status. Design them using the ESR rule: Equality, Sort, Range. Limitation: more fields means a larger index and more write overhead. A 5-field compound index on a high-write collection can noticeably degrade insert performance.
  • Text indexes enable full-text search on string fields. db.articles.createIndex({ title: "text", body: "text" }) allows queries like db.articles.find({ $text: { $search: "mongodb performance" } }). They support stemming, stop words, and text scoring. Limitation: only one text index per collection, no support for partial matching or fuzzy search (no typo tolerance), and performance degrades significantly on large collections compared to dedicated search engines. For anything beyond basic text search, use Elasticsearch or Atlas Search.
  • Wildcard indexes (db.collection.createIndex({ "$**": 1 })) index all fields in a document, or all fields matching a pattern. They are designed for collections with unpredictable or highly variable schemas — like event logs where each event type has different fields. Limitation: wildcard indexes cannot support compound queries (they are effectively many single-field indexes), cannot support sort operations, and are not as efficient as purpose-built compound indexes. Use them as a safety net for ad-hoc queries on schema-variable collections, not as a replacement for thoughtful compound indexes on well-understood query patterns.
  • My rule of thumb: start with compound indexes designed for your known query patterns (the 80% case). Add text indexes only if you need basic search and cannot afford a dedicated search service. Use wildcard indexes only for collections with truly unpredictable schemas where you need to support exploratory queries. Single-field indexes are mostly useful as building blocks during development — in production, they are usually either upgraded to compound indexes or found to be redundant.
Follow-up: You have a collection with a wildcard index and specific compound indexes on the same collection. How does MongoDB choose which index to use for a given query?
  • MongoDB’s query planner evaluates all candidate indexes for a query and selects the most efficient one through a trial execution. Having both wildcard and compound indexes does not cause a conflict — the planner simply considers both as candidates.
  • In virtually all cases, the compound index wins over the wildcard index for queries it was designed for. A compound index on { status: 1, createdAt: -1 } will be chosen over a wildcard index for find({ status: "active" }).sort({ createdAt: -1 }) because the compound index provides both the filter and the sort in a single scan. The wildcard index can only help with the filter and would require a separate in-memory sort.
  • The wildcard index earns its value for the “other” queries — the ones you did not anticipate with compound indexes. An ad-hoc query on a rarely-filtered field benefits from the wildcard index rather than falling back to a COLLSCAN.
  • The overhead: the wildcard index is maintained on every write, indexing every field in every document. This can significantly increase write latency and storage. Monitor the index size and write performance after adding a wildcard index. If the collection has documents with 100+ fields, the wildcard index could be enormous.