Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
CRUD: Read
Reading data is the most common database operation—and the one that can make or break your application’s performance. MongoDB provides a powerful query language that lets you find exactly the data you need, efficiently.Understanding MongoDB Queries
Unlike SQL’s string-based queries, MongoDB uses JSON-like query documents. This approach is:- Type-safe: No SQL injection vulnerabilities from string concatenation
- Composable: Build complex queries programmatically
- Intuitive: If you know JSON, you can write queries
Query Performance Basics
Every query should be intentional. Poorly written queries can:- Scan millions of documents instead of using indexes
- Consume excessive memory on large result sets
- Block other operations due to collection-level locks (in older versions)
Basic Query Methods
Read operations retrieve documents from a collection; i.e., query a collection for documents.find() and findOne()
find(filter): Returns a cursor to the matching documents.findOne(filter): Returns the first matching document.
Comparison Operators
$eq: Equal to$ne: Not equal to$gt: Greater than$gte: Greater than or equal to$lt: Less than$lte: Less than or equal to$in: In an array$nin: Not in an array
Logical Operators
$or: Joins query clauses with a logical OR$and: Joins query clauses with a logical AND$not: Inverts the effect of a query expression$nor: Joins query clauses with a logical NOR
Element Operators
$exists: Matches documents that have the specified field.$type: Selects documents if a field is of the specified type.
Projection
Projection determines which fields are returned in the matching documents.Summary
- Use
find()to query data. - Use comparison operators like
$gt,$lt,$infor filtering. - Use logical operators like
$or,$andfor complex logic. - Use Projection to limit the fields returned.
Interview Deep-Dive
Your MongoDB query on a 100-million-document collection takes 12 seconds. Walk me through how you would diagnose and fix the performance problem.
Your MongoDB query on a 100-million-document collection takes 12 seconds. Walk me through how you would diagnose and fix the performance problem.
Strong Answer:
- Step one: run the query with
.explain("executionStats")to get the execution plan. This tells you everything: which index was used (or if none was), how many documents were scanned, how many were returned, and the execution time breakdown. - The key fields I look at in the explain output are
winningPlan.stage(COLLSCAN is bad, IXSCAN is good),totalDocsExaminedversusnReturned(if you examined 10 million documents to return 50, your query is inefficient), andexecutionTimeMillisbroken down by stage. - If I see a COLLSCAN, the fix is straightforward: create an index that covers the query’s filter fields. For a query like
db.orders.find({ customerId: "abc", status: "shipped" }).sort({ createdAt: -1 }), the ideal compound index is{ customerId: 1, status: 1, createdAt: -1 }. The order matters — filter fields first (equality matches), then sort fields. - If an index exists but MongoDB is not using it, check for a few common culprits. First, the query planner might have chosen a different plan because it estimates a collection scan is cheaper (this happens with low-selectivity fields). Use
.hint()to force index usage and compare performance. Second, the query might use operators that cannot use indexes efficiently, like$ne,$nin, or$regexwithout an anchor (/^prefix/uses indexes,/substring/does not). - If the index is being used but the query is still slow, check whether the index fits in RAM. Run
db.orders.stats()and comparetotalIndexSizeto your available memory. If the index is 20 GB but your WiredTiger cache is 4 GB, every index lookup potentially hits disk. The fix is either more RAM, a more selective index (fewer fields, partial index), or rethinking the query. - Another common issue: returning too much data. If the query returns 100,000 documents with 50 fields each, the bottleneck might be network transfer and deserialization, not the database itself. Add a projection to return only the fields you need, and use
.limit()if you do not need all results.
- Every index adds overhead to write operations. When you insert or update a document, MongoDB must update every index that includes a modified field. A collection with 10 indexes means every insert triggers 10 index B-tree updates, each requiring disk I/O and potentially splitting B-tree pages.
- The 15% write throughput drop is the cost of your new index. The question is whether the read improvement justifies it. If this query runs 10,000 times per second and was previously causing cascading timeouts, the trade-off is clearly worth it. If it runs once a day in a reporting script, consider dropping the index and accepting the slow report.
- To mitigate write overhead from indexes: audit all indexes with
db.orders.aggregate([{ $indexStats: {} }]). This shows which indexes are actually being used and how often. I regularly find “zombie indexes” — indexes created months ago for a feature that was later removed. Every unused index is pure write overhead with zero benefit. Drop them. - Also consider partial indexes. If your query only needs shipped orders,
db.orders.createIndex({ customerId: 1 }, { partialFilterExpression: { status: "shipped" } })creates a smaller index that only includes documents matching the filter. Smaller index means less write overhead and better cache utilization.
Explain the difference between find() returning a cursor and findOne() returning a document. Why does this distinction matter for application performance?
Explain the difference between find() returning a cursor and findOne() returning a document. Why does this distinction matter for application performance?
Strong Answer:
find()returns a cursor object, not the actual documents. The cursor is a pointer to the result set that the server maintains. Documents are fetched in batches (default first batch is 101 documents or 16 MB, whichever is smaller; subsequent batches are 16 MB each) as you iterate over the cursor.findOne()is syntactic sugar forfind().limit(1)— it fetches a single document and closes the cursor immediately.- This distinction matters enormously for memory management. If your query matches 5 million documents and you call
find(), the server does not load all 5 million into memory at once. It streams them in batches via the cursor. Your application processes one batch, requests the next, processes it, and so on. Memory usage stays constant regardless of result set size. - If your application code accidentally forces all documents into memory (for example,
await collection.find({}).toArray()in Node.js), you lose the streaming benefit. For a 5-million-document result,.toArray()loads everything into a single JavaScript array, potentially consuming gigabytes of memory and causing your Node.js process to crash with an out-of-memory error. - Cursors also have a server-side timeout. By default, a cursor expires after 10 minutes of inactivity (the
noCursorTimeoutoption overrides this, but it is dangerous because orphaned cursors consume server resources). If your application takes too long between batch fetches — maybe it is doing heavy processing on each document — the cursor expires and the next fetch fails with a “cursor not found” error. - The practical advice: for small result sets (under a few thousand documents),
.toArray()is fine and more convenient. For large result sets, always iterate the cursor with afor await...ofloop or.forEach(), process documents in batches, and ensure your processing is fast enough to avoid cursor timeouts.
- This depends on the cursor’s snapshot behavior, and the answer is “it is complicated.” By default, MongoDB cursors do not guarantee a point-in-time snapshot. You might see the original version of some documents and the updated version of others, depending on whether the cursor has already fetched the batch containing those documents.
- It gets worse: if an update changes a document’s size and it moves to a different location on disk (this used to happen with MMAPv1 but is rare with WiredTiger), the cursor could potentially skip the document entirely or return it twice. WiredTiger largely mitigates this because documents have stable record IDs, but it is not guaranteed.
- If you need consistent point-in-time reads, use a snapshot read concern. With
readConcern: "snapshot"(available inside a transaction), the cursor returns data as of the transaction’s start time, regardless of concurrent modifications. Outside transactions, thereadConcern: "majority"provides a weaker but useful guarantee: you only see data that has been committed by a majority of replica set members, but it is not a frozen snapshot. - For the 10-million-document iteration use case, the pragmatic solution is often to not worry about perfect consistency. If you are generating a report or doing a data migration, small inconsistencies during the iteration are acceptable. If you need exact consistency, use a transaction with snapshot read concern — but be aware that long-running transactions (especially ones reading 10 million documents) hold resources and can time out.
Compare $in versus $or for querying multiple values on the same field. When would you use each, and how do they differ in execution?
Compare $in versus $or for querying multiple values on the same field. When would you use each, and how do they differ in execution?
Strong Answer:
- For querying multiple values on the same field,
$inand$orproduce the same logical result, but they differ in how MongoDB executes them.$inis always the better choice for matching multiple values on a single field. $inexecutes as a single index scan. When you query{ status: { $in: ["active", "pending", "review"] } }, MongoDB performs one IXSCAN operation that seeks to each value in the index B-tree. The query planner treats this as a single, efficient operation.$orwith the same logic —{ $or: [{ status: "active" }, { status: "pending" }, { status: "review" }] }— may execute as multiple index scans that are then merged. Each clause in the$oris evaluated independently, and the results are deduplicated. This has more overhead: multiple index lookups, a merge step, and deduplication.- Where
$oris necessary is when you are matching across different fields. For example, “find users where name is Alice OR age is 30” —{ $or: [{ name: "Alice" }, { age: 30 }] }. This cannot be expressed with$inbecause the conditions are on different fields. For$orto use indexes in this case, each clause must have its own index. If onlynameis indexed, theageclause triggers a COLLSCAN for that branch. - A subtle performance trap:
$orat the top level can use indexes, but$ornested inside other expressions might not. For example,{ active: true, $or: [{ name: "Alice" }, { age: 30 }] }can use an index onactiveplus indexes onnameandage. But complex nesting can cause the query planner to give up on index optimization and fall back to a COLLSCAN. - My rule of thumb: use
$inwhenever you are checking multiple values on the same field. Reserve$orfor conditions across different fields. And always verify withexplain()— assumptions about query planning are often wrong.
- MongoDB does not have a hard limit on the size of an
$inarray, but there are practical performance considerations. An$inwith 10,000 values is on the edge of reasonable. Each value requires a seek in the index B-tree, so 10,000 values means 10,000 seeks. If the index is fully in memory, each seek is sub-microsecond and the total is a few milliseconds. If the index is on disk, each seek could be a disk I/O, and 10,000 disk I/Os could take seconds. - The query document itself must fit in the 16 MB BSON limit. With 10,000 ObjectId values (12 bytes each plus BSON overhead), you are at roughly 200 KB — well within the limit. But with 10,000 long strings, you could approach the limit.
- At very high counts (100K+ values), consider restructuring the query. Instead of
{ userId: { $in: [huge list] } }, stage the list into a temporary collection and use$lookupin an aggregation pipeline to join against it. Or redesign the data model so the relationship is expressed differently — for example, tag users with a group ID and query{ groupId: "batch-42" }instead of listing every user ID. - I have seen production incidents where a buggy application dynamically built an
$inarray that grew unboundedly — eventually hitting millions of values and causing timeouts. Defensive coding: always validate and cap the size of arrays used in$inclauses.
What is a covered query in MongoDB, and why is it the holy grail of query optimization?
What is a covered query in MongoDB, and why is it the holy grail of query optimization?
Strong Answer:
- A covered query is one where all the fields in the query filter, projection, and sort are present in a single index. When a query is covered, MongoDB can satisfy it entirely from the index without ever loading the actual document from disk. The explain output shows
totalDocsExamined: 0— zero documents touched, only index entries. - Why this matters so much: indexes are typically much smaller than the documents they index. A compound index on
{ userId: 1, status: 1, createdAt: -1 }might be 2 GB for a collection where the full data is 100 GB. If that 2 GB index fits in the WiredTiger cache (RAM), every covered query is served entirely from memory with no disk I/O. This is the difference between a 1ms response and a 50ms response. - To create a covered query, your projection must exclude
_id(unless_idis part of the index) and include only fields that are in the index. For example, with the index{ userId: 1, email: 1 }, the querydb.users.find({ userId: 123 }, { email: 1, _id: 0 })is covered. Butdb.users.find({ userId: 123 }, { email: 1, name: 1, _id: 0 })is not covered becausenameis not in the index. - The
_idexclusion catches many developers. By default, MongoDB includes_idin query results. If_idis not part of your index and you do not explicitly exclude it with_id: 0, the query cannot be covered because MongoDB must fetch the document to get the_idvalue. - Covered queries are particularly powerful for count operations and existence checks.
db.orders.countDocuments({ status: "shipped" })with an index on{ status: 1 }is covered — it counts index entries without touching documents. - The limitation: multikey indexes (indexes on array fields) cannot cover queries. Even if all projected fields are in the index, if the index is multikey, MongoDB must fetch the document to correctly reconstruct the array. This is a documented limitation of the query engine.
- Yes, and it is one of the most impactful optimizations. If the sort field is part of the index and the index ordering matches the requested sort direction, MongoDB reads index entries in order without a separate sort step. This is called an “index-provided sort.” Combined with projection on only indexed fields, you get a covered query with sorted results and no in-memory sort.
- For example, with index
{ status: 1, createdAt: -1 }, the querydb.orders.find({ status: "shipped" }, { status: 1, createdAt: 1, _id: 0 }).sort({ createdAt: -1 })is fully covered with an index-provided sort. The explain output showsIXSCANwith noSORTstage. - The sort direction matters. If your index is
{ createdAt: 1 }(ascending) and your query sorts{ createdAt: -1 }(descending), MongoDB can still use the index by traversing it in reverse. But with compound indexes, the reversal must be consistent across all fields — you cannot reverse just one field in a compound sort. - The in-memory sort limit is another reason to aim for index-provided sorts. MongoDB refuses to sort more than 100 MB of data in memory (the
allowDiskUse: trueoption relaxes this for aggregation, but not forfind().sort()). If your query matches 500,000 documents and needs a sort that is not index-supported, it fails with a “sort exceeded memory limit” error.