Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Introduction to NoSQL & MongoDB

What is NoSQL?

NoSQL (Not Only SQL) databases are non-tabular databases and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. Think of it this way: a relational (SQL) database is like a spreadsheet — every row must have the same columns, and the structure is decided up front. A NoSQL database is more like a filing cabinet full of folders, where each folder (document) can contain different kinds of paperwork. Some folders might have a photo attached, others might have three pages of notes. The cabinet does not care — it stores whatever you put in the folder.

Types of NoSQL Databases

TypeHow It Stores DataReal-World AnalogyExample DBs
DocumentJSON/BSON documentsA filing cabinet where each folder has its own structureMongoDB, CouchDB
Key-ValueSimple key-value pairsA dictionary or hash map — look up a value by its keyRedis, DynamoDB
Wide-ColumnRows with dynamic columnsA spreadsheet where each row can have different columnsCassandra, HBase
GraphNodes and edges (relationships)A social network map — people connected by friendshipsNeo4j, Amazon Neptune
MongoDB is a document database. When people say “NoSQL” in a web development context, they usually mean document databases — but the term covers all four types above.

SQL vs NoSQL

FeatureSQL (Relational)NoSQL (Non-Relational)
StructureTables with fixed rows and columnsDocument: JSON documents, Key-value: key-value pairs, etc.
SchemaRigidFlexible
ScalabilityVertical (scale up with a larger server)Horizontal (scale out across commodity servers)
JoinsComplex joins supportedJoins typically handled in code or not supported
TransactionsACID transactions are native and matureACID support varies; MongoDB added multi-document transactions in v4.0
Best ForComplex relationships, strict consistencyRapid iteration, flexible data, high write throughput
“SQL vs NoSQL” is not a war with a winner. They solve different problems. Many production systems use both — for example, a SQL database for financial transactions and MongoDB for product catalogs or user activity logs. Choose based on your data access patterns, not hype.

What is MongoDB?

MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. It was created at the company 10gen (now MongoDB, Inc.) around 2009, originally as part of a planned platform-as-a-service product. When they realized the database component was the most valuable piece, they open-sourced it and the rest is history. MongoDB is the “M” in the popular MERN (MongoDB, Express, React, Node.js) and MEAN (MongoDB, Express, Angular, Node.js) stacks. Its document model maps naturally to how JavaScript objects work, which is why it became the go-to database for Node.js applications.

Key Features

  1. Document Model: Data is stored in documents (BSON format) grouped into collections. This maps naturally to objects in application code.
  2. Flexible Schema: Documents in the same collection do not need to have the same set of fields or structure.
  3. High Availability: Replica sets provide automatic failover and data redundancy.
  4. Horizontal Scalability: Sharding distributes data across a cluster of machines.
  5. Rich Queries: Supports ad-hoc queries, indexing, and real-time aggregation.

JSON vs BSON

MongoDB stores data in BSON (Binary JSON). BSON extends the JSON model to provide additional data types, ordered fields, and to be efficient for encoding and decoding within different languages.
AspectJSONBSON
FormatText-based, human-readableBinary-encoded, machine-optimized
Data TypesString, Number, Boolean, Array, Object, nullAll JSON types plus Date, ObjectId, Binary, Decimal128, and more
SpeedSlower to parse (text parsing)Faster to traverse (length-prefixed, so the engine can skip fields)
SizeUsually smaller on the wireSlightly larger due to metadata, but faster to decode
You write JSON when interacting with MongoDB, but MongoDB converts it to BSON under the hood for storage. When you read data back, MongoDB converts BSON back to JSON for you. This is transparent — you rarely need to think about BSON directly, but understanding it explains why MongoDB supports data types (like Date and ObjectId) that plain JSON does not.

When to Choose MongoDB

MongoDB excels in scenarios where:
  • Your schema evolves rapidly — startups iterating on product features, prototyping
  • You have hierarchical or nested data — product catalogs, content management, user profiles
  • You need horizontal scalability — high write throughput across distributed servers
  • Your data access pattern is document-centric — you usually fetch or update an entire “thing” (a user, an order) at once
MongoDB is a weaker fit when:
  • You need complex joins across many entities — a relational database handles this more naturally
  • Your data is highly relational — many-to-many relationships with strict integrity constraints
  • You need mature multi-statement transactions — MongoDB supports them, but they are more constrained than in PostgreSQL or MySQL

Summary

  • NoSQL databases offer flexibility and scalability compared to traditional SQL databases.
  • MongoDB is a document database that stores data in BSON format.
  • It is designed for modern application development with a flexible schema and powerful query capabilities.
  • Choose MongoDB when your data is document-shaped and your schema needs room to evolve; choose SQL when relationships and strict consistency are paramount.

Interview Deep-Dive

Strong Answer:
  • The first thing I would not do is pick a database based on ideology. I would start by mapping out the core data entities and their access patterns. E-commerce has two very different data profiles living side by side.
  • Product catalogs are a strong fit for MongoDB. Products have wildly different attributes — a laptop has RAM, screen size, and GPU specs, while a t-shirt has size, color, and fabric. In a relational database, you end up with either a sparse table with 200 nullable columns or an EAV (Entity-Attribute-Value) anti-pattern that destroys query performance. In MongoDB, each product document simply contains the fields it needs.
  • Orders and payments, on the other hand, are deeply relational and demand ACID transactions with strict consistency. An order references a customer, multiple products, a shipping address, and a payment record. If the payment succeeds but the order record fails to write, you have a financial discrepancy. PostgreSQL handles this natively with mature multi-statement transactions.
  • My recommendation would be a polyglot persistence approach: MongoDB for the product catalog, user sessions, and activity logs; PostgreSQL for orders, payments, and inventory management. This is not theoretical — companies like eBay and Walmart run hybrid stacks for exactly this reason.
  • The trade-off is operational complexity. Two databases means two backup strategies, two monitoring dashboards, and two sets of connection pool configurations. For a small team, I might start with PostgreSQL and JSONB columns for the product catalog, then migrate to MongoDB when the catalog complexity justifies it.
Follow-up: MongoDB added multi-document transactions in v4.0. Why not just use MongoDB for everything, including orders and payments?
  • MongoDB transactions work, but they come with meaningful constraints that matter at scale. Transactions in MongoDB have a default 60-second lifetime limit. They hold WiredTiger locks for the duration, which can create contention on high-throughput write paths. If you have a checkout flow that involves validating inventory, charging a payment gateway, creating an order, and updating stock — and the payment gateway takes 3 seconds to respond — you are holding locks the entire time.
  • In PostgreSQL, transaction isolation levels like SERIALIZABLE have decades of battle-tested optimization. The query planner understands transactions deeply. MongoDB’s transaction implementation is newer and lacks some of the fine-grained isolation controls (like advisory locks or SELECT FOR UPDATE with SKIP LOCKED for queue patterns).
  • The real-world signal: MongoDB’s own documentation recommends that if your application requires transactions for the majority of its operations, you should reconsider whether a relational database is a better fit. That is the vendor themselves telling you when not to use their product, which I think is worth listening to.
Strong Answer:
  • BSON is Binary JSON — a binary-encoded serialization of JSON-like documents. The reason MongoDB chose BSON over raw JSON comes down to three engineering trade-offs: traversal speed, type richness, and storage efficiency for queries.
  • Traversal speed is the big one. JSON is a text format, so to find the value of the 50th field in a document, you have to parse every character from the beginning — counting braces, handling escape sequences, the whole thing. BSON is length-prefixed: every element stores its byte length upfront, so the engine can skip directly to the field it needs without parsing everything before it. For a document with 100 fields where you only need one, this is a massive performance difference.
  • Type richness matters for a database. JSON has exactly six types: string, number, boolean, array, object, null. That is not enough. You cannot natively represent a date, a 128-bit decimal for financial calculations, binary data, or a regular expression. BSON adds Date, ObjectId, Decimal128, BinData, and about a dozen other types. Without BSON, every date would be a string, and every comparison would require parsing that string back into a date — killing index performance.
  • There is a size trade-off: BSON documents are often slightly larger than their JSON equivalents because of the type metadata and length prefixes. A simple document like {"a": 1} is 7 bytes in JSON but around 16 bytes in BSON. But MongoDB is optimized for read-heavy workloads where traversal speed matters more than storage density.
Follow-up: How does the 16 MB BSON document size limit affect schema design decisions in practice?
  • The 16 MB limit is a deliberate design constraint, not a technical limitation. MongoDB could increase it, but they chose not to because it forces better schema design. In practice, 16 MB is enormous — it can hold roughly 250,000 short log entries or a 10,000-word article with all its metadata.
  • Where it actually bites you is unbounded arrays. The classic mistake is embedding comments directly in a blog post document. A viral post gets 500,000 comments, and suddenly you hit the 16 MB wall. The fix is the “bucket pattern” — store comments in separate documents, maybe 100 per bucket, and reference them from the post. Alternatively, put comments in their own collection entirely.
  • I have seen teams hit this limit with audit logs embedded in user documents. Every action the user takes gets pushed into an array. After six months of heavy usage, the document is enormous and approaching the limit. The solution is to move the audit trail to a separate collection with a TTL index for automatic cleanup.
Strong Answer:
  • This is one of the most dangerous misconceptions in the NoSQL world, and I would correct it firmly but constructively. “Schema-flexible” does not mean “schema-free.” Every application has an implicit schema — the shape your code expects the data to be in. The question is whether that schema is enforced by the database or by your application code.
  • When you skip schema design in MongoDB, you end up with what I call “schema drift.” Six months in, the same collection has documents where the createdAt field is a Date in some records, a string in others, and missing entirely in a third group. Your application code becomes riddled with defensive checks: if (user.createdAt && typeof user.createdAt === 'string') { ... }. That is a schema — it is just a terrible, scattered, undocumented one.
  • The right approach is to design your schema intentionally, then use MongoDB’s built-in JSON Schema validation or Mongoose schemas to enforce it. You get flexibility where you want it (different product types can have different attribute sets) and consistency where you need it (every product must have a name, price, and category).
  • I would point the developer to MongoDB’s own documentation on data modeling patterns: the Extended Reference Pattern, the Bucket Pattern, the Outlier Pattern. These exist precisely because schema design in MongoDB is a first-class engineering concern, not something you skip.
  • The real-world cost of ignoring schema design: I have seen a startup spend three weeks migrating 40 million documents because early “just throw it in there” decisions meant their aggregation pipelines could not work reliably. The migration required downtime and a custom script to normalize every document. That is the tax you pay for treating “flexible” as “unplanned.”
Follow-up: When would you deliberately choose NOT to enforce a strict schema in MongoDB?
  • There are legitimate cases. Event sourcing is one — when you are capturing raw events from multiple sources with genuinely different shapes (clickstream events, purchase events, page view events), forcing them into a single rigid schema creates more problems than it solves. You validate at the application layer and use the $type operator or $jsonSchema with partial validation to enforce only the common fields (timestamp, event_type, user_id).
  • Another case is during early prototyping when you are genuinely unsure what the final data model looks like. But even then, I would set a hard deadline — “we run schema-less for two sprints to explore, then we lock down the schema based on what we learned.” Open-ended schema flexibility is a liability, not an asset.
  • Third-party integrations are another good example. If you are ingesting webhook payloads from Stripe, Twilio, and Sendgrid, their payload shapes differ and evolve independently. Enforcing a schema on raw webhook storage is a maintenance burden. Store the raw payload schema-less in one collection, then transform and validate before writing to your canonical collections.
Strong Answer:
  • Sharding is the nuclear option for scaling MongoDB, and it is irreversible in practice — once you shard a collection, you cannot unshard it without dumping and reimporting the data. Before agreeing, I would run through a diagnostic checklist.
  • First: are your queries actually using indexes? Run explain("executionStats") on the slow queries. If I see COLLSCAN (collection scan) instead of IXSCAN (index scan), the problem is not data volume — it is missing indexes. I have seen teams propose sharding when a single compound index would have dropped query time from 8 seconds to 2 milliseconds.
  • Second: what is the working set size versus available RAM? MongoDB’s WiredTiger storage engine keeps frequently accessed data and indexes in a cache (default is 50% of RAM minus 1 GB). If your indexes alone exceed available RAM, queries degrade sharply because every lookup hits disk. The fix might be more RAM or better-targeted indexes, not sharding.
  • Third: what is the actual bottleneck — reads or writes? If it is reads, read replicas (secondary reads) might be sufficient. If it is writes, sharding is more justified because MongoDB’s write path goes to a single primary node in a replica set.
  • Fourth: what would the shard key be? A bad shard key is worse than no sharding at all. If you pick a monotonically increasing field like _id (ObjectId), all new writes go to the same shard (the “hot shard” problem). You want a shard key with high cardinality, even distribution, and alignment with your query patterns.
  • Fifth: is the operational overhead justified? Sharding requires config servers, mongos routers, and at minimum two shard replica sets. That is a significant jump in infrastructure complexity, monitoring burden, and failure modes.
Follow-up: What makes a good shard key versus a bad one? Give a concrete example.
  • A good shard key has three properties: high cardinality (many distinct values), even write distribution (no hot spots), and query isolation (most queries can target a single shard rather than scatter-gathering across all shards).
  • Bad shard key example: { country: 1 } for a global e-commerce platform. If 60% of your users are in the United States, 60% of your data lands on one shard. That is the hot shard problem — one server is overwhelmed while others sit idle.
  • Good shard key example: { userId: "hashed" } for a user activity log. Hashed sharding distributes writes evenly because the hash function randomizes the distribution. The downside is that range queries on userId become scatter-gather operations, but if your primary access pattern is “get all activity for user X” (equality match), hashed sharding works well.
  • Even better: a compound shard key like { tenantId: 1, createdAt: 1 } for a multi-tenant SaaS application. All data for one tenant lives on the same shard (query isolation), and the createdAt component prevents any single tenant from creating a hot chunk. This is the “zone sharding” sweet spot.
Strong Answer:
  • The SQL-scales-vertically versus NoSQL-scales-horizontally framing is an oversimplification that was more true in 2010 than it is today, but it captures a real architectural difference in how these systems were designed.
  • MongoDB was built from the ground up for sharding. The mongos router, config servers, and chunk migration are first-class components of the architecture. Adding a new shard to a MongoDB cluster is an operational procedure, not a research project. Data rebalancing happens automatically (though it consumes I/O, so you schedule it during low-traffic windows).
  • PostgreSQL was designed as a single-node system with stellar vertical scaling. On a modern machine with 128 cores and 1 TB of RAM, PostgreSQL can handle extraordinary workloads. Vertical scaling hits its ceiling at the largest available hardware — which today means roughly 24 TB of RAM and 448 cores on AWS (u-24tb1.metal). Beyond that, you are stuck.
  • PostgreSQL does have horizontal scaling options now: Citus (distributed PostgreSQL), PgBouncer for connection pooling, read replicas, and logical replication for multi-region reads. But these are bolt-on solutions with their own limitations. Citus requires you to pick a distribution column upfront and has restrictions on cross-shard joins. It works, but it is not as seamless as MongoDB’s native sharding.
  • The practical ceiling for MongoDB sharding is operational complexity. At 50+ shards, chunk migration becomes a significant background load, balancer rounds take longer, and debugging query performance requires understanding which shards are involved. I have seen MongoDB clusters at 100+ shards at companies like Expedia, but they have dedicated database teams managing them.
  • My rule of thumb: if your dataset fits on a single powerful server (under 2-4 TB) and your workload is read-heavy with complex queries, PostgreSQL will outperform. If you are dealing with 10+ TB of data, high write throughput, and your queries are mostly document-scoped, MongoDB sharding is the better path.
Follow-up: What about NewSQL databases like CockroachDB or TiDB? Do they make this whole debate obsolete?
  • NewSQL is a real contender that blurs the line. CockroachDB gives you SQL semantics with automatic horizontal scaling and strong consistency via Raft consensus. TiDB is MySQL-compatible with horizontal scaling. They are genuinely impressive engineering.
  • The trade-off is latency. Distributed consensus means every write requires coordination across nodes. In CockroachDB, a single-row write in a 3-node cluster has a floor latency of about 2x the network round-trip time between nodes. For a single-region deployment, this is often sub-millisecond and barely noticeable. For multi-region, you are looking at 50-200ms per write unless you use stale reads or locality-aware placement.
  • NewSQL also has a maturity gap. PostgreSQL has 30+ years of battle-testing, edge-case fixes, and ecosystem tools. MongoDB has 15+ years. CockroachDB has about 8. For a brand-new startup, I might choose CockroachDB. For a financial institution migrating off Oracle, I would still lean PostgreSQL. The risk calculus is different.