Module 2: Database Design & Modeling
Great database design is the foundation of every successful application. A well-designed schema makes queries fast, code simple, and scaling possible. A poorly designed one creates endless headaches.Estimated Time: 10-12 hours
Hands-On: Schema design exercises + E-commerce project
Key Skill: Translating business requirements into efficient schemas
Hands-On: Schema design exercises + E-commerce project
Key Skill: Translating business requirements into efficient schemas
Learning Objectives
By the end of this module, you will be able to:- Design ER models systematically: Apply the 4-step ER modeling process to translate business requirements into entities, attributes, and relationships.
- Apply normalization principles: Transform schemas to 1NF, 2NF, and 3NF to eliminate redundancy and maintain data integrity.
- Choose appropriate relationships: Implement one-to-one, one-to-many, and many-to-many relationships with proper foreign key constraints.
- Define and implement constraints: Use primary keys, foreign keys, unique constraints, and check constraints to enforce data integrity.
- Apply advanced schema patterns: Implement soft deletes, audit trails, polymorphic associations, and hierarchical data structures.
- Design flexible schemas: Use JSONB effectively for semi-structured data while maintaining performance.
- Manage schema evolution: Execute safe migrations that preserve data and maintain backward compatibility.
Module-Specific Learning Outcomes
Section 2.1 - Entity-Relationship (ER) Modeling
- Understand the purpose of ER modeling as a blueprint for database design
- Identify entities, attributes, and relationships from business requirements
- Apply the 4-step ER modeling process: Identify Entities → Define Attributes → Establish Relationships → Determine Cardinality
- Distinguish between one-to-one, one-to-many, and many-to-many relationships
- Represent ER models visually and translate them to SQL table structures
- Apply cardinality rules to determine proper foreign key placement
Section 2.2 - Normalization Principles
- Explain the problems caused by data redundancy (update anomalies, storage waste)
- Transform unnormalized data to First Normal Form (1NF) by eliminating repeating groups
- Apply Second Normal Form (2NF) by removing partial dependencies on composite keys
- Achieve Third Normal Form (3NF) by eliminating transitive dependencies
- Recognize when denormalization is appropriate for performance optimization
- Balance normalization benefits against query complexity considerations
Section 2.3 - Keys and Constraints
- Implement primary keys using appropriate data types (SERIAL, UUID, composite)
- Design foreign key relationships with proper referential actions (CASCADE, RESTRICT, SET NULL)
- Apply unique constraints at column and table levels
- Use check constraints to enforce business rules at the database level
- Implement not-null constraints and default values strategically
- Design composite keys for junction tables and special use cases
Section 2.4 - Schema Design Patterns
- Implement soft deletes using deletion timestamps instead of permanent removal
- Design audit trails to track changes to critical data over time
- Apply polymorphic associations when multiple entity types relate to a common entity
- Create self-referential relationships for hierarchical data (categories, org charts)
- Design tagging systems with many-to-many relationships and junction tables
- Structure data for flexible querying while maintaining referential integrity
Section 2.5 - JSONB for Flexible Data
- Determine appropriate use cases for JSONB versus traditional normalized columns
- Store and query nested JSON data using PostgreSQL’s JSONB operators
- Create indexes on JSONB data for efficient querying
- Implement hybrid schemas combining normalized columns with JSONB attributes
- Balance schema flexibility against query performance and data validation
Section 2.6 - Schema Migrations
- Plan backward-compatible schema changes for production environments
- Execute safe migration patterns for adding, modifying, and removing columns
- Create indexes concurrently to avoid table locking
- Implement multi-step migrations for breaking changes
- Use version control for tracking and managing schema evolution
Section 2.7 - Physical Design Considerations
- Understand the differences between row-oriented and column-oriented storage
- Apply partitioning strategies (horizontal and vertical) for large datasets
- Consider the impact of data types and NULL values on storage efficiency
- Design schemas with scalability and performance in mind
Hands-On Lab: Blog Schema Implementation
Goal: Apply ER modeling and normalization principles to implement a complete blog schema with proper constraints.
When you can design and implement a normalized schema with proper constraints that meets all requirements, you’ve mastered database design fundamentals.
- Design an ER model for a blog system with the following requirements:
- Users can create posts
- Users can comment on posts
- Posts can have multiple tags
- Posts can be categorized hierarchically (Technology > Programming > SQL)
- Users can have profiles with optional information
- Posts can have publication dates and status (draft, published, archived)
- Normalize your schema to 3NF, explaining how each normal form is satisfied
- Implement the schema with appropriate constraints:
- Primary keys with suitable data types
- Foreign keys with proper referential actions
- Unique constraints where appropriate
- Check constraints for data validation
- Indexes for common query patterns
- Add soft delete capability to the comments table
- Implement a tagging system using a junction table
- Add JSONB columns for flexible post metadata and user preferences
- Write sample queries that demonstrate the relationships between entities
2.1 Entity-Relationship (ER) Modeling: Blueprinting Your Data
What is ER Modeling? (The Big Picture)
Before you write a single line of SQL code or create any tables, you need to understand and plan your data. This is where Entity-Relationship (ER) Modeling comes in. Real-World Analogy: Architectural Blueprint Think of ER modeling as creating a blueprint for your database, just like an architect creates blueprints for a building. You wouldn’t start building a house by just laying bricks without a plan, right? The same goes for databases!- Architectural Blueprint: Shows rooms, connections, sizes, purpose
- ER Diagram: Shows entities (tables), their attributes (columns), and how they relate
- Visualize your data: See how everything fits together.
- Communicate ideas: Discuss the database structure with teammates, stakeholders, or users.
- Identify relationships: Understand how different pieces of information are connected.
- Prevent issues: Catch design flaws early, before writing code.
The Core Components of an ER Model
An ER model consists of three main components:- Entities: The “things” or objects you want to store data about (e.g., a user, a product, an order).
- Attributes: The properties or characteristics of an entity (e.g., a user has a name, email, and age).
- Relationships: How entities are connected to each other (e.g., a user places an order).
The ER Modeling Process
The ER Modeling Process: A Step-by-Step Guide
Building an ER model is a systematic process. Let’s walk through the steps with an example. Scenario: Designing a simple Blog Database We need to store information about blog posts, their authors, and comments.Step 1: Identify Entities (The Nouns)
What are entities? These are the main “things” or objects in your system that you want to store data about. Think of them as the nouns in your business requirements.How to find them? Look for key objects, concepts, or roles in your application.For our Blog:
- A User (author of posts, commenter)
- A Post (the blog article itself)
- A Comment (feedback on a post)
Step 2: Define Attributes (The Properties)
What are attributes? These are the properties, characteristics, or pieces of information that describe each entity. Think of them as the adjectives or details of your nouns.How to find them? For each entity, ask: “What information do I need to store about this?”For our Blog:
- User:
id,name,email,password_hash,created_at - Post:
id,title,content,published_date,user_id(who wrote it) - Comment:
id,text,created_at,user_id(who commented),post_id(which post it’s on)
user_id and post_id might seem like relationships, but for now, list them as attributes, and we’ll formalize the relationships in the next step.Step 3: Establish Relationships (How Entities Connect)
What are relationships? These describe how entities interact or are associated with each other. Think of them as the verbs that connect your nouns.How to find them? For any two entities, ask: “How do they relate?”For our Blog:
- A
UserwritesPosts - A
UsermakesComments - A
PostreceivesComments
Step 4: Determine Cardinality (How Many?)
What is cardinality? This specifies the number of instances of one entity that can be associated with the number of instances of another entity.How to think about it? For each relationship, ask two questions:
- “How many X can relate to one Y?”
- “How many Y can relate to one X?”
- One-to-One (1:1): Each instance of Entity A is associated with exactly one instance of Entity B, and vice-versa.
- Example:
Userhas oneUserProfile(and aUserProfilebelongs to oneUser)
- Example:
- One-to-Many (1:N): Each instance of Entity A is associated with one or more instances of Entity B, but each instance of Entity B is associated with only one instance of Entity A.
- Example:
Userwrites manyPosts(but aPostis written by only oneUser)
- Example:
- Many-to-Many (M:N): Each instance of Entity A can be associated with many instances of Entity B, and vice-versa.
- Example:
Postscan have manyTags, andTagscan be on manyPosts - Note: Many-to-many relationships often require an extra “junction” or “bridge” table in the database.
- Example:
UserwritesPosts: One-to-Many (1:N). One user can write many posts, but each post has only one author.UsermakesComments: One-to-Many (1:N). One user can make many comments, but each comment has only one author.PostreceivesComments: One-to-Many (1:N). One post can receive many comments, but each comment belongs to only one post.
Relationship Types: Detailed Examples
Let’s dive deeper into each cardinality type with more detailed explanations and SQL examples.One-to-One (1:1) Relationships
Concept: Each instance of Entity A is associated with exactly one instance of Entity B, and each instance of Entity B is associated with exactly one instance of Entity A. When to use: Often used to separate large tables into smaller, more manageable ones, or to store sensitive/optional information in a separate table. Real-World Analogy: A person has exactly one passport (and a passport belongs to exactly one person). Example: Users and User Profiles- A
Userhas basic login info (email, password). - A
UserProfilehas extended details (bio, avatar, preferences).
users.idis the primary key for theuserstable.user_profiles.user_idis both a primary key and a foreign key, ensuring a 1:1 relationship.ON DELETE CASCADE: If a user is deleted, their profile is automatically deleted too.
One-to-Many (1:N) Relationships
Concept: Each instance of Entity A can be associated with one or more instances of Entity B, but each instance of Entity B is associated with only one instance of Entity A. When to use: This is the most common type of relationship in relational databases. Real-World Analogy: A customer can place many orders, but each order is placed by only one customer. Example: Authors and Posts- An
Authorcan write manyPosts. - A
Postis written by only oneAuthor.
posts.author_idis a foreign key referencingauthors.id.NOT NULL: A post must have an author.ON DELETE RESTRICT: Prevents deleting an author if they still have posts (you’d have to reassign posts or delete them first).
Many-to-Many (M:N) Relationships
Concept: Each instance of Entity A can be associated with many instances of Entity B, and each instance of Entity B can be associated with many instances of Entity A. When to use: When two entities have a bidirectional “many” relationship. Real-World Analogy: Students can enroll in many courses, and courses can have many students. Example: Posts and Tags- A
Postcan have multipleTags(e.g., ‘SQL’, ‘PostgreSQL’, ‘Performance’). - A
Tagcan be applied to multiplePosts.
post_tagslinkspostsandtags.post_idandtag_idtogether form a composite primary key, meaning a specific post can only have a specific tag once.ON DELETE CASCADEfor both foreign keys ensures that if a post or tag is deleted, the corresponding entry inpost_tagsis also removed.
2.3 E-Commerce ER Diagram: From Concept to Reality
Now that you understand entities, attributes, relationships, and normalization, let’s apply these concepts to a real-world example: an E-Commerce Platform. The following diagram represents a well-designed schema for a typical e-commerce application. It includes users, products, orders, reviews, and more. This diagram is not just a picture; it’s a visual contract for how your data is structured.How to Read and Use This ER Diagram:
- Entities (Tables): Each box represents a table in your database (e.g.,
users,products,orders). The table name is at the top. - Attributes (Columns): Inside each box, you’ll find the columns for that table. Key columns are highlighted:
id (PK): Primary Key. Uniquely identifies each row in the table._id (FK): Foreign Key. Links to the primary key of another table, establishing a relationship.
- Relationships (Lines): The lines connecting tables show how they relate:
- Crow’s Foot Notation: The symbols at the ends of the lines indicate cardinality (1:1, 1:N, M:N).
- A single line (
|) means “one”. - A crow’s foot (
<) means “many”. - So,
|---<means “one-to-many”.
- A single line (
- The line connecting
ordersandorder_itemsshows that one order can have many order items (1:N).
- Crow’s Foot Notation: The symbols at the ends of the lines indicate cardinality (1:1, 1:N, M:N).
- Special Notations:
Self-reference: A foreign key that refers to the primary key within the same table (e.g.,categoriesfor hierarchical categories).JSONB: Indicates a column storing flexible JSON data, useful for varying attributes.
2.4 Keys and Constraints
Constraints enforce data integrity at the database level.Primary Keys
Foreign Keys
Unique Constraints
Check Constraints
Not Null Constraints
2.5 Schema Design Patterns
What is Normalization? (The “Why”)
Concept: Normalization is a process of organizing the columns and tables of a relational database to minimize data redundancy (duplicate data) and improve data integrity (accuracy and consistency). Real-World Analogy: Organizing a Closet Imagine your closet is a messy database:- Unnormalized closet: Shirts, pants, socks, and accessories are all mixed in one big pile. You have 5 identical blue shirts because you can’t find the one you own. (Lots of redundancy!)
- Normalized closet: You have drawers for socks, hangers for shirts, shelves for pants. Each item has its designated place, and you know exactly where everything is. You only own one blue shirt. (No redundancy, easy to find and manage!)
- Storage Waste: Storing the same information multiple times takes up unnecessary space.
- Update Anomalies: If you need to change redundant data, you must update it in multiple places. If you miss one, your data becomes inconsistent.
- Example: Changing a customer’s address in one place but not another.
- Deletion Anomalies: Deleting one piece of data might unintentionally delete other, unrelated data.
- Example: Deleting a product might also delete the only record of its supplier.
- Insertion Anomalies: You might not be able to add new data until other, unrelated data is available.
- Example: Can’t add a new supplier until they provide a product.
Normal Forms: Rules for a Clean Database
Normalization is guided by a series of “normal forms” (rules). We’ll focus on the first three, which are most commonly applied.First Normal Form (1NF): No Repeating Groups!
Rule: A table is in 1NF if it contains no repeating groups or multi-valued attributes. Each column must hold only one value, and each row must be unique. Real-World Analogy: Your shopping list should have one item per line, not “apples, bananas, oranges” on one line. Problem (Not in 1NF): Imagine anorders table where you list multiple items in one column:
products_orderedcolumn contains multiple values.- How would you easily search for all orders that contain a “Mouse”?
Second Normal Form (2NF): Full Dependency on Primary Key!
Rule: A table is in 2NF if it is in 1NF AND all non-key attributes are fully dependent on the primary key. This means no non-key attribute can depend on only part of a composite primary key. Real-World Analogy: If you have a two-part ID (like student_ID + course_ID), information about just the student shouldn’t be stored in that table, only information about that specific student in that specific course. Problem (Not in 2NF): Consider anenrollments table with a composite primary key (student_id, course_id):
- Primary Key:
(student_id, course_id) student_nameonly depends onstudent_id(part of the primary key), notcourse_id.course_nameonly depends oncourse_id(part of the primary key), notstudent_id.
Third Normal Form (3NF): No Transitive Dependencies!
Rule: A table is in 3NF if it is in 2NF AND all non-key attributes are non-transitively dependent on the primary key. This means there are no non-key attributes that depend on other non-key attributes. Real-World Analogy: If you have a list of employees, and each employee has a department name, you shouldn’t also store the department’s phone number in the employee list. The department phone number depends on the department name, not directly on the employee. Problem (Not in 3NF): Consider anemployees table:
- Primary Key:
employee_id department_managerdepends ondepartment_name.department_nameis a non-key attribute.- Therefore,
department_manageris transitively dependent onemployee_id(viadepartment_name).
department_manager is repeated.
Solution (3NF): Move the transitively dependent attributes to a new table.
Benefits of Normalization (Why all these rules?)
- Reduced Data Redundancy: Data is stored once, saving space.
- Improved Data Integrity: Changes only need to be made in one place, reducing errors.
- Easier Maintenance: Less data duplication makes updates and deletions safer.
- Better Querying: Although sometimes requiring more JOINs, a normalized schema is logically clearer and often more flexible for complex queries.
When to Consider Denormalization: Breaking the Rules for Performance
Concept: Denormalization is the process of intentionally adding redundant data to a database, often by combining tables or duplicating columns. Why break the rules? While normalization is generally good, sometimes it can lead to very complex queries with many JOINs, which can be slow for highly read-intensive applications (like reporting or analytics dashboards). Real-World Analogy: Having a small, quick-reference notebook with frequently used information copied from your perfectly organized main files. It’s redundant, but it’s faster for quick lookups. When to Denormalize (Use Cases):- Read Performance is Critical: When read queries (SELECTs) are much more frequent than write queries (INSERTs, UPDATEs, DELETEs), and normalization causes unacceptable slowdowns.
- Data Rarely Changes: If the redundant data changes infrequently, the risk of update anomalies is low.
- Complex JOINs Hurt Performance: If a query requires joining many tables to get frequently accessed data, denormalizing can flatten the data and reduce JOINs.
- Reporting and Analytics (OLAP Systems): Data warehouses often use denormalized schemas (star schemas, snowflake schemas) to optimize for analytical queries.
- You know all query patterns: If your application has very specific, predictable query patterns, you can optimize the schema for those exact queries.
- Duplicating Columns: Copying a column from one table into another related table.
- Example: Storing
customer_namedirectly in theorderstable to avoid joiningusersfor every order display.
- Example: Storing
- Pre-joining Tables: Storing the results of a join in a new, single table.
- Materialized Views: Storing pre-calculated aggregates or joined data that can be refreshed periodically.
- Increased Data Redundancy: More disk space used.
- Higher Risk of Data Inconsistency: If redundant data isn’t updated everywhere, you can have conflicting information.
- More Complex Write Operations: Updates now need to modify data in multiple places, potentially increasing application logic complexity.
- Less Flexible: Optimized for specific queries; new query patterns might not be efficient.
Practical Approach to Normalization
- Start with Normalization: Always begin by designing your database to at least 3NF. This is the safe and robust default.
- Measure Performance: If you encounter slow queries in production, use tools like
EXPLAIN ANALYZEto pinpoint bottlenecks. - Identify Bottlenecks: If slow performance is due to too many JOINs on frequently accessed data, then consider denormalization.
- Denormalize Strategically: Apply denormalization only where strictly necessary, with clear justification, and for specific, identified performance problems.
- Maintain Source of Truth: Even when denormalizing, always keep your fully normalized tables as the single source of truth to ensure data integrity. Denormalized tables can be generated views or copies.
2.3 E-Commerce ER Diagram: From Concept to Reality
Now that you understand entities, attributes, relationships, and normalization, let’s apply these concepts to a real-world example: an E-Commerce Platform. The following diagram represents a well-designed schema for a typical e-commerce application. It includes users, products, orders, reviews, and more. This diagram is not just a picture; it’s a visual contract for how your data is structured.How to Read and Use This ER Diagram:
- Entities (Tables): Each box represents a table in your database (e.g.,
users,products,orders). The table name is at the top. - Attributes (Columns): Inside each box, you’ll find the columns for that table. Key columns are highlighted:
id (PK): Primary Key. Uniquely identifies each row in the table._id (FK): Foreign Key. Links to the primary key of another table, establishing a relationship.
- Relationships (Lines): The lines connecting tables show how they relate:
- Crow’s Foot Notation: The symbols at the ends of the lines indicate cardinality (1:1, 1:N, M:N).
- A single line (
|) means “one”. - A crow’s foot (
<) means “many”. - So,
|---<means “one-to-many”.
- A single line (
- The line connecting
ordersandorder_itemsshows that one order can have many order items (1:N).
- Crow’s Foot Notation: The symbols at the ends of the lines indicate cardinality (1:1, 1:N, M:N).
- Special Notations:
Self-reference: A foreign key that refers to the primary key within the same table (e.g.,categoriesfor hierarchical categories).JSONB: Indicates a column storing flexible JSON data, useful for varying attributes.
2.4 Keys and Constraints
Constraints enforce data integrity at the database level.Primary Keys
Foreign Keys
Unique Constraints
Check Constraints
Not Null Constraints
2.4 Schema Design Patterns
Common patterns for real-world applications.Soft Deletes
Keep deleted records for auditing/recovery instead of hard deleting.Audit Trail / History Tables
Track all changes to important data.Polymorphic Associations
When multiple tables can be “commentable” or “taggable”.Self-Referential Relationships
For hierarchies like categories, org charts, or threaded comments.Common Design Anti-Patterns: What NOT to Do
Understanding anti-patterns is just as important as knowing best practices. These are common mistakes that lead to poor performance, maintenance nightmares, and data integrity issues.Anti-Pattern #1: Repeating Groups and Arrays in Columns
The Problem: Storing multiple values in a single column, either as delimited strings or arrays. Bad Example:- Impossible to enforce referential integrity
- Difficult to query efficiently
- Violates First Normal Form (1NF)
- Cannot create proper indexes on individual values
- Updating a single value requires manipulating the entire string/array
Anti-Pattern #2: Generic/Polymorphic Tables Gone Wrong
The Problem: Creating overly generic tables that try to store different entity types in a single table. Bad Example:- No data integrity constraints possible
- Impossible to create proper indexes
- Very difficult to maintain and query
- Violates the principle of single responsibility
- Poor performance for any non-trivial query
Anti-Pattern #3: The Swiss Army Knife Table
The Problem: Creating one massive table with dozens of columns, many of which are NULL for any given row. Bad Example:- Most columns are NULL for any given record
- Impossible to create meaningful constraints
- Terrible performance due to wide rows
- Difficult to maintain and understand
- Violates the principle of having focused, purposeful tables
Anti-Pattern #4: Ignoring Proper Keys and Constraints
The Problem: Creating tables without primary keys, foreign keys, or any data integrity constraints. Bad Example:- Data integrity violations are inevitable
- Impossible to maintain referential integrity
- Leads to garbage data and unpredictable behavior
- Makes debugging and maintenance extremely difficult
- Creates cascading problems throughout the application
Anti-Pattern #5: Over-normalization (Premature Optimization)
The Problem: Taking normalization too far, creating excessive complexity for marginal gains. Bad Example:- Excessive joins hurt performance
- Complex queries are harder to write and maintain
- Diminishing returns on normalization benefits
- May actually hurt data integrity in some cases
- Premature optimization at the expense of readability
Anti-Pattern #6: Improper Handling of Hierarchical Data
The Problem: Poor approaches to storing tree-like data structures. Bad Example:- Path-based: Difficult to maintain, prone to errors during renames
- Adjacency list: Can create infinite loops, no protection against cycles
- Both: Hard to query for ancestors/descendants efficiently
- Both: No enforcement of tree structure integrity
Key Takeaways for Avoiding Anti-Patterns
- Start with normalization (1NF, 2NF, 3NF) and only denormalize when there’s a clear performance need
- Always use constraints to enforce data integrity at the database level
- Design tables with single responsibilities - don’t try to store everything in one place
- Consider query patterns when designing your schema
- Avoid premature optimization - don’t over-normalize for theoretical benefits
- Plan for growth - design schemas that can accommodate future requirements
- Document your design decisions - explain why you chose certain approaches
2.5 JSONB for Flexible Data
When you need schema flexibility within a relational database.When to Use JSONB
Good Use Cases
- User preferences/settings
- Product attributes (vary by category)
- API response caching
- Metadata and tags
- Form submissions
Avoid JSONB For
- Core business data you query often
- Data with relationships
- Data you need to aggregate
- Anything that should be normalized
JSONB Operations
Hybrid Schema Pattern
Combine normalized columns with JSONB for flexibility.2.6 Schema Migrations
Evolve your schema safely over time.Migration Best Practices
Version Control Migrations
Version Control Migrations
Every schema change should be a versioned migration file.Each migration has UP (apply) and DOWN (rollback) scripts.
Backward Compatible Changes
Backward Compatible Changes
When deploying, old and new code may run simultaneously. Safe changes:✅ Safe (backward compatible):
- Add new table
- Add new nullable column
- Add new column with default
- Add new index
- Rename table/column (with alias/view)
- Drop table/column
- Rename without alias
- Change column type
- Add NOT NULL to existing column
Multi-Step Migrations
Multi-Step Migrations
For breaking changes, use multiple deployments:Renaming a column:
- Deploy: Add new column, backfill data
- Deploy: Update code to write both columns
- Deploy: Update code to read new column
- Deploy: Drop old column
- Deploy: Add column as nullable
- Backfill data (all rows)
- Deploy: Add NOT NULL constraint
Safe Migration Examples
2.7 Practice: E-Commerce Schema Design
Challenge
Design a complete database schema for an e-commerce platform with:- Users (customers and admins)
- Products with categories (hierarchical)
- Product variants (size, color)
- Shopping cart
- Orders with items
- Reviews and ratings
- Wishlist
- Coupons/discounts
Requirements
- Users can have multiple addresses
- Products belong to categories (nested up to 3 levels)
- Each product can have variants with different prices and stock
- Orders track status history
- Reviews are moderated (pending, approved, rejected)
- Coupons can be percentage or fixed amount
Solution Schema
Solution Schema
2.6 Physical Design and Storage Layout
Physical design bridges the gap between logical models (ERDs) and how bytes are actually laid out on disk. This significantly impacts performance, especially for high-throughput systems.Row vs. Columnar Storage
How a database stores data impacts its suitability for different workloads:- Row-Oriented (OLTP):
- Storage: Stores all columns of a single row together in a block.
- Optimized for: Finding specific records, inserts, and updates (e.g., PostgreSQL, MySQL).
- Use Case: E-commerce checkouts, user profiles.
- Column-Oriented (OLAP):
- Storage: Stores all values of a single column together in a block.
- Optimized for: Aggregating millions of rows on a few columns (e.g., BigQuery, Snowflake, ClickHouse).
- Use Case: Analytics, reporting, trend analysis.
The Problem of NULLs and Data Types
In physical storage, every byte counts. Choosing the wrong data type can lead to “bloat”:- Alignment Padding: Databases often align data to 8-byte boundaries. If you alternate
BIGINT(8 bytes) andBOOLEAN(1 byte), the DB might add 7 bytes of padding after every boolean. - Fixed vs. Variable Width:
CHAR(100)always takes 100 bytes, even if the string is 2 chars.VARCHAR(100)only takes the bytes needed + a small header.
CREATE TABLE from largest to smallest to minimize alignment padding.
Partitioning and Sharding
As data grows, a single table becomes too large to manage efficiently.- Vertical Partitioning: Splitting a table by columns. (e.g., Moving large
textblobs to a separatedetailstable). - Horizontal Partitioning (Sharding): Splitting a table by rows across multiple nodes or physical files.
- Range Partitioning: Based on a range (e.g.,
orders_2023,orders_2024). - List Partitioning: Based on discrete values (e.g.,
users_us,users_eu). - Hash Partitioning: Distributing data evenly using a hash of a key (e.g.,
hash(user_id) % 4).
- Range Partitioning: Based on a range (e.g.,
2.7 Advanced Normalization and Denormalization Strategies
In high-performance systems, the trade-off between 3NF and performance is a primary design decision.The True Cost of 3NF
While 3NF prevents anomalies, it forces Join Latency. In distributed databases, joins across nodes can be prohibitively expensive (the “Network Hop” problem).Scalable Denormalization Patterns
- Pre-aggregated Fields (Counter Pattern):
- Instead of
SELECT COUNT(*) FROM comments WHERE post_id = ?, store acomments_counton thepoststable. - Trade-off: You must use transactions or triggers to keep the count in sync.
- Instead of
- Read-Model Materialization:
- Keep your source of truth in 3NF.
- Use Change Data Capture (CDC) to stream updates into a denormalized “Read Model” (e.g., an Elasticsearch index or a flat Postgres table) optimized for specific UI screens.
- Attribute Value Bundling:
- If you always fetch a user’s
city,state, andcountrytogether, you might bundle them into alocationJSONB column or a flattened table, even if it technically violates the strict atomicity of 1NF.
- If you always fetch a user’s
Normalization vs. Performance Checklist
| Factor | Favor Normalization (3NF+) | Favor Denormalization |
|---|---|---|
| Workload | Write-heavy / Transactional | Read-heavy / Analytical |
| Consistency | Strict (ACID) | Eventual/Managed |
| Complexity | Low data volume | High volume / High concurrency |
| Joins | Fast / Single-node | Slow / Distributed |
Module Mastery Checklist
Complete this checklist to confirm you’ve mastered Database Design & Modeling:Entity-Relationship (ER) Modeling
- Can identify entities, attributes, and relationships from business requirements
- Apply the 4-step ER modeling process systematically
- Distinguish between one-to-one, one-to-many, and many-to-many relationships
- Determine proper cardinality for each relationship
- Translate ER models into SQL table structures
- Create visual ER diagrams for complex systems
Normalization Principles
- Transform unnormalized data to First Normal Form (1NF)
- Apply Second Normal Form (2NF) by eliminating partial dependencies
- Achieve Third Normal Form (3NF) by removing transitive dependencies
- Recognize when denormalization is appropriate for performance
- Balance normalization benefits against query complexity
- Identify and fix common normalization violations
Keys and Constraints
- Implement appropriate primary key strategies (SERIAL, UUID, composite)
- Design foreign key relationships with proper referential actions
- Apply unique constraints at column and table levels
- Use check constraints to enforce business rules
- Implement not-null constraints and default values strategically
- Create proper indexes for common query patterns
Schema Design Patterns
- Implement soft deletes using timestamp-based approaches
- Design audit trails to track data changes over time
- Apply polymorphic associations when appropriate
- Create self-referential relationships for hierarchical data
- Design efficient tagging systems with junction tables
- Structure data with referential integrity in mind
JSONB and Flexible Schemas
- Determine appropriate use cases for JSONB vs. normalized columns
- Query nested JSON data using PostgreSQL’s JSONB operators
- Create effective indexes on JSONB data
- Implement hybrid schemas combining normalized and flexible data
- Balance schema flexibility against performance considerations
Schema Migrations
- Plan backward-compatible schema changes for production
- Execute safe migration patterns for structural changes
- Create indexes without blocking table access
- Implement multi-step migrations for breaking changes
- Use version control for tracking schema evolution
Anti-Pattern Recognition
- Identify and avoid repeating groups in columns
- Prevent over-generic table designs
- Avoid the “Swiss Army Knife” table anti-pattern
- Implement proper constraints for data integrity
- Balance normalization with practical considerations
- Design hierarchical data structures properly
Practical Application
- Complete the hands-on blog schema implementation lab
- Design a normalized schema for a new business domain
- Identify and fix anti-patterns in existing schemas
- Apply appropriate constraints to maintain data quality
- Design schemas considering both current and future requirements
Next Module
Module 3: Transactions & ACID
Build reliable applications with proper transaction handling