Skip to main content
Database Design & Modeling - ER diagrams, normalization, and schema patterns

Module 2: Database Design & Modeling

Great database design is the foundation of every successful application. A well-designed schema makes queries fast, code simple, and scaling possible. A poorly designed one creates endless headaches.
Estimated Time: 10-12 hours
Hands-On: Schema design exercises + E-commerce project
Key Skill: Translating business requirements into efficient schemas

Learning Objectives

By the end of this module, you will be able to:
  • Design ER models systematically: Apply the 4-step ER modeling process to translate business requirements into entities, attributes, and relationships.
  • Apply normalization principles: Transform schemas to 1NF, 2NF, and 3NF to eliminate redundancy and maintain data integrity.
  • Choose appropriate relationships: Implement one-to-one, one-to-many, and many-to-many relationships with proper foreign key constraints.
  • Define and implement constraints: Use primary keys, foreign keys, unique constraints, and check constraints to enforce data integrity.
  • Apply advanced schema patterns: Implement soft deletes, audit trails, polymorphic associations, and hierarchical data structures.
  • Design flexible schemas: Use JSONB effectively for semi-structured data while maintaining performance.
  • Manage schema evolution: Execute safe migrations that preserve data and maintain backward compatibility.

Module-Specific Learning Outcomes

Section 2.1 - Entity-Relationship (ER) Modeling

  • Understand the purpose of ER modeling as a blueprint for database design
  • Identify entities, attributes, and relationships from business requirements
  • Apply the 4-step ER modeling process: Identify Entities → Define Attributes → Establish Relationships → Determine Cardinality
  • Distinguish between one-to-one, one-to-many, and many-to-many relationships
  • Represent ER models visually and translate them to SQL table structures
  • Apply cardinality rules to determine proper foreign key placement

Section 2.2 - Normalization Principles

  • Explain the problems caused by data redundancy (update anomalies, storage waste)
  • Transform unnormalized data to First Normal Form (1NF) by eliminating repeating groups
  • Apply Second Normal Form (2NF) by removing partial dependencies on composite keys
  • Achieve Third Normal Form (3NF) by eliminating transitive dependencies
  • Recognize when denormalization is appropriate for performance optimization
  • Balance normalization benefits against query complexity considerations

Section 2.3 - Keys and Constraints

  • Implement primary keys using appropriate data types (SERIAL, UUID, composite)
  • Design foreign key relationships with proper referential actions (CASCADE, RESTRICT, SET NULL)
  • Apply unique constraints at column and table levels
  • Use check constraints to enforce business rules at the database level
  • Implement not-null constraints and default values strategically
  • Design composite keys for junction tables and special use cases

Section 2.4 - Schema Design Patterns

  • Implement soft deletes using deletion timestamps instead of permanent removal
  • Design audit trails to track changes to critical data over time
  • Apply polymorphic associations when multiple entity types relate to a common entity
  • Create self-referential relationships for hierarchical data (categories, org charts)
  • Design tagging systems with many-to-many relationships and junction tables
  • Structure data for flexible querying while maintaining referential integrity

Section 2.5 - JSONB for Flexible Data

  • Determine appropriate use cases for JSONB versus traditional normalized columns
  • Store and query nested JSON data using PostgreSQL’s JSONB operators
  • Create indexes on JSONB data for efficient querying
  • Implement hybrid schemas combining normalized columns with JSONB attributes
  • Balance schema flexibility against query performance and data validation

Section 2.6 - Schema Migrations

  • Plan backward-compatible schema changes for production environments
  • Execute safe migration patterns for adding, modifying, and removing columns
  • Create indexes concurrently to avoid table locking
  • Implement multi-step migrations for breaking changes
  • Use version control for tracking and managing schema evolution

Section 2.7 - Physical Design Considerations

  • Understand the differences between row-oriented and column-oriented storage
  • Apply partitioning strategies (horizontal and vertical) for large datasets
  • Consider the impact of data types and NULL values on storage efficiency
  • Design schemas with scalability and performance in mind

Hands-On Lab: Blog Schema Implementation

Goal: Apply ER modeling and normalization principles to implement a complete blog schema with proper constraints.
  1. Design an ER model for a blog system with the following requirements:
    • Users can create posts
    • Users can comment on posts
    • Posts can have multiple tags
    • Posts can be categorized hierarchically (Technology > Programming > SQL)
    • Users can have profiles with optional information
    • Posts can have publication dates and status (draft, published, archived)
  2. Normalize your schema to 3NF, explaining how each normal form is satisfied
  3. Implement the schema with appropriate constraints:
    • Primary keys with suitable data types
    • Foreign keys with proper referential actions
    • Unique constraints where appropriate
    • Check constraints for data validation
    • Indexes for common query patterns
  4. Add soft delete capability to the comments table
  5. Implement a tagging system using a junction table
  6. Add JSONB columns for flexible post metadata and user preferences
  7. Write sample queries that demonstrate the relationships between entities
When you can design and implement a normalized schema with proper constraints that meets all requirements, you’ve mastered database design fundamentals.

2.1 Entity-Relationship (ER) Modeling: Blueprinting Your Data

What is ER Modeling? (The Big Picture)

Before you write a single line of SQL code or create any tables, you need to understand and plan your data. This is where Entity-Relationship (ER) Modeling comes in. Real-World Analogy: Architectural Blueprint Think of ER modeling as creating a blueprint for your database, just like an architect creates blueprints for a building. You wouldn’t start building a house by just laying bricks without a plan, right? The same goes for databases!
  • Architectural Blueprint: Shows rooms, connections, sizes, purpose
  • ER Diagram: Shows entities (tables), their attributes (columns), and how they relate
An ER model helps you:
  1. Visualize your data: See how everything fits together.
  2. Communicate ideas: Discuss the database structure with teammates, stakeholders, or users.
  3. Identify relationships: Understand how different pieces of information are connected.
  4. Prevent issues: Catch design flaws early, before writing code.

The Core Components of an ER Model

An ER model consists of three main components:
  1. Entities: The “things” or objects you want to store data about (e.g., a user, a product, an order).
  2. Attributes: The properties or characteristics of an entity (e.g., a user has a name, email, and age).
  3. Relationships: How entities are connected to each other (e.g., a user places an order).
Before writing any SQL, understanding your data through ER modeling is crucial.

The ER Modeling Process

The ER Modeling Process: A Step-by-Step Guide

Building an ER model is a systematic process. Let’s walk through the steps with an example. Scenario: Designing a simple Blog Database We need to store information about blog posts, their authors, and comments.
1

Step 1: Identify Entities (The Nouns)

What are entities? These are the main “things” or objects in your system that you want to store data about. Think of them as the nouns in your business requirements.How to find them? Look for key objects, concepts, or roles in your application.For our Blog:
  • A User (author of posts, commenter)
  • A Post (the blog article itself)
  • A Comment (feedback on a post)
Each entity will eventually become a table in your database.
2

Step 2: Define Attributes (The Properties)

What are attributes? These are the properties, characteristics, or pieces of information that describe each entity. Think of them as the adjectives or details of your nouns.How to find them? For each entity, ask: “What information do I need to store about this?”For our Blog:
  • User: id, name, email, password_hash, created_at
  • Post: id, title, content, published_date, user_id (who wrote it)
  • Comment: id, text, created_at, user_id (who commented), post_id (which post it’s on)
Each attribute will eventually become a column in your table.Self-Correction Tip: Attributes like user_id and post_id might seem like relationships, but for now, list them as attributes, and we’ll formalize the relationships in the next step.
3

Step 3: Establish Relationships (How Entities Connect)

What are relationships? These describe how entities interact or are associated with each other. Think of them as the verbs that connect your nouns.How to find them? For any two entities, ask: “How do they relate?”For our Blog:
  • A User writes Posts
  • A User makes Comments
  • A Post receives Comments
Relationships are crucial for joining data across tables.
4

Step 4: Determine Cardinality (How Many?)

What is cardinality? This specifies the number of instances of one entity that can be associated with the number of instances of another entity.How to think about it? For each relationship, ask two questions:
  1. “How many X can relate to one Y?”
  2. “How many Y can relate to one X?”
Common Cardinalities:
  • One-to-One (1:1): Each instance of Entity A is associated with exactly one instance of Entity B, and vice-versa.
    • Example: User has one UserProfile (and a UserProfile belongs to one User)
  • One-to-Many (1:N): Each instance of Entity A is associated with one or more instances of Entity B, but each instance of Entity B is associated with only one instance of Entity A.
    • Example: User writes many Posts (but a Post is written by only one User)
  • Many-to-Many (M:N): Each instance of Entity A can be associated with many instances of Entity B, and vice-versa.
    • Example: Posts can have many Tags, and Tags can be on many Posts
    • Note: Many-to-many relationships often require an extra “junction” or “bridge” table in the database.
For our Blog (revisiting relationships with cardinality):
  • User writes Posts: One-to-Many (1:N). One user can write many posts, but each post has only one author.
  • User makes Comments: One-to-Many (1:N). One user can make many comments, but each comment has only one author.
  • Post receives Comments: One-to-Many (1:N). One post can receive many comments, but each comment belongs to only one post.
Determining cardinality helps you translate your ER model into database tables and foreign keys.

Relationship Types: Detailed Examples

Let’s dive deeper into each cardinality type with more detailed explanations and SQL examples.

One-to-One (1:1) Relationships

Concept: Each instance of Entity A is associated with exactly one instance of Entity B, and each instance of Entity B is associated with exactly one instance of Entity A. When to use: Often used to separate large tables into smaller, more manageable ones, or to store sensitive/optional information in a separate table. Real-World Analogy: A person has exactly one passport (and a passport belongs to exactly one person). Example: Users and User Profiles
  • A User has basic login info (email, password).
  • A UserProfile has extended details (bio, avatar, preferences).
How it’s implemented in SQL: Both tables usually share the same primary key.
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL
);

CREATE TABLE user_profiles (
    user_id INTEGER PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
    bio TEXT,
    avatar_url VARCHAR(255),
    preferences JSONB
);
Explanation:
  • users.id is the primary key for the users table.
  • user_profiles.user_id is both a primary key and a foreign key, ensuring a 1:1 relationship.
  • ON DELETE CASCADE: If a user is deleted, their profile is automatically deleted too.

One-to-Many (1:N) Relationships

Concept: Each instance of Entity A can be associated with one or more instances of Entity B, but each instance of Entity B is associated with only one instance of Entity A. When to use: This is the most common type of relationship in relational databases. Real-World Analogy: A customer can place many orders, but each order is placed by only one customer. Example: Authors and Posts
  • An Author can write many Posts.
  • A Post is written by only one Author.
How it’s implemented in SQL: The “many” side table contains a foreign key that references the primary key of the “one” side table.
CREATE TABLE authors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT NOT NULL,
    author_id INTEGER NOT NULL REFERENCES authors(id) ON DELETE RESTRICT,
    published_date TIMESTAMPTZ DEFAULT NOW()
);
Explanation:
  • posts.author_id is a foreign key referencing authors.id.
  • NOT NULL: A post must have an author.
  • ON DELETE RESTRICT: Prevents deleting an author if they still have posts (you’d have to reassign posts or delete them first).

Many-to-Many (M:N) Relationships

Concept: Each instance of Entity A can be associated with many instances of Entity B, and each instance of Entity B can be associated with many instances of Entity A. When to use: When two entities have a bidirectional “many” relationship. Real-World Analogy: Students can enroll in many courses, and courses can have many students. Example: Posts and Tags
  • A Post can have multiple Tags (e.g., ‘SQL’, ‘PostgreSQL’, ‘Performance’).
  • A Tag can be applied to multiple Posts.
How it’s implemented in SQL: Requires an intermediate table (often called a junction table or bridge table) with foreign keys to both original tables.
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT NOT NULL
);

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL
);

-- Junction table: post_tags
CREATE TABLE post_tags (
    post_id INTEGER NOT NULL REFERENCES posts(id) ON DELETE CASCADE,
    tag_id INTEGER NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (post_id, tag_id)  -- Composite primary key ensures unique pair
);
Explanation:
  • post_tags links posts and tags.
  • post_id and tag_id together form a composite primary key, meaning a specific post can only have a specific tag once.
  • ON DELETE CASCADE for both foreign keys ensures that if a post or tag is deleted, the corresponding entry in post_tags is also removed.

2.3 E-Commerce ER Diagram: From Concept to Reality

Now that you understand entities, attributes, relationships, and normalization, let’s apply these concepts to a real-world example: an E-Commerce Platform. The following diagram represents a well-designed schema for a typical e-commerce application. It includes users, products, orders, reviews, and more. This diagram is not just a picture; it’s a visual contract for how your data is structured.
E-Commerce ER Diagram

How to Read and Use This ER Diagram:

  1. Entities (Tables): Each box represents a table in your database (e.g., users, products, orders). The table name is at the top.
  2. Attributes (Columns): Inside each box, you’ll find the columns for that table. Key columns are highlighted:
    • id (PK): Primary Key. Uniquely identifies each row in the table.
    • _id (FK): Foreign Key. Links to the primary key of another table, establishing a relationship.
  3. Relationships (Lines): The lines connecting tables show how they relate:
    • Crow’s Foot Notation: The symbols at the ends of the lines indicate cardinality (1:1, 1:N, M:N).
      • A single line (|) means “one”.
      • A crow’s foot (<) means “many”.
      • So, |---< means “one-to-many”.
    • The line connecting orders and order_items shows that one order can have many order items (1:N).
  4. Special Notations:
    • Self-reference: A foreign key that refers to the primary key within the same table (e.g., categories for hierarchical categories).
    • JSONB: Indicates a column storing flexible JSON data, useful for varying attributes.
Think of this diagram as your database’s blueprint. Spend time understanding how data flows and connects between different parts of the system.

2.4 Keys and Constraints

Constraints enforce data integrity at the database level.

Primary Keys

-- Auto-incrementing integer (common)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL
);

-- BigInt for large scale
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    event_type VARCHAR(50) NOT NULL
);

-- UUID for distributed systems
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL
);

-- Composite primary key
CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_id)
);
Integer vs UUID: Use integers for simplicity and performance. Use UUIDs when you need to generate IDs in distributed systems or expose IDs in URLs (harder to guess).

Foreign Keys

-- Basic foreign key
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- With explicit ON DELETE behavior
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id INTEGER NOT NULL REFERENCES products(id) ON DELETE RESTRICT
);

-- ON DELETE options:
-- CASCADE    - Delete child rows when parent is deleted
-- RESTRICT   - Prevent deletion of parent if children exist
-- SET NULL   - Set foreign key to NULL when parent is deleted
-- SET DEFAULT- Set to default value when parent is deleted
-- NO ACTION  - Similar to RESTRICT (default)

Unique Constraints

-- Single column unique
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE
);

-- Multi-column unique (composite unique)
CREATE TABLE product_variants (
    id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(id),
    size VARCHAR(10),
    color VARCHAR(20),
    UNIQUE (product_id, size, color)  -- Each combination must be unique
);

-- Partial unique (unique with condition)
CREATE UNIQUE INDEX unique_active_email 
ON users(email) 
WHERE deleted_at IS NULL;  -- Only active users must have unique emails

Check Constraints

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
    stock INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
    status VARCHAR(20) CHECK (status IN ('draft', 'active', 'archived')),
    
    -- Multi-column check
    sale_price DECIMAL(10, 2),
    CONSTRAINT valid_sale_price CHECK (sale_price IS NULL OR sale_price < price)
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    status VARCHAR(20) NOT NULL DEFAULT 'pending',
    total DECIMAL(12, 2) NOT NULL,
    shipped_at TIMESTAMPTZ,
    
    -- Logical constraint
    CONSTRAINT shipped_orders_have_date 
        CHECK (status != 'shipped' OR shipped_at IS NOT NULL)
);

Not Null Constraints

-- Explicit NOT NULL
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    name VARCHAR(100) NOT NULL,
    bio TEXT,  -- This can be NULL
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Default values reduce NULL needs
CREATE TABLE settings (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    theme VARCHAR(20) NOT NULL DEFAULT 'light',
    notifications_enabled BOOLEAN NOT NULL DEFAULT true,
    timezone VARCHAR(50) NOT NULL DEFAULT 'UTC'
);

2.5 Schema Design Patterns

What is Normalization? (The “Why”)

Concept: Normalization is a process of organizing the columns and tables of a relational database to minimize data redundancy (duplicate data) and improve data integrity (accuracy and consistency). Real-World Analogy: Organizing a Closet Imagine your closet is a messy database:
  • Unnormalized closet: Shirts, pants, socks, and accessories are all mixed in one big pile. You have 5 identical blue shirts because you can’t find the one you own. (Lots of redundancy!)
  • Normalized closet: You have drawers for socks, hangers for shirts, shelves for pants. Each item has its designated place, and you know exactly where everything is. You only own one blue shirt. (No redundancy, easy to find and manage!)
Why is Redundancy Bad?
  1. Storage Waste: Storing the same information multiple times takes up unnecessary space.
  2. Update Anomalies: If you need to change redundant data, you must update it in multiple places. If you miss one, your data becomes inconsistent.
    • Example: Changing a customer’s address in one place but not another.
  3. Deletion Anomalies: Deleting one piece of data might unintentionally delete other, unrelated data.
    • Example: Deleting a product might also delete the only record of its supplier.
  4. Insertion Anomalies: You might not be able to add new data until other, unrelated data is available.
    • Example: Can’t add a new supplier until they provide a product.
The Goal of Normalization: To eliminate these problems by structuring tables so that each piece of information is stored in only one place, and related data is linked via keys.

Normal Forms: Rules for a Clean Database

Normalization is guided by a series of “normal forms” (rules). We’ll focus on the first three, which are most commonly applied.

First Normal Form (1NF): No Repeating Groups!

Rule: A table is in 1NF if it contains no repeating groups or multi-valued attributes. Each column must hold only one value, and each row must be unique. Real-World Analogy: Your shopping list should have one item per line, not “apples, bananas, oranges” on one line. Problem (Not in 1NF): Imagine an orders table where you list multiple items in one column:
Orders (Bad - Not 1NF):
┌──────────┬────────────┬─────────────────────────┬───────────────┐
│ order_id │ customer   │      products_ordered     │ total_amount │
├──────────┼────────────┼─────────────────────────┼───────────────┤
│    1     │ Alice      │ "Laptop, Mouse, Keyboard" │   1200.00     │
│    2     │ Bob        │ "Monitor, Webcam"         │   300.00      │
└──────────┴────────────┴─────────────────────────┴───────────────┘
  • products_ordered column contains multiple values.
  • How would you easily search for all orders that contain a “Mouse”?
Solution (1NF): Split the repeating group into a new, related table.
-- Orders table (1NF):
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),
    total_amount DECIMAL(10, 2)
);

-- Order_Items table (1NF):
CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    product_name VARCHAR(100),
    quantity INTEGER
);

-- Now our data looks like this:
Orders:
┌──────────┬────────────┬───────────────┐
│ order_id │ customer   │ total_amount │
├──────────┼────────────┼───────────────┤
1     │ Alice      │   1200.00
2     │ Bob        │   300.00
└──────────┴────────────┴───────────────┘

Order_Items:
┌───────────────┬──────────┬────────────┬──────────┐
│ order_item_id │ order_id │ product_name │ quantity │
├───────────────┼──────────┼────────────┼──────────┤
11     │ Laptop       │    1
21     │ Mouse        │    1
31     │ Keyboard     │    1
42     │ Monitor      │    1
52     │ Webcam       │    1
└───────────────┴──────────┴────────────┴──────────┘
Key Takeaway for 1NF: Every column in every row should contain a single, atomic value. No lists, arrays, or repeating groups within a single cell.

Second Normal Form (2NF): Full Dependency on Primary Key!

Rule: A table is in 2NF if it is in 1NF AND all non-key attributes are fully dependent on the primary key. This means no non-key attribute can depend on only part of a composite primary key. Real-World Analogy: If you have a two-part ID (like student_ID + course_ID), information about just the student shouldn’t be stored in that table, only information about that specific student in that specific course. Problem (Not in 2NF): Consider an enrollments table with a composite primary key (student_id, course_id):
Enrollments (Bad - Not 2NF):
┌────────────┬───────────┬──────────────┬───────────────┐
│ student_id │ course_id │ student_name │  course_name  │
├────────────┼───────────┼──────────────┼───────────────┤
│    101     │   CS101   │ Alice        │ Intro to CS   │
│    101     │   MA101   │ Alice        │ Calc I        │
│    102     │   CS101   │ Bob          │ Intro to CS   │
└────────────┴───────────┴──────────────┴───────────────┘
  • Primary Key: (student_id, course_id)
  • student_name only depends on student_id (part of the primary key), not course_id.
  • course_name only depends on course_id (part of the primary key), not student_id.
Redundancy: Alice’s name is repeated for each course she enrolls in. Solution (2NF): Break the table into separate tables so that attributes only depend on the entire primary key.
-- Students table (2NF):
CREATE TABLE students (
    student_id SERIAL PRIMARY KEY,
    student_name VARCHAR(100)
);

-- Courses table (2NF):
CREATE TABLE courses (
    course_id VARCHAR(10) PRIMARY KEY,
    course_name VARCHAR(100)
);

-- Enrollments table (2NF - junction table):
CREATE TABLE enrollments (
    student_id INTEGER REFERENCES students(student_id),
    course_id VARCHAR(10) REFERENCES courses(course_id),
    PRIMARY KEY (student_id, course_id) -- Composite Primary Key
);

-- Now our data looks like this:
Students:
┌────────────┬──────────────┐
│ student_id │ student_name │
├────────────┼──────────────┤
101     │ Alice        │
102     │ Bob          │
└────────────┴──────────────┘

Courses:
┌───────────┬───────────────┐
│ course_id │  course_name  │
├───────────┼───────────────┤
│   CS101   │ Intro to CS   │
│   MA101   │ Calc I        │
└───────────┴───────────────┘

Enrollments:
┌────────────┬───────────┐
│ student_id │ course_id │
├────────────┼───────────┤
101     │   CS101   │
101     │   MA101   │
102     │   CS101   │
└────────────┴───────────┘
Key Takeaway for 2NF: If you have a composite primary key, make sure all other columns truly rely on all parts of that key. If a column only relies on one part, move it to a new table.

Third Normal Form (3NF): No Transitive Dependencies!

Rule: A table is in 3NF if it is in 2NF AND all non-key attributes are non-transitively dependent on the primary key. This means there are no non-key attributes that depend on other non-key attributes. Real-World Analogy: If you have a list of employees, and each employee has a department name, you shouldn’t also store the department’s phone number in the employee list. The department phone number depends on the department name, not directly on the employee. Problem (Not in 3NF): Consider an employees table:
Employees (Bad - Not 3NF):
┌────────────┬───────────────┬───────────────┬───────────────────┐
│ employee_id │ employee_name │ department_name │ department_manager │
├────────────┼───────────────┼───────────────┼───────────────────┤
│    101     │ Alice         │ Sales         │ Bob               │
│    102     │ Charlie       │ Sales         │ Bob               │
│    103     │ Diana         │ Marketing     │ Eve               │
└────────────┴───────────────┴───────────────┴───────────────────┘
  • Primary Key: employee_id
  • department_manager depends on department_name.
  • department_name is a non-key attribute.
  • Therefore, department_manager is transitively dependent on employee_id (via department_name).
Redundancy: If two employees are in the same department, the department_manager is repeated. Solution (3NF): Move the transitively dependent attributes to a new table.
-- Employees table (3NF):
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    employee_name VARCHAR(100) NOT NULL,
    department_id INTEGER REFERENCES departments(department_id) -- Link to new table
);

-- Departments table (3NF):
CREATE TABLE departments (
    department_id SERIAL PRIMARY KEY,
    department_name VARCHAR(100) UNIQUE NOT NULL,
    department_manager VARCHAR(100)
);

-- Now our data looks like this:
Employees:
┌────────────┬───────────────┬───────────────┐
│ employee_id │ employee_name │ department_id │
├────────────┼───────────────┼───────────────┤
101     │ Alice         │       1
102     │ Charlie       │       1
103     │ Diana         │       2
└────────────┴───────────────┴───────────────┘

Departments:
┌───────────────┬───────────────┬───────────────────┐
│ department_id │ department_name │ department_manager │
├───────────────┼───────────────┼───────────────────┤
1       │ Sales         │ Bob               │
2       │ Marketing     │ Eve               │
└───────────────┴───────────────┴───────────────────┘
Key Takeaway for 3NF: Make sure no non-key column depends on another non-key column. Every non-key column should depend only on the primary key, and nothing else.

Benefits of Normalization (Why all these rules?)

  • Reduced Data Redundancy: Data is stored once, saving space.
  • Improved Data Integrity: Changes only need to be made in one place, reducing errors.
  • Easier Maintenance: Less data duplication makes updates and deletions safer.
  • Better Querying: Although sometimes requiring more JOINs, a normalized schema is logically clearer and often more flexible for complex queries.

When to Consider Denormalization: Breaking the Rules for Performance

Concept: Denormalization is the process of intentionally adding redundant data to a database, often by combining tables or duplicating columns. Why break the rules? While normalization is generally good, sometimes it can lead to very complex queries with many JOINs, which can be slow for highly read-intensive applications (like reporting or analytics dashboards). Real-World Analogy: Having a small, quick-reference notebook with frequently used information copied from your perfectly organized main files. It’s redundant, but it’s faster for quick lookups. When to Denormalize (Use Cases):
  1. Read Performance is Critical: When read queries (SELECTs) are much more frequent than write queries (INSERTs, UPDATEs, DELETEs), and normalization causes unacceptable slowdowns.
  2. Data Rarely Changes: If the redundant data changes infrequently, the risk of update anomalies is low.
  3. Complex JOINs Hurt Performance: If a query requires joining many tables to get frequently accessed data, denormalizing can flatten the data and reduce JOINs.
  4. Reporting and Analytics (OLAP Systems): Data warehouses often use denormalized schemas (star schemas, snowflake schemas) to optimize for analytical queries.
  5. You know all query patterns: If your application has very specific, predictable query patterns, you can optimize the schema for those exact queries.
Common Denormalization Techniques:
  • Duplicating Columns: Copying a column from one table into another related table.
    • Example: Storing customer_name directly in the orders table to avoid joining users for every order display.
  • Pre-joining Tables: Storing the results of a join in a new, single table.
  • Materialized Views: Storing pre-calculated aggregates or joined data that can be refreshed periodically.
Trade-offs of Denormalization (The Downside):
  • Increased Data Redundancy: More disk space used.
  • Higher Risk of Data Inconsistency: If redundant data isn’t updated everywhere, you can have conflicting information.
  • More Complex Write Operations: Updates now need to modify data in multiple places, potentially increasing application logic complexity.
  • Less Flexible: Optimized for specific queries; new query patterns might not be efficient.

Practical Approach to Normalization

  1. Start with Normalization: Always begin by designing your database to at least 3NF. This is the safe and robust default.
  2. Measure Performance: If you encounter slow queries in production, use tools like EXPLAIN ANALYZE to pinpoint bottlenecks.
  3. Identify Bottlenecks: If slow performance is due to too many JOINs on frequently accessed data, then consider denormalization.
  4. Denormalize Strategically: Apply denormalization only where strictly necessary, with clear justification, and for specific, identified performance problems.
  5. Maintain Source of Truth: Even when denormalizing, always keep your fully normalized tables as the single source of truth to ensure data integrity. Denormalized tables can be generated views or copies.
Don’t denormalize prematurely! Optimize with indexes and query rewrites first. Denormalization is a last resort, but a powerful one when used wisely.

2.3 E-Commerce ER Diagram: From Concept to Reality

Now that you understand entities, attributes, relationships, and normalization, let’s apply these concepts to a real-world example: an E-Commerce Platform. The following diagram represents a well-designed schema for a typical e-commerce application. It includes users, products, orders, reviews, and more. This diagram is not just a picture; it’s a visual contract for how your data is structured.
E-Commerce ER Diagram

How to Read and Use This ER Diagram:

  1. Entities (Tables): Each box represents a table in your database (e.g., users, products, orders). The table name is at the top.
  2. Attributes (Columns): Inside each box, you’ll find the columns for that table. Key columns are highlighted:
    • id (PK): Primary Key. Uniquely identifies each row in the table.
    • _id (FK): Foreign Key. Links to the primary key of another table, establishing a relationship.
  3. Relationships (Lines): The lines connecting tables show how they relate:
    • Crow’s Foot Notation: The symbols at the ends of the lines indicate cardinality (1:1, 1:N, M:N).
      • A single line (|) means “one”.
      • A crow’s foot (<) means “many”.
      • So, |---< means “one-to-many”.
    • The line connecting orders and order_items shows that one order can have many order items (1:N).
  4. Special Notations:
    • Self-reference: A foreign key that refers to the primary key within the same table (e.g., categories for hierarchical categories).
    • JSONB: Indicates a column storing flexible JSON data, useful for varying attributes.
Think of this diagram as your database’s blueprint. Spend time understanding how data flows and connects between different parts of the system.

2.4 Keys and Constraints

Constraints enforce data integrity at the database level.

Primary Keys

-- Auto-incrementing integer (common)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL
);

-- BigInt for large scale
CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    event_type VARCHAR(50) NOT NULL
);

-- UUID for distributed systems
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL
);

-- Composite primary key
CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_id)
);
Integer vs UUID: Use integers for simplicity and performance. Use UUIDs when you need to generate IDs in distributed systems or expose IDs in URLs (harder to guess).

Foreign Keys

-- Basic foreign key
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- With explicit ON DELETE behavior
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id INTEGER NOT NULL REFERENCES products(id) ON DELETE RESTRICT
);

-- ON DELETE options:
-- CASCADE    - Delete child rows when parent is deleted
-- RESTRICT   - Prevent deletion of parent if children exist
-- SET NULL   - Set foreign key to NULL when parent is deleted
-- SET DEFAULT- Set to default value when parent is deleted
-- NO ACTION  - Similar to RESTRICT (default)

Unique Constraints

-- Single column unique
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE
);

-- Multi-column unique (composite unique)
CREATE TABLE product_variants (
    id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL REFERENCES products(id),
    size VARCHAR(10),
    color VARCHAR(20),
    UNIQUE (product_id, size, color)  -- Each combination must be unique
);

-- Partial unique (unique with condition)
CREATE UNIQUE INDEX unique_active_email 
ON users(email) 
WHERE deleted_at IS NULL;  -- Only active users must have unique emails

Check Constraints

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
    stock INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
    status VARCHAR(20) CHECK (status IN ('draft', 'active', 'archived')),
    
    -- Multi-column check
    sale_price DECIMAL(10, 2),
    CONSTRAINT valid_sale_price CHECK (sale_price IS NULL OR sale_price < price)
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    status VARCHAR(20) NOT NULL DEFAULT 'pending',
    total DECIMAL(12, 2) NOT NULL,
    shipped_at TIMESTAMPTZ,
    
    -- Logical constraint
    CONSTRAINT shipped_orders_have_date 
        CHECK (status != 'shipped' OR shipped_at IS NOT NULL)
);

Not Null Constraints

-- Explicit NOT NULL
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    name VARCHAR(100) NOT NULL,
    bio TEXT,  -- This can be NULL
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Default values reduce NULL needs
CREATE TABLE settings (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    theme VARCHAR(20) NOT NULL DEFAULT 'light',
    notifications_enabled BOOLEAN NOT NULL DEFAULT true,
    timezone VARCHAR(50) NOT NULL DEFAULT 'UTC'
);

2.4 Schema Design Patterns

Common patterns for real-world applications.

Soft Deletes

Keep deleted records for auditing/recovery instead of hard deleting.
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    name VARCHAR(100) NOT NULL,
    deleted_at TIMESTAMPTZ,  -- NULL means active
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Unique email only for non-deleted users
CREATE UNIQUE INDEX idx_users_email_active 
ON users(email) 
WHERE deleted_at IS NULL;

-- Soft delete
UPDATE users SET deleted_at = NOW() WHERE id = 123;

-- Query active users only (create a view for convenience)
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;

Audit Trail / History Tables

Track all changes to important data.
-- Main table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- History table
CREATE TABLE products_history (
    id SERIAL PRIMARY KEY,
    product_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    changed_by INTEGER REFERENCES users(id),
    operation VARCHAR(10) NOT NULL  -- INSERT, UPDATE, DELETE
);

-- Trigger to auto-populate history
CREATE OR REPLACE FUNCTION log_product_changes()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'UPDATE' THEN
        INSERT INTO products_history (product_id, name, price, operation)
        VALUES (OLD.id, OLD.name, OLD.price, 'UPDATE');
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO products_history (product_id, name, price, operation)
        VALUES (OLD.id, OLD.name, OLD.price, 'DELETE');
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER product_audit_trigger
AFTER UPDATE OR DELETE ON products
FOR EACH ROW EXECUTE FUNCTION log_product_changes();

Polymorphic Associations

When multiple tables can be “commentable” or “taggable”.
-- Approach 1: Separate foreign keys (recommended for smaller cases)
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    body TEXT NOT NULL,
    -- One of these will be set
    post_id INTEGER REFERENCES posts(id),
    product_id INTEGER REFERENCES products(id),
    article_id INTEGER REFERENCES articles(id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Ensure exactly one is set
    CONSTRAINT one_parent CHECK (
        (post_id IS NOT NULL)::int + 
        (product_id IS NOT NULL)::int + 
        (article_id IS NOT NULL)::int = 1
    )
);

-- Approach 2: Generic reference (more flexible, less type-safe)
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    body TEXT NOT NULL,
    commentable_type VARCHAR(50) NOT NULL,  -- 'Post', 'Product', 'Article'
    commentable_id INTEGER NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_comments_polymorphic 
ON comments(commentable_type, commentable_id);

Self-Referential Relationships

For hierarchies like categories, org charts, or threaded comments.
-- Categories with unlimited nesting
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INTEGER REFERENCES categories(id),
    depth INTEGER NOT NULL DEFAULT 0,
    path TEXT  -- Materialized path for efficient queries
);

-- Create index for hierarchy queries
CREATE INDEX idx_categories_parent ON categories(parent_id);
CREATE INDEX idx_categories_path ON categories(path);

-- Example data
INSERT INTO categories (id, name, parent_id, path) VALUES
(1, 'Electronics', NULL, '1'),
(2, 'Computers', 1, '1.2'),
(3, 'Laptops', 2, '1.2.3'),
(4, 'Gaming Laptops', 3, '1.2.3.4');

-- Query: Get all descendants
SELECT * FROM categories 
WHERE path LIKE '1.2.%'  -- All under Computers
ORDER BY path;

-- Query: Get ancestors (breadcrumb)
SELECT * FROM categories
WHERE '1.2.3.4' LIKE path || '.%' OR id = 4
ORDER BY depth;
---

### Tagging System

Many-to-many with flexible metadata.

```sql
CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50) NOT NULL UNIQUE,
    slug VARCHAR(50) NOT NULL UNIQUE
);

CREATE TABLE post_tags (
    post_id INTEGER REFERENCES posts(id) ON DELETE CASCADE,
    tag_id INTEGER REFERENCES tags(id) ON DELETE CASCADE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (post_id, tag_id)
);

-- Indexes for common queries
CREATE INDEX idx_post_tags_tag ON post_tags(tag_id);

-- Find posts with specific tags
SELECT p.* FROM posts p
INNER JOIN post_tags pt ON p.id = pt.post_id
INNER JOIN tags t ON pt.tag_id = t.id
WHERE t.slug IN ('javascript', 'react');

-- Find posts with ALL specified tags
SELECT p.* FROM posts p
INNER JOIN post_tags pt ON p.id = pt.post_id
INNER JOIN tags t ON pt.tag_id = t.id
WHERE t.slug IN ('javascript', 'react')
GROUP BY p.id
HAVING COUNT(DISTINCT t.id) = 2;  -- Must have both tags

Common Design Anti-Patterns: What NOT to Do

Understanding anti-patterns is just as important as knowing best practices. These are common mistakes that lead to poor performance, maintenance nightmares, and data integrity issues.

Anti-Pattern #1: Repeating Groups and Arrays in Columns

The Problem: Storing multiple values in a single column, either as delimited strings or arrays. Bad Example:
-- AVOID: Storing multiple email addresses in one field
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email_addresses TEXT  -- Something like "primary@example.com,secondary@example.com"
);

-- AVOID: Using PostgreSQL arrays
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    product_ids INTEGER[]  -- Something like {1, 5, 12}
);
Issues:
  • Impossible to enforce referential integrity
  • Difficult to query efficiently
  • Violates First Normal Form (1NF)
  • Cannot create proper indexes on individual values
  • Updating a single value requires manipulating the entire string/array
Good Solution:
-- CORRECT: Separate table for multiple emails per user
CREATE TABLE user_emails (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    email VARCHAR(255) NOT NULL,
    is_primary BOOLEAN NOT NULL DEFAULT FALSE,
    UNIQUE (user_id, email),  -- Prevent duplicate emails per user
    UNIQUE (user_id, is_primary) WHERE is_primary = true  -- Only one primary per user
);

-- CORRECT: Junction table for order-items relationship
CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

Anti-Pattern #2: Generic/Polymorphic Tables Gone Wrong

The Problem: Creating overly generic tables that try to store different entity types in a single table. Bad Example:
-- AVOID: Super-generic table
CREATE TABLE entities (
    id SERIAL PRIMARY KEY,
    entity_type VARCHAR(50) NOT NULL,  -- 'user', 'product', 'order'
    field1 TEXT,
    field2 TEXT,
    field3 TEXT,
    field4 TEXT,
    field5 TEXT
);

-- Even worse with JSON
CREATE TABLE generic_entities (
    id SERIAL PRIMARY KEY,
    entity_type VARCHAR(50) NOT NULL,
    data JSONB NOT NULL  -- Contains completely different structures
);
Issues:
  • No data integrity constraints possible
  • Impossible to create proper indexes
  • Very difficult to maintain and query
  • Violates the principle of single responsibility
  • Poor performance for any non-trivial query
Good Solution:
-- CORRECT: Properly normalized tables with specific purposes
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Use proper polymorphic associations only when needed
CREATE TABLE attachments (
    id SERIAL PRIMARY KEY,
    attachable_type VARCHAR(50) NOT NULL CHECK (attachable_type IN ('user', 'product', 'order')),
    attachable_id INTEGER NOT NULL,
    file_path VARCHAR(500) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Combined constraint to ensure referential integrity within the application
    UNIQUE (attachable_type, attachable_id, file_path)
);

Anti-Pattern #3: The Swiss Army Knife Table

The Problem: Creating one massive table with dozens of columns, many of which are NULL for any given row. Bad Example:
-- AVOID: One table to rule them all
CREATE TABLE master_records (
    id SERIAL PRIMARY KEY,
    record_type VARCHAR(20) NOT NULL,  -- 'customer', 'supplier', 'employee', 'product'
    
    -- Customer fields
    customer_name VARCHAR(100),
    customer_credit_limit DECIMAL(12, 2),
    
    -- Supplier fields
    supplier_company VARCHAR(100),
    supplier_terms VARCHAR(50),
    
    -- Employee fields
    employee_ssn VARCHAR(11),
    employee_department VARCHAR(50),
    
    -- Product fields
    product_sku VARCHAR(50),
    product_weight DECIMAL(8, 2),
    
    -- And 30 more fields for other record types...
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Issues:
  • Most columns are NULL for any given record
  • Impossible to create meaningful constraints
  • Terrible performance due to wide rows
  • Difficult to maintain and understand
  • Violates the principle of having focused, purposeful tables
Good Solution:
-- CORRECT: Separate focused tables
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    credit_limit DECIMAL(12, 2) CHECK (credit_limit >= 0),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE suppliers (
    id SERIAL PRIMARY KEY,
    company_name VARCHAR(100) NOT NULL,
    terms VARCHAR(50),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    ssn VARCHAR(11) NOT NULL UNIQUE,  -- Simplified for example
    department VARCHAR(50),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    sku VARCHAR(50) NOT NULL UNIQUE,
    weight DECIMAL(8, 2),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Anti-Pattern #4: Ignoring Proper Keys and Constraints

The Problem: Creating tables without primary keys, foreign keys, or any data integrity constraints. Bad Example:
-- AVOID: No constraints
CREATE TABLE orders (
    id INTEGER,  -- Not even a serial primary key!
    user_id INTEGER,  -- No foreign key constraint
    total_amount NUMERIC,  -- No NOT NULL or CHECK
    status VARCHAR(20),  -- No check constraint
    created_at TIMESTAMP  -- Nullable timestamp
);

-- Data can be anything!
INSERT INTO orders VALUES 
(NULL, -1, -999.99, 'invalid_status', NULL),
(1, 999999, NULL, 'active', 'not_a_date');
Issues:
  • Data integrity violations are inevitable
  • Impossible to maintain referential integrity
  • Leads to garbage data and unpredictable behavior
  • Makes debugging and maintenance extremely difficult
  • Creates cascading problems throughout the application
Good Solution:
-- CORRECT: Comprehensive constraints
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    total_amount DECIMAL(12, 2) NOT NULL CHECK (total_amount >= 0),
    status VARCHAR(20) NOT NULL DEFAULT 'pending' 
        CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Business rule: shipped orders must have a ship date
    CONSTRAINT shipped_orders_require_ship_date 
        CHECK (status != 'shipped' OR ship_date IS NOT NULL),
    ship_date TIMESTAMPTZ
);

Anti-Pattern #5: Over-normalization (Premature Optimization)

The Problem: Taking normalization too far, creating excessive complexity for marginal gains. Bad Example:
-- AVOID: Extreme normalization
CREATE TABLE first_names (id SERIAL PRIMARY KEY, name VARCHAR(50));
CREATE TABLE last_names (id SERIAL PRIMARY KEY, name VARCHAR(50));
CREATE TABLE email_domains (id SERIAL PRIMARY KEY, domain VARCHAR(100));

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    first_name_id INTEGER REFERENCES first_names(id),
    last_name_id INTEGER REFERENCES last_names(id),
    email_domain_id INTEGER REFERENCES email_domains(id),
    email_local_part VARCHAR(100)  -- Before @ symbol
);

-- Now every user query requires 4 joins!
SELECT 
    fn.name, ln.name, 
    CONCAT(el.email_local_part, '@', ed.domain) as email
FROM users u
JOIN first_names fn ON u.first_name_id = fn.id
JOIN last_names ln ON u.last_name_id = ln.id
JOIN email_domains ed ON u.email_domain_id = ed.id;
Issues:
  • Excessive joins hurt performance
  • Complex queries are harder to write and maintain
  • Diminishing returns on normalization benefits
  • May actually hurt data integrity in some cases
  • Premature optimization at the expense of readability
Good Solution:
-- CORRECT: Balance normalization with practicality
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(255) NOT NULL UNIQUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- Use check constraints for basic validation
    CONSTRAINT valid_email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
);

-- Only normalize when there's a clear benefit:
-- 1. Significant data repetition
-- 2. Clear business requirement for the separate entity
-- 3. Performance improvement from the separation

Anti-Pattern #6: Improper Handling of Hierarchical Data

The Problem: Poor approaches to storing tree-like data structures. Bad Example:
-- AVOID: Storing full paths as strings (fragile)
CREATE TABLE categories_bad (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    path VARCHAR(500) NOT NULL  -- Like 'Electronics/Computers/Laptops/Gaming'
);

-- AVOID: Adjacency list without safeguards (prone to cycles)
CREATE TABLE categories_risky (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INTEGER REFERENCES categories_risky(id)  -- No cycle prevention!
);
Issues:
  • Path-based: Difficult to maintain, prone to errors during renames
  • Adjacency list: Can create infinite loops, no protection against cycles
  • Both: Hard to query for ancestors/descendants efficiently
  • Both: No enforcement of tree structure integrity
Good Solutions:
-- Option 1: Adjacency List + Materialized Path (recommended)
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INTEGER REFERENCES categories(id),
    path TEXT NOT NULL,  -- Like '.1.5.12.' where 1=root, 5=child of 1, 12=child of 5
    depth INTEGER NOT NULL,
    
    -- Prevent self-references
    CONSTRAINT no_self_reference CHECK (id != parent_id),
    
    -- Maintain path consistency with triggers (application logic)
    -- Index for ancestor queries: WHERE path LIKE '%.5.%'
    -- Index for direct children: WHERE parent_id = ?
);

CREATE INDEX idx_categories_path ON categories(path);
CREATE INDEX idx_categories_parent ON categories(parent_id);

-- Option 2: Closure Table (for complex hierarchy queries)
CREATE TABLE category_closure (
    ancestor_id INTEGER NOT NULL REFERENCES categories(id),
    descendant_id INTEGER NOT NULL REFERENCES categories(id),
    depth INTEGER NOT NULL,  -- 0 for self-links, 1 for direct children, etc.
    PRIMARY KEY (ancestor_id, descendant_id)
);

-- Provides fast ancestor/descendant queries at cost of maintenance complexity

Key Takeaways for Avoiding Anti-Patterns

  1. Start with normalization (1NF, 2NF, 3NF) and only denormalize when there’s a clear performance need
  2. Always use constraints to enforce data integrity at the database level
  3. Design tables with single responsibilities - don’t try to store everything in one place
  4. Consider query patterns when designing your schema
  5. Avoid premature optimization - don’t over-normalize for theoretical benefits
  6. Plan for growth - design schemas that can accommodate future requirements
  7. Document your design decisions - explain why you chose certain approaches
Remember: Good database design is about balancing competing concerns - data integrity, performance, maintainability, and scalability. The best schema is one that serves your current needs while remaining adaptable for future requirements.

2.5 JSONB for Flexible Data

When you need schema flexibility within a relational database.

When to Use JSONB

Good Use Cases

  • User preferences/settings
  • Product attributes (vary by category)
  • API response caching
  • Metadata and tags
  • Form submissions

Avoid JSONB For

  • Core business data you query often
  • Data with relationships
  • Data you need to aggregate
  • Anything that should be normalized

JSONB Operations

-- Create table with JSONB
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    base_price DECIMAL(10, 2) NOT NULL,
    attributes JSONB NOT NULL DEFAULT '{}'
);

-- Insert with JSONB
INSERT INTO products (name, base_price, attributes) VALUES
('Gaming Laptop', 1500.00, '{
    "brand": "TechCorp",
    "screen_size": 15.6,
    "specs": {
        "ram": "32GB",
        "storage": "1TB SSD",
        "gpu": "RTX 4080"
    },
    "colors": ["black", "silver"],
    "gaming_features": true
}');

-- Query: Access nested values
SELECT 
    name,
    attributes->>'brand' AS brand,
    attributes->'specs'->>'ram' AS ram,
    attributes->'specs'->>'gpu' AS gpu
FROM products
WHERE attributes->>'brand' = 'TechCorp';

-- Query: Check if key exists
SELECT * FROM products 
WHERE attributes ? 'gaming_features';

-- Query: Contains value
SELECT * FROM products 
WHERE attributes @> '{"gaming_features": true}';

-- Query: Array contains
SELECT * FROM products 
WHERE attributes->'colors' ? 'silver';

-- Update: Set a nested value
UPDATE products 
SET attributes = jsonb_set(attributes, '{specs,ram}', '"64GB"')
WHERE id = 1;

-- Update: Add a new field
UPDATE products 
SET attributes = attributes || '{"warranty_years": 3}'
WHERE id = 1;

-- Index for containment queries (very fast)
CREATE INDEX idx_products_attrs ON products USING gin(attributes);

-- Index specific paths
CREATE INDEX idx_products_brand ON products ((attributes->>'brand'));

Hybrid Schema Pattern

Combine normalized columns with JSONB for flexibility.
CREATE TABLE products (
    -- Normalized: data you query/filter often
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    category_id INTEGER NOT NULL REFERENCES categories(id),
    base_price DECIMAL(10, 2) NOT NULL,
    stock INTEGER NOT NULL DEFAULT 0,
    status VARCHAR(20) NOT NULL DEFAULT 'active',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    
    -- JSONB: flexible, category-specific attributes
    attributes JSONB NOT NULL DEFAULT '{}',
    
    -- Indexes
    CONSTRAINT valid_status CHECK (status IN ('draft', 'active', 'archived'))
);

-- Fast filtering on normalized columns
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_status ON products(status) WHERE status = 'active';

-- GIN index for JSONB queries
CREATE INDEX idx_products_attributes ON products USING gin(attributes);

2.6 Schema Migrations

Evolve your schema safely over time.

Migration Best Practices

Every schema change should be a versioned migration file.
migrations/
├── 001_create_users.sql
├── 002_create_products.sql
├── 003_add_user_phone.sql
├── 004_create_orders.sql
└── 005_add_order_status_index.sql
Each migration has UP (apply) and DOWN (rollback) scripts.
When deploying, old and new code may run simultaneously. Safe changes:Safe (backward compatible):
  • Add new table
  • Add new nullable column
  • Add new column with default
  • Add new index
  • Rename table/column (with alias/view)
Unsafe (requires coordination):
  • Drop table/column
  • Rename without alias
  • Change column type
  • Add NOT NULL to existing column
For breaking changes, use multiple deployments:Renaming a column:
  1. Deploy: Add new column, backfill data
  2. Deploy: Update code to write both columns
  3. Deploy: Update code to read new column
  4. Deploy: Drop old column
Adding NOT NULL:
  1. Deploy: Add column as nullable
  2. Backfill data (all rows)
  3. Deploy: Add NOT NULL constraint

Safe Migration Examples

-- Adding a column (safe)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Adding with default (safe, but can be slow on large tables)
ALTER TABLE users ADD COLUMN verified BOOLEAN NOT NULL DEFAULT false;

-- PostgreSQL 11+ instant add with default
ALTER TABLE users ADD COLUMN tier VARCHAR(20) NOT NULL DEFAULT 'free';

-- Adding an index concurrently (doesn't lock table)
CREATE INDEX CONCURRENTLY idx_orders_user ON orders(user_id);

-- Renaming column safely
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(200);

-- Step 2: Backfill
UPDATE users SET full_name = name;

-- Step 3: (After code change) Drop old column
ALTER TABLE users DROP COLUMN name;

-- Changing column type (use intermediate column)
ALTER TABLE products ADD COLUMN new_price BIGINT;
UPDATE products SET new_price = (price * 100)::bigint;  -- Convert to cents
ALTER TABLE products DROP COLUMN price;
ALTER TABLE products RENAME COLUMN new_price TO price_cents;

2.7 Practice: E-Commerce Schema Design

Challenge

Design a complete database schema for an e-commerce platform with:
  1. Users (customers and admins)
  2. Products with categories (hierarchical)
  3. Product variants (size, color)
  4. Shopping cart
  5. Orders with items
  6. Reviews and ratings
  7. Wishlist
  8. Coupons/discounts

Requirements

  • Users can have multiple addresses
  • Products belong to categories (nested up to 3 levels)
  • Each product can have variants with different prices and stock
  • Orders track status history
  • Reviews are moderated (pending, approved, rejected)
  • Coupons can be percentage or fixed amount
-- Users and Authentication
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255) NOT NULL,
    name VARCHAR(100) NOT NULL,
    role VARCHAR(20) NOT NULL DEFAULT 'customer' CHECK (role IN ('customer', 'admin')),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE addresses (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    label VARCHAR(50) NOT NULL,  -- 'Home', 'Work', etc.
    street VARCHAR(255) NOT NULL,
    city VARCHAR(100) NOT NULL,
    state VARCHAR(100),
    postal_code VARCHAR(20) NOT NULL,
    country VARCHAR(100) NOT NULL,
    is_default BOOLEAN NOT NULL DEFAULT false,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Categories (hierarchical)
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    slug VARCHAR(100) NOT NULL UNIQUE,
    parent_id INTEGER REFERENCES categories(id),
    depth INTEGER NOT NULL DEFAULT 0 CHECK (depth <= 2),
    path TEXT NOT NULL,  -- Materialized path: '1.5.12'
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Products
CREATE TABLE products (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    sku VARCHAR(100) NOT NULL UNIQUE,
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(255) NOT NULL UNIQUE,
    description TEXT,
    category_id INTEGER NOT NULL REFERENCES categories(id),
    base_price DECIMAL(10, 2) NOT NULL CHECK (base_price >= 0),
    attributes JSONB NOT NULL DEFAULT '{}',
    status VARCHAR(20) NOT NULL DEFAULT 'draft' 
        CHECK (status IN ('draft', 'active', 'archived')),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE product_variants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
    sku VARCHAR(100) NOT NULL UNIQUE,
    name VARCHAR(100) NOT NULL,  -- 'Large / Red'
    price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
    stock INTEGER NOT NULL DEFAULT 0 CHECK (stock >= 0),
    attributes JSONB NOT NULL DEFAULT '{}',  -- {"size": "L", "color": "red"}
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (product_id, name)
);

CREATE TABLE product_images (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
    url VARCHAR(500) NOT NULL,
    alt_text VARCHAR(255),
    position INTEGER NOT NULL DEFAULT 0,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Shopping Cart
CREATE TABLE carts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id) ON DELETE CASCADE,
    session_id VARCHAR(255),  -- For anonymous carts
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    CONSTRAINT user_or_session CHECK (user_id IS NOT NULL OR session_id IS NOT NULL)
);

CREATE TABLE cart_items (
    cart_id UUID NOT NULL REFERENCES carts(id) ON DELETE CASCADE,
    variant_id UUID NOT NULL REFERENCES product_variants(id),
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    added_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (cart_id, variant_id)
);

-- Orders
CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    order_number VARCHAR(20) NOT NULL UNIQUE,
    status VARCHAR(30) NOT NULL DEFAULT 'pending'
        CHECK (status IN ('pending', 'confirmed', 'processing', 
                          'shipped', 'delivered', 'cancelled', 'refunded')),
    subtotal DECIMAL(12, 2) NOT NULL,
    discount_amount DECIMAL(12, 2) NOT NULL DEFAULT 0,
    shipping_amount DECIMAL(12, 2) NOT NULL DEFAULT 0,
    tax_amount DECIMAL(12, 2) NOT NULL DEFAULT 0,
    total DECIMAL(12, 2) NOT NULL,
    
    shipping_address JSONB NOT NULL,  -- Snapshot of address at order time
    billing_address JSONB,
    
    notes TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE order_items (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    variant_id UUID NOT NULL REFERENCES product_variants(id),
    product_name VARCHAR(255) NOT NULL,  -- Snapshot
    variant_name VARCHAR(100) NOT NULL,  -- Snapshot
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10, 2) NOT NULL,
    total_price DECIMAL(12, 2) NOT NULL
);

CREATE TABLE order_status_history (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    order_id UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    status VARCHAR(30) NOT NULL,
    notes TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by UUID REFERENCES users(id)
);

-- Reviews
CREATE TABLE reviews (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
    user_id UUID NOT NULL REFERENCES users(id),
    order_id UUID REFERENCES orders(id),  -- Verified purchase
    rating INTEGER NOT NULL CHECK (rating BETWEEN 1 AND 5),
    title VARCHAR(255),
    body TEXT,
    status VARCHAR(20) NOT NULL DEFAULT 'pending'
        CHECK (status IN ('pending', 'approved', 'rejected')),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    UNIQUE (product_id, user_id)  -- One review per product per user
);

-- Wishlists
CREATE TABLE wishlists (
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (user_id, product_id)
);

-- Coupons
CREATE TABLE coupons (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    code VARCHAR(50) NOT NULL UNIQUE,
    description TEXT,
    discount_type VARCHAR(20) NOT NULL CHECK (discount_type IN ('percentage', 'fixed')),
    discount_value DECIMAL(10, 2) NOT NULL CHECK (discount_value > 0),
    minimum_order DECIMAL(10, 2),
    maximum_discount DECIMAL(10, 2),  -- Cap for percentage discounts
    usage_limit INTEGER,
    used_count INTEGER NOT NULL DEFAULT 0,
    valid_from TIMESTAMPTZ NOT NULL,
    valid_until TIMESTAMPTZ NOT NULL,
    is_active BOOLEAN NOT NULL DEFAULT true,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE order_coupons (
    order_id UUID NOT NULL REFERENCES orders(id),
    coupon_id UUID NOT NULL REFERENCES coupons(id),
    discount_amount DECIMAL(12, 2) NOT NULL,
    PRIMARY KEY (order_id, coupon_id)
);

-- Essential Indexes
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_status ON products(status) WHERE status = 'active';
CREATE INDEX idx_variants_product ON product_variants(product_id);
CREATE INDEX idx_orders_user ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_reviews_product ON reviews(product_id) WHERE status = 'approved';
CREATE INDEX idx_coupons_code ON coupons(code) WHERE is_active = true;

2.6 Physical Design and Storage Layout

Physical design bridges the gap between logical models (ERDs) and how bytes are actually laid out on disk. This significantly impacts performance, especially for high-throughput systems.

Row vs. Columnar Storage

How a database stores data impacts its suitability for different workloads:
  1. Row-Oriented (OLTP):
    • Storage: Stores all columns of a single row together in a block.
    • Optimized for: Finding specific records, inserts, and updates (e.g., PostgreSQL, MySQL).
    • Use Case: E-commerce checkouts, user profiles.
  2. Column-Oriented (OLAP):
    • Storage: Stores all values of a single column together in a block.
    • Optimized for: Aggregating millions of rows on a few columns (e.g., BigQuery, Snowflake, ClickHouse).
    • Use Case: Analytics, reporting, trend analysis.

The Problem of NULLs and Data Types

In physical storage, every byte counts. Choosing the wrong data type can lead to “bloat”:
  • Alignment Padding: Databases often align data to 8-byte boundaries. If you alternate BIGINT (8 bytes) and BOOLEAN (1 byte), the DB might add 7 bytes of padding after every boolean.
  • Fixed vs. Variable Width: CHAR(100) always takes 100 bytes, even if the string is 2 chars. VARCHAR(100) only takes the bytes needed + a small header.
Pro-tip: Reorder columns in your CREATE TABLE from largest to smallest to minimize alignment padding.

Partitioning and Sharding

As data grows, a single table becomes too large to manage efficiently.
  1. Vertical Partitioning: Splitting a table by columns. (e.g., Moving large text blobs to a separate details table).
  2. Horizontal Partitioning (Sharding): Splitting a table by rows across multiple nodes or physical files.
    • Range Partitioning: Based on a range (e.g., orders_2023, orders_2024).
    • List Partitioning: Based on discrete values (e.g., users_us, users_eu).
    • Hash Partitioning: Distributing data evenly using a hash of a key (e.g., hash(user_id) % 4).

2.7 Advanced Normalization and Denormalization Strategies

In high-performance systems, the trade-off between 3NF and performance is a primary design decision.

The True Cost of 3NF

While 3NF prevents anomalies, it forces Join Latency. In distributed databases, joins across nodes can be prohibitively expensive (the “Network Hop” problem).

Scalable Denormalization Patterns

  1. Pre-aggregated Fields (Counter Pattern):
    • Instead of SELECT COUNT(*) FROM comments WHERE post_id = ?, store a comments_count on the posts table.
    • Trade-off: You must use transactions or triggers to keep the count in sync.
  2. Read-Model Materialization:
    • Keep your source of truth in 3NF.
    • Use Change Data Capture (CDC) to stream updates into a denormalized “Read Model” (e.g., an Elasticsearch index or a flat Postgres table) optimized for specific UI screens.
  3. Attribute Value Bundling:
    • If you always fetch a user’s city, state, and country together, you might bundle them into a location JSONB column or a flattened table, even if it technically violates the strict atomicity of 1NF.

Normalization vs. Performance Checklist

FactorFavor Normalization (3NF+)Favor Denormalization
WorkloadWrite-heavy / TransactionalRead-heavy / Analytical
ConsistencyStrict (ACID)Eventual/Managed
ComplexityLow data volumeHigh volume / High concurrency
JoinsFast / Single-nodeSlow / Distributed

Module Mastery Checklist

Complete this checklist to confirm you’ve mastered Database Design & Modeling:

Entity-Relationship (ER) Modeling

  • Can identify entities, attributes, and relationships from business requirements
  • Apply the 4-step ER modeling process systematically
  • Distinguish between one-to-one, one-to-many, and many-to-many relationships
  • Determine proper cardinality for each relationship
  • Translate ER models into SQL table structures
  • Create visual ER diagrams for complex systems

Normalization Principles

  • Transform unnormalized data to First Normal Form (1NF)
  • Apply Second Normal Form (2NF) by eliminating partial dependencies
  • Achieve Third Normal Form (3NF) by removing transitive dependencies
  • Recognize when denormalization is appropriate for performance
  • Balance normalization benefits against query complexity
  • Identify and fix common normalization violations

Keys and Constraints

  • Implement appropriate primary key strategies (SERIAL, UUID, composite)
  • Design foreign key relationships with proper referential actions
  • Apply unique constraints at column and table levels
  • Use check constraints to enforce business rules
  • Implement not-null constraints and default values strategically
  • Create proper indexes for common query patterns

Schema Design Patterns

  • Implement soft deletes using timestamp-based approaches
  • Design audit trails to track data changes over time
  • Apply polymorphic associations when appropriate
  • Create self-referential relationships for hierarchical data
  • Design efficient tagging systems with junction tables
  • Structure data with referential integrity in mind

JSONB and Flexible Schemas

  • Determine appropriate use cases for JSONB vs. normalized columns
  • Query nested JSON data using PostgreSQL’s JSONB operators
  • Create effective indexes on JSONB data
  • Implement hybrid schemas combining normalized and flexible data
  • Balance schema flexibility against performance considerations

Schema Migrations

  • Plan backward-compatible schema changes for production
  • Execute safe migration patterns for structural changes
  • Create indexes without blocking table access
  • Implement multi-step migrations for breaking changes
  • Use version control for tracking schema evolution

Anti-Pattern Recognition

  • Identify and avoid repeating groups in columns
  • Prevent over-generic table designs
  • Avoid the “Swiss Army Knife” table anti-pattern
  • Implement proper constraints for data integrity
  • Balance normalization with practical considerations
  • Design hierarchical data structures properly

Practical Application

  • Complete the hands-on blog schema implementation lab
  • Design a normalized schema for a new business domain
  • Identify and fix anti-patterns in existing schemas
  • Apply appropriate constraints to maintain data quality
  • Design schemas considering both current and future requirements

Next Module

Module 3: Transactions & ACID

Build reliable applications with proper transaction handling