> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# SQL Foundations

> Master SQL from basics to advanced queries — the language of data

<Frame>
  <img src="https://mintcdn.com/devweeekends/0kwJwOL2KCwg2YYu/images/courses/database-engineering/sql-foundations-concept.svg?fit=max&auto=format&n=0kwJwOL2KCwg2YYu&q=85&s=ffb6ece3c263b9d48281e1807b101401" alt="SQL Foundations - Relational model, query structure, and operations" width="1080" height="1080" data-path="images/courses/database-engineering/sql-foundations-concept.svg" />
</Frame>

# Module 1: SQL Foundations

SQL (Structured Query Language) is the universal language for interacting with relational databases. This module builds your foundation from the ground up, ensuring you can write any query with confidence.

<Info>
  **Estimated Time**: 8-10 hours\
  **Hands-On**: 30+ practice exercises\
  **Project**: Build queries for an e-commerce analytics dashboard
</Info>

## Learning Objectives

By the end of this module, you will be able to:

* **Explain core relational concepts**: tables, rows, columns, primary/foreign keys, and relationships.
* **Write idiomatic SELECT queries**: projection, filtering, sorting, pagination, and basic expressions.
* **Join multiple tables**: use INNER/LEFT joins correctly and reason about when each is appropriate.
* **Aggregate data**: use GROUP BY, HAVING, and common aggregate functions to answer business questions.
* **Use window functions safely**: apply ranking, running totals, and moving averages without breaking correctness.
* **Design and query simple schemas**: model small domains (e.g., blog, e-commerce) and query them confidently.

### Module-Specific Learning Outcomes

#### Section 1.1 - Relational Database Fundamentals

* Understand the relational model and how it differs from file-based data storage
* Identify and explain the purpose of primary keys, foreign keys, and their role in data integrity
* Recognize common data types in PostgreSQL and know when to use each
* Apply constraints (UNIQUE, NOT NULL, CHECK) to enforce data quality rules
* Explain referential integrity and how foreign key constraints prevent orphaned records

#### Section 1.2 - Query Construction and Filtering

* Construct basic SELECT queries with proper column selection and aliases
* Apply WHERE clauses with comparison operators (=, !=, \<, >, \<=, >=)
* Combine multiple conditions using AND, OR, and NOT operators
* Use special operators (IN, BETWEEN, LIKE, ILIKE) for pattern matching
* Implement result ordering (ORDER BY) and pagination (LIMIT, OFFSET)
* Apply performance best practices like avoiding SELECT \* in production code

#### Section 1.3 - Data Aggregation and Analysis

* Utilize aggregate functions (COUNT, SUM, AVG, MIN, MAX) appropriately
* Group data using GROUP BY with single and multiple columns
* Filter groups using HAVING clauses after aggregation
* Combine WHERE and HAVING clauses in complex analytical queries
* Interpret aggregated results in business context

#### Section 1.4 - Multi-Table Operations

* Differentiate between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN
* Apply appropriate JOIN types based on desired result set behavior
* Handle NULL values that result from unmatched JOINs
* Execute multiple JOINs across several tables in a single query
* Use self-JOINs for hierarchical data and comparative analysis
* Construct complex queries involving multiple table relationships

#### Section 1.5 - Advanced Query Techniques

* Create and utilize subqueries in WHERE, FROM, and SELECT clauses
* Implement correlated subqueries that reference outer query context
* Apply EXISTS and NOT EXISTS for existence checking
* Understand query execution order and performance implications
* Troubleshoot common query construction errors and misconceptions

***

## Hands-On Lab 1: Query Basics and Filtering

> **Goal**: Master basic SELECT queries, filtering, and result manipulation with realistic data.

> 1. Create a `lab_products` table:
>    ```sql theme={null}
>    CREATE TABLE lab_products (
>      id SERIAL PRIMARY KEY,
>      name TEXT NOT NULL,
>      category TEXT NOT NULL,
>      price DECIMAL(10,2) NOT NULL,
>      stock_quantity INTEGER NOT NULL,
>      supplier TEXT NOT NULL,
>      created_at TIMESTAMP DEFAULT NOW()
>    );
>    ```
>
> 2. Insert at least 15 diverse products across 4-5 categories (e.g., Electronics, Books, Clothing, Home, Sports) with varying prices and stock levels:
>    * Electronics: Laptops, phones, tablets, headphones
>    * Books: Fiction, non-fiction, textbooks
>    * Clothing: shirts, pants, jackets
>    * Home: kitchen items, furniture, decor
>    * Sports: equipment, clothing, accessories
>
> 3. Write queries to:
>    * List all electronics priced under \$500, sorted by price (lowest first)
>    * Find all products with low stock (less than 10 units) in the 'Electronics' or 'Home' categories
>    * Show the 5 most expensive products, displaying only name and price
>    * Find all products with 'pro' or 'max' in their name (case insensitive)
>    * List products that cost between $25 and $200, ordered by category then price
>    * Show products that have no stock (stock\_quantity = 0)
>
> 4. Add a few products with special characters in names or unusual pricing, then run your queries again to ensure they handle edge cases properly.
>
> 5. Practice using LIMIT and OFFSET to implement pagination: write a query that shows products 6-10 when sorted alphabetically by name.
>
> When you can write these queries from memory without referencing examples, you've mastered basic query construction.

***

## Hands-On Lab 2: Joins & Aggregations

> **Goal**: Connect multiple tables with JOINs and analyze data using aggregate functions.

> 1. Create three related tables representing an e-commerce scenario:
>    ```sql theme={null}
>    CREATE TABLE lab_customers (
>      id SERIAL PRIMARY KEY,
>      name TEXT NOT NULL,
>      email TEXT UNIQUE NOT NULL,
>      city TEXT NOT NULL,
>      registration_date DATE DEFAULT CURRENT_DATE
>    );
>
>    CREATE TABLE lab_products (
>      id SERIAL PRIMARY KEY,
>      name TEXT NOT NULL,
>      category TEXT NOT NULL,
>      price DECIMAL(10,2) NOT NULL,
>      cost DECIMAL(10,2) NOT NULL  -- For profit calculations
>    );
>
>    CREATE TABLE lab_orders (
>      id SERIAL PRIMARY KEY,
>      customer_id INTEGER NOT NULL REFERENCES lab_customers(id),
>      product_id INTEGER NOT NULL REFERENCES lab_products(id),
>      quantity INTEGER NOT NULL,
>      order_date DATE DEFAULT CURRENT_DATE,
>      status TEXT DEFAULT 'completed'  -- completed, pending, cancelled
>    );
>    ```
>
> 2. Populate the tables with realistic data:
>    * 8-10 customers from different cities
>    * 12-15 products across various categories
>    * 25-30 orders linking customers to products (some customers should have multiple orders, some products should be ordered multiple times)
>    * Include at least 2 customers who have never placed an order
>    * Include at least 2 products that have never been ordered
>
> 3. Write queries to:
>    * List all orders with customer name and product name
>    * Show total quantity ordered and total revenue per customer
>    * Find the top 3 products by total revenue generated (price \* quantity)
>    * Calculate profit margins per product category (revenue - cost)
>    * Find customers who have never placed an order using LEFT JOIN
>    * Identify products that have never been ordered
>    * Show monthly order trends (orders per month)
>
> 4. Advanced challenges:
>    * Create a summary showing each customer's favorite product category (category they spent the most on)
>    * Find the customer who spent the most money in each city
>    * Calculate conversion rates (percentage of products that have been ordered at least once)
>
> 5. Experiment with changing INNER JOINs to LEFT JOINs and observe how the result set changes when there are unmatched rows.
>
> When you can confidently write JOINs and aggregations to answer complex business questions, you're ready for advanced SQL techniques.

## 1.1 Introduction to Relational Databases

### What is a Relational Database? (The Big Picture)

Think of a relational database like a **digital filing cabinet** that's incredibly organized and can find information instantly. Unlike a spreadsheet where data might be scattered, a database stores data in structured **tables** that can be connected to each other.

**Real-World Analogy:**

* A **table** is like a spreadsheet sheet (e.g., "Users", "Orders", "Products")
* A **row** is like one entry in that spreadsheet (e.g., one user's information)
* A **column** is like a field/category (e.g., "name", "email", "age")
* A **relationship** connects tables (e.g., "Orders belong to Users")

**Why Use a Database Instead of Files/Spreadsheets?**

1. **Data Integrity**: Prevents invalid data (e.g., can't create an order for a user that doesn't exist)
2. **Concurrent Access**: Multiple people can read/write simultaneously without conflicts
3. **Query Power**: Find complex patterns across millions of records in milliseconds
4. **Relationships**: Link related data automatically (orders to users to products)
5. **ACID Guarantees**: Transactions ensure data consistency even if something crashes

**A practical way to think about it:** A spreadsheet is like a whiteboard -- anyone can write anything anywhere, and you are trusting everyone to follow the rules. A database is like a bank vault with rules enforced by the vault itself -- you physically cannot withdraw money that does not exist, and every transaction is logged and reversible.

A relational database organizes data into **tables** (relations) with **rows** (records) and **columns** (attributes). Tables can be linked through **relationships** using keys.

```
┌─────────────────────────────────────────────────────────────────────────┐
│                          RELATIONAL MODEL                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  USERS TABLE                          ORDERS TABLE                      │
│  ┌────┬───────────┬─────────────┐    ┌────┬─────────┬────────┬───────┐ │
│  │ id │   name    │    email    │    │ id │ user_id │ total  │ status│ │
│  ├────┼───────────┼─────────────┤    ├────┼─────────┼────────┼───────┤ │
│  │  1 │ Alice     │ a@email.com │◄───│  1 │    1    │ 150.00 │ done  │ │
│  │  2 │ Bob       │ b@email.com │◄───│  2 │    1    │  75.50 │ done  │ │
│  │  3 │ Charlie   │ c@email.com │◄───│  3 │    2    │ 200.00 │pending│ │
│  └────┴───────────┴─────────────┘    └────┴─────────┴────────┴───────┘ │
│         ▲                                      │                        │
│         │          PRIMARY KEY                 │                        │
│         └──────────────────────────────────────┘                        │
│                     FOREIGN KEY                                         │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘
```

### Key Concepts

<AccordionGroup>
  <Accordion title="Primary Key" icon="key">
    **What is it?** A column (or combination of columns) that uniquely identifies each row in a table.

    **Why do we need it?** Think of it like a Social Security Number or student ID. Every person has a unique one, and you can use it to find exactly that person. In databases, the primary key lets you:

    * Uniquely identify any row (no duplicates allowed!)
    * Link tables together (foreign keys reference primary keys)
    * Make queries faster (databases automatically index primary keys)

    **Real-World Example:**

    * In a library: Each book has a unique ISBN (primary key)
    * In a school: Each student has a unique student ID
    * In an e-commerce site: Each user has a unique user ID

    ```sql theme={null}
    CREATE TABLE users (
        id SERIAL PRIMARY KEY,  -- Auto-incrementing unique identifier
        email VARCHAR(255) UNIQUE NOT NULL,
        name VARCHAR(100) NOT NULL
    );
    ```

    **Breaking down `SERIAL PRIMARY KEY`:**

    * `SERIAL`: PostgreSQL automatically generates the next number (1, 2, 3, 4...)
    * `PRIMARY KEY`: This column is the unique identifier
    * You don't need to provide the ID when inserting - it's automatic!

    **Rules** (Why these matter):

    * **Must be unique**: If two rows had the same ID, how would you tell them apart?
    * **Cannot be NULL**: Every row MUST have an ID (you can't have "unknown" as an identifier)
    * **Should rarely change**: If you change an ID, all foreign keys pointing to it break!
    * **Often auto-incrementing**: Makes it easy - just insert data, ID is assigned automatically

    **Common Mistake:**

    ```sql theme={null}
    -- ❌ BAD: Using email as primary key
    CREATE TABLE users (
        email VARCHAR(255) PRIMARY KEY,  -- What if user changes email?
        name VARCHAR(100)
    );

    -- ✅ GOOD: Separate ID as primary key
    CREATE TABLE users (
        id SERIAL PRIMARY KEY,  -- Never changes
        email VARCHAR(255) UNIQUE NOT NULL,  -- Can change, but stays unique
        name VARCHAR(100)
    );
    ```
  </Accordion>

  <Accordion title="Foreign Key" icon="link">
    **What is it?** A column that references the primary key of another table, creating a relationship.

    **Why do we need it?** Think of it like a pointer or reference. It says "this order belongs to that user" by storing the user's ID.

    **Real-World Analogy:**

    * In a library system: A "loan" record has a `book_id` (foreign key) pointing to the "books" table
    * In a school: A "grade" record has a `student_id` (foreign key) pointing to the "students" table
    * In e-commerce: An "order" has a `user_id` (foreign key) pointing to the "users" table

    ```sql theme={null}
    CREATE TABLE orders (
        id SERIAL PRIMARY KEY,
        user_id INTEGER NOT NULL REFERENCES users(id),  -- This is the foreign key!
        total DECIMAL(10, 2) NOT NULL,
        created_at TIMESTAMP DEFAULT NOW()
    );
    ```

    **Breaking down `REFERENCES users(id)`:**

    * `user_id`: The column in the orders table that stores the user's ID
    * `REFERENCES users(id)`: This tells PostgreSQL "user\_id must exist in the users table's id column"
    * `NOT NULL`: Every order MUST belong to a user (can't be orphaned)

    **What happens with foreign keys?**

    1. **Prevents Invalid Data:**

    ```sql theme={null}
    -- ❌ This will FAIL:
    INSERT INTO orders (user_id, total) VALUES (99999, 100.00);
    -- Error: insert or update on table "orders" violates foreign key constraint
    -- Because user_id 99999 doesn't exist in users table!

    -- ✅ This works:
    INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com') RETURNING id;
    -- Returns: id = 1
    INSERT INTO orders (user_id, total) VALUES (1, 100.00);  -- Success!
    ```

    2. **Prevents Orphaned Records:**

    ```sql theme={null}
    -- If you try to delete a user who has orders:
    DELETE FROM users WHERE id = 1;
    -- Error: update or delete on table "users" violates foreign key constraint
    -- PostgreSQL protects you from creating "orphaned" orders with no user!

    -- You must delete orders first, or use CASCADE:
    CREATE TABLE orders (
        user_id INTEGER REFERENCES users(id) ON DELETE CASCADE
        -- Now deleting a user automatically deletes their orders
    );
    ```

    3. **Makes Queries Powerful:**

    ```sql theme={null}
    -- Join tables using the foreign key relationship:
    SELECT u.name, o.total, o.created_at
    FROM users u
    JOIN orders o ON u.id = o.user_id;  -- The foreign key makes this join possible!
    ```

    **Benefits** (Why this matters):

    * **Enforces referential integrity**: Can't create an order for a non-existent user
    * **Prevents orphaned records**: Can't delete a user if they have orders (unless you handle it)
    * **Documents relationships**: The schema itself shows how tables connect
    * **Enables JOINs**: Foreign keys make it easy to combine related data

    **Common Patterns:**

    ```sql theme={null}
    -- One-to-Many: One user has many orders
    users (1) ──< (many) orders

    -- Many-to-Many: Students enroll in courses
    students (many) ──< enrollments >── (many) courses
    -- Need a junction table with foreign keys to both!
    ```
  </Accordion>

  <Accordion title="Data Types" icon="shapes">
    Common PostgreSQL data types:

    | Type           | Description               | Example                |
    | -------------- | ------------------------- | ---------------------- |
    | `INTEGER`      | Whole numbers             | `42`, `-100`           |
    | `BIGINT`       | Large whole numbers       | User IDs at scale      |
    | `SERIAL`       | Auto-incrementing integer | Primary keys           |
    | `VARCHAR(n)`   | Variable-length string    | Names, emails          |
    | `TEXT`         | Unlimited length string   | Descriptions           |
    | `DECIMAL(p,s)` | Exact numeric             | Money: `DECIMAL(10,2)` |
    | `BOOLEAN`      | True/False                | `is_active`            |
    | `TIMESTAMP`    | Date and time             | `created_at`           |
    | `TIMESTAMPTZ`  | Timestamp with timezone   | Always use for times   |
    | `JSONB`        | Binary JSON               | Flexible data          |
    | `UUID`         | Universally unique ID     | Distributed systems    |
  </Accordion>
</AccordionGroup>

***

### 1.2 Basic SELECT Queries

> **Hands-On Lab 1 – Query the users table**\
> Goal: Get comfortable with `SELECT`, `WHERE`, `ORDER BY`, and `LIMIT`.
>
> 1. Create a `users` table:
>    ```sql theme={null}
>    CREATE TABLE lab_users (
>      id SERIAL PRIMARY KEY,
>      name TEXT NOT NULL,
>      email TEXT NOT NULL UNIQUE,
>      age INT NOT NULL,
>      country TEXT NOT NULL
>    );
>    ```
> 2. Insert at least 10 realistic rows covering multiple ages and countries.
> 3. Write queries to:
>    * List all users ordered by newest `id` first.
>    * Find all users older than 25.
>    * Find all users from a specific country (e.g. 'US').
>    * Return only `name` and `email` for users younger than 21.
> 4. Add one or two edge cases (very old age, unusual country) and re-run your queries to make sure they still behave as expected.
>
> When you can write these queries *without* looking at the examples, you’re done with this lab.

**What is SELECT?** The `SELECT` statement is how you ask the database questions. It's like saying "Show me..." or "Find me...".

**The Basic Structure:**

```sql theme={null}
SELECT [what columns] FROM [which table] WHERE [what conditions];
```

**Think of it like a question:**

* "SELECT name, email" = "Show me the name and email"
* "FROM users" = "from the users table"
* "WHERE age > 18" = "where the age is greater than 18"

**Step-by-Step: Your First Query**

Let's say you have a users table:

```
users table:
┌────┬──────────┬─────────────────────┬─────┐
│ id │   name   │       email         │ age │
├────┼──────────┼─────────────────────┼─────┤
│  1 │ Alice    │ alice@example.com   │  25 │
│  2 │ Bob      │ bob@example.com    │  30 │
│  3 │ Charlie  │ charlie@example.com │  18 │
└────┴──────────┴─────────────────────┴─────┘
```

**Query 1: Get everything**

```sql theme={null}
SELECT * FROM users;
```

**What happens:**

1. Database looks at the `users` table
2. `*` means "all columns"
3. Returns all rows with all columns

**Result:**

```
┌────┬──────────┬─────────────────────┬─────┐
│ id │   name   │       email         │ age │
├────┼──────────┼─────────────────────┼─────┤
│  1 │ Alice    │ alice@example.com   │  25 │
│  2 │ Bob      │ bob@example.com    │  30 │
│  3 │ Charlie  │ charlie@example.com │  18 │
└────┴──────────┴─────────────────────┴─────┘
```

**Query 2: Get specific columns**

```sql theme={null}
SELECT name, email FROM users;
```

**What happens:**

1. Database looks at the `users` table
2. Only selects `name` and `email` columns
3. Returns all rows, but only those two columns

**Result:**

```
┌──────────┬─────────────────────┐
│   name   │       email         │
├──────────┼─────────────────────┤
│ Alice    │ alice@example.com   │
│ Bob      │ bob@example.com    │
│ Charlie  │ charlie@example.com │
└──────────┴─────────────────────┘
```

**Why specify columns instead of `*`?**

1. **Performance**: Fetching only what you need is faster
2. **Clarity**: Makes it obvious what data you're using
3. **Safety**: If someone adds a huge column later, `SELECT *` would fetch it unnecessarily
4. **Index optimization**: Some queries can use "index-only scans" if you only select indexed columns

The `SELECT` statement retrieves data from tables.

```sql theme={null}
-- Select all columns from users table
SELECT * FROM users;

-- Select specific columns
SELECT name, email FROM users;

-- Select with column alias
SELECT 
    name AS customer_name,
    email AS contact_email
FROM users;
```

<Warning>
  **Avoid `SELECT *` in production code!** It fetches unnecessary data, breaks when schema changes, and prevents index-only scans. Always specify the columns you need.
</Warning>

### Filtering with WHERE: Finding Specific Data

**What is WHERE?** The `WHERE` clause is like a filter. It says "only show me rows that match these conditions."

**Real-World Analogy:**

* Without WHERE: "Show me all books in the library" (thousands of books!)
* With WHERE: "Show me all books published after 2020" (much smaller, relevant list)

**How WHERE Works:**

1. Database looks at each row in the table
2. Checks if the row matches your WHERE condition
3. Only returns rows where the condition is TRUE
4. Skips rows where the condition is FALSE or NULL

**Step-by-Step Example:**

Starting data:

```
users table:
┌────┬──────────┬─────────────────────┬─────┐
│ id │   name   │       email         │ age │
├────┼──────────┼─────────────────────┼─────┤
│  1 │ Alice    │ alice@example.com   │  25 │
│  2 │ Bob      │ bob@example.com    │  30 │
│  3 │ Charlie  │ charlie@example.com │  18 │
│  4 │ Diana    │ diana@example.com  │  22 │
└────┴──────────┴─────────────────────┴─────┘
```

**Query: Find users older than 21**

```sql theme={null}
SELECT * FROM users WHERE age > 21;
```

**What the database does:**

1. Look at row 1: age = 25, is 25 > 21? YES → Include this row
2. Look at row 2: age = 30, is 30 > 21? YES → Include this row
3. Look at row 3: age = 18, is 18 > 21? NO → Skip this row
4. Look at row 4: age = 22, is 22 > 21? YES → Include this row

**Result:**

```
┌────┬──────────┬─────────────────────┬─────┐
│ id │   name   │       email         │ age │
├────┼──────────┼─────────────────────┼─────┤
│  1 │ Alice    │ alice@example.com   │  25 │
│  2 │ Bob      │ bob@example.com    │  30 │
│  4 │ Diana    │ diana@example.com  │  22 │
└────┴──────────┴─────────────────────┴─────┘
```

**Important: WHERE vs HAVING**

* **WHERE**: Filters rows BEFORE grouping/aggregation
* **HAVING**: Filters groups AFTER grouping/aggregation
* You'll learn HAVING later, but remember: WHERE comes first!

The `WHERE` clause filters rows based on conditions.

```sql theme={null}
-- Basic equality
SELECT * FROM users WHERE id = 1;

-- String matching
SELECT * FROM users WHERE email = 'alice@example.com';

-- Comparison operators
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE stock <= 10;

-- Not equal
SELECT * FROM users WHERE status != 'deleted';
SELECT * FROM users WHERE status <> 'deleted';  -- Same as !=
```

### Multiple Conditions

Combine conditions with `AND`, `OR`, and `NOT`.

```sql theme={null}
-- AND: Both conditions must be true
SELECT * FROM products 
WHERE category = 'electronics' 
  AND price < 500;

-- OR: At least one condition must be true
SELECT * FROM products 
WHERE category = 'electronics' 
   OR category = 'computers';

-- NOT: Negates a condition
SELECT * FROM users 
WHERE NOT status = 'banned';

-- Complex combinations (use parentheses!)
SELECT * FROM products 
WHERE (category = 'electronics' OR category = 'computers')
  AND price BETWEEN 100 AND 1000
  AND stock > 0;
```

### Special Operators

```sql theme={null}
-- IN: Match any value in a list
SELECT * FROM orders 
WHERE status IN ('pending', 'processing', 'shipped');

-- BETWEEN: Range inclusive
SELECT * FROM orders 
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';

-- LIKE: Pattern matching
SELECT * FROM users WHERE email LIKE '%@gmail.com';     -- Ends with
SELECT * FROM users WHERE name LIKE 'John%';            -- Starts with
SELECT * FROM users WHERE name LIKE '%son%';            -- Contains

-- ILIKE: Case-insensitive LIKE (PostgreSQL)
SELECT * FROM users WHERE name ILIKE '%john%';

-- IS NULL / IS NOT NULL
SELECT * FROM orders WHERE shipped_at IS NULL;          -- Not yet shipped
SELECT * FROM users WHERE phone IS NOT NULL;            -- Has phone number
```

### Sorting Results

```sql theme={null}
-- Sort ascending (default)
SELECT * FROM products ORDER BY price;
SELECT * FROM products ORDER BY price ASC;

-- Sort descending
SELECT * FROM products ORDER BY price DESC;

-- Multiple sort columns
SELECT * FROM products 
ORDER BY category ASC, price DESC;

-- Sort with NULL handling
SELECT * FROM products 
ORDER BY discount NULLS LAST;  -- NULLs at the end
```

### Limiting Results

```sql theme={null}
-- Get first N rows
SELECT * FROM products ORDER BY created_at DESC LIMIT 10;

-- Pagination with OFFSET
SELECT * FROM products 
ORDER BY id 
LIMIT 20 OFFSET 40;  -- Page 3 (rows 41-60)

-- Better pagination with keyset (cursor-based)
SELECT * FROM products 
WHERE id > 1000      -- Last seen ID
ORDER BY id 
LIMIT 20;
```

<Tip>
  **Performance Tip**: `OFFSET` becomes slow for large values because the database must scan and discard all skipped rows. Use keyset pagination for better performance on large datasets.
</Tip>

***

## 1.3 Aggregations and Grouping

Aggregate functions compute a single result from multiple rows.

### Aggregate Functions

```sql theme={null}
-- COUNT: Number of rows
SELECT COUNT(*) FROM orders;                    -- All rows
SELECT COUNT(shipped_at) FROM orders;           -- Non-NULL values only
SELECT COUNT(DISTINCT user_id) FROM orders;     -- Unique users

-- SUM: Total of numeric values
SELECT SUM(total) FROM orders WHERE status = 'completed';

-- AVG: Average
SELECT AVG(price) FROM products WHERE category = 'electronics';

-- MIN / MAX
SELECT MIN(price), MAX(price) FROM products;

-- Combined
SELECT 
    COUNT(*) AS total_orders,
    SUM(total) AS revenue,
    AVG(total) AS avg_order_value,
    MIN(total) AS smallest_order,
    MAX(total) AS largest_order
FROM orders
WHERE status = 'completed';
```

### GROUP BY

Group rows by column values and apply aggregates to each group.

```sql theme={null}
-- Orders per user
SELECT 
    user_id,
    COUNT(*) AS order_count,
    SUM(total) AS total_spent
FROM orders
GROUP BY user_id;

-- Sales by category
SELECT 
    category,
    COUNT(*) AS product_count,
    AVG(price) AS avg_price,
    SUM(stock) AS total_inventory
FROM products
GROUP BY category;

-- Multiple grouping columns
SELECT 
    EXTRACT(YEAR FROM created_at) AS year,
    EXTRACT(MONTH FROM created_at) AS month,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
GROUP BY 
    EXTRACT(YEAR FROM created_at),
    EXTRACT(MONTH FROM created_at)
ORDER BY year, month;
```

### HAVING: Filter Groups

`WHERE` filters rows before grouping. `HAVING` filters groups after aggregation.

```sql theme={null}
-- Find users with more than 5 orders
SELECT 
    user_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;

-- Categories with average price over $100
SELECT 
    category,
    AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 100
ORDER BY avg_price DESC;

-- Combined WHERE and HAVING
SELECT 
    user_id,
    SUM(total) AS total_spent
FROM orders
WHERE status = 'completed'           -- Filter rows first
GROUP BY user_id
HAVING SUM(total) > 1000             -- Then filter groups
ORDER BY total_spent DESC;
```

***

## 1.4 JOINs Mastery: Combining Data from Multiple Tables

### Why Do We Need JOINs?

**The Problem:** Data is split across multiple tables for organization, but you often need to see it together.

**Real-World Example:**

* You have a `users` table with names and emails
* You have an `orders` table with order totals and dates
* You want to see: "Show me all orders with the customer's name"

**Without JOINs:** You'd have to:

1. Query orders table → get user\_id and total
2. Query users table → get name for each user\_id
3. Manually match them in your application code (slow, error-prone!)

**With JOINs:** One query gets everything, matched automatically!

**The Concept:**
JOINs combine rows from multiple tables based on related columns (usually foreign keys). Think of it like a VLOOKUP in Excel, but much more powerful.

### Understanding JOIN Types: A Visual Guide

Before diving in, here's the key difference:

```
INNER JOIN:  "Show me only matches" (intersection)
LEFT JOIN:   "Show me all from left table, plus matches from right"
RIGHT JOIN:  "Show me all from right table, plus matches from left"
FULL JOIN:   "Show me everything from both tables"
```

### INNER JOIN: The Most Common Join

**What it does:** Returns only rows that have matches in BOTH tables.

**Real-World Analogy:** Like a Venn diagram intersection. Only show things that exist in both sets.

**Step-by-Step Example:**

**Starting Data:**

Users table:

```
┌────┬──────────┬─────────────────────┐
│ id │   name   │       email         │
├────┼──────────┼─────────────────────┤
│  1 │ Alice    │ alice@example.com   │
│  2 │ Bob      │ bob@example.com    │
│  3 │ Charlie  │ charlie@example.com│
└────┴──────────┴─────────────────────┘
```

Orders table:

```
┌────┬─────────┬────────┬──────────────┐
│ id │ user_id │ total  │   status    │
├────┼─────────┼────────┼──────────────┤
│  1 │    1    │ 150.00 │ completed   │
│  2 │    1    │  75.50 │ completed   │
│  3 │    2    │ 200.00 │ pending     │
└────┴─────────┴────────┴──────────────┘
```

**Query:**

```sql theme={null}
SELECT 
    o.id AS order_id,
    o.total,
    o.created_at,
    u.name AS customer_name,
    u.email
FROM orders o
INNER JOIN users u ON o.user_id = u.id;
```

**What happens step-by-step:**

1. **Database starts with the `orders` table** (the "left" table in the JOIN)
2. **For each order row:**
   * Order 1: user\_id = 1 → Look for user with id = 1 → Found Alice! → Include this match
   * Order 2: user\_id = 1 → Look for user with id = 1 → Found Alice! → Include this match
   * Order 3: user\_id = 2 → Look for user with id = 2 → Found Bob! → Include this match
3. **Result:** Only orders that have matching users (all 3 orders in this case)

**Result:**

```
┌──────────┬────────┬──────────────┬──────────────┬─────────────────────┐
│ order_id │ total  │ created_at   │customer_name │       email         │
├──────────┼────────┼──────────────┼──────────────┼─────────────────────┤
│    1     │ 150.00 │ 2024-01-15   │ Alice        │ alice@example.com   │
│    2     │  75.50 │ 2024-01-16   │ Alice        │ alice@example.com   │
│    3     │ 200.00 │ 2024-01-17   │ Bob          │ bob@example.com    │
└──────────┴────────┴──────────────┴──────────────┴─────────────────────┘
```

**Notice:** Charlie doesn't appear because they have no orders. INNER JOIN only shows matches!

**When to use INNER JOIN:**

* When you only want rows that have matches in both tables
* Most common join type (90% of joins are INNER)
* Example: "Show me orders with customer info" (only orders that have customers)

Returns only rows that have matches in both tables.

```sql theme={null}
-- Get order details with user names
SELECT 
    o.id AS order_id,
    o.total,
    o.created_at,
    u.name AS customer_name,
    u.email
FROM orders o
INNER JOIN users u ON o.user_id = u.id;
```

```
┌─────────────────────────────────────────────────────────────────────┐
│                           INNER JOIN                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   USERS                    ORDERS                 RESULT            │
│   ┌───┬───────┐           ┌───┬─────────┐       ┌───┬───────┬────┐ │
│   │ 1 │ Alice │ ─────────▶│ 1 │ user: 1 │──────▶│ 1 │ Alice │ O1 │ │
│   │ 2 │ Bob   │ ─────────▶│ 2 │ user: 1 │──────▶│ 2 │ Alice │ O2 │ │
│   │ 3 │ Carol │ ✗         │ 3 │ user: 2 │──────▶│ 3 │ Bob   │ O3 │ │
│   └───┴───────┘           └───┴─────────┘       └───┴───────┴────┘ │
│                                                                     │
│   Carol has no orders ──▶ Not in result                            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

### LEFT JOIN (LEFT OUTER JOIN): Keep Everything from the Left

**What it does:** Returns ALL rows from the left table, plus matching rows from the right table. If there's no match, the right table columns are NULL.

**Real-World Analogy:** Like a report card. You list ALL students (left table), and show their grades (right table) if they have any. If a student has no grades yet, you still list them but show "No grades" (NULL).

**Why "LEFT"?** The table mentioned first (before JOIN) is the "left" table. The table after JOIN is the "right" table.

**Step-by-Step Example:**

**Starting Data:**

Users table:

```
┌────┬──────────┬─────────────────────┐
│ id │   name   │       email         │
├────┼──────────┼─────────────────────┤
│  1 │ Alice    │ alice@example.com   │
│  2 │ Bob      │ bob@example.com    │
│  3 │ Charlie  │ charlie@example.com│  ← Has no orders!
└────┴──────────┴─────────────────────┘
```

Orders table:

```
┌────┬─────────┬────────┬──────────────┐
│ id │ user_id │ total  │   status    │
├────┼─────────┼────────┼──────────────┤
│  1 │    1    │ 150.00 │ completed   │
│  2 │    1    │  75.50 │ completed   │
│  3 │    2    │ 200.00 │ pending     │
└────┴─────────┴────────┴──────────────┘
```

**Query:**

```sql theme={null}
SELECT 
    u.id,
    u.name,
    COUNT(o.id) AS order_count,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
```

**What happens step-by-step:**

1. **Database starts with ALL users** (the left table)
2. **For each user:**
   * Alice (id=1): Find orders with user\_id=1 → Found 2 orders → Include with order data
   * Bob (id=2): Find orders with user\_id=2 → Found 1 order → Include with order data
   * Charlie (id=3): Find orders with user\_id=3 → Found 0 orders → Still include, but order columns are NULL
3. **Result:** All users appear, even if they have no orders!

**Result:**

```
┌────┬──────────┬─────────────┬─────────────┐
│ id │   name   │ order_count │ total_spent │
├────┼──────────┼─────────────┼─────────────┤
│  1 │ Alice    │      2      │   225.50    │
│  2 │ Bob      │      1      │   200.00    │
│  3 │ Charlie  │      0      │     0.00    │  ← Still appears!
└────┴──────────┴─────────────┴─────────────┘
```

**Key Points:**

* `COALESCE(SUM(o.total), 0)`: Converts NULL to 0 (SUM returns NULL when there are no rows)
* `COUNT(o.id)`: Counts 0 for users with no orders (COUNT counts non-NULL values)
* Charlie appears even though they have no orders!

**When to use LEFT JOIN:**

* "Show me all users and their order counts" (including users with 0 orders)
* "List all products and their sales" (including products that never sold)
* "Find customers who haven't placed orders" (use LEFT JOIN + WHERE right\_table.id IS NULL)

**Common Pattern: Finding Missing Relationships**

```sql theme={null}
-- Find users who have never placed an order
SELECT u.name, u.email
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;  -- No matching order found!
```

Returns all rows from the left table, with matching rows from the right (or NULL if no match).

```sql theme={null}
-- All users, with their orders (if any)
SELECT 
    u.id,
    u.name,
    COUNT(o.id) AS order_count,
    COALESCE(SUM(o.total), 0) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
```

```
┌─────────────────────────────────────────────────────────────────────┐
│                           LEFT JOIN                                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   USERS                    ORDERS                 RESULT            │
│   ┌───┬───────┐           ┌───┬─────────┐       ┌───┬───────┬────┐ │
│   │ 1 │ Alice │ ─────────▶│ 1 │ user: 1 │──────▶│ 1 │ Alice │ O1 │ │
│   │ 2 │ Bob   │ ─────────▶│ 2 │ user: 1 │──────▶│ 2 │ Alice │ O2 │ │
│   │ 3 │ Carol │ ─────────▶│ 3 │ user: 2 │──────▶│ 3 │ Bob   │ O3 │ │
│   └───┴───────┘           └───┴─────────┘       │ 4 │ Carol │NULL│ │
│                                                  └───┴───────┴────┘ │
│                                                                     │
│   Carol included with NULL for order data                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

### RIGHT JOIN and FULL OUTER JOIN

```sql theme={null}
-- RIGHT JOIN: All rows from right table
SELECT u.name, o.id AS order_id
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;

-- FULL OUTER JOIN: All rows from both tables
SELECT u.name, o.id AS order_id
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id;
```

### Multiple JOINs

```sql theme={null}
-- Orders with user info and product details
SELECT 
    o.id AS order_id,
    u.name AS customer,
    p.name AS product,
    oi.quantity,
    oi.unit_price,
    oi.quantity * oi.unit_price AS line_total
FROM orders o
INNER JOIN users u ON o.user_id = u.id
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
WHERE o.created_at > '2024-01-01'
ORDER BY o.id, p.name;
```

### Self JOIN

Join a table to itself, useful for hierarchical or comparative data.

```sql theme={null}
-- Employees with their managers
SELECT 
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

-- Find products with the same category and similar prices
SELECT 
    p1.name AS product1,
    p2.name AS product2,
    p1.category,
    p1.price AS price1,
    p2.price AS price2
FROM products p1
INNER JOIN products p2 
    ON p1.category = p2.category 
    AND p1.id < p2.id  -- Avoid duplicates
    AND ABS(p1.price - p2.price) < 50;
```

### CROSS JOIN

Cartesian product: every row from first table paired with every row from second.

```sql theme={null}
-- All possible size/color combinations
SELECT s.name AS size, c.name AS color
FROM sizes s
CROSS JOIN colors c;

-- Generate date series with categories
SELECT 
    d.date,
    c.name AS category
FROM generate_series('2024-01-01', '2024-12-31', '1 day') AS d(date)
CROSS JOIN categories c;
```

***

## 1.5 Subqueries

A subquery is a query nested inside another query. Think of it like asking a question where the answer depends on the answer to another question: "Show me all users who have spent more than average" requires first answering "What is the average spend?" -- that inner question is the subquery.

### Subqueries in WHERE

```sql theme={null}
-- Users who have placed orders
-- The inner query produces a list of user IDs; the outer query filters against it
SELECT * FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);

-- Products priced above average
-- The inner query runs ONCE and returns a single scalar value
SELECT * FROM products
WHERE price > (SELECT AVG(price) FROM products);

-- Users with more than 5 orders
-- The inner query uses GROUP BY + HAVING to filter groups before returning IDs
SELECT * FROM users
WHERE id IN (
    SELECT user_id 
    FROM orders 
    GROUP BY user_id 
    HAVING COUNT(*) > 5
);
```

### Correlated Subqueries

A correlated subquery references the outer query -- which means the inner query re-executes once per outer row. This is conceptually like a nested `for` loop: for each row in the outer table, run the inner query with that row's values.

**Performance consideration:** Because correlated subqueries execute once per outer row, they can be slow on large tables. A correlated subquery against 100,000 outer rows runs the inner query 100,000 times. PostgreSQL's optimizer can sometimes flatten these into joins, but not always. If performance is an issue, consider rewriting with a JOIN or a window function.

```sql theme={null}
-- Products with price above their category average
-- For each product, the inner query computes the average for THAT product's category
SELECT p1.name, p1.price, p1.category
FROM products p1
WHERE p1.price > (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category = p1.category  -- References outer query (p1.category)
);

-- Users with their latest order date
-- The scalar subquery in SELECT runs once per user row
SELECT 
    u.name,
    (SELECT MAX(created_at) FROM orders o WHERE o.user_id = u.id) AS last_order
FROM users u;
```

### EXISTS and NOT EXISTS

```sql theme={null}
-- Users who have placed at least one order
SELECT u.name, u.email
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.user_id = u.id
);

-- Products that have never been ordered
SELECT p.name, p.price
FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);
```

<Tip>
  **Performance Tip**: `EXISTS` often performs better than `IN` for large subqueries because it can stop as soon as it finds one match.
</Tip>

### Subqueries in FROM (Derived Tables)

```sql theme={null}
-- Monthly revenue with growth calculation
SELECT 
    current_month.month,
    current_month.revenue,
    prev_month.revenue AS prev_revenue,
    current_month.revenue - COALESCE(prev_month.revenue, 0) AS growth
FROM (
    SELECT 
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', created_at)
) current_month
LEFT JOIN (
    SELECT 
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', created_at)
) prev_month ON current_month.month = prev_month.month + INTERVAL '1 month'
ORDER BY current_month.month;
```

***

## 1.6 Common Table Expressions (CTEs)

CTEs make complex queries readable by breaking them into named, logical steps.

### Basic CTE Syntax

```sql theme={null}
WITH monthly_revenue AS (
    SELECT 
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS revenue
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE_TRUNC('month', created_at)
)
SELECT 
    month,
    revenue,
    AVG(revenue) OVER () AS avg_monthly_revenue,
    revenue - AVG(revenue) OVER () AS diff_from_avg
FROM monthly_revenue
ORDER BY month;
```

### Multiple CTEs

```sql theme={null}
WITH 
-- Step 1: Calculate user order stats
user_stats AS (
    SELECT 
        user_id,
        COUNT(*) AS order_count,
        SUM(total) AS total_spent,
        AVG(total) AS avg_order
    FROM orders
    WHERE status = 'completed'
    GROUP BY user_id
),

-- Step 2: Classify users into tiers
user_tiers AS (
    SELECT 
        user_id,
        order_count,
        total_spent,
        CASE 
            WHEN total_spent >= 10000 THEN 'platinum'
            WHEN total_spent >= 5000 THEN 'gold'
            WHEN total_spent >= 1000 THEN 'silver'
            ELSE 'bronze'
        END AS tier
    FROM user_stats
),

-- Step 3: Count users per tier
tier_summary AS (
    SELECT 
        tier,
        COUNT(*) AS user_count,
        SUM(total_spent) AS tier_revenue
    FROM user_tiers
    GROUP BY tier
)

-- Final result
SELECT 
    tier,
    user_count,
    tier_revenue,
    ROUND(100.0 * tier_revenue / SUM(tier_revenue) OVER (), 2) AS revenue_pct
FROM tier_summary
ORDER BY tier_revenue DESC;
```

### Recursive CTEs

For hierarchical data like org charts, categories, or graphs.

```sql theme={null}
-- Employee hierarchy (org chart)
WITH RECURSIVE org_chart AS (
    -- Base case: top-level employees (no manager)
    SELECT id, name, manager_id, 1 AS level, name::TEXT AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees with managers
    SELECT e.id, e.name, e.manager_id, oc.level + 1, oc.path || ' > ' || e.name
    FROM employees e
    INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY path;

-- Category tree with all descendants
WITH RECURSIVE category_tree AS (
    SELECT id, name, parent_id, 0 AS depth, ARRAY[id] AS path
    FROM categories
    WHERE parent_id IS NULL
    
    UNION ALL
    
    SELECT c.id, c.name, c.parent_id, ct.depth + 1, ct.path || c.id
    FROM categories c
    INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT 
    REPEAT('  ', depth) || name AS category_tree,
    depth,
    path
FROM category_tree
ORDER BY path;
```

***

## 1.7 Window Functions

Window functions perform calculations across a set of rows related to the current row, without collapsing them into a single output row like aggregates.

### ROW\_NUMBER, RANK, DENSE\_RANK

```sql theme={null}
-- Rank products by price within each category
SELECT 
    name,
    category,
    price,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS row_num,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS dense_rank
FROM products;
```

```
┌──────────────────────────────────────────────────────────────────────┐
│           ROW_NUMBER vs RANK vs DENSE_RANK                           │
├──────────────────────────────────────────────────────────────────────┤
│ Product    │ Price │ ROW_NUMBER │ RANK │ DENSE_RANK                  │
├────────────┼───────┼────────────┼──────┼────────────                  │
│ Laptop     │ 1200  │     1      │  1   │     1                        │
│ Desktop    │ 1000  │     2      │  2   │     2                        │
│ Tablet     │ 1000  │     3      │  2   │     2      ◄── Same price    │
│ Phone      │  800  │     4      │  4   │     3      ◄── RANK skips 3  │
│ Earbuds    │  100  │     5      │  5   │     4                        │
└──────────────────────────────────────────────────────────────────────┘
```

### LAG and LEAD

Access previous or next row values.

```sql theme={null}
-- Compare each order to the previous order for the same user
SELECT 
    user_id,
    id AS order_id,
    total,
    created_at,
    LAG(total) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_order_total,
    total - LAG(total) OVER (PARTITION BY user_id ORDER BY created_at) AS change,
    LEAD(total) OVER (PARTITION BY user_id ORDER BY created_at) AS next_order_total
FROM orders;

-- Calculate month-over-month revenue growth
WITH monthly_revenue AS (
    SELECT 
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', created_at)
)
SELECT 
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month,
    ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month)) / 
          LAG(revenue) OVER (ORDER BY month), 2) AS growth_pct
FROM monthly_revenue;
```

### Running Totals and Moving Averages

```sql theme={null}
-- Running total of sales
SELECT 
    created_at::date AS date,
    total,
    SUM(total) OVER (ORDER BY created_at) AS running_total
FROM orders;

-- 7-day moving average
SELECT 
    date,
    daily_revenue,
    AVG(daily_revenue) OVER (
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM daily_sales;

-- Running total within each month
SELECT 
    created_at::date AS date,
    total,
    SUM(total) OVER (
        PARTITION BY DATE_TRUNC('month', created_at) 
        ORDER BY created_at
    ) AS monthly_running_total
FROM orders;
```

### FIRST\_VALUE, LAST\_VALUE, NTH\_VALUE

```sql theme={null}
-- Compare each product to the most expensive in its category
SELECT 
    name,
    category,
    price,
    FIRST_VALUE(name) OVER (
        PARTITION BY category 
        ORDER BY price DESC
    ) AS top_product,
    FIRST_VALUE(price) OVER (
        PARTITION BY category 
        ORDER BY price DESC
    ) AS top_price,
    ROUND(100.0 * price / FIRST_VALUE(price) OVER (
        PARTITION BY category 
        ORDER BY price DESC
    ), 2) AS pct_of_top
FROM products;
```

### Percentiles and Distribution

```sql theme={null}
-- Percentile ranks
SELECT 
    name,
    salary,
    PERCENT_RANK() OVER (ORDER BY salary) AS percentile,
    NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;

-- Median salary per department
SELECT DISTINCT
    department,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department) AS median_salary
FROM employees;
```

***

## Advanced Query Semantics and Patterns

### How SQL Actually Runs (Logical Evaluation Order)

SQL is **not** evaluated in the same order you write it. Understanding the **logical evaluation order** explains many "weird" rules (like why you can't use column aliases in `WHERE`).

**Typical written order**:

```sql theme={null}
SELECT ...
FROM ...
WHERE ...
GROUP BY ...
HAVING ...
WINDOW ...        -- e.g., window clause in some dialects
ORDER BY ...
LIMIT ... OFFSET ...
```

**Logical evaluation order** (simplified):

1. `FROM` + `JOIN` (build row set)
2. `WHERE` (filter rows)
3. `GROUP BY` (form groups)
4. `HAVING` (filter groups)
5. Window functions (`OVER (...)`) are computed
6. `SELECT` (compute final columns/aliases)
7. `DISTINCT`
8. `ORDER BY`
9. `LIMIT` / `OFFSET`

**Key consequences**:

* `WHERE` **cannot see** column aliases defined in `SELECT` (they don’t exist yet), but `ORDER BY` **can**.
* You **cannot** use aggregates in `WHERE` (aggregation happens after grouping), but you **can** use them in `HAVING`.
* Window functions are computed **after** grouping but **before** the final `SELECT` projection.

```sql theme={null}
-- ❌ This fails: alias not visible in WHERE
SELECT
  price * quantity AS line_total
FROM order_items
WHERE line_total > 100;  -- ERROR

-- ✅ Use the expression directly or a subquery/CTE
SELECT
  price * quantity AS line_total
FROM order_items
WHERE price * quantity > 100;

-- ✅ Or wrap and filter outside
SELECT *
FROM (
  SELECT price * quantity AS line_total
  FROM order_items
) t
WHERE line_total > 100;
```

```sql theme={null}
-- ❌ Aggregates not allowed in WHERE
SELECT user_id, SUM(total) AS total_spent
FROM orders
WHERE SUM(total) > 1000;  -- ERROR

-- ✅ Use HAVING instead
SELECT user_id, SUM(total) AS total_spent
FROM orders
GROUP BY user_id
HAVING SUM(total) > 1000;
```

### Choosing Between JOINs, Subqueries, and CTEs

For many problems you can write equivalent queries in different shapes. You should understand the **trade-offs**:

* **JOIN**: Best when you truly need columns from multiple tables and clear relationships.
* **Subquery in `WHERE` / `EXISTS`**: Great for **existence checks** and filtering sets.
* **Derived table / CTE**: Best for **multi-step logic** where readability matters.

```sql theme={null}
-- 1) Join style: users who have placed orders
SELECT DISTINCT u.id, u.name
FROM users u
JOIN orders o ON o.user_id = u.id;

-- 2) Subquery IN style
SELECT id, name
FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);

-- 3) EXISTS style (often best for large data)
SELECT id, name
FROM users u
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.user_id = u.id
);
```

**Practical guidance**:

* Prefer `EXISTS` when you only care **if** related rows exist (not how many).
* Prefer CTEs when your query has **multiple reusable steps** (e.g., compute user stats, then bucket into tiers, then summarize). Treat a CTE like a named temp view.

### Advanced Filtering Patterns (Production Query Shapes)

#### 1. Anti-joins: "Who has NOT done X?"

Anti-joins show entities that **lack** related rows.

```sql theme={null}
-- Customers who have never placed an order
SELECT c.id, c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;

-- Same idea with NOT EXISTS
SELECT c.id, c.name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.id
);
```

#### 2. Conditional Aggregation instead of CASE explosions

Instead of multiple joins or subqueries, you can use **conditional aggregates** to pivot by condition.

```sql theme={null}
-- Per-customer order stats broken down by status
SELECT
  user_id,
  COUNT(*)                             AS orders_total,
  SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS orders_completed,
  SUM(CASE WHEN status = 'pending'   THEN 1 ELSE 0 END) AS orders_pending,
  SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS orders_cancelled
FROM orders
GROUP BY user_id;
```

This pattern is heavily used in dashboards and analytics queries.

#### 3. Top-N Per Group (without subqueries)

A very common requirement: "Top 3 products **per category**", "Top 5 students **per class**", etc.

```sql theme={null}
-- Top 3 most expensive products in each category
WITH ranked AS (
  SELECT
    id,
    name,
    category,
    price,
    ROW_NUMBER() OVER (
      PARTITION BY category
      ORDER BY price DESC
    ) AS rn
  FROM products
)
SELECT id, name, category, price
FROM ranked
WHERE rn <= 3
ORDER BY category, price DESC;
```

This uses the **window + filter** pattern: rank within group using a window function, then filter by the rank.

### Common Interview Traps and How to Avoid Them

* **Trap 1: Mixing WHERE and HAVING incorrectly**
  * Use `WHERE` for row-level filters (no aggregates).
  * Use `HAVING` for group-level filters (aggregates allowed).
* **Trap 2: Assuming `GROUP BY` output is ordered**
  * SQL does **not** guarantee any order unless you use `ORDER BY`.
  * Always add an explicit `ORDER BY` if order matters.
* **Trap 3: Relying on vendor-specific behavior**
  * Some engines allow selecting non-grouped columns without aggregates — this is non-standard and can produce surprising results.
  * Prefer the strict rule: every selected, non-aggregated column must appear in `GROUP BY`.

```sql theme={null}
-- Non-portable pattern (allowed by some engines, but ambiguous)
SELECT country, name
FROM users
GROUP BY country;  -- What name should be picked?

-- Portable, deterministic pattern
SELECT country, COUNT(*) AS user_count
FROM users
GROUP BY country
ORDER BY user_count DESC;
```

***

## Advanced Query Techniques and Optimization

### Complex Subquery Optimization

Understanding how the database optimizer handles subqueries is critical for performance. Here are key optimization patterns:

1. **Correlated vs Non-Correlated Subqueries**:
   * Non-correlated (evaluated once): `SELECT * FROM users WHERE department_id IN (SELECT id FROM departments WHERE budget > 100000)`
   * Correlated (evaluated per row): `SELECT * FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000)`

2. **Subquery Flattening**: Modern optimizers convert certain subqueries into JOINs automatically:
   ```sql theme={null}
   -- This subquery...
   SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE total > 100);

   -- May be flattened to this JOIN internally:
   SELECT DISTINCT u.* FROM users u INNER JOIN orders o ON u.id = o.user_id WHERE o.total > 100;
   ```

3. **Semi-Joins and Anti-Joins**: `IN` and `EXISTS` often become semi-joins, while `NOT IN` and `NOT EXISTS` become anti-joins. `EXISTS` is often more efficient than `IN` for existence checks.

### LATERAL Joins

LATERAL joins allow you to reference columns from preceding tables in the FROM clause, enabling correlated subqueries in a JOIN-like syntax:

```sql theme={null}
-- Get top 3 orders for each user
SELECT u.name, orders_subq.order_id, orders_subq.total
FROM users u
CROSS JOIN LATERAL (
    SELECT id AS order_id, total
    FROM orders o
    WHERE o.user_id = u.id
    ORDER BY total DESC
    LIMIT 3
) orders_subq;

-- Calculate derived values that depend on other table's data
SELECT p.name, p.category, avg_by_category.avg_price
FROM products p
CROSS JOIN LATERAL (
    SELECT AVG(price) AS avg_price
    FROM products p2
    WHERE p2.category = p.category
) avg_by_category;
```

LATERAL is particularly powerful for:

* Correlated aggregations
* Top-N per group queries
* Complex calculations that reference multiple tables
* Calling table functions with parameters from other tables

### Advanced Window Frame Specifications

Window functions offer sophisticated frame controls beyond basic `PARTITION BY`:

```sql theme={null}
-- ROWS frame: physical row offsets
SELECT 
    date,
    sales,
    AVG(sales) OVER (
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW  -- 7-day moving average
    ) AS weekly_avg
FROM daily_sales;

-- RANGE frame: logical value-based window
SELECT 
    employee_id,
    salary,
    COUNT(*) OVER (
        ORDER BY salary 
        RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING  -- employees with similar salary
    ) AS peers_in_salary_range
FROM employees;

-- Groups frame: treating tied values as a single unit
SELECT 
    product_id,
    price,
    RANK() OVER (ORDER BY price) AS rank,
    COUNT(*) OVER (
        ORDER BY price 
        GROUPS BETWEEN CURRENT ROW AND 2 FOLLOWING  -- count products in current and next 2 price groups
    ) AS products_in_current_and_next_groups
FROM products;
```

Frame specifications allow for:

* Running calculations with specific lookback periods
* Comparisons within value ranges
* Handling ties consistently in analytical computations
* Efficient computation of rolling statistics

***

## 1.8 Practice Exercises

### Exercise 1: Basic Queries

Using a products table with columns: `id`, `name`, `category`, `price`, `stock`, `created_at`:

```sql theme={null}
-- 1. Find all products in the 'Electronics' category priced under $500
-- 2. Find products with low stock (less than 10) that need reordering
-- 3. List the 5 most recently added products
-- 4. Find products with names containing 'Pro' or 'Ultra'
```

<Accordion title="Solutions">
  ```sql theme={null}
  -- 1
  SELECT * FROM products 
  WHERE category = 'Electronics' AND price < 500;

  -- 2
  SELECT name, stock, category FROM products 
  WHERE stock < 10 
  ORDER BY stock ASC;

  -- 3
  SELECT * FROM products 
  ORDER BY created_at DESC 
  LIMIT 5;

  -- 4
  SELECT * FROM products 
  WHERE name ILIKE '%pro%' OR name ILIKE '%ultra%';
  ```
</Accordion>

### Exercise 2: Aggregations

```sql theme={null}
-- 1. Count products per category
-- 2. Find the average price per category, only for categories with avg > $200
-- 3. Find the total inventory value (price * stock) per category
-- 4. Find categories with more than 50 products
```

<Accordion title="Solutions">
  ```sql theme={null}
  -- 1
  SELECT category, COUNT(*) as product_count 
  FROM products 
  GROUP BY category 
  ORDER BY product_count DESC;

  -- 2
  SELECT category, AVG(price) as avg_price 
  FROM products 
  GROUP BY category 
  HAVING AVG(price) > 200;

  -- 3
  SELECT category, SUM(price * stock) as inventory_value 
  FROM products 
  GROUP BY category 
  ORDER BY inventory_value DESC;

  -- 4
  SELECT category, COUNT(*) as count 
  FROM products 
  GROUP BY category 
  HAVING COUNT(*) > 50;
  ```
</Accordion>

### Exercise 3: JOINs

> **Hands-On Lab 2 – Joins & aggregations on e-commerce data**\
> Goal: Become comfortable joining tables and aggregating results.
>
> 1. Create three tables:
>    ```sql theme={null}
>    CREATE TABLE lab_customers (
>      id SERIAL PRIMARY KEY,
>      name TEXT NOT NULL
>    );
>
>    CREATE TABLE lab_products (
>      id SERIAL PRIMARY KEY,
>      name TEXT NOT NULL,
>      price NUMERIC(10,2) NOT NULL
>    );
>
>    CREATE TABLE lab_orders (
>      id SERIAL PRIMARY KEY,
>      customer_id INT NOT NULL REFERENCES lab_customers(id),
>      product_id INT NOT NULL REFERENCES lab_products(id),
>      quantity INT NOT NULL,
>      created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
>    );
>    ```
> 2. Insert at least 5 customers, 5 products, and 20 orders with a mix of quantities.
> 3. Write queries to:
>    * List each order with customer name and product name.
>    * Show total quantity and total revenue per customer.
>    * Find the top 3 products by total revenue.
>    * Find customers who have never placed an order (add at least one such customer).
> 4. Experiment with turning your queries into `LEFT JOIN`s and see how the result set changes when there are missing rows.
>
> When you can write these joins and aggregations from memory, you’re ready for more advanced SQL.

```sql theme={null}
-- 1. List all orders with customer name and email
-- 2. Find customers who have never placed an order
-- 3. Get the top 10 customers by total spending
-- 4. List products that have been ordered more than 100 times
```

<Accordion title="Solutions">
  ```sql theme={null}
  -- 1
  SELECT o.id, o.total, o.created_at, u.name, u.email
  FROM orders o
  INNER JOIN users u ON o.user_id = u.id
  ORDER BY o.created_at DESC;

  -- 2
  SELECT u.name, u.email
  FROM users u
  LEFT JOIN orders o ON u.id = o.user_id
  WHERE o.id IS NULL;

  -- 3
  SELECT u.name, u.email, SUM(o.total) as total_spent
  FROM users u
  INNER JOIN orders o ON u.id = o.user_id
  GROUP BY u.id, u.name, u.email
  ORDER BY total_spent DESC
  LIMIT 10;

  -- 4
  SELECT p.name, COUNT(oi.id) as times_ordered
  FROM products p
  INNER JOIN order_items oi ON p.id = oi.product_id
  GROUP BY p.id, p.name
  HAVING COUNT(oi.id) > 100
  ORDER BY times_ordered DESC;
  ```
</Accordion>

### Exercise 4: Window Functions

```sql theme={null}
-- 1. Rank products by price within each category
-- 2. For each order, show the running total for that user
-- 3. Calculate the percentage each product contributes to its category's total stock
-- 4. Find the second-highest priced product in each category
```

<Accordion title="Solutions">
  ```sql theme={null}
  -- 1
  SELECT 
      name, category, price,
      RANK() OVER (PARTITION BY category ORDER BY price DESC) as price_rank
  FROM products;

  -- 2
  SELECT 
      user_id, id as order_id, total, created_at,
      SUM(total) OVER (PARTITION BY user_id ORDER BY created_at) as running_total
  FROM orders;

  -- 3
  SELECT 
      name, category, stock,
      ROUND(100.0 * stock / SUM(stock) OVER (PARTITION BY category), 2) as stock_pct
  FROM products;

  -- 4
  WITH ranked AS (
      SELECT 
          name, category, price,
          ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rn
      FROM products
  )
  SELECT name, category, price
  FROM ranked
  WHERE rn = 2;
  ```
</Accordion>

***

## Module Mastery Checklist

Complete this checklist to confirm you've mastered SQL Foundations:

### Relational Database Fundamentals

* [ ] Can explain the difference between relational databases and file-based storage
* [ ] Understand the purpose of primary keys and can identify them in any schema
* [ ] Know how foreign keys create relationships and enforce referential integrity
* [ ] Can distinguish between different PostgreSQL data types and their appropriate use cases
* [ ] Apply constraints (UNIQUE, NOT NULL, CHECK) to maintain data quality

### Query Construction and Filtering

* [ ] Write SELECT queries with proper column selection and aliases
* [ ] Apply WHERE clauses with comparison operators (`=` `!=` `<` `>` `<=` `>=`)
* [ ] Combine multiple conditions using AND, OR, and NOT operators
* [ ] Use special operators (IN, BETWEEN, LIKE, ILIKE) for advanced filtering
* [ ] Implement result ordering (ORDER BY) and pagination (LIMIT, OFFSET)
* [ ] Avoid performance pitfalls like SELECT \* in production code

### Data Aggregation and Analysis

* [ ] Use aggregate functions (COUNT, SUM, AVG, MIN, MAX) appropriately
* [ ] Group data using GROUP BY with single and multiple columns
* [ ] Filter groups using HAVING clauses after aggregation
* [ ] Combine WHERE and HAVING clauses in complex analytical queries
* [ ] Interpret aggregated results in business context

### Multi-Table Operations

* [ ] Differentiate between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN
* [ ] Apply appropriate JOIN types based on desired result set behavior
* [ ] Handle NULL values that result from unmatched JOINs
* [ ] Execute multiple JOINs across several tables in a single query
* [ ] Use self-JOINs for hierarchical data and comparative analysis
* [ ] Troubleshoot complex multi-table queries

### Advanced Query Techniques

* [ ] Create and utilize subqueries in WHERE, FROM, and SELECT clauses
* [ ] Implement correlated subqueries that reference outer query context
* [ ] Apply EXISTS and NOT EXISTS for existence checking
* [ ] Understand query execution order and performance implications
* [ ] Write basic window functions (ROW\_NUMBER, RANK, LAG/LEAD)
* [ ] Solve common interview problems involving SQL patterns

### Practical Application

* [ ] Complete both hands-on labs (Query Basics and Joins & Aggregations)
* [ ] Design and query a simple schema for a small domain (e.g., blog, e-commerce)
* [ ] Optimize queries for performance using appropriate indexes and techniques
* [ ] Troubleshoot common SQL errors and misconceptions

***

## Next Module

<Card title="Module 2: Database Design & Modeling" icon="arrow-right" href="/courses/database-engineering/database-design">
  Learn to design schemas that scale and maintain data integrity
</Card>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="Explain the difference between a correlated subquery and a JOIN. When would you choose one over the other, and what are the performance implications?">
    **Strong Answer:**

    * A correlated subquery executes once per row of the outer query -- it references a column from the outer query in its WHERE clause. A JOIN combines two result sets based on a condition and the planner can choose the most efficient algorithm (nested loop, hash, merge) for the full set.
    * In practice, PostgreSQL's optimizer can often flatten a correlated subquery into a JOIN, so the performance difference may be zero. But when it cannot (complex correlated EXISTS, LATERAL joins), the correlated subquery becomes effectively O(n \* m) where n is outer rows and m is the cost of the inner query per row.
    * I prefer EXISTS for semi-join semantics ("does at least one matching row exist?") because the optimizer can short-circuit after the first match. For example: `SELECT * FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id)` is typically faster than `SELECT DISTINCT u.* FROM users u JOIN orders o ON u.id = o.user_id` because EXISTS stops scanning orders after finding one match, while the JOIN+DISTINCT approach may produce and then deduplicate millions of intermediate rows.
    * LATERAL joins are the modern replacement for correlated subqueries in PostgreSQL. They give the planner more optimization freedom while preserving the "per-row" semantics when you genuinely need them.

    **Follow-up: You mentioned the planner can "flatten" subqueries. How does that work, and when does it fail?**

    PostgreSQL's planner uses a technique called subquery pull-up where it converts `IN (SELECT ...)` or `EXISTS (SELECT ...)` into a semi-join or anti-join node in the plan tree. This allows the planner to consider all join strategies. It fails when the subquery contains LIMIT, OFFSET, volatile functions, or aggregates that prevent safe flattening. You can verify by comparing `EXPLAIN` output for both forms -- if the plans are identical, the planner successfully flattened it.
  </Accordion>

  <Accordion title="A dashboard query uses OFFSET 50000 LIMIT 20 and is slow. How do you fix pagination performance on large result sets?">
    **Strong Answer:**

    * OFFSET-based pagination is fundamentally broken at scale because the database must scan and discard the first N rows before returning the next page. OFFSET 50000 means PostgreSQL reads 50020 rows and throws away 50000 of them. At OFFSET 1000000, this becomes catastrophically slow.
    * The fix is keyset pagination (also called cursor-based pagination). Instead of "give me page 2501", you say "give me the next 20 rows after the last row I saw." For example: `SELECT * FROM orders WHERE (created_at, id) < ('2024-06-15 10:30:00', 54321) ORDER BY created_at DESC, id DESC LIMIT 20`. This uses an index scan and reads exactly 20 rows regardless of how deep you are in the result set.
    * The tradeoff is that keyset pagination does not support "jump to page 500" -- it only supports forward/backward navigation. For most API and UI patterns this is acceptable (infinite scroll, "Load more" buttons). For admin dashboards that need page numbers, a hybrid approach works: use keyset for the actual query but maintain a count estimate via `pg_class.reltuples` for page number display.
    * Implementation detail: the composite cursor `(created_at, id)` requires a composite index `CREATE INDEX idx_orders_created_id ON orders(created_at DESC, id DESC)` for the index scan to work efficiently.

    **Follow-up: What if the sort order is on a non-unique column like status?**

    You still need a tiebreaker column to make the cursor deterministic. The pattern is always `(sort_column, unique_tiebreaker)`. For `ORDER BY status, created_at DESC`, the cursor becomes `WHERE (status, created_at, id) > ($last_status, $last_created_at, $last_id) ORDER BY status, created_at DESC, id DESC`. Without the tiebreaker, you risk skipping or duplicating rows when multiple rows share the same sort key value.
  </Accordion>

  <Accordion title="When would you use a CTE versus a derived table versus a temporary table? Walk through the tradeoffs.">
    **Strong Answer:**

    * **CTEs (WITH clauses)**: In PostgreSQL 12+, non-recursive CTEs are inlined by default -- the planner can push predicates into them and optimize across the boundary. They are essentially a readability tool in modern PostgreSQL unless you mark them as MATERIALIZED. Use CTEs for readability when a subquery is referenced once. Use `WITH ... AS MATERIALIZED` when you want to force a single evaluation of an expensive subquery referenced multiple times.
    * **Derived tables (subqueries in FROM)**: Functionally equivalent to inlined CTEs but harder to read for complex multi-step queries. The planner treats them identically. I use derived tables for simple one-off subqueries and CTEs for anything referenced more than once or where the query logic has clear stages.
    * **Temporary tables**: These are physically materialized -- the query runs, results are written to a temp table, and subsequent queries read from it. Use them when: (a) the intermediate result is huge and referenced many times, (b) you need indexes on the intermediate result, or (c) you are building a multi-step ETL pipeline within a session. The overhead is the materialization cost (disk I/O for large results). Always create them with `ON COMMIT DROP` to avoid session leaks.
    * The gotcha with CTEs before PostgreSQL 12 is that they were always materialized (the "optimization fence"). Legacy code from PG 11 and earlier may intentionally use CTEs as an optimization barrier -- be careful when upgrading, as the inlining behavior can change plan shapes.

    **Follow-up: Can you give a concrete example where materializing a CTE is actually faster than inlining it?**

    Yes. If a CTE computes an expensive aggregation and is referenced in both a JOIN and a WHERE clause of the outer query, inlining would compute the aggregation twice. For example: `WITH expensive AS MATERIALIZED (SELECT user_id, SUM(total) as lifetime_value FROM orders GROUP BY user_id) SELECT * FROM expensive e JOIN users u ON e.user_id = u.id WHERE e.lifetime_value > 10000`. Without MATERIALIZED, the planner might compute the aggregation separately for the join and the filter. With MATERIALIZED, it runs once and both references read from the materialized result.
  </Accordion>

  <Accordion title="Explain how window functions work internally and why they can be both powerful and dangerous.">
    **Strong Answer:**

    * Window functions operate on a "window" of rows defined by PARTITION BY and ORDER BY without collapsing the result set like GROUP BY does. Internally, PostgreSQL sorts the data according to the window specification, then iterates through the sorted set maintaining a "frame" (the subset of rows visible to the window function at each position).
    * The power: you can compute running totals, rankings, moving averages, and inter-row comparisons in a single pass without self-joins. `ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)` gives you the rank within each department without a correlated subquery.
    * The danger is memory and sort cost. Every distinct OVER clause with a different PARTITION BY or ORDER BY requires a separate sort of the entire result set. A query with three window functions using three different orderings forces three full sorts. On a 10M-row result, that is 30M rows of sort operations. Check `EXPLAIN ANALYZE` for `Sort Method: external merge Disk` -- that means you have blown past work\_mem and are spilling to disk.
    * Optimization: combine window functions that share the same PARTITION BY and ORDER BY into a single OVER clause definition. Use `WINDOW w AS (PARTITION BY ... ORDER BY ...)` and reference it: `SUM(amount) OVER w, ROW_NUMBER() OVER w`. This allows PostgreSQL to use a single sort for both functions.

    **Follow-up: What happens if you filter on the result of a window function in the WHERE clause?**

    You cannot reference a window function directly in WHERE because window functions are evaluated after WHERE in the logical query execution order (FROM, WHERE, GROUP BY, HAVING, window functions, SELECT, ORDER BY, LIMIT). To filter on a window function result, you must wrap it in a subquery or CTE: `SELECT * FROM (SELECT *, ROW_NUMBER() OVER (...) AS rn FROM table) t WHERE rn = 1`. This is the standard pattern for "get the first/latest row per group."
  </Accordion>
</AccordionGroup>
