SQL (Structured Query Language) is the universal language for interacting with relational databases. This module builds your foundation from the ground up, ensuring you can write any query with confidence.
Estimated Time: 8-10 hours Hands-On: 30+ practice exercises Project: Build queries for an e-commerce analytics dashboard
Goal: Master basic SELECT queries, filtering, and result manipulation with realistic data.
Create a lab_products table:
Copy
CREATE TABLE lab_products ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, category TEXT NOT NULL, price DECIMAL(10,2) NOT NULL, stock_quantity INTEGER NOT NULL, supplier TEXT NOT NULL, created_at TIMESTAMP DEFAULT NOW());
Insert at least 15 diverse products across 4-5 categories (e.g., Electronics, Books, Clothing, Home, Sports) with varying prices and stock levels:
Electronics: Laptops, phones, tablets, headphones
Books: Fiction, non-fiction, textbooks
Clothing: shirts, pants, jackets
Home: kitchen items, furniture, decor
Sports: equipment, clothing, accessories
Write queries to:
List all electronics priced under $500, sorted by price (lowest first)
Find all products with low stock (less than 10 units) in the ‘Electronics’ or ‘Home’ categories
Show the 5 most expensive products, displaying only name and price
Find all products with ‘pro’ or ‘max’ in their name (case insensitive)
List products that cost between 25and200, ordered by category then price
Show products that have no stock (stock_quantity = 0)
Add a few products with special characters in names or unusual pricing, then run your queries again to ensure they handle edge cases properly.
Practice using LIMIT and OFFSET to implement pagination: write a query that shows products 6-10 when sorted alphabetically by name.
When you can write these queries from memory without referencing examples, you’ve mastered basic query construction.
Goal: Connect multiple tables with JOINs and analyze data using aggregate functions.
Create three related tables representing an e-commerce scenario:
Copy
CREATE TABLE lab_customers ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, email TEXT UNIQUE NOT NULL, city TEXT NOT NULL, registration_date DATE DEFAULT CURRENT_DATE);CREATE TABLE lab_products ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, category TEXT NOT NULL, price DECIMAL(10,2) NOT NULL, cost DECIMAL(10,2) NOT NULL -- For profit calculations);CREATE TABLE lab_orders ( id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL REFERENCES lab_customers(id), product_id INTEGER NOT NULL REFERENCES lab_products(id), quantity INTEGER NOT NULL, order_date DATE DEFAULT CURRENT_DATE, status TEXT DEFAULT 'completed' -- completed, pending, cancelled);
Populate the tables with realistic data:
8-10 customers from different cities
12-15 products across various categories
25-30 orders linking customers to products (some customers should have multiple orders, some products should be ordered multiple times)
Include at least 2 customers who have never placed an order
Include at least 2 products that have never been ordered
Write queries to:
List all orders with customer name and product name
Show total quantity ordered and total revenue per customer
Find the top 3 products by total revenue generated (price * quantity)
Calculate profit margins per product category (revenue - cost)
Find customers who have never placed an order using LEFT JOIN
Identify products that have never been ordered
Show monthly order trends (orders per month)
Advanced challenges:
Create a summary showing each customer’s favorite product category (category they spent the most on)
Find the customer who spent the most money in each city
Calculate conversion rates (percentage of products that have been ordered at least once)
Experiment with changing INNER JOINs to LEFT JOINs and observe how the result set changes when there are unmatched rows.
When you can confidently write JOINs and aggregations to answer complex business questions, you’re ready for advanced SQL techniques.
Think of a relational database like a digital filing cabinet that’s incredibly organized and can find information instantly. Unlike a spreadsheet where data might be scattered, a database stores data in structured tables that can be connected to each other.Real-World Analogy:
A table is like a spreadsheet sheet (e.g., “Users”, “Orders”, “Products”)
A row is like one entry in that spreadsheet (e.g., one user’s information)
A column is like a field/category (e.g., “name”, “email”, “age”)
A relationship connects tables (e.g., “Orders belong to Users”)
Why Use a Database Instead of Files/Spreadsheets?
Data Integrity: Prevents invalid data (e.g., can’t create an order for a user that doesn’t exist)
Concurrent Access: Multiple people can read/write simultaneously without conflicts
Query Power: Find complex patterns across millions of records in milliseconds
Relationships: Link related data automatically (orders → users → products)
ACID Guarantees: Transactions ensure data consistency even if something crashes
A relational database organizes data into tables (relations) with rows (records) and columns (attributes). Tables can be linked through relationships using keys.
What is it? A column (or combination of columns) that uniquely identifies each row in a table.Why do we need it? Think of it like a Social Security Number or student ID. Every person has a unique one, and you can use it to find exactly that person. In databases, the primary key lets you:
Uniquely identify any row (no duplicates allowed!)
Link tables together (foreign keys reference primary keys)
Make queries faster (databases automatically index primary keys)
Real-World Example:
In a library: Each book has a unique ISBN (primary key)
In a school: Each student has a unique student ID
In an e-commerce site: Each user has a unique user ID
Copy
CREATE TABLE users ( id SERIAL PRIMARY KEY, -- Auto-incrementing unique identifier email VARCHAR(255) UNIQUE NOT NULL, name VARCHAR(100) NOT NULL);
Breaking down SERIAL PRIMARY KEY:
SERIAL: PostgreSQL automatically generates the next number (1, 2, 3, 4…)
PRIMARY KEY: This column is the unique identifier
You don’t need to provide the ID when inserting - it’s automatic!
Rules (Why these matter):
Must be unique: If two rows had the same ID, how would you tell them apart?
Cannot be NULL: Every row MUST have an ID (you can’t have “unknown” as an identifier)
Should rarely change: If you change an ID, all foreign keys pointing to it break!
Often auto-incrementing: Makes it easy - just insert data, ID is assigned automatically
Common Mistake:
Copy
-- ❌ BAD: Using email as primary keyCREATE TABLE users ( email VARCHAR(255) PRIMARY KEY, -- What if user changes email? name VARCHAR(100));-- ✅ GOOD: Separate ID as primary keyCREATE TABLE users ( id SERIAL PRIMARY KEY, -- Never changes email VARCHAR(255) UNIQUE NOT NULL, -- Can change, but stays unique name VARCHAR(100));
Foreign Key
What is it? A column that references the primary key of another table, creating a relationship.Why do we need it? Think of it like a pointer or reference. It says “this order belongs to that user” by storing the user’s ID.Real-World Analogy:
In a library system: A “loan” record has a book_id (foreign key) pointing to the “books” table
In a school: A “grade” record has a student_id (foreign key) pointing to the “students” table
In e-commerce: An “order” has a user_id (foreign key) pointing to the “users” table
Copy
CREATE TABLE orders ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(id), -- This is the foreign key! total DECIMAL(10, 2) NOT NULL, created_at TIMESTAMP DEFAULT NOW());
Breaking down REFERENCES users(id):
user_id: The column in the orders table that stores the user’s ID
REFERENCES users(id): This tells PostgreSQL “user_id must exist in the users table’s id column”
NOT NULL: Every order MUST belong to a user (can’t be orphaned)
What happens with foreign keys?
Prevents Invalid Data:
Copy
-- ❌ This will FAIL:INSERT INTO orders (user_id, total) VALUES (99999, 100.00);-- Error: insert or update on table "orders" violates foreign key constraint-- Because user_id 99999 doesn't exist in users table!-- ✅ This works:INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com') RETURNING id;-- Returns: id = 1INSERT INTO orders (user_id, total) VALUES (1, 100.00); -- Success!
Prevents Orphaned Records:
Copy
-- If you try to delete a user who has orders:DELETE FROM users WHERE id = 1;-- Error: update or delete on table "users" violates foreign key constraint-- PostgreSQL protects you from creating "orphaned" orders with no user!-- You must delete orders first, or use CASCADE:CREATE TABLE orders ( user_id INTEGER REFERENCES users(id) ON DELETE CASCADE -- Now deleting a user automatically deletes their orders);
Makes Queries Powerful:
Copy
-- Join tables using the foreign key relationship:SELECT u.name, o.total, o.created_atFROM users uJOIN orders o ON u.id = o.user_id; -- The foreign key makes this join possible!
Benefits (Why this matters):
Enforces referential integrity: Can’t create an order for a non-existent user
Prevents orphaned records: Can’t delete a user if they have orders (unless you handle it)
Documents relationships: The schema itself shows how tables connect
Enables JOINs: Foreign keys make it easy to combine related data
Common Patterns:
Copy
-- One-to-Many: One user has many ordersusers (1) ──< (many) orders-- Many-to-Many: Students enroll in coursesstudents (many) ──< enrollments >── (many) courses-- Need a junction table with foreign keys to both!
Hands-On Lab 1 – Query the users table
Goal: Get comfortable with SELECT, WHERE, ORDER BY, and LIMIT.
Create a users table:
Copy
CREATE TABLE lab_users ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, email TEXT NOT NULL UNIQUE, age INT NOT NULL, country TEXT NOT NULL);
Insert at least 10 realistic rows covering multiple ages and countries.
Write queries to:
List all users ordered by newest id first.
Find all users older than 25.
Find all users from a specific country (e.g. ‘US’).
Return only name and email for users younger than 21.
Add one or two edge cases (very old age, unusual country) and re-run your queries to make sure they still behave as expected.
When you can write these queries without looking at the examples, you’re done with this lab.
What is SELECT? The SELECT statement is how you ask the database questions. It’s like saying “Show me…” or “Find me…”.The Basic Structure:
Copy
SELECT [what columns] FROM [which table] WHERE [what conditions];
Think of it like a question:
“SELECT name, email” = “Show me the name and email”
“FROM users” = “from the users table”
“WHERE age > 18” = “where the age is greater than 18”
Step-by-Step: Your First QueryLet’s say you have a users table:
Copy
users table:┌────┬──────────┬─────────────────────┬─────┐│ id │ name │ email │ age │├────┼──────────┼─────────────────────┼─────┤│ 1 │ Alice │ alice@example.com │ 25 ││ 2 │ Bob │ bob@example.com │ 30 ││ 3 │ Charlie │ charlie@example.com │ 18 │└────┴──────────┴─────────────────────┴─────┘
Query 1: Get everything
Copy
SELECT * FROM users;
What happens:
Database looks at the users table
* means “all columns”
Returns all rows with all columns
Result:
Copy
┌────┬──────────┬─────────────────────┬─────┐│ id │ name │ email │ age │├────┼──────────┼─────────────────────┼─────┤│ 1 │ Alice │ alice@example.com │ 25 ││ 2 │ Bob │ bob@example.com │ 30 ││ 3 │ Charlie │ charlie@example.com │ 18 │└────┴──────────┴─────────────────────┴─────┘
Query 2: Get specific columns
Copy
SELECT name, email FROM users;
What happens:
Database looks at the users table
Only selects name and email columns
Returns all rows, but only those two columns
Result:
Copy
┌──────────┬─────────────────────┐│ name │ email │├──────────┼─────────────────────┤│ Alice │ alice@example.com ││ Bob │ bob@example.com ││ Charlie │ charlie@example.com │└──────────┴─────────────────────┘
Why specify columns instead of *?
Performance: Fetching only what you need is faster
Clarity: Makes it obvious what data you’re using
Safety: If someone adds a huge column later, SELECT * would fetch it unnecessarily
Index optimization: Some queries can use “index-only scans” if you only select indexed columns
The SELECT statement retrieves data from tables.
Copy
-- Select all columns from users tableSELECT * FROM users;-- Select specific columnsSELECT name, email FROM users;-- Select with column aliasSELECT name AS customer_name, email AS contact_emailFROM users;
Avoid SELECT * in production code! It fetches unnecessary data, breaks when schema changes, and prevents index-only scans. Always specify the columns you need.
What is WHERE? The WHERE clause is like a filter. It says “only show me rows that match these conditions.”Real-World Analogy:
Without WHERE: “Show me all books in the library” (thousands of books!)
With WHERE: “Show me all books published after 2020” (much smaller, relevant list)
How WHERE Works:
Database looks at each row in the table
Checks if the row matches your WHERE condition
Only returns rows where the condition is TRUE
Skips rows where the condition is FALSE or NULL
Step-by-Step Example:Starting data:
Copy
users table:┌────┬──────────┬─────────────────────┬─────┐│ id │ name │ email │ age │├────┼──────────┼─────────────────────┼─────┤│ 1 │ Alice │ alice@example.com │ 25 ││ 2 │ Bob │ bob@example.com │ 30 ││ 3 │ Charlie │ charlie@example.com │ 18 ││ 4 │ Diana │ diana@example.com │ 22 │└────┴──────────┴─────────────────────┴─────┘
Query: Find users older than 21
Copy
SELECT * FROM users WHERE age > 21;
What the database does:
Look at row 1: age = 25, is 25 > 21? YES → Include this row
Look at row 2: age = 30, is 30 > 21? YES → Include this row
Look at row 3: age = 18, is 18 > 21? NO → Skip this row
Look at row 4: age = 22, is 22 > 21? YES → Include this row
Result:
Copy
┌────┬──────────┬─────────────────────┬─────┐│ id │ name │ email │ age │├────┼──────────┼─────────────────────┼─────┤│ 1 │ Alice │ alice@example.com │ 25 ││ 2 │ Bob │ bob@example.com │ 30 ││ 4 │ Diana │ diana@example.com │ 22 │└────┴──────────┴─────────────────────┴─────┘
Important: WHERE vs HAVING
WHERE: Filters rows BEFORE grouping/aggregation
HAVING: Filters groups AFTER grouping/aggregation
You’ll learn HAVING later, but remember: WHERE comes first!
The WHERE clause filters rows based on conditions.
Copy
-- Basic equalitySELECT * FROM users WHERE id = 1;-- String matchingSELECT * FROM users WHERE email = 'alice@example.com';-- Comparison operatorsSELECT * FROM products WHERE price > 100;SELECT * FROM products WHERE stock <= 10;-- Not equalSELECT * FROM users WHERE status != 'deleted';SELECT * FROM users WHERE status <> 'deleted'; -- Same as !=
-- AND: Both conditions must be trueSELECT * FROM products WHERE category = 'electronics' AND price < 500;-- OR: At least one condition must be trueSELECT * FROM products WHERE category = 'electronics' OR category = 'computers';-- NOT: Negates a conditionSELECT * FROM users WHERE NOT status = 'banned';-- Complex combinations (use parentheses!)SELECT * FROM products WHERE (category = 'electronics' OR category = 'computers') AND price BETWEEN 100 AND 1000 AND stock > 0;
-- IN: Match any value in a listSELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');-- BETWEEN: Range inclusiveSELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';-- LIKE: Pattern matchingSELECT * FROM users WHERE email LIKE '%@gmail.com'; -- Ends withSELECT * FROM users WHERE name LIKE 'John%'; -- Starts withSELECT * FROM users WHERE name LIKE '%son%'; -- Contains-- ILIKE: Case-insensitive LIKE (PostgreSQL)SELECT * FROM users WHERE name ILIKE '%john%';-- IS NULL / IS NOT NULLSELECT * FROM orders WHERE shipped_at IS NULL; -- Not yet shippedSELECT * FROM users WHERE phone IS NOT NULL; -- Has phone number
-- Sort ascending (default)SELECT * FROM products ORDER BY price;SELECT * FROM products ORDER BY price ASC;-- Sort descendingSELECT * FROM products ORDER BY price DESC;-- Multiple sort columnsSELECT * FROM products ORDER BY category ASC, price DESC;-- Sort with NULL handlingSELECT * FROM products ORDER BY discount NULLS LAST; -- NULLs at the end
-- Get first N rowsSELECT * FROM products ORDER BY created_at DESC LIMIT 10;-- Pagination with OFFSETSELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40; -- Page 3 (rows 41-60)-- Better pagination with keyset (cursor-based)SELECT * FROM products WHERE id > 1000 -- Last seen IDORDER BY id LIMIT 20;
Performance Tip: OFFSET becomes slow for large values because the database must scan and discard all skipped rows. Use keyset pagination for better performance on large datasets.
-- COUNT: Number of rowsSELECT COUNT(*) FROM orders; -- All rowsSELECT COUNT(shipped_at) FROM orders; -- Non-NULL values onlySELECT COUNT(DISTINCT user_id) FROM orders; -- Unique users-- SUM: Total of numeric valuesSELECT SUM(total) FROM orders WHERE status = 'completed';-- AVG: AverageSELECT AVG(price) FROM products WHERE category = 'electronics';-- MIN / MAXSELECT MIN(price), MAX(price) FROM products;-- CombinedSELECT COUNT(*) AS total_orders, SUM(total) AS revenue, AVG(total) AS avg_order_value, MIN(total) AS smallest_order, MAX(total) AS largest_orderFROM ordersWHERE status = 'completed';
Group rows by column values and apply aggregates to each group.
Copy
-- Orders per userSELECT user_id, COUNT(*) AS order_count, SUM(total) AS total_spentFROM ordersGROUP BY user_id;-- Sales by categorySELECT category, COUNT(*) AS product_count, AVG(price) AS avg_price, SUM(stock) AS total_inventoryFROM productsGROUP BY category;-- Multiple grouping columnsSELECT EXTRACT(YEAR FROM created_at) AS year, EXTRACT(MONTH FROM created_at) AS month, COUNT(*) AS order_count, SUM(total) AS revenueFROM ordersGROUP BY EXTRACT(YEAR FROM created_at), EXTRACT(MONTH FROM created_at)ORDER BY year, month;
WHERE filters rows before grouping. HAVING filters groups after aggregation.
Copy
-- Find users with more than 5 ordersSELECT user_id, COUNT(*) AS order_countFROM ordersGROUP BY user_idHAVING COUNT(*) > 5;-- Categories with average price over $100SELECT category, AVG(price) AS avg_priceFROM productsGROUP BY categoryHAVING AVG(price) > 100ORDER BY avg_price DESC;-- Combined WHERE and HAVINGSELECT user_id, SUM(total) AS total_spentFROM ordersWHERE status = 'completed' -- Filter rows firstGROUP BY user_idHAVING SUM(total) > 1000 -- Then filter groupsORDER BY total_spent DESC;
The Problem: Data is split across multiple tables for organization, but you often need to see it together.Real-World Example:
You have a users table with names and emails
You have an orders table with order totals and dates
You want to see: “Show me all orders with the customer’s name”
Without JOINs: You’d have to:
Query orders table → get user_id and total
Query users table → get name for each user_id
Manually match them in your application code (slow, error-prone!)
With JOINs: One query gets everything, matched automatically!The Concept:
JOINs combine rows from multiple tables based on related columns (usually foreign keys). Think of it like a VLOOKUP in Excel, but much more powerful.
INNER JOIN: "Show me only matches" (intersection)LEFT JOIN: "Show me all from left table, plus matches from right"RIGHT JOIN: "Show me all from right table, plus matches from left"FULL JOIN: "Show me everything from both tables"
What it does: Returns only rows that have matches in BOTH tables.Real-World Analogy: Like a Venn diagram intersection. Only show things that exist in both sets.Step-by-Step Example:Starting Data:Users table:
Copy
┌────┬──────────┬─────────────────────┐│ id │ name │ email │├────┼──────────┼─────────────────────┤│ 1 │ Alice │ alice@example.com ││ 2 │ Bob │ bob@example.com ││ 3 │ Charlie │ charlie@example.com│└────┴──────────┴─────────────────────┘
SELECT o.id AS order_id, o.total, o.created_at, u.name AS customer_name, u.emailFROM orders oINNER JOIN users u ON o.user_id = u.id;
What happens step-by-step:
Database starts with the orders table (the “left” table in the JOIN)
For each order row:
Order 1: user_id = 1 → Look for user with id = 1 → Found Alice! → Include this match
Order 2: user_id = 1 → Look for user with id = 1 → Found Alice! → Include this match
Order 3: user_id = 2 → Look for user with id = 2 → Found Bob! → Include this match
Result: Only orders that have matching users (all 3 orders in this case)
Result:
Copy
┌──────────┬────────┬──────────────┬──────────────┬─────────────────────┐│ order_id │ total │ created_at │customer_name │ email │├──────────┼────────┼──────────────┼──────────────┼─────────────────────┤│ 1 │ 150.00 │ 2024-01-15 │ Alice │ alice@example.com ││ 2 │ 75.50 │ 2024-01-16 │ Alice │ alice@example.com ││ 3 │ 200.00 │ 2024-01-17 │ Bob │ bob@example.com │└──────────┴────────┴──────────────┴──────────────┴─────────────────────┘
Notice: Charlie doesn’t appear because they have no orders. INNER JOIN only shows matches!When to use INNER JOIN:
When you only want rows that have matches in both tables
Most common join type (90% of joins are INNER)
Example: “Show me orders with customer info” (only orders that have customers)
Returns only rows that have matches in both tables.
Copy
-- Get order details with user namesSELECT o.id AS order_id, o.total, o.created_at, u.name AS customer_name, u.emailFROM orders oINNER JOIN users u ON o.user_id = u.id;
Copy
┌─────────────────────────────────────────────────────────────────────┐│ INNER JOIN │├─────────────────────────────────────────────────────────────────────┤│ ││ USERS ORDERS RESULT ││ ┌───┬───────┐ ┌───┬─────────┐ ┌───┬───────┬────┐ ││ │ 1 │ Alice │ ─────────▶│ 1 │ user: 1 │──────▶│ 1 │ Alice │ O1 │ ││ │ 2 │ Bob │ ─────────▶│ 2 │ user: 1 │──────▶│ 2 │ Alice │ O2 │ ││ │ 3 │ Carol │ ✗ │ 3 │ user: 2 │──────▶│ 3 │ Bob │ O3 │ ││ └───┴───────┘ └───┴─────────┘ └───┴───────┴────┘ ││ ││ Carol has no orders ──▶ Not in result ││ │└─────────────────────────────────────────────────────────────────────┘
LEFT JOIN (LEFT OUTER JOIN): Keep Everything from the Left
What it does: Returns ALL rows from the left table, plus matching rows from the right table. If there’s no match, the right table columns are NULL.Real-World Analogy: Like a report card. You list ALL students (left table), and show their grades (right table) if they have any. If a student has no grades yet, you still list them but show “No grades” (NULL).Why “LEFT”? The table mentioned first (before JOIN) is the “left” table. The table after JOIN is the “right” table.Step-by-Step Example:Starting Data:Users table:
Copy
┌────┬──────────┬─────────────────────┐│ id │ name │ email │├────┼──────────┼─────────────────────┤│ 1 │ Alice │ alice@example.com ││ 2 │ Bob │ bob@example.com ││ 3 │ Charlie │ charlie@example.com│ ← Has no orders!└────┴──────────┴─────────────────────┘
SELECT u.id, u.name, COUNT(o.id) AS order_count, COALESCE(SUM(o.total), 0) AS total_spentFROM users uLEFT JOIN orders o ON u.id = o.user_idGROUP BY u.id, u.name;
What happens step-by-step:
Database starts with ALL users (the left table)
For each user:
Alice (id=1): Find orders with user_id=1 → Found 2 orders → Include with order data
Bob (id=2): Find orders with user_id=2 → Found 1 order → Include with order data
Charlie (id=3): Find orders with user_id=3 → Found 0 orders → Still include, but order columns are NULL
Result: All users appear, even if they have no orders!
Result:
Copy
┌────┬──────────┬─────────────┬─────────────┐│ id │ name │ order_count │ total_spent │├────┼──────────┼─────────────┼─────────────┤│ 1 │ Alice │ 2 │ 225.50 ││ 2 │ Bob │ 1 │ 200.00 ││ 3 │ Charlie │ 0 │ 0.00 │ ← Still appears!└────┴──────────┴─────────────┴─────────────┘
Key Points:
COALESCE(SUM(o.total), 0): Converts NULL to 0 (SUM returns NULL when there are no rows)
COUNT(o.id): Counts 0 for users with no orders (COUNT counts non-NULL values)
Charlie appears even though they have no orders!
When to use LEFT JOIN:
“Show me all users and their order counts” (including users with 0 orders)
“List all products and their sales” (including products that never sold)
“Find customers who haven’t placed orders” (use LEFT JOIN + WHERE right_table.id IS NULL)
Common Pattern: Finding Missing Relationships
Copy
-- Find users who have never placed an orderSELECT u.name, u.emailFROM users uLEFT JOIN orders o ON u.id = o.user_idWHERE o.id IS NULL; -- No matching order found!
Returns all rows from the left table, with matching rows from the right (or NULL if no match).
Copy
-- All users, with their orders (if any)SELECT u.id, u.name, COUNT(o.id) AS order_count, COALESCE(SUM(o.total), 0) AS total_spentFROM users uLEFT JOIN orders o ON u.id = o.user_idGROUP BY u.id, u.name;
Copy
┌─────────────────────────────────────────────────────────────────────┐│ LEFT JOIN │├─────────────────────────────────────────────────────────────────────┤│ ││ USERS ORDERS RESULT ││ ┌───┬───────┐ ┌───┬─────────┐ ┌───┬───────┬────┐ ││ │ 1 │ Alice │ ─────────▶│ 1 │ user: 1 │──────▶│ 1 │ Alice │ O1 │ ││ │ 2 │ Bob │ ─────────▶│ 2 │ user: 1 │──────▶│ 2 │ Alice │ O2 │ ││ │ 3 │ Carol │ ─────────▶│ 3 │ user: 2 │──────▶│ 3 │ Bob │ O3 │ ││ └───┴───────┘ └───┴─────────┘ │ 4 │ Carol │NULL│ ││ └───┴───────┴────┘ ││ ││ Carol included with NULL for order data ││ │└─────────────────────────────────────────────────────────────────────┘
-- RIGHT JOIN: All rows from right tableSELECT u.name, o.id AS order_idFROM users uRIGHT JOIN orders o ON u.id = o.user_id;-- FULL OUTER JOIN: All rows from both tablesSELECT u.name, o.id AS order_idFROM users uFULL OUTER JOIN orders o ON u.id = o.user_id;
-- Orders with user info and product detailsSELECT o.id AS order_id, u.name AS customer, p.name AS product, oi.quantity, oi.unit_price, oi.quantity * oi.unit_price AS line_totalFROM orders oINNER JOIN users u ON o.user_id = u.idINNER JOIN order_items oi ON o.id = oi.order_idINNER JOIN products p ON oi.product_id = p.idWHERE o.created_at > '2024-01-01'ORDER BY o.id, p.name;
Join a table to itself, useful for hierarchical or comparative data.
Copy
-- Employees with their managersSELECT e.name AS employee, m.name AS managerFROM employees eLEFT JOIN employees m ON e.manager_id = m.id;-- Find products with the same category and similar pricesSELECT p1.name AS product1, p2.name AS product2, p1.category, p1.price AS price1, p2.price AS price2FROM products p1INNER JOIN products p2 ON p1.category = p2.category AND p1.id < p2.id -- Avoid duplicates AND ABS(p1.price - p2.price) < 50;
Cartesian product: every row from first table paired with every row from second.
Copy
-- All possible size/color combinationsSELECT s.name AS size, c.name AS colorFROM sizes sCROSS JOIN colors c;-- Generate date series with categoriesSELECT d.date, c.name AS categoryFROM generate_series('2024-01-01', '2024-12-31', '1 day') AS d(date)CROSS JOIN categories c;
-- Users who have placed ordersSELECT * FROM usersWHERE id IN (SELECT DISTINCT user_id FROM orders);-- Products priced above averageSELECT * FROM productsWHERE price > (SELECT AVG(price) FROM products);-- Users with more than 5 ordersSELECT * FROM usersWHERE id IN ( SELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5);
References the outer query — executes once per outer row.
Copy
-- Products with price above their category averageSELECT p1.name, p1.price, p1.categoryFROM products p1WHERE p1.price > ( SELECT AVG(p2.price) FROM products p2 WHERE p2.category = p1.category -- References outer query);-- Users with their latest orderSELECT u.name, (SELECT MAX(created_at) FROM orders o WHERE o.user_id = u.id) AS last_orderFROM users u;
-- Users who have placed at least one orderSELECT u.name, u.emailFROM users uWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id);-- Products that have never been orderedSELECT p.name, p.priceFROM products pWHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.id);
Performance Tip: EXISTS often performs better than IN for large subqueries because it can stop as soon as it finds one match.
-- Monthly revenue with growth calculationSELECT current_month.month, current_month.revenue, prev_month.revenue AS prev_revenue, current_month.revenue - COALESCE(prev_month.revenue, 0) AS growthFROM ( SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue FROM orders GROUP BY DATE_TRUNC('month', created_at)) current_monthLEFT JOIN ( SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue FROM orders GROUP BY DATE_TRUNC('month', created_at)) prev_month ON current_month.month = prev_month.month + INTERVAL '1 month'ORDER BY current_month.month;
WITH monthly_revenue AS ( SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue FROM orders WHERE status = 'completed' GROUP BY DATE_TRUNC('month', created_at))SELECT month, revenue, AVG(revenue) OVER () AS avg_monthly_revenue, revenue - AVG(revenue) OVER () AS diff_from_avgFROM monthly_revenueORDER BY month;
WITH-- Step 1: Calculate user order statsuser_stats AS ( SELECT user_id, COUNT(*) AS order_count, SUM(total) AS total_spent, AVG(total) AS avg_order FROM orders WHERE status = 'completed' GROUP BY user_id),-- Step 2: Classify users into tiersuser_tiers AS ( SELECT user_id, order_count, total_spent, CASE WHEN total_spent >= 10000 THEN 'platinum' WHEN total_spent >= 5000 THEN 'gold' WHEN total_spent >= 1000 THEN 'silver' ELSE 'bronze' END AS tier FROM user_stats),-- Step 3: Count users per tiertier_summary AS ( SELECT tier, COUNT(*) AS user_count, SUM(total_spent) AS tier_revenue FROM user_tiers GROUP BY tier)-- Final resultSELECT tier, user_count, tier_revenue, ROUND(100.0 * tier_revenue / SUM(tier_revenue) OVER (), 2) AS revenue_pctFROM tier_summaryORDER BY tier_revenue DESC;
For hierarchical data like org charts, categories, or graphs.
Copy
-- Employee hierarchy (org chart)WITH RECURSIVE org_chart AS ( -- Base case: top-level employees (no manager) SELECT id, name, manager_id, 1 AS level, name::TEXT AS path FROM employees WHERE manager_id IS NULL UNION ALL -- Recursive case: employees with managers SELECT e.id, e.name, e.manager_id, oc.level + 1, oc.path || ' > ' || e.name FROM employees e INNER JOIN org_chart oc ON e.manager_id = oc.id)SELECT * FROM org_chart ORDER BY path;-- Category tree with all descendantsWITH RECURSIVE category_tree AS ( SELECT id, name, parent_id, 0 AS depth, ARRAY[id] AS path FROM categories WHERE parent_id IS NULL UNION ALL SELECT c.id, c.name, c.parent_id, ct.depth + 1, ct.path || c.id FROM categories c INNER JOIN category_tree ct ON c.parent_id = ct.id)SELECT REPEAT(' ', depth) || name AS category_tree, depth, pathFROM category_treeORDER BY path;
Window functions perform calculations across a set of rows related to the current row, without collapsing them into a single output row like aggregates.
-- Rank products by price within each categorySELECT name, category, price, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS row_num, RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank, DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS dense_rankFROM products;
-- Compare each order to the previous order for the same userSELECT user_id, id AS order_id, total, created_at, LAG(total) OVER (PARTITION BY user_id ORDER BY created_at) AS prev_order_total, total - LAG(total) OVER (PARTITION BY user_id ORDER BY created_at) AS change, LEAD(total) OVER (PARTITION BY user_id ORDER BY created_at) AS next_order_totalFROM orders;-- Calculate month-over-month revenue growthWITH monthly_revenue AS ( SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue FROM orders GROUP BY DATE_TRUNC('month', created_at))SELECT month, revenue, LAG(revenue) OVER (ORDER BY month) AS prev_month, ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month), 2) AS growth_pctFROM monthly_revenue;
-- Running total of salesSELECT created_at::date AS date, total, SUM(total) OVER (ORDER BY created_at) AS running_totalFROM orders;-- 7-day moving averageSELECT date, daily_revenue, AVG(daily_revenue) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS moving_avg_7dFROM daily_sales;-- Running total within each monthSELECT created_at::date AS date, total, SUM(total) OVER ( PARTITION BY DATE_TRUNC('month', created_at) ORDER BY created_at ) AS monthly_running_totalFROM orders;
-- Compare each product to the most expensive in its categorySELECT name, category, price, FIRST_VALUE(name) OVER ( PARTITION BY category ORDER BY price DESC ) AS top_product, FIRST_VALUE(price) OVER ( PARTITION BY category ORDER BY price DESC ) AS top_price, ROUND(100.0 * price / FIRST_VALUE(price) OVER ( PARTITION BY category ORDER BY price DESC ), 2) AS pct_of_topFROM products;
-- Percentile ranksSELECT name, salary, PERCENT_RANK() OVER (ORDER BY salary) AS percentile, NTILE(4) OVER (ORDER BY salary) AS quartileFROM employees;-- Median salary per departmentSELECT DISTINCT department, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department) AS median_salaryFROM employees;
SQL is not evaluated in the same order you write it. Understanding the logical evaluation order explains many “weird” rules (like why you can’t use column aliases in WHERE).Typical written order:
Copy
SELECT ...FROM ...WHERE ...GROUP BY ...HAVING ...WINDOW ... -- e.g., window clause in some dialectsORDER BY ...LIMIT ... OFFSET ...
Logical evaluation order (simplified):
FROM + JOIN (build row set)
WHERE (filter rows)
GROUP BY (form groups)
HAVING (filter groups)
Window functions (OVER (...)) are computed
SELECT (compute final columns/aliases)
DISTINCT
ORDER BY
LIMIT / OFFSET
Key consequences:
WHEREcannot see column aliases defined in SELECT (they don’t exist yet), but ORDER BYcan.
You cannot use aggregates in WHERE (aggregation happens after grouping), but you can use them in HAVING.
Window functions are computed after grouping but before the final SELECT projection.
Copy
-- ❌ This fails: alias not visible in WHERESELECT price * quantity AS line_totalFROM order_itemsWHERE line_total > 100; -- ERROR-- ✅ Use the expression directly or a subquery/CTESELECT price * quantity AS line_totalFROM order_itemsWHERE price * quantity > 100;-- ✅ Or wrap and filter outsideSELECT *FROM ( SELECT price * quantity AS line_total FROM order_items) tWHERE line_total > 100;
Copy
-- ❌ Aggregates not allowed in WHERESELECT user_id, SUM(total) AS total_spentFROM ordersWHERE SUM(total) > 1000; -- ERROR-- ✅ Use HAVING insteadSELECT user_id, SUM(total) AS total_spentFROM ordersGROUP BY user_idHAVING SUM(total) > 1000;
For many problems you can write equivalent queries in different shapes. You should understand the trade-offs:
JOIN: Best when you truly need columns from multiple tables and clear relationships.
Subquery in WHERE / EXISTS: Great for existence checks and filtering sets.
Derived table / CTE: Best for multi-step logic where readability matters.
Copy
-- 1) Join style: users who have placed ordersSELECT DISTINCT u.id, u.nameFROM users uJOIN orders o ON o.user_id = u.id;-- 2) Subquery IN styleSELECT id, nameFROM usersWHERE id IN (SELECT DISTINCT user_id FROM orders);-- 3) EXISTS style (often best for large data)SELECT id, nameFROM users uWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id);
Practical guidance:
Prefer EXISTS when you only care if related rows exist (not how many).
Prefer CTEs when your query has multiple reusable steps (e.g., compute user stats, then bucket into tiers, then summarize). Treat a CTE like a named temp view.
-- Customers who have never placed an orderSELECT c.id, c.nameFROM customers cLEFT JOIN orders o ON o.customer_id = c.idWHERE o.id IS NULL;-- Same idea with NOT EXISTSSELECT c.id, c.nameFROM customers cWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.id);
2. Conditional Aggregation instead of CASE explosions
Instead of multiple joins or subqueries, you can use conditional aggregates to pivot by condition.
Copy
-- Per-customer order stats broken down by statusSELECT user_id, COUNT(*) AS orders_total, SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS orders_completed, SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) AS orders_pending, SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS orders_cancelledFROM ordersGROUP BY user_id;
This pattern is heavily used in dashboards and analytics queries.
A very common requirement: “Top 3 products per category”, “Top 5 students per class”, etc.
Copy
-- Top 3 most expensive products in each categoryWITH ranked AS ( SELECT id, name, category, price, ROW_NUMBER() OVER ( PARTITION BY category ORDER BY price DESC ) AS rn FROM products)SELECT id, name, category, priceFROM rankedWHERE rn <= 3ORDER BY category, price DESC;
This uses the window + filter pattern: rank within group using a window function, then filter by the rank.
Use HAVING for group-level filters (aggregates allowed).
Trap 2: Assuming GROUP BY output is ordered
SQL does not guarantee any order unless you use ORDER BY.
Always add an explicit ORDER BY if order matters.
Trap 3: Relying on vendor-specific behavior
Some engines allow selecting non-grouped columns without aggregates — this is non-standard and can produce surprising results.
Prefer the strict rule: every selected, non-aggregated column must appear in GROUP BY.
Copy
-- Non-portable pattern (allowed by some engines, but ambiguous)SELECT country, nameFROM usersGROUP BY country; -- What name should be picked?-- Portable, deterministic patternSELECT country, COUNT(*) AS user_countFROM usersGROUP BY countryORDER BY user_count DESC;
Understanding how the database optimizer handles subqueries is critical for performance. Here are key optimization patterns:
Correlated vs Non-Correlated Subqueries:
Non-correlated (evaluated once): SELECT * FROM users WHERE department_id IN (SELECT id FROM departments WHERE budget > 100000)
Correlated (evaluated per row): SELECT * FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000)
Subquery Flattening: Modern optimizers convert certain subqueries into JOINs automatically:
Copy
-- This subquery...SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE total > 100);-- May be flattened to this JOIN internally:SELECT DISTINCT u.* FROM users u INNER JOIN orders o ON u.id = o.user_id WHERE o.total > 100;
Semi-Joins and Anti-Joins: IN and EXISTS often become semi-joins, while NOT IN and NOT EXISTS become anti-joins. EXISTS is often more efficient than IN for existence checks.
LATERAL joins allow you to reference columns from preceding tables in the FROM clause, enabling correlated subqueries in a JOIN-like syntax:
Copy
-- Get top 3 orders for each userSELECT u.name, orders_subq.order_id, orders_subq.totalFROM users uCROSS JOIN LATERAL ( SELECT id AS order_id, total FROM orders o WHERE o.user_id = u.id ORDER BY total DESC LIMIT 3) orders_subq;-- Calculate derived values that depend on other table's dataSELECT p.name, p.category, avg_by_category.avg_priceFROM products pCROSS JOIN LATERAL ( SELECT AVG(price) AS avg_price FROM products p2 WHERE p2.category = p.category) avg_by_category;
LATERAL is particularly powerful for:
Correlated aggregations
Top-N per group queries
Complex calculations that reference multiple tables
Calling table functions with parameters from other tables
-- ROWS frame: physical row offsetsSELECT date, sales, AVG(sales) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7-day moving average ) AS weekly_avgFROM daily_sales;-- RANGE frame: logical value-based windowSELECT employee_id, salary, COUNT(*) OVER ( ORDER BY salary RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING -- employees with similar salary ) AS peers_in_salary_rangeFROM employees;-- Groups frame: treating tied values as a single unitSELECT product_id, price, RANK() OVER (ORDER BY price) AS rank, COUNT(*) OVER ( ORDER BY price GROUPS BETWEEN CURRENT ROW AND 2 FOLLOWING -- count products in current and next 2 price groups ) AS products_in_current_and_next_groupsFROM products;
Frame specifications allow for:
Running calculations with specific lookback periods
Comparisons within value ranges
Handling ties consistently in analytical computations
Using a products table with columns: id, name, category, price, stock, created_at:
Copy
-- 1. Find all products in the 'Electronics' category priced under $500-- 2. Find products with low stock (less than 10) that need reordering-- 3. List the 5 most recently added products-- 4. Find products with names containing 'Pro' or 'Ultra'
Solutions
Copy
-- 1SELECT * FROM products WHERE category = 'Electronics' AND price < 500;-- 2SELECT name, stock, category FROM products WHERE stock < 10ORDER BY stock ASC;-- 3SELECT * FROM products ORDER BY created_at DESCLIMIT 5;-- 4SELECT * FROM products WHERE name ILIKE '%pro%' OR name ILIKE '%ultra%';
-- 1. Count products per category-- 2. Find the average price per category, only for categories with avg > $200-- 3. Find the total inventory value (price * stock) per category-- 4. Find categories with more than 50 products
Solutions
Copy
-- 1SELECT category, COUNT(*) as product_count FROM products GROUP BY category ORDER BY product_count DESC;-- 2SELECT category, AVG(price) as avg_price FROM products GROUP BY category HAVING AVG(price) > 200;-- 3SELECT category, SUM(price * stock) as inventory_value FROM products GROUP BY category ORDER BY inventory_value DESC;-- 4SELECT category, COUNT(*) as count FROM products GROUP BY category HAVING COUNT(*) > 50;
Hands-On Lab 2 – Joins & aggregations on e-commerce data
Goal: Become comfortable joining tables and aggregating results.
Create three tables:
Copy
CREATE TABLE lab_customers ( id SERIAL PRIMARY KEY, name TEXT NOT NULL);CREATE TABLE lab_products ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, price NUMERIC(10,2) NOT NULL);CREATE TABLE lab_orders ( id SERIAL PRIMARY KEY, customer_id INT NOT NULL REFERENCES lab_customers(id), product_id INT NOT NULL REFERENCES lab_products(id), quantity INT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW());
Insert at least 5 customers, 5 products, and 20 orders with a mix of quantities.
Write queries to:
List each order with customer name and product name.
Show total quantity and total revenue per customer.
Find the top 3 products by total revenue.
Find customers who have never placed an order (add at least one such customer).
Experiment with turning your queries into LEFT JOINs and see how the result set changes when there are missing rows.
When you can write these joins and aggregations from memory, you’re ready for more advanced SQL.
Copy
-- 1. List all orders with customer name and email-- 2. Find customers who have never placed an order-- 3. Get the top 10 customers by total spending-- 4. List products that have been ordered more than 100 times
Solutions
Copy
-- 1SELECT o.id, o.total, o.created_at, u.name, u.emailFROM orders oINNER JOIN users u ON o.user_id = u.idORDER BY o.created_at DESC;-- 2SELECT u.name, u.emailFROM users uLEFT JOIN orders o ON u.id = o.user_idWHERE o.id IS NULL;-- 3SELECT u.name, u.email, SUM(o.total) as total_spentFROM users uINNER JOIN orders o ON u.id = o.user_idGROUP BY u.id, u.name, u.emailORDER BY total_spent DESCLIMIT 10;-- 4SELECT p.name, COUNT(oi.id) as times_orderedFROM products pINNER JOIN order_items oi ON p.id = oi.product_idGROUP BY p.id, p.nameHAVING COUNT(oi.id) > 100ORDER BY times_ordered DESC;
-- 1. Rank products by price within each category-- 2. For each order, show the running total for that user-- 3. Calculate the percentage each product contributes to its category's total stock-- 4. Find the second-highest priced product in each category
Solutions
Copy
-- 1SELECT name, category, price, RANK() OVER (PARTITION BY category ORDER BY price DESC) as price_rankFROM products;-- 2SELECT user_id, id as order_id, total, created_at, SUM(total) OVER (PARTITION BY user_id ORDER BY created_at) as running_totalFROM orders;-- 3SELECT name, category, stock, ROUND(100.0 * stock / SUM(stock) OVER (PARTITION BY category), 2) as stock_pctFROM products;-- 4WITH ranked AS ( SELECT name, category, price, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) as rn FROM products)SELECT name, category, priceFROM rankedWHERE rn = 2;