Calculus for Machine Learning
The Question That Unlocks AI
You train a neural network. It starts completely random - worse than guessing. You feed it 10,000 images of cats and dogs. An hour later, it’s 95% accurate. What happened in that hour? The network adjusted millions of numbers (weights) until they were “right.” But how did it know which direction to adjust each number? How did it know how much? The answer is calculus. Specifically: derivatives tell the network “if I change this weight by a tiny amount, how much will my error change?” Then it adjusts every weight to reduce the error, step by step, millions of times.Estimated Time: 14-18 hours
Difficulty: Beginner-friendly (we start from scratch)
Prerequisites: Basic Python, Linear Algebra course (or willingness to learn alongside)
What You’ll Build: A neural network that learns - from scratch, no libraries
Difficulty: Beginner-friendly (we start from scratch)
Prerequisites: Basic Python, Linear Algebra course (or willingness to learn alongside)
What You’ll Build: A neural network that learns - from scratch, no libraries
📋 Prerequisite Self-Check
📋 Prerequisite Self-Check
Before starting, make sure you can:✅ Python Basics
- Work with NumPy arrays:
np.array([1, 2, 3]) - Write functions with multiple parameters
- Create simple plots with matplotlib
- Understand list comprehensions
- Vectors: what they are and how to add them
- Dot product:
np.dot(a, b)and what it means - Basic matrix operations (helpful but we’ll review)
- Comfortable with basic graphing (x-y plots)
- Understand slope of a line (rise/run)
- Know that functions take inputs and produce outputs
- Previous calculus experience
- To remember derivative rules from school
- Physics or engineering background
🧪 Quick Diagnostic: Are You Ready?
🧪 Quick Diagnostic: Are You Ready?
Try these checks to gauge your readiness:Slope Check (can you answer this?):
A line goes through points (1, 3) and (4, 9). What is its slope?Vector Check (do you know this?):
What does
np.dot([1, 2, 3], [4, 5, 6]) return?Remediation Paths:| Gap Identified | Recommended Action |
|---|---|
| Slope concept unclear | Khan Academy “Slope of a line” - 20 min |
| Vector/dot product unfamiliar | Vectors Module - 3 hours |
| NumPy basics | Python Crash Course - NumPy section |
| Graphing concepts | YouTube “Reading function graphs” - 30 min |
The Core Insight: Learning = Finding the Bottom of a Hill
Imagine you’re blindfolded, dropped somewhere on a hilly landscape. Your goal: find the lowest point (the valley). You can’t see anything. But you can feel the slope under your feet.- If the ground slopes down to your left, step left
- If it slopes down forward, step forward
- Keep stepping downhill until the ground is flat
- The “landscape” is your error function (how wrong your model is)
- The “position” is your current weights
- The “slope” (derivative) tells you which direction reduces error
- You keep stepping until error is minimized
🔗 ML Connection: This “hill descent” is literally how every major AI system learns:
Every module connects to these real systems!
| AI System | What It’s Optimizing | The “Slope” |
|---|---|---|
| ChatGPT | Predict next word probability | Cross-entropy gradient |
| DALL-E | Match image to text description | Diffusion loss gradient |
| AlphaFold | Protein structure accuracy | Distance & angle gradients |
| Tesla Autopilot | Object detection accuracy | Multi-task loss gradient |
| Spotify Recommendations | User engagement prediction | Ranking loss gradient |
Who Uses This (Companies & Roles)
OpenAI
GPT-4 training uses gradient descent on 175 billion parameters. Understanding calculus = understanding how ChatGPT learns.
Tesla Autopilot
Self-driving AI optimizes millions of weights to detect pedestrians, lanes, and obstacles in real-time.
DeepMind AlphaFold
Solved 50-year protein folding problem using neural networks trained with the exact math you’ll learn here.
| Role | How They Use Calculus | Salary Impact |
|---|---|---|
| ML Engineer | Debug training, implement custom layers, optimize performance | +$30-50K over non-ML roles |
| Research Scientist | Develop new architectures, publish papers, prove convergence | +$50-80K, often PhD required |
| ML Ops Engineer | Optimize training pipelines, reduce compute costs | +$20-40K |
| Data Scientist | Understand why models work, explain to stakeholders | +$15-30K |
What You’ll Actually Learn
Module 1: Derivatives — “Which way is downhill?”
The Real Question: If I change this weight by 0.001, how much does my error change? What You’ll Understand:- Derivatives measure sensitivity (how much output changes when input changes)
- Finding the minimum means finding where the derivative is zero
- Every weight in a neural network has a derivative
Module 2: Gradients — “Which way is MOST downhill?”
The Real Question: I have 1,000 weights. Which combination of changes reduces error the fastest? What You’ll Understand:- A gradient is just a list of derivatives (one per weight)
- It points in the direction of steepest increase
- We go the OPPOSITE direction to decrease error
Module 3: Chain Rule — “How do changes propagate through layers?”
The Real Question: In a 50-layer neural network, how does changing a weight in layer 1 affect the final output? What You’ll Understand:- Nested functions: the output of layer 1 becomes input to layer 2, etc.
- Chain rule: multiply the derivatives along the chain
- Backpropagation: computing all derivatives efficiently, from output back to input
Module 4: Gradient Descent — “Taking steps downhill”
The Real Question: How big should each step be? When should we stop? What You’ll Understand:- Learning rate: step too big = overshoot, step too small = takes forever
- Convergence: knowing when you’ve reached the bottom
- Local minima: getting stuck in small valleys instead of the deepest one
Module 5: Optimization — “Getting there faster”
The Real Question: Gradient descent is slow. How do we speed it up? What You’ll Understand:- Momentum: build up speed when going in a consistent direction
- Adam: adapt the step size for each weight individually
- Why Adam is the default choice for most deep learning
Your Learning Journey
1
Week 1: Derivatives
Understand what derivatives really mean. Build a price optimizer.
2
Week 2: Gradients
Handle multiple variables at once. Optimize price AND marketing spend together.
3
Week 3: Chain Rule
Understand how changes propagate through layers. Implement backpropagation.
4
Week 4: Gradient Descent
Build a complete training loop. Watch your model learn.
5
Week 5: Final Project
Build a neural network from scratch using ONLY NumPy. No TensorFlow. No PyTorch.
Prerequisites
What You Need:- Basic Python (variables, functions, loops)
- Linear Algebra course (or take it alongside - they complement each other)
- Curiosity about how AI actually works
- Previous calculus knowledge (we start from zero)
- Memorized derivative formulas (we focus on understanding)
- Mathematical proofs (we focus on intuition and code)
Setup
🎮 Interactive Visualization Tools
Calculus comes alive when you can see it. Use these tools alongside the course:3Blue1Brown: Essence of Calculus
Beautiful visualizations of derivatives, integrals, and why they matter. Watch the first 3 videos before Module 1.
Desmos Graphing Calculator
Plot functions, visualize derivatives as tangent lines, see how slope changes. Use throughout the course.
Gradient Descent Visualizer
Watch gradient descent optimize in real-time on different loss surfaces. Perfect for Module 4.
TensorFlow Playground
See neural networks learn live. Adjust architecture, watch loss decrease. Great after Module 5.
🔗 When to Use These Tools:
- Module 1 (Derivatives): Desmos - plot f(x), add tangent lines, see slopes
- Module 2 (Gradients): 3D surface plots in our notebooks
- Module 3 (Chain Rule): Our interactive backprop visualizer
- Module 4 (Gradient Descent): Gradient Descent Visualizer website
- Module 5 (Final Project): TensorFlow Playground after you build your own!
🚀 Going Deeper: For Advanced Learners
🚀 Going Deeper: For Advanced Learners
Want more mathematical rigor? Each module includes optional “Going Deeper” sections:
These sections are OPTIONAL. You can build neural networks and understand gradient descent without them. They’re for learners who:
| Module | Advanced Topic | Why It Matters |
|---|---|---|
| Derivatives | Limits, continuity, formal definition | Understand convergence proofs in ML papers |
| Gradients | Jacobian matrices, Hessians | Understand second-order optimization methods |
| Chain Rule | Computational graphs, automatic differentiation | How PyTorch/JAX actually work |
| Optimization | Convexity, convergence rates, saddle points | Why certain architectures train better |
- Want to read ML research papers
- Are curious about optimization theory
- Plan to implement custom autograd systems
- Calculus Made Easy by Silvanus Thompson (classic, intuitive)
- Convex Optimization by Boyd & Vandenberghe (free online)
- Fast.ai’s “Practical Deep Learning” course (connects calculus to real training)
What You’ll Build
Price Optimizer
Given a profit function, automatically find the price that maximizes profit using derivatives.
Multi-Variable Optimizer
Optimize both price and ad spend simultaneously using gradients.
Backpropagation Engine
Implement the chain rule to compute gradients through multiple layers.
Neural Network (From Scratch)
Build a complete neural network that learns XOR - using only NumPy.
Interview Preparation: What Companies Ask
FAANG-Level Questions
FAANG-Level Questions
Google/Meta/Amazon commonly ask:
- “Explain how backpropagation works” (Chain Rule module)
- “Why might training get stuck? How do you fix it?” (Optimization module)
- “What happens if learning rate is too high/low?” (Gradient Descent module)
- “Derive the gradient for a simple loss function” (Derivatives module)
Startup ML Engineer Questions
Startup ML Engineer Questions
Fast-growing startups focus on:
- “Walk me through training a neural network from scratch”
- “How would you debug a model that’s not learning?”
- “Why do we use Adam over SGD?”
- “Explain vanishing/exploding gradients”
Research Scientist Questions
Research Scientist Questions
Research-focused roles ask:
- “Prove that gradient descent converges for convex functions”
- “What are second-order optimization methods?”
- “Explain the mathematical foundations of attention mechanisms”
- “Derive backprop for a custom activation function”
Why This Course Exists
Most calculus courses teach you to solve problems like: “Find the derivative of ” And you learn: “Use the power rule: ” But nobody tells you WHY. Why do neural networks need derivatives? How does PyTorch compute gradients automatically? Why does “learning rate = 0.01” work better than “learning rate = 1.0”? This course answers those questions. By the end, you won’t just know formulas - you’ll understand the engine that makes AI learn.By The End of This Course
You will:- Understand why every ML framework computes gradients
- Build a neural network that actually learns (from scratch!)
- Debug training problems because you understand what’s happening
- Read ML papers and understand the math notation
- Choose the right optimizer for your problem
Let’s Begin
The next module starts with a simple question: “You own a business. What price should you charge to maximize profit?” The answer will teach you what derivatives really mean.Next: Derivatives
Learn what derivatives actually measure and why neural networks need them