Skip to main content
Derivatives & Rates of Change

Derivatives & Rates of Change

Your Challenge: The Pricing Problem

You just launched your online store selling wireless headphones. Exciting! But now you face a critical decision: What price should you charge? You experiment with different prices over several weeks:
  • **Week 1 (30/pair):1,000customersbought!But...yourprofitwasonly30/pair)**: 1,000 customers bought! But... your profit was only 10,000
    • “Great sales, but I’m barely making money after costs ($20/pair)”
  • **Week 2 (100/pair):Only200customersbought.Profit:100/pair)**: Only 200 customers bought. Profit: 16,000
    • “Better profit per sale, but I’m losing too many customers!”
  • **Week 3 (50/pair):800customers.Profit:50/pair)**: 800 customers. Profit: 24,000
    • “Getting better… but is this the best I can do?”
Your Question: “There must be a sweet spot - a price that maximizes my profit. But how do I find it without testing every single price?”

The Slow Way (What You’re Doing Now)

You could test 100 different prices, one per week. That would take 2 years and cost you thousands in lost revenue!

The Fast Way (What You’ll Learn)

There’s a better approach: Derivatives Instead of blindly testing prices, derivatives tell you:
  • At $30: “Increase price → profit will go UP”
  • At $75: “Perfect! Any change makes profit go DOWN”
  • At $100: “Decrease price → profit will go UP”
Result: You find the optimal price (75)inminutes,notyears.Yourprofitjumpsto75) in minutes, not years. Your profit jumps to 30,250/month!

What You’ll Be Able To Do

By the end of this module, you’ll answer questions like: Your Business: What price maximizes YOUR profit?
Your Learning: How many hours should YOU study for maximum score?
Your ML Models: How should YOU adjust weights to reduce errors?
Your Life: What’s YOUR optimal speed to minimize fuel consumption?
Your tool: Derivatives - the mathematical way to find optimal solutions.
Estimated Time: 3-4 hours
Difficulty: Beginner
Prerequisites: Basic algebra
You’ll Build: Your own pricing optimizer, learning rate finder, and simple neural network

Your Problem: Finding the Pattern

Let’s model your business mathematically and visualize your pricing landscape: Your Pricing Landscape What this shows:
  • The green curve is your profit at different prices
  • Red dots are the prices you tested
  • The gold star is the optimal price ($75)
  • Arrows show which direction the derivative tells you to move
import numpy as np
import matplotlib.pyplot as plt

def profit(price):
    """
    Your profit model:
    - At $30: 1000 customers
    - Lose 10 customers for every $1 price increase
    - Cost per headphone: $20
    """
    customers = 1300 - 10 * price
    profit_per_sale = price - 20  # price minus cost
    return customers * profit_per_sale

# Visualize your pricing landscape
prices = np.linspace(20, 130, 1000)
profits = [profit(p) for p in prices]

plt.figure(figsize=(12, 6))
plt.plot(prices, profits, linewidth=3, color='#10b981', label='Your Profit')

# Mark your experiments
plt.scatter([30, 50, 100], [profit(30), profit(50), profit(100)], 
           s=200, c='red', zorder=5, label='You tested these')

# Mark the optimal
plt.scatter([75], [profit(75)], s=300, c='gold', marker='*', 
           zorder=6, label='Optimal (you\'ll find this!)')

plt.xlabel('Price ($)', fontsize=14)
plt.ylabel('Your Monthly Profit ($)', fontsize=14)
plt.title('Your Pricing Landscape', fontsize=16, fontweight='bold')
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()

print("Your experiments:")
print(f"  $30: ${profit(30):,.0f} profit")
print(f"  $50: ${profit(50):,.0f} profit")  
print(f"  $100: ${profit(100):,.0f} profit")
print(f"\nOptimal price: $75 → ${profit(75):,.0f} profit ⭐")
Your Insight: “The graph shows a hill! I need to find the peak. But how?”

Enter: The Derivative (Your Solution)

What You Need to Know

At any price, you need to answer: “If I increase my price by $1, does my profit go up or down?” This is EXACTLY what a derivative tells you! Derivative = Rate of Change
def profit(price):
    customers = 1300 - 10 * price
    return (price - 20) * customers

# Your current price
your_price = 50

# "If I increase my price by $1, how much does my profit change?"
small_increase = 1
profit_now = profit(your_price)
profit_after = profit(your_price + small_increase)
change_in_profit = profit_after - profit_now

print(f"At your current price of ${your_price}:")
print(f"  Your profit now: ${profit_now:,.0f}")
print(f"  Your profit at ${your_price + small_increase}: ${profit_after:,.0f}")
print(f"  Change: ${change_in_profit:,.0f}")
print(f"  → Derivative ≈ {change_in_profit}")
print(f"     (your profit changes by ${change_in_profit} per $1 price increase)")

if change_in_profit > 0:
    print(f"\n  ✅ Your profit is INCREASING → You should raise your price!")
elif change_in_profit < 0:
    print(f"\n  ❌ Your profit is DECREASING → You should lower your price!")
else:
    print(f"\n  ⭐ Your profit is at MAXIMUM → You found the perfect price!")
Output:
At your current price of $50:
  Your profit now: $24,000
  Your profit at $51: $24,490
  Change: $490
  → Derivative ≈ 490
     (your profit changes by $490 per $1 price increase)

  ✅ Your profit is INCREASING → You should raise your price!
Your Reaction: “Wow! At 50,Ishouldincreasemyprice.Eachdollarincreaseadds50, I should increase my price. Each dollar increase adds 490 to my profit!”

What Is a Derivative? (The Intuitive Explanation)

Everyday Analogy: Your Car’s Speedometer

Think about driving a car: 🚗 Position = where you are (e.g., mile marker 50)
📊 Speed = how fast your position is changing (e.g., 60 mph)
Acceleration = how fast your speed is changing (e.g., +5 mph/second)
The speedometer shows your derivative! It tells you: “Right now, at this exact moment, you’re going 60 mph.” Mathematically:
  • Position = f(t)f(t) (function of time)
  • Speed = f(t)f'(t) (derivative of position)
  • Acceleration = f(t)f''(t) (derivative of derivative)

Mathematical Definition (Now It Makes Sense!)

Derivative = Rate of change
“If I increase x by a tiny amount, how much does f(x) change?”
Formula: f(x)=limh0f(x+h)f(x)hf'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} In plain English:
  1. Move a tiny bit to the right (x → x+h)
  2. See how much f(x) changed
  3. Divide change in f by change in x
  4. Make h smaller and smaller (approaching zero)

Geometric View: The Tangent Line

Derivative as Slope The derivative at a point = slope of the tangent line Why tangent line?
  • Secant line: connects two points (average rate of change)
  • Tangent line: touches at ONE point (instantaneous rate of change)
  • As points get closer, secant → tangent

Computing a Derivative Numerically

Let’s compute the derivative of f(x)=x2f(x) = x^2 at x=3x = 3:
import numpy as np

def f(x):
    """Our function: f(x) = x²"""
    return x**2

# We want the derivative at x=3
x = 3

# Method 1: Numerical approximation
print("=== Numerical Approximation ===")
for h in [0.1, 0.01, 0.001, 0.0001]:
    # Compute slope of secant line
    df = f(x + h) - f(x)  # Change in f
    dx = h                 # Change in x
    derivative_approx = df / dx
    
    print(f"h = {h:7.4f} → f'(3) ≈ {derivative_approx:.6f}")

print("\n=== Exact Answer ===")
# For f(x) = x², the derivative is f'(x) = 2x
exact_derivative = 2 * x
print(f"f'(3) = 2×3 = {exact_derivative}")

print("\n=== Interpretation ===")
print(f"At x=3, if we increase x by 1, f(x) increases by approximately {exact_derivative}")
print(f"At x=3, the function is rising with a slope of {exact_derivative}")
Output:
=== Numerical Approximation ===
h =  0.1000 → f'(3) ≈ 6.100000
h =  0.0100 → f'(3) ≈ 6.010000
h =  0.0010 → f'(3) ≈ 6.001000
h =  0.0001 → f'(3) ≈ 6.000100

=== Exact Answer ===
f'(3) = 2×3 = 6

=== Interpretation ===
At x=3, if we increase x by 1, f(x) increases by approximately 6
At x=3, the function is rising with a slope of 6
Key Insights: ✅ As h gets smaller, our approximation gets better
✅ The derivative is the instantaneous rate of change
✅ At x=3, the function x2x^2 is rising steeply (slope = 6)
✅ This tells us: small changes in x cause BIG changes in f(x)

Why This Matters for Machine Learning

In ML, we have a loss function L(w)L(w) where ww = model weights:
# Simplified neural network
def loss(weight):
    """How wrong our predictions are"""
    predictions = weight * data
    errors = predictions - true_values
    return np.mean(errors**2)

# The derivative tells us:
# "If I increase this weight slightly, does loss go up or down?"

dL_dw = compute_derivative(loss, weight)

if dL_dw > 0:
    # Loss increases when weight increases
    # → Decrease weight to reduce loss!
    weight = weight - learning_rate * dL_dw
else:
    # Loss decreases when weight increases  
    # → Increase weight to reduce loss!
    weight = weight - learning_rate * dL_dw
This is gradient descent - the algorithm that powers ALL of machine learning!

Example 1: Minimizing Business Costs

The Problem

You’re optimizing ad spending. Your cost function is: C(x)=x210x+100C(x) = x^2 - 10x + 100 Where xx is ad spend in thousands of dollars. Goal: Find the spending level that minimizes cost.

Step 1: Understand the Function

def cost(x):
    return x**2 - 10*x + 100

# Visualize
x_values = np.linspace(0, 10, 100)
costs = [cost(x) for x in x_values]

plt.plot(x_values, costs)
plt.xlabel('Ad Spend ($1000s)')
plt.ylabel('Total Cost ($)')
plt.title('Cost vs. Ad Spend')
plt.grid(True)
plt.show()

Step 2: Compute the Derivative

Derivative of C(x)=x210x+100C(x) = x^2 - 10x + 100: C(x)=2x10C'(x) = 2x - 10
def cost_derivative(x):
    return 2*x - 10

# At x=3
x = 3
slope = cost_derivative(x)
print(f"At x={x}, slope = {slope}")  # -4

# Interpretation:
# Negative slope → cost is decreasing
# We should increase x!

Step 3: Find the Minimum

At the minimum, the derivative = 0 (flat tangent line) C(x)=02x10=0x=5C'(x) = 0 \\ 2x - 10 = 0 \\ x = 5
# Optimal ad spend
optimal_x = 5
min_cost = cost(optimal_x)

print(f"Optimal ad spend: ${optimal_x},000")
print(f"Minimum cost: ${min_cost}")

# Verify it's a minimum
print(f"Slope at x=4: {cost_derivative(4)}")  # -2 (decreasing)
print(f"Slope at x=5: {cost_derivative(5)}")  # 0 (flat)
print(f"Slope at x=6: {cost_derivative(6)}")  # 2 (increasing)
Key Insight:
  • Derivative < 0 → function decreasing → move right
  • Derivative = 0 → potential minimum/maximum
  • Derivative > 0 → function increasing → move left
Real Application: Google Ads uses derivatives to optimize bidding strategies for millions of advertisers!

Example 2: Optimizing Student Learning

The Problem

A student’s test score depends on study hours: S(h)=h2+12h+20S(h) = -h^2 + 12h + 20 Where hh is hours studied per day. Question: How many hours should they study to maximize their score?

Understanding the Relationship

def score(hours):
    return -hours**2 + 12*hours + 20

# Visualize
hours = np.linspace(0, 15, 100)
scores = [score(h) for h in hours]

plt.plot(hours, scores)
plt.xlabel('Study Hours per Day')
plt.ylabel('Test Score')
plt.title('Study Hours vs. Test Score')
plt.axhline(y=0, color='k', linestyle='--', alpha=0.3)
plt.grid(True)
plt.show()
Observation: Too few hours → low score. Too many hours → burnout, score decreases!

Finding the Optimal Study Time

Derivative: S(h)=2h+12S'(h) = -2h + 12
def score_derivative(h):
    return -2*h + 12

# Find where derivative = 0
# -2h + 12 = 0
# h = 6

optimal_hours = 6
max_score = score(optimal_hours)

print(f"Optimal study time: {optimal_hours} hours/day")
print(f"Maximum score: {max_score}")

# Check the derivative
print(f"\\nAt h=5: slope = {score_derivative(5)}")  # 2 (increasing)
print(f"At h=6: slope = {score_derivative(6)}")  # 0 (maximum!)
print(f"At h=7: slope = {score_derivative(7)}")  # -2 (decreasing)
Interpretation:
  • Before 6 hours: More study → higher score (positive derivative)
  • At 6 hours: Perfect balance (zero derivative)
  • After 6 hours: More study → lower score due to burnout (negative derivative)
Real Application: Khan Academy uses similar models to recommend optimal practice time for students!

Example 3: Tuning Recommendation Systems

The Problem

Netflix wants to tune a recommendation parameter α\alpha to minimize prediction error: E(α)=(α0.8)2+0.1E(\alpha) = (\alpha - 0.8)^2 + 0.1 Goal: Find the α\alpha that minimizes error.

Visualizing the Error

def error(alpha):
    return (alpha - 0.8)**2 + 0.1

# Visualize
alphas = np.linspace(0, 2, 100)
errors = [error(a) for a in alphas]

plt.plot(alphas, errors)
plt.xlabel('Parameter α')
plt.ylabel('Prediction Error')
plt.title('Recommendation Error vs. Parameter')
plt.grid(True)
plt.show()

Finding Optimal Parameter

Derivative: E(α)=2(α0.8)E'(\alpha) = 2(\alpha - 0.8)
def error_derivative(alpha):
    return 2*(alpha - 0.8)

# Find minimum: E'(α) = 0
# 2(α - 0.8) = 0
# α = 0.8

optimal_alpha = 0.8
min_error = error(optimal_alpha)

print(f"Optimal α: {optimal_alpha}")
print(f"Minimum error: {min_error}")

# Gradient descent simulation
alpha = 0.2  # Start with bad guess
learning_rate = 0.1
history = [alpha]

for step in range(10):
    gradient = error_derivative(alpha)
    alpha = alpha - learning_rate * gradient
    history.append(alpha)
    print(f"Step {step+1}: α={alpha:.4f}, error={error(alpha):.4f}")

# Visualize convergence
plt.plot(history, marker='o')
plt.xlabel('Step')
plt.ylabel('α value')
plt.title('Gradient Descent Convergence')
plt.axhline(y=0.8, color='r', linestyle='--', label='Optimal')
plt.legend()
plt.grid(True)
plt.show()
Key Insight: This is exactly how machine learning works!
  1. Start with random parameters
  2. Compute derivative (gradient)
  3. Move in opposite direction of gradient
  4. Repeat until convergence
Real Application: Netflix uses gradient descent to tune thousands of parameters in their recommendation system!

Derivative Rules

Now that you understand WHY derivatives matter, here are the rules:

Power Rule

ddxxn=nxn1\frac{d}{dx}x^n = nx^{n-1}
# Examples
# d/dx (x²) = 2x
# d/dx (x³) = 3x²
# d/dx (x⁻¹) = -x⁻²

def power_rule_derivative(n):
    """Returns derivative function for x^n"""
    return lambda x: n * x**(n-1)

# Derivative of x²
f_prime = power_rule_derivative(2)
print(f"d/dx(x²) at x=3: {f_prime(3)}")  # 6

Complete Derivative Rules Reference

Here’s your cheat sheet. Bookmark this page!

Basic Rules

RuleFormulaExample
Constantddx(c)=0\frac{d}{dx}(c) = 0ddx(5)=0\frac{d}{dx}(5) = 0
Powerddx(xn)=nxn1\frac{d}{dx}(x^n) = nx^{n-1}ddx(x4)=4x3\frac{d}{dx}(x^4) = 4x^3
Constant Multipleddx(cf)=cdfdx\frac{d}{dx}(cf) = c\frac{df}{dx}ddx(3x2)=6x\frac{d}{dx}(3x^2) = 6x
Sumddx(f+g)=dfdx+dgdx\frac{d}{dx}(f+g) = \frac{df}{dx} + \frac{dg}{dx}ddx(x2+x)=2x+1\frac{d}{dx}(x^2 + x) = 2x + 1
Differenceddx(fg)=dfdxdgdx\frac{d}{dx}(f-g) = \frac{df}{dx} - \frac{dg}{dx}ddx(x3x)=3x21\frac{d}{dx}(x^3 - x) = 3x^2 - 1

Product & Quotient Rules

RuleFormulaMemory Trick
Product(fg)=fg+fg(fg)' = f'g + fg'”First times derivative of second, plus second times derivative of first”
Quotient(fg)=fgfgg2\left(\frac{f}{g}\right)' = \frac{f'g - fg'}{g^2}”Low d-high minus high d-low, over low squared”

Chain Rule

ddxf(g(x))=f(g(x))g(x)\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x) Memory trick: “Derivative of outside times derivative of inside”

Common Functions

FunctionDerivativeML Application
exe^xexe^xSoftmax, exponential growth
ln(x)\ln(x)1x\frac{1}{x}Log-likelihood, cross-entropy
sin(x)\sin(x)cos(x)\cos(x)Positional encodings
cos(x)\cos(x)sin(x)-\sin(x)Signal processing
11+ex\frac{1}{1+e^{-x}} (sigmoid)σ(x)(1σ(x))\sigma(x)(1-\sigma(x))Activation functions
max(0,x)\max(0, x) (ReLU){1x>00x0\begin{cases}1 & x > 0\\0 & x \leq 0\end{cases}Neural network activations
tanh(x)\tanh(x)1tanh2(x)1 - \tanh^2(x)Activation functions

Worked Examples: Applying the Rules

Example 1: Polynomial f(x)=3x42x3+5x7f(x) = 3x^4 - 2x^3 + 5x - 7 Using power rule and sum rule: f(x)=3(4x3)2(3x2)+5(1)0=12x36x2+5f'(x) = 3(4x^3) - 2(3x^2) + 5(1) - 0 = 12x^3 - 6x^2 + 5 Example 2: Product Rule h(x)=x2exh(x) = x^2 \cdot e^x Let f=x2f = x^2 and g=exg = e^x: h(x)=(2x)(ex)+(x2)(ex)=ex(2x+x2)=exx(x+2)h'(x) = (2x)(e^x) + (x^2)(e^x) = e^x(2x + x^2) = e^x \cdot x(x + 2) Example 3: Quotient Rule q(x)=x2x+1q(x) = \frac{x^2}{x + 1} Let f=x2f = x^2 and g=x+1g = x + 1: q(x)=(2x)(x+1)(x2)(1)(x+1)2=2x2+2xx2(x+1)2=x2+2x(x+1)2q'(x) = \frac{(2x)(x+1) - (x^2)(1)}{(x+1)^2} = \frac{2x^2 + 2x - x^2}{(x+1)^2} = \frac{x^2 + 2x}{(x+1)^2} Example 4: Chain Rule y=(3x+1)5y = (3x + 1)^5 Let outer f(u)=u5f(u) = u^5 and inner g(x)=3x+1g(x) = 3x + 1: y=5(3x+1)43=15(3x+1)4y' = 5(3x + 1)^4 \cdot 3 = 15(3x + 1)^4
import numpy as np

# Verify chain rule example numerically
def y(x):
    return (3*x + 1)**5

def y_prime(x):
    return 15 * (3*x + 1)**4

x = 2
h = 0.0001
numerical = (y(x + h) - y(x)) / h
analytical = y_prime(x)

print(f"Numerical:  {numerical:.2f}")
print(f"Analytical: {analytical}")
# Both should be 31752015

ML-Specific Derivatives You’ll Use Often

Sigmoid Function: σ(x)=11+ex,σ(x)=σ(x)(1σ(x))\sigma(x) = \frac{1}{1 + e^{-x}}, \quad \sigma'(x) = \sigma(x)(1 - \sigma(x))
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

# At x=0, sigmoid = 0.5, derivative = 0.25 (maximum!)
print(f"σ(0) = {sigmoid(0)}")         # 0.5
print(f"σ'(0) = {sigmoid_derivative(0)}")  # 0.25
Mean Squared Error Loss: L=1n(ypredytrue)2,Lypred=2n(ypredytrue)L = \frac{1}{n}\sum(y_{pred} - y_{true})^2, \quad \frac{\partial L}{\partial y_{pred}} = \frac{2}{n}(y_{pred} - y_{true}) Cross-Entropy Loss: L=ytruelog(ypred),Lypred=ytrueypredL = -\sum y_{true} \log(y_{pred}), \quad \frac{\partial L}{\partial y_{pred}} = -\frac{y_{true}}{y_{pred}}

Constant Rule

ddxc=0\frac{d}{dx}c = 0 Why? Constants don’t change!

Sum Rule

ddx[f(x)+g(x)]=f(x)+g(x)\frac{d}{dx}[f(x) + g(x)] = f'(x) + g'(x)
# Example: f(x) = x² + 3x + 5
# f'(x) = 2x + 3 + 0 = 2x + 3

def f(x):
    return x**2 + 3*x + 5

def f_derivative(x):
    return 2*x + 3

# Verify numerically
x = 4
h = 0.0001
numerical = (f(x+h) - f(x)) / h
analytical = f_derivative(x)

print(f"Numerical: {numerical:.4f}")
print(f"Analytical: {analytical}")

Product Rule

ddx[f(x)g(x)]=f(x)g(x)+f(x)g(x)\frac{d}{dx}[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)
# Example: h(x) = x² · sin(x)
# h'(x) = 2x·sin(x) + x²·cos(x)

import numpy as np

def h(x):
    return x**2 * np.sin(x)

def h_derivative(x):
    return 2*x*np.sin(x) + x**2*np.cos(x)

x = 2
print(f"h'({x}) = {h_derivative(x):.4f}")

Chain Rule (Preview)

ddxf(g(x))=f(g(x))g(x)\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x) We’ll cover this in depth in Module 3!

Higher-Order Derivatives

Second Derivative

The derivative of the derivative! f(x)=d2dx2f(x)f''(x) = \frac{d^2}{dx^2}f(x) Interpretation: How fast is the rate of change changing?
# Example: f(x) = x³
# f'(x) = 3x²
# f''(x) = 6x

def f(x):
    return x**3

def f_prime(x):
    return 3*x**2

def f_double_prime(x):
    return 6*x

x = 2
print(f"f({x}) = {f(x)}")
print(f"f'({x}) = {f_prime(x)}")  # Rate of change
print(f"f''({x}) = {f_double_prime(x)}")  # Acceleration
Physical Interpretation:
  • f(x)f(x) = position
  • f(x)f'(x) = velocity (rate of change of position)
  • f(x)f''(x) = acceleration (rate of change of velocity)

Concavity

Second derivative tells you about curvature:
  • f(x)>0f''(x) > 0 → Concave up (smiling face) → Local minimum
  • f(x)<0f''(x) < 0 → Concave down (frowning face) → Local maximum
  • f(x)=0f''(x) = 0 → Inflection point
# Cost function: C(x) = x² - 10x + 100
# C'(x) = 2x - 10
# C''(x) = 2

# Since C''(x) = 2 > 0 everywhere, function is always concave up
# So x=5 (where C'(x)=0) is definitely a MINIMUM!

def cost(x):
    return x**2 - 10*x + 100

def cost_second_derivative(x):
    return 2

x = 5
print(f"At x={x}:")
print(f"Second derivative: {cost_second_derivative(x)}")
print(f"→ Concave up → This is a minimum!")

Numerical Derivatives

When you can’t compute derivatives analytically:

Forward Difference

f(x)f(x+h)f(x)hf'(x) \approx \frac{f(x+h) - f(x)}{h}

Central Difference (More Accurate)

f(x)f(x+h)f(xh)2hf'(x) \approx \frac{f(x+h) - f(x-h)}{2h}
def numerical_derivative(f, x, h=1e-5, method='central'):
    """Compute derivative numerically"""
    if method == 'forward':
        return (f(x + h) - f(x)) / h
    elif method == 'central':
        return (f(x + h) - f(x - h)) / (2 * h)
    else:
        raise ValueError("Method must be 'forward' or 'central'")

# Test on f(x) = x²
def f(x):
    return x**2

x = 3
exact = 2*x  # Analytical derivative

forward = numerical_derivative(f, x, method='forward')
central = numerical_derivative(f, x, method='central')

print(f"Exact: {exact}")
print(f"Forward difference: {forward:.6f}")
print(f"Central difference: {central:.6f}")
When to use:
  • Complex functions without closed-form derivatives
  • Debugging analytical derivatives
  • Quick prototyping

Practice Exercises

Exercise 1: Profit Maximization

# A company's profit function is:
# P(x) = -2x² + 40x - 100
# where x is production quantity in thousands

# TODO:
# 1. Find the derivative P'(x)
# 2. Find the production quantity that maximizes profit
# 3. What is the maximum profit?
# 4. Verify it's a maximum using the second derivative

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises connect derivatives to decisions you make every day - from pricing to fitness to driving.

Exercise 1: Uber Surge Pricing 🚕

Uber uses dynamic pricing. When demand is high, prices surge. Model this:
import numpy as np

# Revenue = Price × Rides
# As price increases, rides decrease
# rides(price) = 1000 - 5*price (linear demand)
# revenue(price) = price × (1000 - 5*price)

# Uber's costs are $2 per ride
# profit(price) = revenue - costs

# TODO:
# 1. Write the profit function
# 2. Find the derivative
# 3. Find the optimal surge multiplier
# 4. What happens to optimal price if demand doubles?
import numpy as np

def rides(price):
    """Number of ride requests at given price"""
    return 1000 - 5 * price

def revenue(price):
    """Total revenue = price × quantity"""
    return price * rides(price)

def profit(price):
    """Profit = revenue - costs ($2 per ride)"""
    return revenue(price) - 2 * rides(price)
    # = price * (1000 - 5*price) - 2 * (1000 - 5*price)
    # = (price - 2) * (1000 - 5*price)
    # = -5*price² + 1010*price - 2000

def profit_derivative(price):
    """d(profit)/d(price) = -10*price + 1010"""
    return -10 * price + 1010

print("🚕 Uber Surge Pricing Optimization")
print("=" * 50)

# Find optimal price (set derivative = 0)
optimal_price = 1010 / 10  # = 101
print(f"\n📊 Normal Demand Scenario:")
print(f"   Optimal price: ${optimal_price:.2f}")
print(f"   Expected rides: {rides(optimal_price):.0f}")
print(f"   Maximum profit: ${profit(optimal_price):,.2f}")

# What if demand doubles?
# rides(price) = 2000 - 5*price
def profit_high_demand(price):
    rides_high = 2000 - 5 * price
    return (price - 2) * rides_high

def profit_derivative_high(price):
    return -10 * price + 2010

optimal_high = 2010 / 10  # = 201
print(f"\n📈 High Demand Scenario (2× demand):")
print(f"   Optimal price: ${optimal_high:.2f}")
print(f"   Profit increase: {profit_high_demand(optimal_high)/profit(optimal_price):.1f}x")

# Verify with numerical check
prices = np.linspace(50, 200, 100)
profits = [profit(p) for p in prices]
numerical_optimal = prices[np.argmax(profits)]
print(f"\n✅ Verification (numerical): ${numerical_optimal:.1f}")
Real-World Insight: This is exactly how Uber’s pricing algorithm works! They continuously estimate demand curves and adjust prices to maximize profit while balancing rider satisfaction.

Exercise 2: Optimal Study Time 📚

You’re studying for an exam. More study time = higher score, but with diminishing returns:
# Score model (realistic diminishing returns):
# score(hours) = 100 × (1 - e^(-0.3 × hours))
# 
# But studying has a cost: fatigue reduces retention
# effective_score(hours) = score(hours) - 2 × hours

# TODO:
# 1. Find the derivative of effective_score
# 2. Find optimal study hours
# 3. What's your expected score?
# 4. Plot the curve to visualize
import numpy as np

def score(hours):
    """Base score: 100 × (1 - e^(-0.3h))"""
    return 100 * (1 - np.exp(-0.3 * hours))

def fatigue_cost(hours):
    """Fatigue penalty: 2 points per hour"""
    return 2 * hours

def effective_score(hours):
    """Net score after fatigue"""
    return score(hours) - fatigue_cost(hours)

def score_derivative(hours):
    """d(score)/dh = 100 × 0.3 × e^(-0.3h) = 30 × e^(-0.3h)"""
    return 30 * np.exp(-0.3 * hours)

def effective_derivative(hours):
    """d(effective_score)/dh = 30 × e^(-0.3h) - 2"""
    return score_derivative(hours) - 2

print("📚 Optimal Study Time Analysis")
print("=" * 50)

# Find optimal: 30 × e^(-0.3h) - 2 = 0
# e^(-0.3h) = 2/30 = 1/15
# -0.3h = ln(1/15)
# h = -ln(1/15) / 0.3

optimal_hours = -np.log(1/15) / 0.3
print(f"\n🎯 Optimal study time: {optimal_hours:.1f} hours")
print(f"   Base score: {score(optimal_hours):.1f}")
print(f"   Fatigue cost: -{fatigue_cost(optimal_hours):.1f}")
print(f"   Effective score: {effective_score(optimal_hours):.1f}")

# Compare with over-studying
over_study = 15
print(f"\n⚠️  Comparison: Studying {over_study} hours:")
print(f"   Base score: {score(over_study):.1f}")
print(f"   Fatigue cost: -{fatigue_cost(over_study):.1f}")
print(f"   Effective score: {effective_score(over_study):.1f}")
print(f"   You lost {effective_score(optimal_hours) - effective_score(over_study):.1f} points!")

# Diminishing returns table
print("\n📊 Diminishing Returns:")
print("   Hours | Score Gain | Marginal Gain")
print("   ------|------------|-------------")
for h in [0, 2, 4, 6, 8, 10]:
    gain = score(h)
    marginal = score_derivative(h) if h > 0 else 30
    print(f"   {h:5} | {gain:10.1f} | {marginal:13.2f} pts/hr")
Real-World Insight: This “diminishing returns + cost” model applies everywhere: exercise (muscle gains vs. injury risk), marketing (ad spend vs. saturation), even eating (enjoyment vs. fullness)!

Exercise 3: Fuel Efficiency Sweet Spot 🚗

Your car’s fuel consumption depends on speed:
# Fuel consumption (gallons/hour) = 0.001 × speed² + 2
# Distance traveled (miles/hour) = speed
# 
# Fuel efficiency = miles per gallon = distance / fuel
# efficiency(speed) = speed / (0.001 × speed² + 2)

# TODO:
# 1. Find the derivative of efficiency
# 2. Find the speed that maximizes MPG
# 3. What's the maximum MPG?
# 4. Compare efficiency at 55 mph vs 75 mph
import numpy as np

def fuel_consumption(speed):
    """Gallons per hour at given speed"""
    return 0.001 * speed**2 + 2

def efficiency(speed):
    """Miles per gallon = speed / fuel_per_hour"""
    return speed / fuel_consumption(speed)

def efficiency_derivative(speed):
    """Using quotient rule: d/dx [f/g] = (f'g - fg') / g²"""
    # f = speed, f' = 1
    # g = 0.001*speed² + 2, g' = 0.002*speed
    f = speed
    g = 0.001 * speed**2 + 2
    f_prime = 1
    g_prime = 0.002 * speed
    
    return (f_prime * g - f * g_prime) / g**2

print("🚗 Fuel Efficiency Optimization")
print("=" * 50)

# Find optimal: set derivative = 0
# (1)(0.001*s² + 2) - (s)(0.002*s) = 0
# 0.001*s² + 2 - 0.002*s² = 0
# 2 - 0.001*s² = 0
# s² = 2000
# s = sqrt(2000) ≈ 44.7 mph

optimal_speed = np.sqrt(2000)
print(f"\n🎯 Optimal speed: {optimal_speed:.1f} mph")
print(f"   Maximum efficiency: {efficiency(optimal_speed):.1f} MPG")

# Compare different speeds
print("\n📊 Speed vs Efficiency:")
print("   Speed (mph) | MPG    | Fuel/100mi")
print("   ------------|--------|----------")
for speed in [35, 45, 55, 65, 75, 85]:
    mpg = efficiency(speed)
    fuel_per_100 = 100 / mpg
    marker = " ← optimal" if abs(speed - optimal_speed) < 5 else ""
    print(f"   {speed:11} | {mpg:6.1f} | {fuel_per_100:10.2f} gal{marker}")

# Cost analysis for a 300-mile trip
print("\n💰 Cost Analysis (300-mile trip, $3.50/gal):")
for speed in [45, 55, 75]:
    gallons = 300 / efficiency(speed)
    cost = gallons * 3.50
    time = 300 / speed
    print(f"   {speed} mph: ${cost:.2f} ({time:.1f} hours)")

# Trade-off
print("\n⚡ Time vs Money Trade-off:")
print("   Going 75 vs 55 mph saves 1.3 hours")
print(f"   But costs ${300/efficiency(75)*3.5 - 300/efficiency(55)*3.5:.2f} extra in fuel")
Real-World Insight: This is why highway speed limits and eco-driving recommendations hover around 55-65 mph. Car manufacturers optimize engines for this range. Tesla’s efficiency curves show the same pattern!

Exercise 4: Investment Growth Rate 💹

You’re analyzing compound growth with continuous compounding:
# Investment value: V(t) = P × e^(r×t)
# P = initial principal ($10,000)
# r = annual rate (5% = 0.05)
# t = years

# You want to know:
# 1. How fast is your money growing at year 10?
# 2. How long until your money doubles?
# 3. At what rate does money double in 10 years?
import numpy as np

def value(t, P=10000, r=0.05):
    """Investment value at time t"""
    return P * np.exp(r * t)

def growth_rate(t, P=10000, r=0.05):
    """d(V)/dt = r × P × e^(r×t) = r × V(t)"""
    return r * value(t, P, r)

print("💹 Investment Growth Analysis")
print("=" * 50)

P = 10000  # Initial investment
r = 0.05  # 5% annual rate

# 1. Growth rate at year 10
t = 10
V_10 = value(t)
rate_10 = growth_rate(t)
print(f"\n📈 After {t} years:")
print(f"   Value: ${V_10:,.2f}")
print(f"   Growing at: ${rate_10:,.2f}/year")
print(f"   Daily growth: ${rate_10/365:,.2f}/day")

# 2. Time to double (doubling time)
# 2P = P × e^(r×t)
# 2 = e^(r×t)
# ln(2) = r×t
# t = ln(2) / r
doubling_time = np.log(2) / r
print(f"\n⏱️ Doubling time at {r*100}%: {doubling_time:.2f} years")
print(f"   (Rule of 72 estimate: {72/5:.1f} years)")

# 3. Rate needed to double in 10 years
# 2 = e^(r×10)
# ln(2) = 10r
# r = ln(2) / 10
target_years = 10
required_rate = np.log(2) / target_years
print(f"\n🎯 To double in {target_years} years:")
print(f"   Required rate: {required_rate*100:.2f}%")

# Comparison table
print("\n📊 Compound Growth Power:")
print("   Years |  5% Rate  |  7% Rate  | 10% Rate")
print("   ------|-----------|-----------|----------")
for years in [5, 10, 20, 30]:
    v5 = value(years, P, 0.05)
    v7 = value(years, P, 0.07)
    v10 = value(years, P, 0.10)
    print(f"   {years:5} | ${v5:9,.0f} | ${v7:9,.0f} | ${v10:9,.0f}")

# Instantaneous vs average growth
print("\n💡 Key Insight:")
print(f"   At year 10, growth rate = r × V(t) = {r} × ${V_10:,.2f}")
print(f"   The derivative tells us: 'Right now, money is growing")
print(f"   at ${rate_10:,.2f}/year' - not the average, but THIS MOMENT!")
Real-World Insight: This is the “magic” of compound interest that Einstein allegedly called the 8th wonder of the world. The derivative shows that growth rate is proportional to current value - the rich get richer mathematically!

Key Takeaways

Derivative = rate of change - How output changes with input
Geometric view - Slope of tangent line
Optimization - Set derivative = 0 to find min/max
Second derivative - Tells you if it’s min or max
ML connection - Gradient descent uses derivatives to learn

Common Pitfalls & How to Avoid Them

Mistakes that trip up beginners and even experienced practitioners:
Wrong thinking: “The derivative of x2x^2 at x=3x=3 is x2=9x^2 = 9Correct: The derivative of x2x^2 is 2x2x. At x=3x=3, the derivative is 2(3)=62(3) = 6.The derivative tells you the slope, not the height!
# Wrong
def wrong_approach(x):
    return x**2  # This is f(x), not f'(x)!

# Correct
def derivative(x):
    return 2*x  # This is f'(x)

print(f"Value at x=3: {3**2}")      # 9
print(f"Derivative at x=3: {2*3}")  # 6 (the slope!)
Wrong: ddx(x2+1)3=3(x2+1)2\frac{d}{dx}(x^2 + 1)^3 = 3(x^2 + 1)^2Correct: ddx(x2+1)3=3(x2+1)22x=6x(x2+1)2\frac{d}{dx}(x^2 + 1)^3 = 3(x^2 + 1)^2 \cdot 2x = 6x(x^2 + 1)^2Rule: When there’s a function inside another function, multiply by the derivative of the inner function!
Trap: Using extremely small hh values for numerical derivatives.
# Too small h causes numerical errors!
h = 1e-15
numerical_deriv = (f(x + h) - f(x)) / h  # Can give wrong answer!

# Safe range: h between 1e-5 and 1e-8
h = 1e-7
numerical_deriv = (f(x + h) - f(x - h)) / (2 * h)  # Central difference is better
Why? Computers have limited precision (~15-16 decimal digits). Subtracting nearly equal numbers loses precision.
Wrong thinking: “f’(x) = 0 means I found the minimum!”Reality: f’(x) = 0 could be:
  • Minimum (f”(x) > 0)
  • Maximum (f”(x) < 0)
  • Saddle point (f”(x) = 0)
Always check the second derivative or evaluate the function around that point!

Interview Questions You Should Be Able to Answer

These come up in ML Engineer and Data Scientist interviews at top companies:
QuestionKey Points to Cover
”What is a derivative?”Rate of change, slope of tangent line, sensitivity of output to input
”Why do neural networks need derivatives?”To know which direction to adjust weights to reduce error
”What’s the derivative of sigmoid?”σ(x)(1σ(x))\sigma(x)(1-\sigma(x)) — and explain why this matters (vanishing gradients)
“Why is ReLU popular?”Derivative is 0 or 1 — no vanishing gradient problem, fast to compute
”How would you find the minimum of a function?”Set derivative to 0, check second derivative, or use gradient descent
”What’s the difference between analytical and numerical derivatives?”Analytical is exact formula, numerical is approximation — both have trade-offs

What’s Next?

You now understand derivatives for single-variable functions. But ML models have MANY variables (thousands or millions!). How do we handle that? Gradients - the multi-variable version of derivatives!

Next: Gradients & Multivariable Calculus

Learn how to optimize functions with many variables