Derivatives & Rates of Change

Your Challenge: The Pricing Problem

You just launched your online store selling wireless headphones. Exciting! But now you face a critical decision: What price should you charge? You experiment with different prices over several weeks:

**Week 1 ( $30/pair)**: 1,000 customers bought! But... your profit was only$ $30/ p ai r) * * : 1, 000 c u s t o m ers b o ug h t! B u t ... yo u r p ro f i tw a so n l y$ 10,000
- “Great sales, but I’m barely making money after costs ($20/pair)”
**Week 2 ( $100/pair)**: Only 200 customers bought. Profit:$ $100/ p ai r) * * : O n l y 200 c u s t o m ers b o ug h t . P ro f i t :$ 16,000
- “Better profit per sale, but I’m losing too many customers!”
**Week 3 ( $50/pair)**: 800 customers. Profit:$ $50/ p ai r) * * : 800 c u s t o m ers . P ro f i t :$ 24,000
- “Getting better… but is this the best I can do?”

Your Question: “There must be a sweet spot - a price that maximizes my profit. But how do I find it without testing every single price?”

The Slow Way (What You’re Doing Now)

You could test 100 different prices, one per week. That would take 2 years and cost you thousands in lost revenue!

The Fast Way (What You’ll Learn)

There’s a better approach: Derivatives Instead of blindly testing prices, derivatives tell you:

At $30: “Increase price → profit will go UP”
At $75: “Perfect! Any change makes profit go DOWN”
At $100: “Decrease price → profit will go UP”

Result: You find the optimal price (

75) in minutes, not years. Your profit jumps to

30,250/month!

What You’ll Be Able To Do

By the end of this module, you’ll answer questions like: ✅ Your Business: What price maximizes YOUR profit?
✅ Your Learning: How many hours should YOU study for maximum score?
✅ Your ML Models: How should YOU adjust weights to reduce errors?
✅ Your Life: What’s YOUR optimal speed to minimize fuel consumption? Your tool: Derivatives - the mathematical way to find optimal solutions.

Estimated Time: 3-4 hours
Difficulty: Beginner
Prerequisites: Basic algebra
You’ll Build: Your own pricing optimizer, learning rate finder, and simple neural network

Your Problem: Finding the Pattern

Let’s model your business mathematically and visualize your pricing landscape:

What this shows:

The green curve is your profit at different prices
Red dots are the prices you tested
The gold star is the optimal price ($75)
Arrows show which direction the derivative tells you to move

import numpy as np
import matplotlib.pyplot as plt

def profit(price):
    """
    Your profit model:
    - At $30: 1000 customers
    - Lose 10 customers for every $1 price increase
    - Cost per headphone: $20
    """
    customers = 1300 - 10 * price
    profit_per_sale = price - 20  # price minus cost
    return customers * profit_per_sale

# Visualize your pricing landscape
prices = np.linspace(20, 130, 1000)
profits = [profit(p) for p in prices]

plt.figure(figsize=(12, 6))
plt.plot(prices, profits, linewidth=3, color='#10b981', label='Your Profit')

# Mark your experiments
plt.scatter([30, 50, 100], [profit(30), profit(50), profit(100)], 
           s=200, c='red', zorder=5, label='You tested these')

# Mark the optimal
plt.scatter([75], [profit(75)], s=300, c='gold', marker='*', 
           zorder=6, label='Optimal (you\'ll find this!)')

plt.xlabel('Price ($)', fontsize=14)
plt.ylabel('Your Monthly Profit ($)', fontsize=14)
plt.title('Your Pricing Landscape', fontsize=16, fontweight='bold')
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.show()

print("Your experiments:")
print(f"  $30: ${profit(30):,.0f} profit")
print(f"  $50: ${profit(50):,.0f} profit")  
print(f"  $100: ${profit(100):,.0f} profit")
print(f"\nOptimal price: $75 → ${profit(75):,.0f} profit ⭐")

Your Insight: “The graph shows a hill! I need to find the peak. But how?”

Enter: The Derivative (Your Solution)

What You Need to Know

At any price, you need to answer: “If I increase my price by $1, does my profit go up or down?” This is EXACTLY what a derivative tells you! Derivative = Rate of Change

def profit(price):
    customers = 1300 - 10 * price
    return (price - 20) * customers

# Your current price
your_price = 50

# "If I increase my price by $1, how much does my profit change?"
small_increase = 1
profit_now = profit(your_price)
profit_after = profit(your_price + small_increase)
change_in_profit = profit_after - profit_now

print(f"At your current price of ${your_price}:")
print(f"  Your profit now: ${profit_now:,.0f}")
print(f"  Your profit at ${your_price + small_increase}: ${profit_after:,.0f}")
print(f"  Change: ${change_in_profit:,.0f}")
print(f"  → Derivative ≈ {change_in_profit}")
print(f"     (your profit changes by ${change_in_profit} per $1 price increase)")

if change_in_profit > 0:
    print(f"\n  ✅ Your profit is INCREASING → You should raise your price!")
elif change_in_profit < 0:
    print(f"\n  ❌ Your profit is DECREASING → You should lower your price!")
else:
    print(f"\n  ⭐ Your profit is at MAXIMUM → You found the perfect price!")

Output:

At your current price of $50:
  Your profit now: $24,000
  Your profit at $51: $24,490
  Change: $490
  → Derivative ≈ 490
     (your profit changes by $490 per $1 price increase)

  ✅ Your profit is INCREASING → You should raise your price!

Your Reaction: “Wow! At

50, I should increase my price. Each dollar increase adds

490 to my profit!”

What Is a Derivative? (The Intuitive Explanation)

Everyday Analogy: Your Car’s Speedometer

Think about driving a car: 🚗 Position = where you are (e.g., mile marker 50)
📊 Speed = how fast your position is changing (e.g., 60 mph)
⚡ Acceleration = how fast your speed is changing (e.g., +5 mph/second) The speedometer shows your derivative! It tells you: “Right now, at this exact moment, you’re going 60 mph.” Mathematically:

Position = $f(t)$ (function of time)
Speed = $f'(t)$ (derivative of position)
Acceleration = $f''(t)$ (derivative of derivative)

Mathematical Definition (Now It Makes Sense!)

Derivative = Rate of change

“If I increase x by a tiny amount, how much does f(x) change?”

Formula:

f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}

In plain English:

Move a tiny bit to the right (x → x+h)
See how much f(x) changed
Divide change in f by change in x
Make h smaller and smaller (approaching zero)

Geometric View: The Tangent Line

The derivative at a point = slope of the tangent line Why tangent line?

Secant line: connects two points (average rate of change)
Tangent line: touches at ONE point (instantaneous rate of change)
As points get closer, secant → tangent

Computing a Derivative Numerically

Let’s compute the derivative of

f(x) = x^2

x = 3

import numpy as np

def f(x):
    """Our function: f(x) = x²"""
    return x**2

# We want the derivative at x=3
x = 3

# Method 1: Numerical approximation
print("=== Numerical Approximation ===")
for h in [0.1, 0.01, 0.001, 0.0001]:
    # Compute slope of secant line
    df = f(x + h) - f(x)  # Change in f
    dx = h                 # Change in x
    derivative_approx = df / dx
    
    print(f"h = {h:7.4f} → f'(3) ≈ {derivative_approx:.6f}")

print("\n=== Exact Answer ===")
# For f(x) = x², the derivative is f'(x) = 2x
exact_derivative = 2 * x
print(f"f'(3) = 2×3 = {exact_derivative}")

print("\n=== Interpretation ===")
print(f"At x=3, if we increase x by 1, f(x) increases by approximately {exact_derivative}")
print(f"At x=3, the function is rising with a slope of {exact_derivative}")

Output:

=== Numerical Approximation ===
h =  0.1000 → f'(3) ≈ 6.100000
h =  0.0100 → f'(3) ≈ 6.010000
h =  0.0010 → f'(3) ≈ 6.001000
h =  0.0001 → f'(3) ≈ 6.000100

=== Exact Answer ===
f'(3) = 2×3 = 6

=== Interpretation ===
At x=3, if we increase x by 1, f(x) increases by approximately 6
At x=3, the function is rising with a slope of 6

Key Insights: ✅ As h gets smaller, our approximation gets better
✅ The derivative is the instantaneous rate of change
✅ At x=3, the function

x^2

is rising steeply (slope = 6)
✅ This tells us: small changes in x cause BIG changes in f(x)

Why This Matters for Machine Learning

In ML, we have a loss function

L(w)

where

w

= model weights:

# Simplified neural network
def loss(weight):
    """How wrong our predictions are"""
    predictions = weight * data
    errors = predictions - true_values
    return np.mean(errors**2)

# The derivative tells us:
# "If I increase this weight slightly, does loss go up or down?"

dL_dw = compute_derivative(loss, weight)

if dL_dw > 0:
    # Loss increases when weight increases
    # → Decrease weight to reduce loss!
    weight = weight - learning_rate * dL_dw
else:
    # Loss decreases when weight increases  
    # → Increase weight to reduce loss!
    weight = weight - learning_rate * dL_dw

This is gradient descent - the algorithm that powers ALL of machine learning!

Example 1: Minimizing Business Costs

The Problem

You’re optimizing ad spending. Your cost function is:

C(x) = x^2 - 10x + 100

Where

x

is ad spend in thousands of dollars. Goal: Find the spending level that minimizes cost.

Step 1: Understand the Function

def cost(x):
    return x**2 - 10*x + 100

# Visualize
x_values = np.linspace(0, 10, 100)
costs = [cost(x) for x in x_values]

plt.plot(x_values, costs)
plt.xlabel('Ad Spend ($1000s)')
plt.ylabel('Total Cost ($)')
plt.title('Cost vs. Ad Spend')
plt.grid(True)
plt.show()

Step 2: Compute the Derivative

Derivative of $C(x) = x^2 - 10x + 100$ :

C'(x) = 2x - 10

def cost_derivative(x):
    return 2*x - 10

# At x=3
x = 3
slope = cost_derivative(x)
print(f"At x={x}, slope = {slope}")  # -4

# Interpretation:
# Negative slope → cost is decreasing
# We should increase x!

Step 3: Find the Minimum

At the minimum, the derivative = 0 (flat tangent line)

C'(x) = 0 \\ 2x - 10 = 0 \\ x = 5

# Optimal ad spend
optimal_x = 5
min_cost = cost(optimal_x)

print(f"Optimal ad spend: ${optimal_x},000")
print(f"Minimum cost: ${min_cost}")

# Verify it's a minimum
print(f"Slope at x=4: {cost_derivative(4)}")  # -2 (decreasing)
print(f"Slope at x=5: {cost_derivative(5)}")  # 0 (flat)
print(f"Slope at x=6: {cost_derivative(6)}")  # 2 (increasing)

Key Insight:

Derivative < 0 → function decreasing → move right
Derivative = 0 → potential minimum/maximum
Derivative > 0 → function increasing → move left

Real Application: Google Ads uses derivatives to optimize bidding strategies for millions of advertisers!

Example 2: Optimizing Student Learning

The Problem

A student’s test score depends on study hours:

S(h) = -h^2 + 12h + 20

Where

h

is hours studied per day. Question: How many hours should they study to maximize their score?

Understanding the Relationship

def score(hours):
    return -hours**2 + 12*hours + 20

# Visualize
hours = np.linspace(0, 15, 100)
scores = [score(h) for h in hours]

plt.plot(hours, scores)
plt.xlabel('Study Hours per Day')
plt.ylabel('Test Score')
plt.title('Study Hours vs. Test Score')
plt.axhline(y=0, color='k', linestyle='--', alpha=0.3)
plt.grid(True)
plt.show()

Observation: Too few hours → low score. Too many hours → burnout, score decreases!

Finding the Optimal Study Time

Derivative:

S'(h) = -2h + 12

def score_derivative(h):
    return -2*h + 12

# Find where derivative = 0
# -2h + 12 = 0
# h = 6

optimal_hours = 6
max_score = score(optimal_hours)

print(f"Optimal study time: {optimal_hours} hours/day")
print(f"Maximum score: {max_score}")

# Check the derivative
print(f"\\nAt h=5: slope = {score_derivative(5)}")  # 2 (increasing)
print(f"At h=6: slope = {score_derivative(6)}")  # 0 (maximum!)
print(f"At h=7: slope = {score_derivative(7)}")  # -2 (decreasing)

Interpretation:

Before 6 hours: More study → higher score (positive derivative)
At 6 hours: Perfect balance (zero derivative)
After 6 hours: More study → lower score due to burnout (negative derivative)

Real Application: Khan Academy uses similar models to recommend optimal practice time for students!

Example 3: Tuning Recommendation Systems

The Problem

Netflix wants to tune a recommendation parameter

\alpha

to minimize prediction error:

E(\alpha) = (\alpha - 0.8)^2 + 0.1

Goal: Find the

\alpha

that minimizes error.

Visualizing the Error

def error(alpha):
    return (alpha - 0.8)**2 + 0.1

# Visualize
alphas = np.linspace(0, 2, 100)
errors = [error(a) for a in alphas]

plt.plot(alphas, errors)
plt.xlabel('Parameter α')
plt.ylabel('Prediction Error')
plt.title('Recommendation Error vs. Parameter')
plt.grid(True)
plt.show()

Finding Optimal Parameter

Derivative:

E'(\alpha) = 2(\alpha - 0.8)

def error_derivative(alpha):
    return 2*(alpha - 0.8)

# Find minimum: E'(α) = 0
# 2(α - 0.8) = 0
# α = 0.8

optimal_alpha = 0.8
min_error = error(optimal_alpha)

print(f"Optimal α: {optimal_alpha}")
print(f"Minimum error: {min_error}")

# Gradient descent simulation
alpha = 0.2  # Start with bad guess
learning_rate = 0.1
history = [alpha]

for step in range(10):
    gradient = error_derivative(alpha)
    alpha = alpha - learning_rate * gradient
    history.append(alpha)
    print(f"Step {step+1}: α={alpha:.4f}, error={error(alpha):.4f}")

# Visualize convergence
plt.plot(history, marker='o')
plt.xlabel('Step')
plt.ylabel('α value')
plt.title('Gradient Descent Convergence')
plt.axhline(y=0.8, color='r', linestyle='--', label='Optimal')
plt.legend()
plt.grid(True)
plt.show()

Key Insight: This is exactly how machine learning works!

Start with random parameters
Compute derivative (gradient)
Move in opposite direction of gradient
Repeat until convergence

Real Application: Netflix uses gradient descent to tune thousands of parameters in their recommendation system!

Derivative Rules

Now that you understand WHY derivatives matter, here are the rules:

Power Rule

\frac{d}{dx}x^n = nx^{n-1}

# Examples
# d/dx (x²) = 2x
# d/dx (x³) = 3x²
# d/dx (x⁻¹) = -x⁻²

def power_rule_derivative(n):
    """Returns derivative function for x^n"""
    return lambda x: n * x**(n-1)

# Derivative of x²
f_prime = power_rule_derivative(2)
print(f"d/dx(x²) at x=3: {f_prime(3)}")  # 6

Complete Derivative Rules Reference

Here’s your cheat sheet. Bookmark this page!

Basic Rules

Rule	Formula	Example
Constant	$\frac{d}{dx}(c) = 0$	$\frac{d}{dx}(5) = 0$
Power	$\frac{d}{dx}(x^n) = nx^{n-1}$	$\frac{d}{dx}(x^4) = 4x^3$
Constant Multiple	$\frac{d}{dx}(cf) = c\frac{df}{dx}$	$\frac{d}{dx}(3x^2) = 6x$
Sum	$\frac{d}{dx}(f+g) = \frac{df}{dx} + \frac{dg}{dx}$	$\frac{d}{dx}(x^2 + x) = 2x + 1$
Difference	$\frac{d}{dx}(f-g) = \frac{df}{dx} - \frac{dg}{dx}$	$\frac{d}{dx}(x^3 - x) = 3x^2 - 1$

Product & Quotient Rules

Rule	Formula	Memory Trick
Product	$(fg)' = f'g + fg'$	”First times derivative of second, plus second times derivative of first”
Quotient	$\left(\frac{f}{g}\right)' = \frac{f'g - fg'}{g^2}$	”Low d-high minus high d-low, over low squared”

Chain Rule

\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)

Memory trick: “Derivative of outside times derivative of inside”

Common Functions

Function	Derivative	ML Application
$e^x$	$e^x$	Softmax, exponential growth
$\ln(x)$	$\frac{1}{x}$	Log-likelihood, cross-entropy
$\sin(x)$	$\cos(x)$	Positional encodings
$\cos(x)$	$-\sin(x)$	Signal processing
$\frac{1}{1+e^{-x}}$ (sigmoid)	$\sigma(x)(1-\sigma(x))$	Activation functions
$\max(0, x)$ (ReLU)	$\begin{cases}1 & x > 0\\0 & x \leq 0\end{cases}$	Neural network activations
$\tanh(x)$	$1 - \tanh^2(x)$	Activation functions

Worked Examples: Applying the Rules

Example 1: Polynomial

f(x) = 3x^4 - 2x^3 + 5x - 7

Using power rule and sum rule:

f'(x) = 3(4x^3) - 2(3x^2) + 5(1) - 0 = 12x^3 - 6x^2 + 5

Example 2: Product Rule

h(x) = x^2 \cdot e^x

Let

f = x^2

and

g = e^x

h'(x) = (2x)(e^x) + (x^2)(e^x) = e^x(2x + x^2) = e^x \cdot x(x + 2)

Example 3: Quotient Rule

q(x) = \frac{x^2}{x + 1}

Let

f = x^2

and

g = x + 1

q'(x) = \frac{(2x)(x+1) - (x^2)(1)}{(x+1)^2} = \frac{2x^2 + 2x - x^2}{(x+1)^2} = \frac{x^2 + 2x}{(x+1)^2}

Example 4: Chain Rule

y = (3x + 1)^5

Let outer

f(u) = u^5

and inner

g(x) = 3x + 1

y' = 5(3x + 1)^4 \cdot 3 = 15(3x + 1)^4

import numpy as np

# Verify chain rule example numerically
def y(x):
    return (3*x + 1)**5

def y_prime(x):
    return 15 * (3*x + 1)**4

x = 2
h = 0.0001
numerical = (y(x + h) - y(x)) / h
analytical = y_prime(x)

print(f"Numerical:  {numerical:.2f}")
print(f"Analytical: {analytical}")
# Both should be 31752015

ML-Specific Derivatives You’ll Use Often

Sigmoid Function:

\sigma(x) = \frac{1}{1 + e^{-x}}, \quad \sigma'(x) = \sigma(x)(1 - \sigma(x))

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

# At x=0, sigmoid = 0.5, derivative = 0.25 (maximum!)
print(f"σ(0) = {sigmoid(0)}")         # 0.5
print(f"σ'(0) = {sigmoid_derivative(0)}")  # 0.25

Mean Squared Error Loss:

L = \frac{1}{n}\sum(y_{pred} - y_{true})^2, \quad \frac{\partial L}{\partial y_{pred}} = \frac{2}{n}(y_{pred} - y_{true})

Cross-Entropy Loss:

L = -\sum y_{true} \log(y_{pred}), \quad \frac{\partial L}{\partial y_{pred}} = -\frac{y_{true}}{y_{pred}}

Constant Rule

\frac{d}{dx}c = 0

Why? Constants don’t change!

Sum Rule

\frac{d}{dx}[f(x) + g(x)] = f'(x) + g'(x)

# Example: f(x) = x² + 3x + 5
# f'(x) = 2x + 3 + 0 = 2x + 3

def f(x):
    return x**2 + 3*x + 5

def f_derivative(x):
    return 2*x + 3

# Verify numerically
x = 4
h = 0.0001
numerical = (f(x+h) - f(x)) / h
analytical = f_derivative(x)

print(f"Numerical: {numerical:.4f}")
print(f"Analytical: {analytical}")

Product Rule

\frac{d}{dx}[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)

# Example: h(x) = x² · sin(x)
# h'(x) = 2x·sin(x) + x²·cos(x)

import numpy as np

def h(x):
    return x**2 * np.sin(x)

def h_derivative(x):
    return 2*x*np.sin(x) + x**2*np.cos(x)

x = 2
print(f"h'({x}) = {h_derivative(x):.4f}")

Chain Rule (Preview)

\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)

We’ll cover this in depth in Module 3!

Higher-Order Derivatives

Second Derivative

The derivative of the derivative!

f''(x) = \frac{d^2}{dx^2}f(x)

Interpretation: How fast is the rate of change changing?

# Example: f(x) = x³
# f'(x) = 3x²
# f''(x) = 6x

def f(x):
    return x**3

def f_prime(x):
    return 3*x**2

def f_double_prime(x):
    return 6*x

x = 2
print(f"f({x}) = {f(x)}")
print(f"f'({x}) = {f_prime(x)}")  # Rate of change
print(f"f''({x}) = {f_double_prime(x)}")  # Acceleration

Physical Interpretation:

$f(x)$ = position
$f'(x)$ = velocity (rate of change of position)
$f''(x)$ = acceleration (rate of change of velocity)

Concavity

Second derivative tells you about curvature:

$f''(x) > 0$ → Concave up (smiling face) → Local minimum
$f''(x) < 0$ → Concave down (frowning face) → Local maximum
$f''(x) = 0$ → Inflection point

# Cost function: C(x) = x² - 10x + 100
# C'(x) = 2x - 10
# C''(x) = 2

# Since C''(x) = 2 > 0 everywhere, function is always concave up
# So x=5 (where C'(x)=0) is definitely a MINIMUM!

def cost(x):
    return x**2 - 10*x + 100

def cost_second_derivative(x):
    return 2

x = 5
print(f"At x={x}:")
print(f"Second derivative: {cost_second_derivative(x)}")
print(f"→ Concave up → This is a minimum!")

Numerical Derivatives

When you can’t compute derivatives analytically:

Forward Difference

f'(x) \approx \frac{f(x+h) - f(x)}{h}

Central Difference (More Accurate)

f'(x) \approx \frac{f(x+h) - f(x-h)}{2h}

def numerical_derivative(f, x, h=1e-5, method='central'):
    """Compute derivative numerically"""
    if method == 'forward':
        return (f(x + h) - f(x)) / h
    elif method == 'central':
        return (f(x + h) - f(x - h)) / (2 * h)
    else:
        raise ValueError("Method must be 'forward' or 'central'")

# Test on f(x) = x²
def f(x):
    return x**2

x = 3
exact = 2*x  # Analytical derivative

forward = numerical_derivative(f, x, method='forward')
central = numerical_derivative(f, x, method='central')

print(f"Exact: {exact}")
print(f"Forward difference: {forward:.6f}")
print(f"Central difference: {central:.6f}")

When to use:

Complex functions without closed-form derivatives
Debugging analytical derivatives
Quick prototyping

Practice Exercises

Exercise 1: Profit Maximization

# A company's profit function is:
# P(x) = -2x² + 40x - 100
# where x is production quantity in thousands

# TODO:
# 1. Find the derivative P'(x)
# 2. Find the production quantity that maximizes profit
# 3. What is the maximum profit?
# 4. Verify it's a maximum using the second derivative

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises connect derivatives to decisions you make every day - from pricing to fitness to driving.

Exercise 1: Uber Surge Pricing 🚕

Uber uses dynamic pricing. When demand is high, prices surge. Model this:

import numpy as np

# Revenue = Price × Rides
# As price increases, rides decrease
# rides(price) = 1000 - 5*price (linear demand)
# revenue(price) = price × (1000 - 5*price)

# Uber's costs are $2 per ride
# profit(price) = revenue - costs

# TODO:
# 1. Write the profit function
# 2. Find the derivative
# 3. Find the optimal surge multiplier
# 4. What happens to optimal price if demand doubles?

💡 Solution

import numpy as np

def rides(price):
    """Number of ride requests at given price"""
    return 1000 - 5 * price

def revenue(price):
    """Total revenue = price × quantity"""
    return price * rides(price)

def profit(price):
    """Profit = revenue - costs ($2 per ride)"""
    return revenue(price) - 2 * rides(price)
    # = price * (1000 - 5*price) - 2 * (1000 - 5*price)
    # = (price - 2) * (1000 - 5*price)
    # = -5*price² + 1010*price - 2000

def profit_derivative(price):
    """d(profit)/d(price) = -10*price + 1010"""
    return -10 * price + 1010

print("🚕 Uber Surge Pricing Optimization")
print("=" * 50)

# Find optimal price (set derivative = 0)
optimal_price = 1010 / 10  # = 101
print(f"\n📊 Normal Demand Scenario:")
print(f"   Optimal price: ${optimal_price:.2f}")
print(f"   Expected rides: {rides(optimal_price):.0f}")
print(f"   Maximum profit: ${profit(optimal_price):,.2f}")

# What if demand doubles?
# rides(price) = 2000 - 5*price
def profit_high_demand(price):
    rides_high = 2000 - 5 * price
    return (price - 2) * rides_high

def profit_derivative_high(price):
    return -10 * price + 2010

optimal_high = 2010 / 10  # = 201
print(f"\n📈 High Demand Scenario (2× demand):")
print(f"   Optimal price: ${optimal_high:.2f}")
print(f"   Profit increase: {profit_high_demand(optimal_high)/profit(optimal_price):.1f}x")

# Verify with numerical check
prices = np.linspace(50, 200, 100)
profits = [profit(p) for p in prices]
numerical_optimal = prices[np.argmax(profits)]
print(f"\n✅ Verification (numerical): ${numerical_optimal:.1f}")

Real-World Insight: This is exactly how Uber’s pricing algorithm works! They continuously estimate demand curves and adjust prices to maximize profit while balancing rider satisfaction.

Exercise 2: Optimal Study Time 📚

You’re studying for an exam. More study time = higher score, but with diminishing returns:

# Score model (realistic diminishing returns):
# score(hours) = 100 × (1 - e^(-0.3 × hours))
# 
# But studying has a cost: fatigue reduces retention
# effective_score(hours) = score(hours) - 2 × hours

# TODO:
# 1. Find the derivative of effective_score
# 2. Find optimal study hours
# 3. What's your expected score?
# 4. Plot the curve to visualize

💡 Solution

import numpy as np

def score(hours):
    """Base score: 100 × (1 - e^(-0.3h))"""
    return 100 * (1 - np.exp(-0.3 * hours))

def fatigue_cost(hours):
    """Fatigue penalty: 2 points per hour"""
    return 2 * hours

def effective_score(hours):
    """Net score after fatigue"""
    return score(hours) - fatigue_cost(hours)

def score_derivative(hours):
    """d(score)/dh = 100 × 0.3 × e^(-0.3h) = 30 × e^(-0.3h)"""
    return 30 * np.exp(-0.3 * hours)

def effective_derivative(hours):
    """d(effective_score)/dh = 30 × e^(-0.3h) - 2"""
    return score_derivative(hours) - 2

print("📚 Optimal Study Time Analysis")
print("=" * 50)

# Find optimal: 30 × e^(-0.3h) - 2 = 0
# e^(-0.3h) = 2/30 = 1/15
# -0.3h = ln(1/15)
# h = -ln(1/15) / 0.3

optimal_hours = -np.log(1/15) / 0.3
print(f"\n🎯 Optimal study time: {optimal_hours:.1f} hours")
print(f"   Base score: {score(optimal_hours):.1f}")
print(f"   Fatigue cost: -{fatigue_cost(optimal_hours):.1f}")
print(f"   Effective score: {effective_score(optimal_hours):.1f}")

# Compare with over-studying
over_study = 15
print(f"\n⚠️  Comparison: Studying {over_study} hours:")
print(f"   Base score: {score(over_study):.1f}")
print(f"   Fatigue cost: -{fatigue_cost(over_study):.1f}")
print(f"   Effective score: {effective_score(over_study):.1f}")
print(f"   You lost {effective_score(optimal_hours) - effective_score(over_study):.1f} points!")

# Diminishing returns table
print("\n📊 Diminishing Returns:")
print("   Hours | Score Gain | Marginal Gain")
print("   ------|------------|-------------")
for h in [0, 2, 4, 6, 8, 10]:
    gain = score(h)
    marginal = score_derivative(h) if h > 0 else 30
    print(f"   {h:5} | {gain:10.1f} | {marginal:13.2f} pts/hr")

Real-World Insight: This “diminishing returns + cost” model applies everywhere: exercise (muscle gains vs. injury risk), marketing (ad spend vs. saturation), even eating (enjoyment vs. fullness)!

Exercise 3: Fuel Efficiency Sweet Spot 🚗

Your car’s fuel consumption depends on speed:

# Fuel consumption (gallons/hour) = 0.001 × speed² + 2
# Distance traveled (miles/hour) = speed
# 
# Fuel efficiency = miles per gallon = distance / fuel
# efficiency(speed) = speed / (0.001 × speed² + 2)

# TODO:
# 1. Find the derivative of efficiency
# 2. Find the speed that maximizes MPG
# 3. What's the maximum MPG?
# 4. Compare efficiency at 55 mph vs 75 mph

💡 Solution

import numpy as np

def fuel_consumption(speed):
    """Gallons per hour at given speed"""
    return 0.001 * speed**2 + 2

def efficiency(speed):
    """Miles per gallon = speed / fuel_per_hour"""
    return speed / fuel_consumption(speed)

def efficiency_derivative(speed):
    """Using quotient rule: d/dx [f/g] = (f'g - fg') / g²"""
    # f = speed, f' = 1
    # g = 0.001*speed² + 2, g' = 0.002*speed
    f = speed
    g = 0.001 * speed**2 + 2
    f_prime = 1
    g_prime = 0.002 * speed
    
    return (f_prime * g - f * g_prime) / g**2

print("🚗 Fuel Efficiency Optimization")
print("=" * 50)

# Find optimal: set derivative = 0
# (1)(0.001*s² + 2) - (s)(0.002*s) = 0
# 0.001*s² + 2 - 0.002*s² = 0
# 2 - 0.001*s² = 0
# s² = 2000
# s = sqrt(2000) ≈ 44.7 mph

optimal_speed = np.sqrt(2000)
print(f"\n🎯 Optimal speed: {optimal_speed:.1f} mph")
print(f"   Maximum efficiency: {efficiency(optimal_speed):.1f} MPG")

# Compare different speeds
print("\n📊 Speed vs Efficiency:")
print("   Speed (mph) | MPG    | Fuel/100mi")
print("   ------------|--------|----------")
for speed in [35, 45, 55, 65, 75, 85]:
    mpg = efficiency(speed)
    fuel_per_100 = 100 / mpg
    marker = " ← optimal" if abs(speed - optimal_speed) < 5 else ""
    print(f"   {speed:11} | {mpg:6.1f} | {fuel_per_100:10.2f} gal{marker}")

# Cost analysis for a 300-mile trip
print("\n💰 Cost Analysis (300-mile trip, $3.50/gal):")
for speed in [45, 55, 75]:
    gallons = 300 / efficiency(speed)
    cost = gallons * 3.50
    time = 300 / speed
    print(f"   {speed} mph: ${cost:.2f} ({time:.1f} hours)")

# Trade-off
print("\n⚡ Time vs Money Trade-off:")
print("   Going 75 vs 55 mph saves 1.3 hours")
print(f"   But costs ${300/efficiency(75)*3.5 - 300/efficiency(55)*3.5:.2f} extra in fuel")

Real-World Insight: This is why highway speed limits and eco-driving recommendations hover around 55-65 mph. Car manufacturers optimize engines for this range. Tesla’s efficiency curves show the same pattern!

Exercise 4: Investment Growth Rate 💹

You’re analyzing compound growth with continuous compounding:

# Investment value: V(t) = P × e^(r×t)
# P = initial principal ($10,000)
# r = annual rate (5% = 0.05)
# t = years

# You want to know:
# 1. How fast is your money growing at year 10?
# 2. How long until your money doubles?
# 3. At what rate does money double in 10 years?

💡 Solution

import numpy as np

def value(t, P=10000, r=0.05):
    """Investment value at time t"""
    return P * np.exp(r * t)

def growth_rate(t, P=10000, r=0.05):
    """d(V)/dt = r × P × e^(r×t) = r × V(t)"""
    return r * value(t, P, r)

print("💹 Investment Growth Analysis")
print("=" * 50)

P = 10000  # Initial investment
r = 0.05  # 5% annual rate

# 1. Growth rate at year 10
t = 10
V_10 = value(t)
rate_10 = growth_rate(t)
print(f"\n📈 After {t} years:")
print(f"   Value: ${V_10:,.2f}")
print(f"   Growing at: ${rate_10:,.2f}/year")
print(f"   Daily growth: ${rate_10/365:,.2f}/day")

# 2. Time to double (doubling time)
# 2P = P × e^(r×t)
# 2 = e^(r×t)
# ln(2) = r×t
# t = ln(2) / r
doubling_time = np.log(2) / r
print(f"\n⏱️ Doubling time at {r*100}%: {doubling_time:.2f} years")
print(f"   (Rule of 72 estimate: {72/5:.1f} years)")

# 3. Rate needed to double in 10 years
# 2 = e^(r×10)
# ln(2) = 10r
# r = ln(2) / 10
target_years = 10
required_rate = np.log(2) / target_years
print(f"\n🎯 To double in {target_years} years:")
print(f"   Required rate: {required_rate*100:.2f}%")

# Comparison table
print("\n📊 Compound Growth Power:")
print("   Years |  5% Rate  |  7% Rate  | 10% Rate")
print("   ------|-----------|-----------|----------")
for years in [5, 10, 20, 30]:
    v5 = value(years, P, 0.05)
    v7 = value(years, P, 0.07)
    v10 = value(years, P, 0.10)
    print(f"   {years:5} | ${v5:9,.0f} | ${v7:9,.0f} | ${v10:9,.0f}")

# Instantaneous vs average growth
print("\n💡 Key Insight:")
print(f"   At year 10, growth rate = r × V(t) = {r} × ${V_10:,.2f}")
print(f"   The derivative tells us: 'Right now, money is growing")
print(f"   at ${rate_10:,.2f}/year' - not the average, but THIS MOMENT!")

Real-World Insight: This is the “magic” of compound interest that Einstein allegedly called the 8th wonder of the world. The derivative shows that growth rate is proportional to current value - the rich get richer mathematically!

Key Takeaways

✅ Derivative = rate of change - How output changes with input
✅ Geometric view - Slope of tangent line
✅ Optimization - Set derivative = 0 to find min/max
✅ Second derivative - Tells you if it’s min or max
✅ ML connection - Gradient descent uses derivatives to learn

Common Pitfalls & How to Avoid Them

Mistakes that trip up beginners and even experienced practitioners:

❌ Confusing Derivative with Function Value

Wrong thinking: “The derivative of

x^2

x=3

x^2 = 9

”Correct: The derivative of

x^2

2x

. At

x=3

, the derivative is

2(3) = 6

.The derivative tells you the slope, not the height!

# Wrong
def wrong_approach(x):
    return x**2  # This is f(x), not f'(x)!

# Correct
def derivative(x):
    return 2*x  # This is f'(x)

print(f"Value at x=3: {3**2}")      # 9
print(f"Derivative at x=3: {2*3}")  # 6 (the slope!)

❌ Forgetting the Chain Rule

Wrong:

\frac{d}{dx}(x^2 + 1)^3 = 3(x^2 + 1)^2

Correct:

\frac{d}{dx}(x^2 + 1)^3 = 3(x^2 + 1)^2 \cdot 2x = 6x(x^2 + 1)^2

Rule: When there’s a function inside another function, multiply by the derivative of the inner function!

❌ Numerical Instability with Small h

Trap: Using extremely small

h

values for numerical derivatives.

# Too small h causes numerical errors!
h = 1e-15
numerical_deriv = (f(x + h) - f(x)) / h  # Can give wrong answer!

# Safe range: h between 1e-5 and 1e-8
h = 1e-7
numerical_deriv = (f(x + h) - f(x - h)) / (2 * h)  # Central difference is better

Why? Computers have limited precision (~15-16 decimal digits). Subtracting nearly equal numbers loses precision.

❌ Assuming Derivative Zero = Minimum

Wrong thinking: “f’(x) = 0 means I found the minimum!”Reality: f’(x) = 0 could be:

Minimum (f”(x) > 0)
Maximum (f”(x) < 0)
Saddle point (f”(x) = 0)

Always check the second derivative or evaluate the function around that point!

Interview Questions You Should Be Able to Answer

These come up in ML Engineer and Data Scientist interviews at top companies:

Question	Key Points to Cover
”What is a derivative?”	Rate of change, slope of tangent line, sensitivity of output to input
”Why do neural networks need derivatives?”	To know which direction to adjust weights to reduce error
”What’s the derivative of sigmoid?”	$\sigma(x)(1-\sigma(x))$ — and explain why this matters (vanishing gradients)
“Why is ReLU popular?”	Derivative is 0 or 1 — no vanishing gradient problem, fast to compute
”How would you find the minimum of a function?”	Set derivative to 0, check second derivative, or use gradient descent
”What’s the difference between analytical and numerical derivatives?”	Analytical is exact formula, numerical is approximation — both have trade-offs

What’s Next?

You now understand derivatives for single-variable functions. But ML models have MANY variables (thousands or millions!). How do we handle that? Gradients - the multi-variable version of derivatives!

Next: Gradients & Multivariable Calculus

Learn how to optimize functions with many variables

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Derivatives & Rates of Change

​Your Challenge: The Pricing Problem

​The Slow Way (What You’re Doing Now)

​The Fast Way (What You’ll Learn)

​What You’ll Be Able To Do

​Your Problem: Finding the Pattern

​Enter: The Derivative (Your Solution)

​What You Need to Know

​What Is a Derivative? (The Intuitive Explanation)

​Everyday Analogy: Your Car’s Speedometer

​Mathematical Definition (Now It Makes Sense!)

​Geometric View: The Tangent Line

​Computing a Derivative Numerically

​Why This Matters for Machine Learning

​Example 1: Minimizing Business Costs

​The Problem

​Step 1: Understand the Function

​Step 2: Compute the Derivative

​Step 3: Find the Minimum

​Example 2: Optimizing Student Learning

​The Problem

​Understanding the Relationship

​Finding the Optimal Study Time

​Example 3: Tuning Recommendation Systems

​The Problem

​Visualizing the Error

​Finding Optimal Parameter

​Derivative Rules

​Power Rule

​Complete Derivative Rules Reference

​Basic Rules

​Product & Quotient Rules

​Chain Rule

​Common Functions

​Worked Examples: Applying the Rules

​ML-Specific Derivatives You’ll Use Often

​Constant Rule

​Sum Rule

Derivatives & Rates of Change

Your Challenge: The Pricing Problem

The Slow Way (What You’re Doing Now)

The Fast Way (What You’ll Learn)

What You’ll Be Able To Do

Your Problem: Finding the Pattern

Enter: The Derivative (Your Solution)

What You Need to Know

What Is a Derivative? (The Intuitive Explanation)

Everyday Analogy: Your Car’s Speedometer

Mathematical Definition (Now It Makes Sense!)

Geometric View: The Tangent Line

Computing a Derivative Numerically

Why This Matters for Machine Learning

Example 1: Minimizing Business Costs

The Problem

Step 1: Understand the Function

Step 2: Compute the Derivative

Step 3: Find the Minimum

Example 2: Optimizing Student Learning

The Problem

Understanding the Relationship

Finding the Optimal Study Time

Example 3: Tuning Recommendation Systems

The Problem

Visualizing the Error

Finding Optimal Parameter

Derivative Rules

Power Rule

Complete Derivative Rules Reference

Basic Rules

Product & Quotient Rules

Chain Rule

Common Functions

Worked Examples: Applying the Rules

ML-Specific Derivatives You’ll Use Often

Constant Rule

Sum Rule

Product Rule

Chain Rule (Preview)

Higher-Order Derivatives

Second Derivative

Concavity

Numerical Derivatives

Forward Difference

Central Difference (More Accurate)

Practice Exercises

Exercise 1: Profit Maximization

🎯 Practice Exercises & Real-World Applications

Exercise 1: Uber Surge Pricing 🚕

Exercise 2: Optimal Study Time 📚

Exercise 3: Fuel Efficiency Sweet Spot 🚗

Exercise 4: Investment Growth Rate 💹

Key Takeaways

Common Pitfalls & How to Avoid Them

Interview Questions You Should Be Able to Answer

What’s Next?