Convex Optimization
Why Some Problems Are “Easy”
In gradient descent, we worry about getting stuck in local minima. But for some problems, every local minimum is the global minimum. These are convex optimization problems, and they’re the foundation of many ML algorithms.Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: Gradient Descent, Optimization modules
What You’ll Learn: When optimization is easy, and how to leverage it
Difficulty: Intermediate
Prerequisites: Gradient Descent, Optimization modules
What You’ll Learn: When optimization is easy, and how to leverage it
What Is Convexity?
Convex Sets
A set is convex if the line segment between any two points in lies entirely in .Convex Functions
A function is convex if the line segment between any two points on the graph lies above or on the graph. Mathematically: For all and :Testing for Convexity
Second Derivative Test
For twice-differentiable functions:| Function Type | Condition |
|---|---|
| Convex | everywhere |
| Strictly Convex | everywhere |
| Concave | everywhere |
Hessian Test (Multivariable)
For functions of multiple variables, check if the Hessian (matrix of second derivatives) is positive semi-definite.Why Convexity Matters
Theorem: Every Local Minimum Is Global
For convex functions, gradient descent is guaranteed to find the global minimum.Common Convex Functions in ML
Loss Functions
| Loss | Formula | Convexity |
|---|---|---|
| MSE | Convex ✓ | |
| MAE | Convex ✓ | |
| Cross-Entropy | Convex ✓ | |
| Hinge | Convex ✓ | |
| Huber | Combination | Convex ✓ |
Regularization
| Regularizer | Formula | Convexity |
|---|---|---|
| L2 (Ridge) | Strictly Convex ✓ | |
| L1 (Lasso) | Convex ✓ | |
| Elastic Net | Convex ✓ |
Linear Regression: A Convex Problem
Linear regression with MSE loss is convex: This is a quadratic function in , which is always convex!Strong Convexity: Even Better Guarantees
A function is strongly convex with parameter if: Why it matters: Strong convexity guarantees faster convergence!| Property | Convex | Strongly Convex |
|---|---|---|
| Unique minimum? | Maybe | Yes! |
| Convergence rate | (exponential!) | |
| Effect of L2 regularization | Makes strongly convex | Increases |
Practice Exercises
Exercise 1: Prove Convexity
Exercise 1: Prove Convexity
Problem: Prove that log-loss (binary cross-entropy) is convex.Hint: Compute the second derivative and show it’s non-negative.
Exercise 2: L1 vs L2
Exercise 2: L1 vs L2
Problem: Why does L1 regularization produce sparse solutions (many zeros) while L2 doesn’t?Hint: Think about the geometry of the constraint sets and where they intersect the loss contours.
Exercise 3: Making Problems Convex
Exercise 3: Making Problems Convex
Problem: The function is non-convex. Add L2 regularization to make it convex. What’s the minimum regularization strength needed?Hint: The regularized function is . Find such that for all .
Summary
| Concept | What It Means | Practical Impact |
|---|---|---|
| Convex Function | Bowl-shaped, no local minima | Gradient descent finds global optimum |
| Convex Set | ”Solid” without holes | Feasible region for optimization |
| Strongly Convex | ”Very bowl-shaped” | Faster convergence |
| L2 Regularization | Makes function more convex | Improves optimization |
| Non-Convex | Multiple minima | No global optimum guarantee |
Key Takeaway: Convexity is your friend. When designing ML systems, prefer convex loss functions and regularization. When you must use non-convex models (neural networks), understand that you’re trading optimization guarantees for model flexibility. Good initialization, learning rate schedules, and regularization help navigate non-convex landscapes.