Linear Systems & Applications
The Problem Behind Every ML Model
Every machine learning model eventually boils down to solving a system of linear equations .
When you train a linear regression, you’re solving:
w₁ * x₁₁ + w₂ * x₁₂ + ... + wₙ * x₁ₙ = y₁
w₁ * x₂₁ + w₂ * x₂₂ + ... + wₙ * x₂ₙ = y₂
...
w₁ * xₘ₁ + w₂ * xₘ₂ + ... + wₙ * xₘₙ = yₘ
Finding the best weights w w w means solving this system!
Why This Matters : Understanding how to solve linear systems is crucial for:
Implementing ML algorithms from scratch
Debugging numerical issues in training
Optimizing performance of your models
Understanding when models will fail
Estimated Time : 3-4 hours
Difficulty : Intermediate
Prerequisites : Matrices module
What You’ll Build : Equation solver, network flow optimizer, least squares regression
The Classic Problem: Three Equations, Three Unknowns
A Real Scenario: Pricing Mystery
You’re a detective investigating suspicious pricing at three stores. Each store sells the same three products (apples, bananas, oranges) but only shows the total bill:
Store Apples Bought Bananas Bought Oranges Bought Total Bill Store A 2 3 1 $8.50 Store B 1 2 3 $9.00 Store C 3 1 2 $8.00
Question : What is the price of each fruit?
Setting Up the System
Let’s call:
a a a = price of an apple
b b b = price of a banana
o o o = price of an orange
We get three equations:
2 a + 3 b + 1 o = 8.50 1 a + 2 b + 3 o = 9.00 3 a + 1 b + 2 o = 8.00 \begin{align}
2a + 3b + 1o &= 8.50 \\
1a + 2b + 3o &= 9.00 \\
3a + 1b + 2o &= 8.00
\end{align} 2 a + 3 b + 1 o 1 a + 2 b + 3 o 3 a + 1 b + 2 o = 8.50 = 9.00 = 8.00
In matrix form: A x = b A\mathbf{x} = \mathbf{b} A x = b
[ 2 3 1 1 2 3 3 1 2 ] [ a b o ] = [ 8.50 9.00 8.00 ] \begin{bmatrix}2 & 3 & 1\\1 & 2 & 3\\3 & 1 & 2\end{bmatrix}
\begin{bmatrix}a\\b\\o\end{bmatrix} =
\begin{bmatrix}8.50\\9.00\\8.00\end{bmatrix} 2 1 3 3 2 1 1 3 2 a b o = 8.50 9.00 8.00
Method 1: Gaussian Elimination (The Classic)
This is the method you (probably) learned in high school, but let’s see it with fresh eyes.
The Idea: Simplify Step by Step
Transform the system into a simpler form where the solution becomes obvious.
import numpy as np
def gaussian_elimination ( A , b ):
"""
Solve Ax = b using Gaussian elimination with partial pivoting.
This is the 'textbook' method - great for understanding!
"""
n = len (b)
# Create augmented matrix [A | b]
Ab = np.column_stack([A.astype( float ), b.astype( float )])
# Forward elimination
for col in range (n):
# Find the row with largest absolute value in current column (pivoting)
max_row = col + np.argmax(np.abs(Ab[col:, col]))
# Swap rows if needed
Ab[[col, max_row]] = Ab[[max_row, col]]
# Check for zero pivot (singular matrix)
if np.abs(Ab[col, col]) < 1e-10 :
raise ValueError ( "Matrix is singular or nearly singular!" )
# Eliminate entries below the pivot
for row in range (col + 1 , n):
factor = Ab[row, col] / Ab[col, col]
Ab[row, col:] -= factor * Ab[col, col:]
print ( f " \n After eliminating column { col + 1 } :" )
print (Ab.round( 3 ))
# Back substitution
x = np.zeros(n)
for i in range (n - 1 , - 1 , - 1 ):
x[i] = (Ab[i, - 1 ] - np.dot(Ab[i, i + 1 :n], x[i + 1 :])) / Ab[i, i]
return x
# Our pricing mystery
A = np.array([
[ 2 , 3 , 1 ],
[ 1 , 2 , 3 ],
[ 3 , 1 , 2 ]
])
b = np.array([ 8.50 , 9.00 , 8.00 ])
print ( "Solving the pricing mystery..." )
print ( "Original system:" )
print ( f "2a + 3b + 1o = 8.50" )
print ( f "1a + 2b + 3o = 9.00" )
print ( f "3a + 1b + 2o = 8.00" )
prices = gaussian_elimination(A, b)
print ( f " \n Apple price: $ { prices[ 0 ] :.2f} " )
print ( f "Banana price: $ { prices[ 1 ] :.2f} " )
print ( f "Orange price: $ { prices[ 2 ] :.2f} " )
# Verify
print ( f " \n Verification:" )
for i, (coeffs, total) in enumerate ( zip (A, b)):
calculated = np.dot(coeffs, prices)
print ( f "Store { chr ( 65 + i) } : { calculated :.2f} == { total } (verified)" )
Output:
Apple price: $1.00
Banana price: $1.50
Orange price: $2.00
Method 2: LU Decomposition (The Efficient Way)
The Insight: Factor Once, Solve Many
What if you need to solve A x = b A\mathbf{x} = \mathbf{b} A x = b for many different b \mathbf{b} b values?
Gaussian elimination from scratch each time is wasteful. Instead, we factor A A A once:
A = L U A = LU A = LU
Where:
L L L = Lower triangular matrix (all zeros above diagonal)
U U U = Upper triangular matrix (all zeros below diagonal)
Then solving A x = b A\mathbf{x} = \mathbf{b} A x = b becomes:
Solve L y = b L\mathbf{y} = \mathbf{b} L y = b for y \mathbf{y} y (forward substitution - easy!)
Solve U x = y U\mathbf{x} = \mathbf{y} U x = y for x \mathbf{x} x (back substitution - easy!)
from scipy.linalg import lu, lu_factor, lu_solve
# Factor A once
A = np.array([
[ 2 , 3 , 1 ],
[ 1 , 2 , 3 ],
[ 3 , 1 , 2 ]
], dtype = float )
# Get L, U decomposition
P, L, U = lu(A)
print ( "Original matrix A:" )
print (A)
print ( " \n L (lower triangular):" )
print (L.round( 3 ))
print ( " \n U (upper triangular):" )
print (U.round( 3 ))
print ( " \n Verify: P @ L @ U = A" )
print ((P @ L @ U).round( 3 ))
# Now solve for multiple b vectors efficiently
lu_factored = lu_factor(A)
# Different scenarios (different total bills)
scenarios = [
[ 8.50 , 9.00 , 8.00 ], # Original
[ 10.00 , 11.00 , 9.50 ], # Higher prices?
[ 6.00 , 7.00 , 6.50 ], # Discount day
]
for i, b in enumerate (scenarios):
x = lu_solve(lu_factored, b)
print ( f " \n Scenario { i + 1 } : Bills = { b } " )
print ( f " Prices: Apple=$ { x[ 0 ] :.2f} , Banana=$ { x[ 1 ] :.2f} , Orange=$ { x[ 2 ] :.2f} " )
Performance : LU factorization is O ( n 3 ) O(n^3) O ( n 3 ) once, then each solve is only O ( n 2 ) O(n^2) O ( n 2 ) . For large systems solved repeatedly, this is a huge win!
The Least Squares Problem: When There’s No Exact Solution
Real-World Data is Messy
In ML, we rarely have an exact solution. We have:
More equations than unknowns (overdetermined system)
Noisy measurements
Contradictory data points
Example : Fitting a line through 100 points
import matplotlib.pyplot as plt
# Generate noisy data
np.random.seed( 42 )
n_points = 100
# True relationship: y = 2x + 1 + noise
x_data = np.linspace( 0 , 10 , n_points)
y_true = 2 * x_data + 1
y_noisy = y_true + np.random.normal( 0 , 2 , n_points)
# We want to find: y = w₀ + w₁*x
# This gives us 100 equations, 2 unknowns!
# Set up the system: X @ w = y
X = np.column_stack([np.ones(n_points), x_data]) # [1, x] for each point
print ( f "X shape: { X.shape } " ) # (100, 2)
print ( f "y shape: { y_noisy.shape } " ) # (100,)
print ( f " \n We have 100 equations but only 2 unknowns!" )
print ( "There's no exact solution - we need LEAST SQUARES" )
The Normal Equations
The least squares solution minimizes the squared error:
min w ∥ A w − b ∥ 2 \min_{\mathbf{w}} \|A\mathbf{w} - \mathbf{b}\|^2 w min ∥ A w − b ∥ 2
The solution is:
w = ( A T A ) − 1 A T b \mathbf{w} = (A^T A)^{-1} A^T \mathbf{b} w = ( A T A ) − 1 A T b
# Method 1: Normal equations (direct formula)
def least_squares_normal ( X , y ):
"""Solve using the normal equations."""
return np.linalg.inv(X.T @ X) @ X.T @ y
# Method 2: Using NumPy's lstsq (more numerically stable)
def least_squares_numpy ( X , y ):
"""Solve using NumPy's least squares."""
w, residuals, rank, s = np.linalg.lstsq(X, y, rcond = None )
return w
# Solve both ways
w_normal = least_squares_normal(X, y_noisy)
w_numpy = least_squares_numpy(X, y_noisy)
print ( f "Normal equations: w₀ = { w_normal[ 0 ] :.3f} , w₁ = { w_normal[ 1 ] :.3f} " )
print ( f "NumPy lstsq: w₀ = { w_numpy[ 0 ] :.3f} , w₁ = { w_numpy[ 1 ] :.3f} " )
print ( f "True values: w₀ = 1.000, w₁ = 2.000" )
# Visualize
plt.figure( figsize = ( 10 , 6 ))
plt.scatter(x_data, y_noisy, alpha = 0.5 , label = 'Noisy data' )
plt.plot(x_data, y_true, 'g--' , linewidth = 2 , label = 'True line (y = 2x + 1)' )
plt.plot(x_data, X @ w_numpy, 'r-' , linewidth = 2 ,
label = f 'Fitted line (y = { w_numpy[ 0 ] :.2f} + { w_numpy[ 1 ] :.2f} x)' )
plt.xlabel( 'x' )
plt.ylabel( 'y' )
plt.title( 'Least Squares: Finding the Best Fit Line' )
plt.legend()
plt.grid( True , alpha = 0.3 )
plt.show()
Application 1: Network Flow Analysis
The Problem
You’re managing a water distribution network. Water flows from sources through pipes to destinations. You need to find the flow in each pipe.
# Network structure:
#
# Source A ----[p1]----> Node 1 ----[p3]----> Destination X
# |
# Source B ----[p2]--------+-----[p4]----> Destination Y
#
# Conservation law: Flow in = Flow out at each node
# Node 1 receives from p1 and p2, sends to p3 and p4
# Source A supplies 10 units
# Source B supplies 15 units
# Destination X receives 12 units
# Destination Y receives 13 units
# Variables: flow in pipes p1, p2, p3, p4
# Equations (conservation at each node):
# At Node 1: p1 + p2 = p3 + p4 => p1 + p2 - p3 - p4 = 0
# At Source A: p1 = 10
# At Source B: p2 = 15
# At Dest X: p3 = 12
# At Dest Y: p4 = 13
A_flow = np.array([
[ 1 , 1 , - 1 , - 1 ], # Conservation at Node 1
[ 1 , 0 , 0 , 0 ], # Source A
[ 0 , 1 , 0 , 0 ], # Source B
[ 0 , 0 , 1 , 0 ], # Dest X
])
b_flow = np.array([ 0 , 10 , 15 , 12 ])
flows = np.linalg.solve(A_flow, b_flow)
print ( "Network Flow Solution:" )
print ( f " Pipe 1 (A → Node 1): { flows[ 0 ] :.1f} units" )
print ( f " Pipe 2 (B → Node 1): { flows[ 1 ] :.1f} units" )
print ( f " Pipe 3 (Node 1 → X): { flows[ 2 ] :.1f} units" )
print ( f " Pipe 4 (Node 1 → Y): { flows[ 3 ] :.1f} units" )
# Check conservation at Node 1
print ( f " \n Conservation check at Node 1:" )
print ( f " In: { flows[ 0 ] + flows[ 1 ] :.1f} " )
print ( f " Out: { flows[ 2 ] + flows[ 3 ] :.1f} " )
Application 2: Electrical Circuit Analysis
Kirchhoff’s Laws as Linear Systems
# Circuit with 3 resistors and 2 voltage sources
#
# +--[R1=2Ω]--+--[R2=3Ω]--+
# | | |
# [V1=5V] | [V2=3V]
# | | |
# +---[R3=4Ω]-+-----------+
#
# Using Kirchhoff's laws, we can set up a system for currents
# Let I1 = current through R1, I2 = through R2, I3 = through R3
# Voltage law (loop 1): V1 = R1*I1 + R3*I3 => 2*I1 + 4*I3 = 5
# Voltage law (loop 2): V2 = R2*I2 + R3*I3 => 3*I2 + 4*I3 = 3
# Current law (node): I1 = I2 + I3 => I1 - I2 - I3 = 0
R1, R2, R3 = 2 , 3 , 4
V1, V2 = 5 , 3
A_circuit = np.array([
[R1, 0 , R3], # Loop 1: V1 = R1*I1 + R3*I3
[ 0 , R2, R3], # Loop 2: V2 = R2*I2 + R3*I3
[ 1 , - 1 , - 1 ], # Node: I1 = I2 + I3
])
b_circuit = np.array([V1, V2, 0 ])
currents = np.linalg.solve(A_circuit, b_circuit)
print ( "Circuit Analysis:" )
print ( f " I₁ (through R1): { currents[ 0 ] :.3f} A" )
print ( f " I₂ (through R2): { currents[ 1 ] :.3f} A" )
print ( f " I₃ (through R3): { currents[ 2 ] :.3f} A" )
# Power dissipation
power = [R1 * currents[ 0 ] ** 2 , R2 * currents[ 1 ] ** 2 , R3 * currents[ 2 ] ** 2 ]
print ( f " \n Power dissipation:" )
print ( f " R1: { power[ 0 ] :.3f} W" )
print ( f " R2: { power[ 1 ] :.3f} W" )
print ( f " R3: { power[ 2 ] :.3f} W" )
print ( f " Total: { sum (power) :.3f} W" )
When Systems Fail: Singular Matrices
Not all systems have solutions. Understanding when and why is crucial.
Case 1: No Solution (Inconsistent System)
# Three parallel lines - no intersection!
A_nosol = np.array([
[ 1 , 1 ],
[ 1 , 1 ],
[ 1 , 1 ]
])
b_nosol = np.array([ 2 , 3 , 4 ])
print ( "Trying to solve an inconsistent system..." )
try :
x = np.linalg.lstsq(A_nosol, b_nosol, rcond = None )[ 0 ]
print ( f "Least squares 'solution': { x } " )
print ( "(This minimizes error but doesn't actually solve the system)" )
except :
print ( "No solution exists!" )
Case 2: Infinite Solutions (Underdetermined)
# One equation, two unknowns
A_inf = np.array([[ 1 , 2 ]])
b_inf = np.array([ 5 ])
print ( "Underdetermined system: x + 2y = 5" )
print ( "Infinitely many solutions: x = 5 - 2y for any y" )
# NumPy gives the minimum norm solution
x_min = np.linalg.lstsq(A_inf, b_inf, rcond = None )[ 0 ]
print ( f "Minimum norm solution: x = { x_min[ 0 ] :.2f} , y = { x_min[ 1 ] :.2f} " )
Case 3: Numerical Instability (Ill-Conditioned)
# The Hilbert matrix is notoriously ill-conditioned
def hilbert ( n ):
"""Generate n×n Hilbert matrix."""
return np.array([[ 1 / (i + j + 1 ) for j in range (n)] for i in range (n)])
for n in [ 5 , 10 , 15 ]:
H = hilbert(n)
cond = np.linalg.cond(H)
print ( f "Hilbert matrix { n } × { n } : condition number = { cond :.2e} " )
print ( " \n High condition number = numerically unstable!" )
print ( "Small errors in input cause huge errors in output." )
Condition Number : A measure of how sensitive the solution is to small changes in input.
Condition number ≈ 1: Well-conditioned, stable
Condition number > 10⁶: Ill-conditioned, results may be unreliable
Condition number = ∞: Singular, no unique solution
The Connection to ML
Linear Regression IS Solving a Linear System
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
# Generate a regression problem
X, y = make_regression( n_samples = 100 , n_features = 5 , noise = 10 , random_state = 42 )
# Method 1: sklearn
model = LinearRegression()
model.fit(X, y)
# Method 2: Normal equations (what sklearn does internally)
X_aug = np.column_stack([np.ones( len (X)), X]) # Add bias column
w_direct = np.linalg.lstsq(X_aug, y, rcond = None )[ 0 ]
print ( "Coefficients comparison:" )
print ( f "sklearn: intercept = { model.intercept_ :.3f} " )
print ( f " weights = { model.coef_ } " )
print ( f "Direct: intercept = { w_direct[ 0 ] :.3f} " )
print ( f " weights = { w_direct[ 1 :] } " )
Practice Exercises
Problem : Three intersections have the following traffic counts (cars/hour):
Into intersection A: 100 from north, 80 from east
Out of A to B: x
Out of A to C: y
Into B from A: x, out to east: 90
Into C from A: y, out to south: 90
Find x and y (flows between intersections).
Exercise 2: Chemical Balancing
Problem : Balance this chemical equation by finding a, b, c, d:Set up as a linear system where atoms are conserved.
Exercise 3: Portfolio Optimization
Problem : You have $10,000 to invest in three stocks. Based on historical data:
Stock A: Expected return 8%, risk factor 2
Stock B: Expected return 12%, risk factor 5
Stock C: Expected return 6%, risk factor 1
Find the investment in each to achieve:
Total investment: $10,000
Expected return: 9%
Total risk factor: 2.5
Summary
Concept What It Is When to Use Gaussian Elimination Step-by-step row operations Understanding, small systems LU Decomposition Factor A = LU once Multiple solves with same A Least Squares Minimize ‖Ax - b‖² Overdetermined systems Condition Number Sensitivity measure Checking numerical stability
Next Steps : Now that you understand how to solve linear systems, you’re ready for the capstone project where you’ll build a complete recommendation system using SVD!