Skip to main content
Linear Systems & Applications

Linear Systems & Applications

The Problem Behind Every ML Model

Every machine learning model eventually boils down to solving a system of linear equations. When you train a linear regression, you’re solving:
w₁ * x₁₁ + w₂ * x₁₂ + ... + wₙ * x₁ₙ = y₁
w₁ * x₂₁ + w₂ * x₂₂ + ... + wₙ * x₂ₙ = y₂
...
w₁ * xₘ₁ + w₂ * xₘ₂ + ... + wₙ * xₘₙ = yₘ
Finding the best weights ww means solving this system!
Why This Matters: Understanding how to solve linear systems is crucial for:
  • Implementing ML algorithms from scratch
  • Debugging numerical issues in training
  • Optimizing performance of your models
  • Understanding when models will fail
Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: Matrices module
What You’ll Build: Equation solver, network flow optimizer, least squares regression

The Classic Problem: Three Equations, Three Unknowns

A Real Scenario: Pricing Mystery

You’re a detective investigating suspicious pricing at three stores. Each store sells the same three products (apples, bananas, oranges) but only shows the total bill:
StoreApples BoughtBananas BoughtOranges BoughtTotal Bill
Store A231$8.50
Store B123$9.00
Store C312$8.00
Question: What is the price of each fruit?

Setting Up the System

Let’s call:
  • aa = price of an apple
  • bb = price of a banana
  • oo = price of an orange
We get three equations: 2a+3b+1o=8.501a+2b+3o=9.003a+1b+2o=8.00\begin{align} 2a + 3b + 1o &= 8.50 \\ 1a + 2b + 3o &= 9.00 \\ 3a + 1b + 2o &= 8.00 \end{align} In matrix form: Ax=bA\mathbf{x} = \mathbf{b} [231123312][abo]=[8.509.008.00]\begin{bmatrix}2 & 3 & 1\\1 & 2 & 3\\3 & 1 & 2\end{bmatrix} \begin{bmatrix}a\\b\\o\end{bmatrix} = \begin{bmatrix}8.50\\9.00\\8.00\end{bmatrix}

Method 1: Gaussian Elimination (The Classic)

This is the method you (probably) learned in high school, but let’s see it with fresh eyes.

The Idea: Simplify Step by Step

Transform the system into a simpler form where the solution becomes obvious.
import numpy as np

def gaussian_elimination(A, b):
    """
    Solve Ax = b using Gaussian elimination with partial pivoting.
    
    This is the 'textbook' method - great for understanding!
    """
    n = len(b)
    
    # Create augmented matrix [A | b]
    Ab = np.column_stack([A.astype(float), b.astype(float)])
    
    # Forward elimination
    for col in range(n):
        # Find the row with largest absolute value in current column (pivoting)
        max_row = col + np.argmax(np.abs(Ab[col:, col]))
        
        # Swap rows if needed
        Ab[[col, max_row]] = Ab[[max_row, col]]
        
        # Check for zero pivot (singular matrix)
        if np.abs(Ab[col, col]) < 1e-10:
            raise ValueError("Matrix is singular or nearly singular!")
        
        # Eliminate entries below the pivot
        for row in range(col + 1, n):
            factor = Ab[row, col] / Ab[col, col]
            Ab[row, col:] -= factor * Ab[col, col:]
        
        print(f"\nAfter eliminating column {col + 1}:")
        print(Ab.round(3))
    
    # Back substitution
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (Ab[i, -1] - np.dot(Ab[i, i+1:n], x[i+1:])) / Ab[i, i]
    
    return x

# Our pricing mystery
A = np.array([
    [2, 3, 1],
    [1, 2, 3],
    [3, 1, 2]
])
b = np.array([8.50, 9.00, 8.00])

print("Solving the pricing mystery...")
print("Original system:")
print(f"2a + 3b + 1o = 8.50")
print(f"1a + 2b + 3o = 9.00")
print(f"3a + 1b + 2o = 8.00")

prices = gaussian_elimination(A, b)

print(f"\nApple price: ${prices[0]:.2f}")
print(f"Banana price: ${prices[1]:.2f}")
print(f"Orange price: ${prices[2]:.2f}")

# Verify
print(f"\nVerification:")
for i, (coeffs, total) in enumerate(zip(A, b)):
    calculated = np.dot(coeffs, prices)
    print(f"Store {chr(65+i)}: {calculated:.2f} == {total} (verified)")
Output:
Apple price: $1.00
Banana price: $1.50
Orange price: $2.00

Method 2: LU Decomposition (The Efficient Way)

The Insight: Factor Once, Solve Many

What if you need to solve Ax=bA\mathbf{x} = \mathbf{b} for many different b\mathbf{b} values? Gaussian elimination from scratch each time is wasteful. Instead, we factor AA once: A=LUA = LU Where:
  • LL = Lower triangular matrix (all zeros above diagonal)
  • UU = Upper triangular matrix (all zeros below diagonal)
Then solving Ax=bA\mathbf{x} = \mathbf{b} becomes:
  1. Solve Ly=bL\mathbf{y} = \mathbf{b} for y\mathbf{y} (forward substitution - easy!)
  2. Solve Ux=yU\mathbf{x} = \mathbf{y} for x\mathbf{x} (back substitution - easy!)
from scipy.linalg import lu, lu_factor, lu_solve

# Factor A once
A = np.array([
    [2, 3, 1],
    [1, 2, 3],
    [3, 1, 2]
], dtype=float)

# Get L, U decomposition
P, L, U = lu(A)

print("Original matrix A:")
print(A)
print("\nL (lower triangular):")
print(L.round(3))
print("\nU (upper triangular):")
print(U.round(3))
print("\nVerify: P @ L @ U = A")
print((P @ L @ U).round(3))

# Now solve for multiple b vectors efficiently
lu_factored = lu_factor(A)

# Different scenarios (different total bills)
scenarios = [
    [8.50, 9.00, 8.00],   # Original
    [10.00, 11.00, 9.50], # Higher prices?
    [6.00, 7.00, 6.50],   # Discount day
]

for i, b in enumerate(scenarios):
    x = lu_solve(lu_factored, b)
    print(f"\nScenario {i+1}: Bills = {b}")
    print(f"  Prices: Apple=${x[0]:.2f}, Banana=${x[1]:.2f}, Orange=${x[2]:.2f}")
Performance: LU factorization is O(n3)O(n^3) once, then each solve is only O(n2)O(n^2). For large systems solved repeatedly, this is a huge win!

The Least Squares Problem: When There’s No Exact Solution

Real-World Data is Messy

In ML, we rarely have an exact solution. We have:
  • More equations than unknowns (overdetermined system)
  • Noisy measurements
  • Contradictory data points
Example: Fitting a line through 100 points
import matplotlib.pyplot as plt

# Generate noisy data
np.random.seed(42)
n_points = 100

# True relationship: y = 2x + 1 + noise
x_data = np.linspace(0, 10, n_points)
y_true = 2 * x_data + 1
y_noisy = y_true + np.random.normal(0, 2, n_points)

# We want to find: y = w₀ + w₁*x
# This gives us 100 equations, 2 unknowns!

# Set up the system: X @ w = y
X = np.column_stack([np.ones(n_points), x_data])  # [1, x] for each point

print(f"X shape: {X.shape}")  # (100, 2)
print(f"y shape: {y_noisy.shape}")  # (100,)
print(f"\nWe have 100 equations but only 2 unknowns!")
print("There's no exact solution - we need LEAST SQUARES")

The Normal Equations

The least squares solution minimizes the squared error: minwAwb2\min_{\mathbf{w}} \|A\mathbf{w} - \mathbf{b}\|^2 The solution is: w=(ATA)1ATb\mathbf{w} = (A^T A)^{-1} A^T \mathbf{b}
# Method 1: Normal equations (direct formula)
def least_squares_normal(X, y):
    """Solve using the normal equations."""
    return np.linalg.inv(X.T @ X) @ X.T @ y

# Method 2: Using NumPy's lstsq (more numerically stable)
def least_squares_numpy(X, y):
    """Solve using NumPy's least squares."""
    w, residuals, rank, s = np.linalg.lstsq(X, y, rcond=None)
    return w

# Solve both ways
w_normal = least_squares_normal(X, y_noisy)
w_numpy = least_squares_numpy(X, y_noisy)

print(f"Normal equations: w₀ = {w_normal[0]:.3f}, w₁ = {w_normal[1]:.3f}")
print(f"NumPy lstsq:      w₀ = {w_numpy[0]:.3f}, w₁ = {w_numpy[1]:.3f}")
print(f"True values:      w₀ = 1.000, w₁ = 2.000")

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(x_data, y_noisy, alpha=0.5, label='Noisy data')
plt.plot(x_data, y_true, 'g--', linewidth=2, label='True line (y = 2x + 1)')
plt.plot(x_data, X @ w_numpy, 'r-', linewidth=2, 
         label=f'Fitted line (y = {w_numpy[0]:.2f} + {w_numpy[1]:.2f}x)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Least Squares: Finding the Best Fit Line')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Application 1: Network Flow Analysis

The Problem

You’re managing a water distribution network. Water flows from sources through pipes to destinations. You need to find the flow in each pipe.
# Network structure:
#
#   Source A ----[p1]----> Node 1 ----[p3]----> Destination X
#                            |
#   Source B ----[p2]--------+-----[p4]----> Destination Y
#
# Conservation law: Flow in = Flow out at each node

# Node 1 receives from p1 and p2, sends to p3 and p4
# Source A supplies 10 units
# Source B supplies 15 units  
# Destination X receives 12 units
# Destination Y receives 13 units

# Variables: flow in pipes p1, p2, p3, p4

# Equations (conservation at each node):
# At Node 1:  p1 + p2 = p3 + p4  =>  p1 + p2 - p3 - p4 = 0
# At Source A: p1 = 10
# At Source B: p2 = 15
# At Dest X:   p3 = 12
# At Dest Y:   p4 = 13

A_flow = np.array([
    [1, 1, -1, -1],  # Conservation at Node 1
    [1, 0, 0, 0],     # Source A
    [0, 1, 0, 0],     # Source B
    [0, 0, 1, 0],     # Dest X
])

b_flow = np.array([0, 10, 15, 12])

flows = np.linalg.solve(A_flow, b_flow)

print("Network Flow Solution:")
print(f"  Pipe 1 (A → Node 1): {flows[0]:.1f} units")
print(f"  Pipe 2 (B → Node 1): {flows[1]:.1f} units")
print(f"  Pipe 3 (Node 1 → X): {flows[2]:.1f} units")
print(f"  Pipe 4 (Node 1 → Y): {flows[3]:.1f} units")

# Check conservation at Node 1
print(f"\nConservation check at Node 1:")
print(f"  In:  {flows[0] + flows[1]:.1f}")
print(f"  Out: {flows[2] + flows[3]:.1f}")

Application 2: Electrical Circuit Analysis

Kirchhoff’s Laws as Linear Systems

# Circuit with 3 resistors and 2 voltage sources
#
#    +--[R1=2Ω]--+--[R2=3Ω]--+
#    |           |           |
#   [V1=5V]      |         [V2=3V]
#    |           |           |
#    +---[R3=4Ω]-+-----------+
#
# Using Kirchhoff's laws, we can set up a system for currents

# Let I1 = current through R1, I2 = through R2, I3 = through R3
# Voltage law (loop 1): V1 = R1*I1 + R3*I3  =>  2*I1 + 4*I3 = 5
# Voltage law (loop 2): V2 = R2*I2 + R3*I3  =>  3*I2 + 4*I3 = 3
# Current law (node):   I1 = I2 + I3         =>  I1 - I2 - I3 = 0

R1, R2, R3 = 2, 3, 4
V1, V2 = 5, 3

A_circuit = np.array([
    [R1, 0, R3],      # Loop 1: V1 = R1*I1 + R3*I3
    [0, R2, R3],      # Loop 2: V2 = R2*I2 + R3*I3
    [1, -1, -1],      # Node: I1 = I2 + I3
])

b_circuit = np.array([V1, V2, 0])

currents = np.linalg.solve(A_circuit, b_circuit)

print("Circuit Analysis:")
print(f"  I₁ (through R1): {currents[0]:.3f} A")
print(f"  I₂ (through R2): {currents[1]:.3f} A")
print(f"  I₃ (through R3): {currents[2]:.3f} A")

# Power dissipation
power = [R1 * currents[0]**2, R2 * currents[1]**2, R3 * currents[2]**2]
print(f"\nPower dissipation:")
print(f"  R1: {power[0]:.3f} W")
print(f"  R2: {power[1]:.3f} W")
print(f"  R3: {power[2]:.3f} W")
print(f"  Total: {sum(power):.3f} W")

When Systems Fail: Singular Matrices

Not all systems have solutions. Understanding when and why is crucial.

Case 1: No Solution (Inconsistent System)

# Three parallel lines - no intersection!
A_nosol = np.array([
    [1, 1],
    [1, 1],
    [1, 1]
])
b_nosol = np.array([2, 3, 4])

print("Trying to solve an inconsistent system...")
try:
    x = np.linalg.lstsq(A_nosol, b_nosol, rcond=None)[0]
    print(f"Least squares 'solution': {x}")
    print("(This minimizes error but doesn't actually solve the system)")
except:
    print("No solution exists!")

Case 2: Infinite Solutions (Underdetermined)

# One equation, two unknowns
A_inf = np.array([[1, 2]])
b_inf = np.array([5])

print("Underdetermined system: x + 2y = 5")
print("Infinitely many solutions: x = 5 - 2y for any y")

# NumPy gives the minimum norm solution
x_min = np.linalg.lstsq(A_inf, b_inf, rcond=None)[0]
print(f"Minimum norm solution: x = {x_min[0]:.2f}, y = {x_min[1]:.2f}")

Case 3: Numerical Instability (Ill-Conditioned)

# The Hilbert matrix is notoriously ill-conditioned
def hilbert(n):
    """Generate n×n Hilbert matrix."""
    return np.array([[1/(i+j+1) for j in range(n)] for i in range(n)])

for n in [5, 10, 15]:
    H = hilbert(n)
    cond = np.linalg.cond(H)
    print(f"Hilbert matrix {n}×{n}: condition number = {cond:.2e}")
    
print("\nHigh condition number = numerically unstable!")
print("Small errors in input cause huge errors in output.")
Condition Number: A measure of how sensitive the solution is to small changes in input.
  • Condition number ≈ 1: Well-conditioned, stable
  • Condition number > 10⁶: Ill-conditioned, results may be unreliable
  • Condition number = ∞: Singular, no unique solution

The Connection to ML

Linear Regression IS Solving a Linear System

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# Generate a regression problem
X, y = make_regression(n_samples=100, n_features=5, noise=10, random_state=42)

# Method 1: sklearn
model = LinearRegression()
model.fit(X, y)

# Method 2: Normal equations (what sklearn does internally)
X_aug = np.column_stack([np.ones(len(X)), X])  # Add bias column
w_direct = np.linalg.lstsq(X_aug, y, rcond=None)[0]

print("Coefficients comparison:")
print(f"sklearn:   intercept = {model.intercept_:.3f}")
print(f"           weights = {model.coef_}")
print(f"Direct:    intercept = {w_direct[0]:.3f}")
print(f"           weights = {w_direct[1:]}")

Practice Exercises

Problem: Three intersections have the following traffic counts (cars/hour):
  • Into intersection A: 100 from north, 80 from east
  • Out of A to B: x
  • Out of A to C: y
  • Into B from A: x, out to east: 90
  • Into C from A: y, out to south: 90
Find x and y (flows between intersections).
Problem: Balance this chemical equation by finding a, b, c, d:
a·H₂ + b·O₂ → c·H₂O
Set up as a linear system where atoms are conserved.
Problem: You have $10,000 to invest in three stocks. Based on historical data:
  • Stock A: Expected return 8%, risk factor 2
  • Stock B: Expected return 12%, risk factor 5
  • Stock C: Expected return 6%, risk factor 1
Find the investment in each to achieve:
  • Total investment: $10,000
  • Expected return: 9%
  • Total risk factor: 2.5

Summary

ConceptWhat It IsWhen to Use
Gaussian EliminationStep-by-step row operationsUnderstanding, small systems
LU DecompositionFactor A = LU onceMultiple solves with same A
Least SquaresMinimize ‖Ax - b‖²Overdetermined systems
Condition NumberSensitivity measureChecking numerical stability
Next Steps: Now that you understand how to solve linear systems, you’re ready for the capstone project where you’ll build a complete recommendation system using SVD!