Orthogonality & Projections
Why Orthogonality Matters
The Intuition: Perpendicular = Independent
A Simple Example
The Mathematical Test
Orthonormal Bases: The Gold Standard
Why Orthonormal Is Amazing
Gram-Schmidt: Making Any Basis Orthonormal
Projections: The Heart of Least Squares
Projecting onto a Line
Projecting onto a Subspace (Least Squares!)
QR Decomposition: The Practical Tool
Why QR Is Better Than Normal Equations
Application: Signal Denoising with Orthogonal Bases
The Discrete Cosine Transform (DCT)
Application: Image Compression
Practice Exercises
Summary

$Orthogonality & Projections$

Orthogonality & Projections

Why Orthogonality Matters

In machine learning and data science, orthogonality is everywhere:

Application	How Orthogonality Is Used
PCA	Find orthogonal directions of maximum variance
Least Squares	Project data onto the column space of features
Signal Processing	Decompose signals into orthogonal frequency components
QR Decomposition	Numerically stable way to solve linear systems
Gram-Schmidt	Create orthonormal bases for any subspace

Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: Vectors and Matrices modules
What You’ll Build: Image compression, signal denoising, and robust regression

The Intuition: Perpendicular = Independent

A Simple Example

Two vectors are orthogonal (perpendicular) if they point in completely independent directions.

import numpy as np
import matplotlib.pyplot as plt

# Two orthogonal vectors
v1 = np.array([3, 0])
v2 = np.array([0, 2])

# Visualize
plt.figure(figsize=(8, 8))
plt.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='blue', label='v1 = [3, 0]')
plt.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='red', label='v2 = [0, 2]')

# Show the right angle
theta = np.linspace(0, np.pi/2, 30)
r = 0.5
plt.plot(r * np.cos(theta), r * np.sin(theta), 'g-', linewidth=2)
plt.text(0.6, 0.3, '90°', fontsize=14, color='green')

plt.xlim(-1, 4)
plt.ylim(-1, 3)
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='k', linewidth=0.5)
plt.axvline(x=0, color='k', linewidth=0.5)
plt.legend()
plt.title('Orthogonal Vectors: Perpendicular, Independent')
plt.gca().set_aspect('equal')
plt.show()

# Mathematical test: dot product = 0
print(f"v1 · v2 = {np.dot(v1, v2)}")
print("Orthogonal!" if np.dot(v1, v2) == 0 else "Not orthogonal")

The Mathematical Test

Two vectors

\mathbf{u}

and

\mathbf{v}

are orthogonal if and only if:

\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i = 0

Geometric Interpretation: When the dot product is zero, the vectors form a 90° angle. No component of one vector points in the direction of the other.

Orthonormal Bases: The Gold Standard

An orthonormal basis is a set of vectors that are:

Orthogonal: Every pair has dot product zero
Normalized: Each vector has length 1

Why Orthonormal Is Amazing

# Standard basis in 3D is orthonormal
e1 = np.array([1, 0, 0])
e2 = np.array([0, 1, 0])
e3 = np.array([0, 0, 1])

# Any vector can be easily decomposed
v = np.array([3, -2, 5])

# Finding components is trivial - just dot products!
c1 = np.dot(v, e1)  # 3
c2 = np.dot(v, e2)  # -2
c3 = np.dot(v, e3)  # 5

print(f"v = {c1}·e1 + {c2}·e2 + {c3}·e3")
print(f"v = {c1*e1 + c2*e2 + c3*e3}")  # Reconstructed

Gram-Schmidt: Making Any Basis Orthonormal

Given any set of linearly independent vectors, we can create an orthonormal basis.

def gram_schmidt(vectors):
    """
    Convert a set of linearly independent vectors to an orthonormal basis.
    
    This is the foundation of QR decomposition!
    """
    n = len(vectors)
    orthonormal = []
    
    for i, v in enumerate(vectors):
        # Start with the original vector
        u = v.copy().astype(float)
        
        # Subtract projections onto all previous orthonormal vectors
        for q in orthonormal:
            u = u - np.dot(u, q) * q
        
        # Normalize
        norm = np.linalg.norm(u)
        if norm < 1e-10:
            raise ValueError("Vectors are linearly dependent!")
        
        orthonormal.append(u / norm)
    
    return np.array(orthonormal)

# Example: Convert arbitrary vectors to orthonormal
v1 = np.array([1, 1, 0])
v2 = np.array([1, 0, 1])
v3 = np.array([0, 1, 1])

Q = gram_schmidt([v1, v2, v3])

print("Original vectors:")
print(f"v1 = {v1}")
print(f"v2 = {v2}")
print(f"v3 = {v3}")

print("\nOrthonormal basis Q:")
for i, q in enumerate(Q):
    print(f"q{i+1} = {q.round(4)}")

# Verify orthonormality
print("\nVerification (Q @ Q.T should be identity):")
print((Q @ Q.T).round(4))

Projections: The Heart of Least Squares

Projecting onto a Line

The projection of vector

\mathbf{b}

onto vector

\mathbf{a}

gives the component of

\mathbf{b}

in the direction of

\mathbf{a}

\text{proj}_{\mathbf{a}} \mathbf{b} = \frac{\mathbf{a} \cdot \mathbf{b}}{\mathbf{a} \cdot \mathbf{a}} \mathbf{a}

def project_onto_vector(b, a):
    """Project vector b onto vector a."""
    return (np.dot(a, b) / np.dot(a, a)) * a

# Example
a = np.array([2, 1])
b = np.array([1, 3])

proj = project_onto_vector(b, a)
error = b - proj  # The residual (perpendicular to a)

# Visualize
plt.figure(figsize=(10, 6))
plt.quiver(0, 0, a[0], a[1], angles='xy', scale_units='xy', scale=1, color='blue', label=f'a = {a}', width=0.015)
plt.quiver(0, 0, b[0], b[1], angles='xy', scale_units='xy', scale=1, color='red', label=f'b = {b}', width=0.015)
plt.quiver(0, 0, proj[0], proj[1], angles='xy', scale_units='xy', scale=1, color='green', label=f'proj = {proj.round(2)}', width=0.015)
plt.quiver(proj[0], proj[1], error[0], error[1], angles='xy', scale_units='xy', scale=1, color='purple', label='error (⊥ to a)', width=0.015)

# Show right angle
plt.plot([proj[0], proj[0] + 0.2*error[0]/np.linalg.norm(error)], 
         [proj[1], proj[1] + 0.2*error[1]/np.linalg.norm(error)], 'k-')

plt.xlim(-0.5, 3)
plt.ylim(-0.5, 3.5)
plt.grid(True, alpha=0.3)
plt.legend()
plt.title('Projection of b onto a')
plt.gca().set_aspect('equal')
plt.show()

# Verify error is orthogonal to a
print(f"Error · a = {np.dot(error, a):.6f} (should be ≈ 0)")

Projecting onto a Subspace (Least Squares!)

When we have a matrix

A

with columns spanning a subspace, the projection of

\mathbf{b}

onto this subspace is:

\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}

This is exactly the normal equations for least squares!

# Linear regression as projection
np.random.seed(42)

# Data: y ≈ 2x + 1 + noise
n = 50
x = np.linspace(0, 10, n)
y = 2 * x + 1 + np.random.randn(n) * 2

# Feature matrix (with bias column)
A = np.column_stack([np.ones(n), x])

# The least squares solution is the projection!
# We're projecting y onto the column space of A

# Method 1: Normal equations
x_hat = np.linalg.inv(A.T @ A) @ A.T @ y

# Method 2: QR decomposition (more numerically stable)
Q, R = np.linalg.qr(A)
x_hat_qr = np.linalg.solve(R, Q.T @ y)

print(f"Normal equations: intercept = {x_hat[0]:.3f}, slope = {x_hat[1]:.3f}")
print(f"QR decomposition: intercept = {x_hat_qr[0]:.3f}, slope = {x_hat_qr[1]:.3f}")
print(f"True values: intercept = 1.000, slope = 2.000")

# Visualize
y_pred = A @ x_hat
residuals = y - y_pred

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Fitted line
axes[0].scatter(x, y, alpha=0.6, label='Data')
axes[0].plot(x, y_pred, 'r-', linewidth=2, label=f'Fit: y = {x_hat[1]:.2f}x + {x_hat[0]:.2f}')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')
axes[0].set_title('Least Squares as Projection')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Residuals (should be orthogonal to columns of A)
axes[1].scatter(x, residuals, alpha=0.6)
axes[1].axhline(y=0, color='red', linestyle='--')
axes[1].set_xlabel('x')
axes[1].set_ylabel('Residual')
axes[1].set_title('Residuals (⊥ to feature space)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Verify residuals are orthogonal to column space
print(f"\nResiduals · ones = {np.dot(residuals, np.ones(n)):.6f} (should be ≈ 0)")
print(f"Residuals · x = {np.dot(residuals, x):.6f} (should be ≈ 0)")

QR Decomposition: The Practical Tool

QR decomposition factors a matrix as:

A = QR

Where:

$Q$ is orthogonal (columns are orthonormal)
$R$ is upper triangular

Why QR Is Better Than Normal Equations

# The normal equations (A^T A)^(-1) can be numerically unstable
# QR decomposition avoids computing A^T A

def least_squares_qr(A, b):
    """
    Solve least squares using QR decomposition.
    More stable than normal equations for ill-conditioned problems.
    """
    Q, R = np.linalg.qr(A)
    return np.linalg.solve(R, Q.T @ b)

# Compare on an ill-conditioned problem
n = 100
x = np.linspace(0, 1, n)

# Create increasingly ill-conditioned feature matrix (polynomial features)
degrees = [1, 5, 10, 15]

for deg in degrees:
    # Vandermonde matrix (polynomials)
    A = np.vander(x, deg + 1, increasing=True)
    y = np.sin(3 * x) + np.random.randn(n) * 0.1
    
    # Condition number
    cond = np.linalg.cond(A)
    
    # Try both methods
    try:
        x_normal = np.linalg.inv(A.T @ A) @ A.T @ y
        y_pred_normal = A @ x_normal
        rmse_normal = np.sqrt(np.mean((y - y_pred_normal)**2))
    except:
        rmse_normal = float('inf')
    
    x_qr = least_squares_qr(A, y)
    y_pred_qr = A @ x_qr
    rmse_qr = np.sqrt(np.mean((y - y_pred_qr)**2))
    
    print(f"Degree {deg:2d}: Condition = {cond:.2e}, Normal RMSE = {rmse_normal:.4f}, QR RMSE = {rmse_qr:.4f}")

Application: Signal Denoising with Orthogonal Bases

The Discrete Cosine Transform (DCT)

Signals can be decomposed into orthogonal frequency components.

from scipy.fft import dct, idct

# Create a noisy signal
t = np.linspace(0, 1, 256)
clean_signal = np.sin(2 * np.pi * 3 * t) + 0.5 * np.sin(2 * np.pi * 7 * t)
noise = np.random.randn(len(t)) * 0.5
noisy_signal = clean_signal + noise

# DCT decomposes into orthogonal frequency components
coeffs = dct(noisy_signal, norm='ortho')

# Keep only the largest coefficients (denoising)
n_keep = 20
denoised_coeffs = coeffs.copy()
threshold_idx = np.argsort(np.abs(coeffs))[-n_keep:]
mask = np.zeros_like(coeffs, dtype=bool)
mask[threshold_idx] = True
denoised_coeffs[~mask] = 0

# Reconstruct
denoised_signal = idct(denoised_coeffs, norm='ortho')

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

axes[0, 0].plot(t, clean_signal, 'g-', linewidth=2, label='Clean')
axes[0, 0].plot(t, noisy_signal, 'b-', alpha=0.5, label='Noisy')
axes[0, 0].set_title('Original Signals')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

axes[0, 1].stem(range(50), np.abs(coeffs[:50]), markerfmt='b.')
axes[0, 1].set_title('DCT Coefficients (first 50)')
axes[0, 1].set_xlabel('Frequency Index')
axes[0, 1].set_ylabel('|Coefficient|')
axes[0, 1].grid(True, alpha=0.3)

axes[1, 0].plot(t, noisy_signal, 'b-', alpha=0.5, label='Noisy')
axes[1, 0].plot(t, denoised_signal, 'r-', linewidth=2, label='Denoised')
axes[1, 0].set_title(f'Denoising (keeping {n_keep} coefficients)')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

axes[1, 1].plot(t, clean_signal, 'g-', linewidth=2, label='Clean')
axes[1, 1].plot(t, denoised_signal, 'r--', linewidth=2, label='Denoised')
axes[1, 1].set_title('Comparison: Clean vs Denoised')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Quality metrics
mse_noisy = np.mean((noisy_signal - clean_signal)**2)
mse_denoised = np.mean((denoised_signal - clean_signal)**2)

print(f"MSE (noisy): {mse_noisy:.4f}")
print(f"MSE (denoised): {mse_denoised:.4f}")
print(f"Improvement: {(mse_noisy - mse_denoised) / mse_noisy * 100:.1f}%")

Application: Image Compression

Images can be compressed by keeping only the most important orthogonal components.

from scipy.fft import dctn, idctn

# Create a simple test image
n = 64
x, y = np.meshgrid(np.linspace(-1, 1, n), np.linspace(-1, 1, n))
image = np.exp(-(x**2 + y**2) / 0.3) * np.cos(4 * np.pi * x) * np.cos(4 * np.pi * y)
image = (image - image.min()) / (image.max() - image.min())

# 2D DCT
coeffs_2d = dctn(image, norm='ortho')

# Compress by keeping only top k% of coefficients
compression_ratios = [100, 50, 20, 10, 5]

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

axes[0].imshow(image, cmap='viridis')
axes[0].set_title(f'Original (100% coefficients)')
axes[0].axis('off')

for i, ratio in enumerate(compression_ratios[1:], 1):
    n_keep = int(n * n * ratio / 100)
    
    # Keep largest coefficients
    flat_coeffs = coeffs_2d.flatten()
    threshold = np.sort(np.abs(flat_coeffs))[-n_keep]
    
    compressed_coeffs = np.where(np.abs(coeffs_2d) >= threshold, coeffs_2d, 0)
    reconstructed = idctn(compressed_coeffs, norm='ortho')
    
    mse = np.mean((image - reconstructed)**2)
    
    axes[i].imshow(reconstructed, cmap='viridis')
    axes[i].set_title(f'{ratio}% coeffs, MSE={mse:.4f}')
    axes[i].axis('off')

# Show coefficient distribution
flat_coeffs = np.abs(coeffs_2d.flatten())
axes[5].hist(flat_coeffs, bins=50, edgecolor='black', alpha=0.7)
axes[5].set_xlabel('|Coefficient|')
axes[5].set_ylabel('Count')
axes[5].set_title('DCT Coefficient Distribution')
axes[5].set_yscale('log')

plt.tight_layout()
plt.show()

Practice Exercises

Exercise 1: Orthogonal Projection

Problem: Given vectors

\mathbf{u}_1 = [1, 1, 0]

and

\mathbf{u}_2 = [1, -1, 0]

(which are orthogonal), find the projection of

\mathbf{b} = [3, 4, 2]

onto the plane spanned by

\mathbf{u}_1

and

\mathbf{u}_2

.Hint: For orthogonal bases, projections can be computed independently and summed.

Exercise 2: Gram-Schmidt by Hand

Problem: Apply Gram-Schmidt to vectors

\mathbf{v}_1 = [1, 1]

and

\mathbf{v}_2 = [1, 2]

.Steps:

$\mathbf{q}_1 = \mathbf{v}_1 / \|\mathbf{v}_1\|$
$\mathbf{u}_2 = \mathbf{v}_2 - (\mathbf{v}_2 \cdot \mathbf{q}_1)\mathbf{q}_1$
$\mathbf{q}_2 = \mathbf{u}_2 / \|\mathbf{u}_2\|$

Exercise 3: Orthogonal Regression

Problem: In standard linear regression, we minimize vertical errors. What if we minimize perpendicular (orthogonal) distances to the line? This is called orthogonal regression or total least squares.Implement orthogonal regression using PCA: the first principal component gives the best-fit line that minimizes orthogonal distances.

Summary

Concept	Definition	Use Case
Orthogonal	Dot product = 0	Independent directions
Orthonormal	Orthogonal + unit length	Convenient bases
Gram-Schmidt	Algorithm to orthonormalize	QR decomposition
Projection	Component in a direction	Least squares, dimensionality reduction
QR Decomposition	A = QR	Numerically stable solving

Key Takeaway: Orthogonality simplifies everything. Decomposing into orthogonal components makes calculations independent and numerically stable. This is why PCA (finding orthogonal directions of variance) is so powerful for dimensionality reduction.

Project 2 Introduction

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Orthogonality & Projections

​Why Orthogonality Matters

​The Intuition: Perpendicular = Independent

​A Simple Example

​The Mathematical Test

​Orthonormal Bases: The Gold Standard

​Why Orthonormal Is Amazing

​Gram-Schmidt: Making Any Basis Orthonormal

​Projections: The Heart of Least Squares

​Projecting onto a Line

​Projecting onto a Subspace (Least Squares!)

​QR Decomposition: The Practical Tool

​Why QR Is Better Than Normal Equations

​Application: Signal Denoising with Orthogonal Bases

​The Discrete Cosine Transform (DCT)

​Application: Image Compression

​Practice Exercises

​Summary

Orthogonality & Projections

Why Orthogonality Matters

The Intuition: Perpendicular = Independent

A Simple Example

The Mathematical Test

Orthonormal Bases: The Gold Standard

Why Orthonormal Is Amazing

Gram-Schmidt: Making Any Basis Orthonormal

Projections: The Heart of Least Squares

Projecting onto a Line

Projecting onto a Subspace (Least Squares!)

QR Decomposition: The Practical Tool

Why QR Is Better Than Normal Equations

Application: Signal Denoising with Orthogonal Bases

The Discrete Cosine Transform (DCT)

Application: Image Compression

Practice Exercises

Summary