Singular Value Decomposition (SVD)

The Magic Behind “People Like You Also Bought…”

An Everyday Mystery

You buy running shoes on Amazon. Suddenly Amazon knows you might want:

Protein powder
A fitness tracker
Compression socks
A foam roller

How? You never searched for those things. You never told Amazon you’re a runner. Amazon discovered a hidden pattern: “Runner type” people buy these things together. Similarly:

Spotify knows your music taste after 10 songs
Netflix predicts you’ll rate a movie 4.2 stars
YouTube knows which videos you’ll watch next

All of these use SVD — a technique that finds hidden patterns in incomplete data.

The Real Power: SVD can predict things you haven’t done yet based on patterns from millions of other people.You’ve rated 50 movies? SVD predicts your ratings for the other 9,950 movies!

Before We Dive In: Why SVD Matters

Your Data	Hidden Patterns Found	Business Value
Purchase history	”Customer types”	Personalized recommendations
Song listening	”Music taste dimensions"	"Discover Weekly” playlist
Movie ratings	”Genre preferences"	"Because you watched…”
Job applications	”Candidate profiles”	Better matching
Photos	”Visual features”	Face recognition, search

SVD is behind billions of dollars of revenue at Amazon, Netflix, Spotify, and Google.

Estimated Time: 4-5 hours
Difficulty: Intermediate to Advanced
Prerequisites: Eigenvalues and PCA modules
Key Insight: Any table of data can be decomposed into hidden factors

A Non-Math Example: Restaurant Preferences

The Problem

Five friends rate 6 restaurants. Can we predict what Alice thinks of restaurants she hasn’t tried?

# Friends' restaurant ratings (1-5 stars)
# 0 = haven't tried
ratings = {
    #              Pizza  Sushi  Steak  Salad  Taco  Ramen
    "Alice":      [5,     4,     0,     2,     0,    0],
    "Bob":        [5,     0,     5,     1,     4,    0],
    "Carol":      [1,     5,     1,     5,     2,    5],
    "Dave":       [0,     4,     0,     5,     0,    5],
    "Eve":        [4,     0,     5,     0,     5,    0],
}

The Human Insight

Looking at this, you notice:

Alice, Bob, Eve like hearty food (pizza, steak, tacos)
Carol, Dave like lighter options (sushi, salad, ramen)

There seem to be 2 “types” of eaters:

“Hearty eater” factor
“Light eater” factor

SVD Discovers This Automatically!

import numpy as np

# Convert to matrix
ratings_matrix = np.array([
    [5, 4, 0, 2, 0, 0],
    [5, 0, 5, 1, 4, 0],
    [1, 5, 1, 5, 2, 5],
    [0, 4, 0, 5, 0, 5],
    [4, 0, 5, 0, 5, 0],
])

# Apply SVD
U, S, VT = np.linalg.svd(ratings_matrix, full_matrices=False)

print("Hidden factors found:", len(S))
print("Importance of each:", S.round(1))
# [12.2, 9.8, 2.1, 1.2, 0.3]
# First 2 factors dominate! (Hearty vs Light)

Predict Missing Ratings

# Keep only top 2 factors
k = 2
predicted = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

print("Original (0 = unknown):")
print(ratings_matrix)

print("\nPredicted:")
print(predicted.round(1))

Result: SVD predicts Alice would rate Steak 4.2 and Taco 3.8! (She’s a hearty eater like Bob and Eve.) This is EXACTLY how Netflix recommendations work!

The Math: Breaking Any Matrix Into Patterns

The Core Formula

Any matrix can be broken into 3 simpler matrices:

A = U \Sigma V^T

Where:

$U$ = left singular vectors (patterns in rows — users/people)
$\Sigma$ = singular values (how important each pattern is)
$V^T$ = right singular vectors (patterns in columns — items/restaurants)

import numpy as np

# Any matrix
A = np.array([
    [4, 0, 2],
    [0, 3, 0],
    [2, 0, 1]
])

# SVD decomposition
U, S, VT = np.linalg.svd(A)

print(f"U (row patterns): {U.shape}")    # (3, 3)
print(f"S (importance): {S.shape}")       # (3,)
print(f"VT (column patterns): {VT.shape}")  # (3, 3)

# Reconstruct perfectly
A_reconstructed = U @ np.diag(S) @ VT
print(np.allclose(A, A_reconstructed))  # True!

SVD vs Eigendecomposition

Aspect	Eigendecomposition	SVD
Works on	Square matrices only	Any matrix (m × n)
Formula	$A = V \Lambda V^{-1}$	$A = U \Sigma V^T$
Values	Eigenvalues (can be negative)	Singular values (always ≥ 0)
Requirement	Matrix must be diagonalizable	Always works!

The Key Relationship

For a matrix

A

Singular values of $A$ = square roots of eigenvalues of $A^T A$
$\sigma_i = \sqrt{\lambda_i(A^T A)}$

A = np.array([[4, 0], [3, -5]])

# SVD
U, S, VT = np.linalg.svd(A)
print("Singular values:", S)  # [6.32, 3.16]

# Eigenvalues of A^T A
eigenvalues = np.linalg.eigvals(A.T @ A)
print("√eigenvalues:", np.sqrt(sorted(eigenvalues, reverse=True)))  # [6.32, 3.16]

Low-Rank Approximation

The magic of SVD: Keep only the top-k singular values for a best approximation!

A_k = U_k \Sigma_k V_k^T = \sum_{i=1}^{k} \sigma_i u_i v_i^T

This is optimal in the Frobenius norm:

A_k

is the best rank-k approximation to

A

# Original 5×6 matrix with rank 5
# Keep only top 2 singular values → rank 2 approximation

U, S, VT = np.linalg.svd(A, full_matrices=False)

# Keep top k=2 components
k = 2
A_approx = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# How much did we lose?
error = np.linalg.norm(A - A_approx) / np.linalg.norm(A)
print(f"Reconstruction error: {error:.2%}")

Example 1: Movie Recommendation System (Netflix-Style)

The Setup

# User-Movie rating matrix (5 users × 6 movies)
# Rows = users, Columns = movies
# 0 = not rated (what we want to predict!)

import numpy as np

ratings = np.array([
    [5, 3, 0, 1, 0, 0],  # User 0: seems to like action (Movies 0,1)
    [4, 0, 0, 1, 0, 0],  # User 1: similar to User 0
    [1, 1, 0, 5, 4, 5],  # User 2: seems to like romance (Movies 3,4,5)
    [0, 0, 0, 4, 5, 4],  # User 3: similar to User 2
    [0, 1, 5, 4, 0, 0],  # User 4: mixed taste
])

print(f"Matrix shape: {ratings.shape}")  # (5, 6)
print(f"Unknown ratings: {(ratings == 0).sum()} out of {ratings.size}")
# 15 out of 30 ratings are unknown (50% missing!)

Challenge: Predict the 15 missing ratings!

Apply SVD

# Decompose the ratings matrix
U, S, VT = np.linalg.svd(ratings, full_matrices=False)

print("Singular values (pattern importance):")
print(S.round(2))
# [12.48, 9.51, 1.35, 0.89, 0.12]
# First 2 values are MUCH bigger — 2 main patterns!

Discover Hidden Factors

# What are the 2 main patterns?
print("\nMovie patterns (which movies go together):")
print("Pattern 1:", VT[0].round(2))  # [0.67, 0.38, 0.09, -0.23, -0.32, -0.26]
print("Pattern 2:", VT[1].round(2))  # [-0.08, -0.21, 0.15, 0.58, 0.52, 0.57]

# Pattern 1: High for Movies 0,1 (Action movies!)
# Pattern 2: High for Movies 3,4,5 (Romance movies!)

SVD automatically discovered: Movies 0-1 are “action” and Movies 3-5 are “romance” — without us labeling them!

→ Romance movies (Movies 3, 4, 5)

**Interpretation**:
- **Factor 1**: "Action preference"
- **Factor 2**: "Romance preference"

Users are combinations of these factors!

```python
print("\\nUser factors (U):")
print(U_k)

# User 0: [high action, low romance]
# User 2: [low action, high romance]
# User 4: [medium action, medium romance] - mixed taste!

Real Application: This is exactly how Netflix works! They use ~50 hidden factors instead of 2.

Example 2: House Price Patterns

The Problem

# House-Feature matrix (100 houses × 10 features)
# Discover hidden patterns in house data

np.random.seed(42)
n_houses = 100
n_features = 10

# Simulate house data with hidden patterns
houses = np.random.randn(n_houses, n_features)

# Add pattern 1: "Luxury" (beds, baths, sqft high together)
luxury_factor = np.random.randn(n_houses)
houses[:, 0] += luxury_factor * 0.8  # beds
houses[:, 1] += luxury_factor * 0.7  # baths
houses[:, 2] += luxury_factor * 0.9  # sqft

# Add pattern 2: "Location" (school, walk, crime related)
location_factor = np.random.randn(n_houses)
houses[:, 5] += location_factor * 0.6  # school
houses[:, 7] += location_factor * 0.5  # walkability
houses[:, 6] -= location_factor * 0.4  # crime (inverse)

Apply SVD

# SVD
U, S, VT = np.linalg.svd(houses, full_matrices=False)

print(f"Singular values: {S[:5]}")
# [12.3, 10.8, 3.2, 2.9, 2.1]
# Top 2 dominate!

# Keep top 2 patterns
k = 2
houses_compressed = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# Compression ratio
original_size = houses.size
compressed_size = U[:, :k].size + k + VT[:k, :].size
print(f"Compression: {compressed_size / original_size:.1%}")  # 24% of original!

Interpret Patterns

# What do the patterns represent?
features = ['beds', 'baths', 'sqft', 'age', 'dist', 'school', 'crime', 'walk', 'park', 'yard']

print("\\nPattern 1 (Luxury):")
for i, feature in enumerate(features):
    print(f"  {feature}: {VT[0, i]:.3f}")

# High: beds, baths, sqft
# → "Luxury/Size" pattern

print("\\nPattern 2 (Location):")
for i, feature in enumerate(features):
    print(f"  {feature}: {VT[1, i]:.3f}")

# High: school, walkability
# Low: crime
# → "Location Quality" pattern

Real Application: Zillow uses SVD to discover hidden house “types” (luxury urban, suburban family, etc.)

Example 3: Student Performance Prediction

The Problem

# Student-Subject matrix (200 students × 8 subjects)
# Predict grades in subjects students haven't taken yet

students = np.random.randn(200, 8)

# Add pattern 1: "STEM aptitude"
stem_factor = np.random.randn(200)
students[:, 0] += stem_factor * 0.9  # Math
students[:, 2] += stem_factor * 0.8  # Physics
students[:, 4] += stem_factor * 0.7  # CS

# Add pattern 2: "Humanities aptitude"
humanities_factor = np.random.randn(200)
students[:, 1] += humanities_factor * 0.8  # English
students[:, 3] += humanities_factor * 0.7  # History
students[:, 5] += humanities_factor * 0.6  # Art

Apply SVD

U, S, VT = np.linalg.svd(students, full_matrices=False)

print(f"Singular values: {S[:5]}")
# [18.5, 15.2, 4.3, 3.8, 2.9]

# Keep top 2 aptitudes
k = 2
students_approx = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

Predict Missing Grades

# Simulate missing grades (set some to 0)
students_incomplete = students.copy()
mask = np.random.rand(*students.shape) < 0.3  # 30% missing
students_incomplete[mask] = 0

# Predict using SVD
U, S, VT = np.linalg.svd(students_incomplete, full_matrices=False)
students_predicted = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# Compare predictions
print("\\nActual grade:", students[0, 3])
print("Predicted grade:", students_predicted[0, 3])
# Close match!

Real Application: Coursera uses SVD to recommend courses based on your performance in similar courses!

SVD vs PCA vs Eigenvalues

Comparison

Method	Input	Output	Use Case
Eigenvalues	Square matrix	Eigenvalues + eigenvectors	Feature importance
PCA	Data matrix	Principal components	Dimensionality reduction
SVD	Any matrix	3 matrices (U, Σ, V)	Recommendation, compression

When to Use Each

Use Eigenvalues when:

You have a square matrix (covariance, adjacency)
You want to find important directions
Example: PageRank, stability analysis

Use PCA when:

You want to reduce features
You want to visualize high-D data
Example: Compress 10 features → 3

Use SVD when:

You have a rectangular matrix (users × items)
You want to fill missing values
You want to discover hidden patterns
Example: Recommendations, collaborative filtering

SVD Applications

1. Image Compression

from PIL import Image

# Load image
img = Image.open('photo.jpg').convert('L')  # Grayscale
img_array = np.array(img)

# SVD
U, S, VT = np.linalg.svd(img_array, full_matrices=False)

# Compress: keep top k singular values
k = 50  # Instead of 512
img_compressed = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# Compression ratio
original_size = img_array.size
compressed_size = U[:, :k].size + k + VT[:k, :].size
print(f"Compression: {compressed_size / original_size:.1%}")  # 20%!

# Save compressed image
Image.fromarray(img_compressed.astype(np.uint8)).save('compressed.jpg')

2. Noise Reduction

# Noisy data
data_noisy = data_clean + np.random.randn(*data_clean.shape) * 0.5

# SVD
U, S, VT = np.linalg.svd(data_noisy, full_matrices=False)

# Keep only large singular values (signal)
# Drop small ones (noise)
k = 10
data_denoised = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

3. Latent Semantic Analysis (LSA)

# Document-Term matrix
# Rows = documents, Columns = words
# Discover hidden topics!

from sklearn.feature_extraction.text import TfidfVectorizer

documents = [
    "machine learning is great",
    "deep learning neural networks",
    "python programming language",
    # ... more documents
]

vectorizer = TfidfVectorizer()
doc_term_matrix = vectorizer.fit_transform(documents).toarray()

# SVD
U, S, VT = np.linalg.svd(doc_term_matrix, full_matrices=False)

# Top 3 topics
k = 3
topics = VT[:k, :]

# Interpret topics
terms = vectorizer.get_feature_names_out()
for i in range(k):
    top_terms_idx = np.argsort(topics[i])[-5:][::-1]
    print(f"Topic {i}: {[terms[idx] for idx in top_terms_idx]}")

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises demonstrate SVD applications powering billion-dollar companies.

Exercise 1: Build a Movie Recommendation Engine 🎬

Create a Netflix-style recommendation system:

import numpy as np

# User-Movie ratings matrix (0 = not rated)
# Users: Alice, Bob, Carol, Dave, Eve
# Movies: Action1, Action2, Romance1, Romance2, SciFi1, Comedy1
ratings = np.array([
    [5, 4, 1, 0, 5, 3],   # Alice: likes action, scifi
    [4, 5, 0, 1, 4, 2],   # Bob: similar to Alice
    [1, 0, 5, 5, 2, 4],   # Carol: likes romance, comedy
    [0, 1, 4, 5, 1, 5],   # Dave: similar to Carol
    [5, 5, 1, 1, 0, 0],   # Eve: action fan, hasn't seen scifi/comedy
])

# TODO:
# 1. Apply SVD with rank-2 approximation
# 2. Predict Eve's rating for SciFi1 and Comedy1
# 3. Which movie should we recommend to Eve?
# 4. Find users most similar to Alice

💡 Solution

import numpy as np

ratings = np.array([
    [5, 4, 1, 0, 5, 3],
    [4, 5, 0, 1, 4, 2],
    [1, 0, 5, 5, 2, 4],
    [0, 1, 4, 5, 1, 5],
    [5, 5, 1, 1, 0, 0],
])

users = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve']
movies = ['Action1', 'Action2', 'Romance1', 'Romance2', 'SciFi1', 'Comedy1']

print("🎬 Movie Recommendation Engine with SVD")
print("=" * 55)

# 1. Apply SVD
U, S, VT = np.linalg.svd(ratings, full_matrices=False)

# Rank-2 approximation (captures main "taste" dimensions)
k = 2
U_k = U[:, :k]
S_k = np.diag(S[:k])
VT_k = VT[:k, :]

# Predicted ratings
predicted = U_k @ S_k @ VT_k

print("\n📊 Original Ratings:")
print("       ", "  ".join([m[:6].ljust(6) for m in movies]))
for i, user in enumerate(users):
    row = " ".join([f"{r:5.0f}" if r > 0 else "    -" for r in ratings[i]])
    print(f"  {user:6s} {row}")

print("\n🔮 Predicted Ratings (SVD Rank-2):")
print("       ", "  ".join([m[:6].ljust(6) for m in movies]))
for i, user in enumerate(users):
    row = " ".join([f"{r:5.1f}" for r in predicted[i]])
    print(f"  {user:6s} {row}")

# 2. Eve's predictions for unrated movies
print("\n🎯 Predictions for Eve:")
eve_idx = 4
for j in [4, 5]:  # SciFi1 and Comedy1
    print(f"   {movies[j]}: {predicted[eve_idx, j]:.2f}")

# 3. Recommend to Eve
unrated_by_eve = [j for j in range(6) if ratings[eve_idx, j] == 0]
best_movie_idx = max(unrated_by_eve, key=lambda j: predicted[eve_idx, j])
print(f"\n🌟 Recommendation for Eve: {movies[best_movie_idx]}")
print(f"   (Predicted rating: {predicted[eve_idx, best_movie_idx]:.2f})")

# 4. Find users similar to Alice (using user vectors in latent space)
alice_vec = U_k[0]
print("\n👥 Users Most Similar to Alice:")
for i in range(1, len(users)):
    similarity = np.dot(alice_vec, U_k[i]) / (np.linalg.norm(alice_vec) * np.linalg.norm(U_k[i]))
    print(f"   {users[i]}: {similarity:.3f}")

Real-World Insight: Netflix’s recommendation system uses a more advanced version of this with 100+ latent factors. Their recommendation engine is worth ~$1B/year in reduced churn!

Exercise 2: Image Compression with SVD 📷

Compress an image using low-rank approximation:

import numpy as np

# Create a simple 16x16 "image" with structure
np.random.seed(42)
image = np.zeros((16, 16))

# Add some patterns
image[2:6, 2:6] = 0.8     # Square
image[2:6, 10:14] = 0.9   # Another square
image[10:14, 5:11] = 0.7  # Rectangle at bottom
image += np.random.randn(16, 16) * 0.1  # Add noise

# TODO:
# 1. Apply SVD
# 2. Reconstruct with different ranks (1, 3, 5, 10)
# 3. Calculate compression ratio and error for each
# 4. Find minimum rank for < 5% error

💡 Solution

import numpy as np

np.random.seed(42)
image = np.zeros((16, 16))
image[2:6, 2:6] = 0.8
image[2:6, 10:14] = 0.9
image[10:14, 5:11] = 0.7
image += np.random.randn(16, 16) * 0.1

print("📷 Image Compression with SVD")
print("=" * 55)
print(f"Original image: {image.shape}")

# Apply SVD
U, S, VT = np.linalg.svd(image, full_matrices=False)

print("\n📊 Singular Values:")
print(f"   {S[:5].round(2)} ...")
print(f"   Top 3 capture: {(S[:3]**2).sum() / (S**2).sum() * 100:.1f}% of variance")

# Compression analysis
print("\n📈 Compression Analysis:")
print("-" * 55)
print(f"{'Rank':<6} {'Storage':<12} {'Compression':<14} {'MSE':<10} {'Quality'}")
print("-" * 55)

original_size = 16 * 16  # 256 values

for k in [1, 3, 5, 10, 16]:
    # Reconstruct with rank k
    reconstructed = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]
    
    # Error
    mse = np.mean((image - reconstructed) ** 2)
    psnr = 10 * np.log10(1.0 / mse) if mse > 0 else 100
    
    # Storage: k values in U columns + k singular values + k rows of VT
    storage = k * 16 + k + k * 16  # 33k values
    compression = original_size / storage
    
    quality = "★" * min(5, int(psnr / 10))
    print(f"k={k:<4} {storage:<12} {compression:<14.2f}x {mse:<10.4f} {quality}")

# Find minimum rank for <5% relative error
target_error = np.mean(image ** 2) * 0.05
for k in range(1, 17):
    recon = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]
    mse = np.mean((image - recon) ** 2)
    if mse < target_error:
        print(f"\n🎯 Minimum rank for <5% error: k={k}")
        print(f"   Compression: {original_size / (33*k):.2f}x")
        break

# Visual comparison (ASCII art)
print("\n🖼️ Visual Comparison (scaled 0-9):")

def to_ascii(img):
    scaled = np.clip(img, 0, 1) * 9
    return "\n".join(["".join([str(int(v)) for v in row]) for row in scaled])

print("\nOriginal:")
print(to_ascii(image[::4, ::4]))  # Sample 4x4

print("\nRank-3 Approximation:")
recon_3 = U[:, :3] @ np.diag(S[:3]) @ VT[:3, :]
print(to_ascii(recon_3[::4, ::4]))

Real-World Insight: SVD-based compression is used in JPEG 2000 and video codecs. A 1080p video frame can be compressed 50x with SVD!

Exercise 3: Latent Semantic Analysis (LSA) for Search 🔍

Build a semantic search engine that understands meaning:

import numpy as np

# Document-term matrix (rows=docs, cols=words)
# Words: car, vehicle, auto, road, engine, cat, dog, pet, animal, kitten
doc_term = np.array([
    [3, 2, 1, 1, 2, 0, 0, 0, 0, 0],  # Doc about cars
    [2, 3, 2, 0, 1, 0, 0, 0, 0, 0],  # Doc about vehicles
    [0, 0, 0, 0, 0, 3, 2, 2, 1, 0],  # Doc about dogs
    [0, 0, 0, 0, 0, 2, 0, 1, 2, 3],  # Doc about cats
    [1, 1, 0, 2, 0, 0, 0, 0, 0, 0],  # Doc about roads
])

# Query: "automobile" (similar to car, vehicle, auto)
query = np.array([1, 1, 1, 0, 0, 0, 0, 0, 0, 0])

# TODO:
# 1. Apply SVD with k=2 latent topics
# 2. Project query into latent space
# 3. Find most relevant document
# 4. Identify the "topics" discovered

💡 Solution

import numpy as np

doc_term = np.array([
    [3, 2, 1, 1, 2, 0, 0, 0, 0, 0],
    [2, 3, 2, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 3, 2, 2, 1, 0],
    [0, 0, 0, 0, 0, 2, 0, 1, 2, 3],
    [1, 1, 0, 2, 0, 0, 0, 0, 0, 0],
])

words = ['car', 'vehicle', 'auto', 'road', 'engine', 'cat', 'dog', 'pet', 'animal', 'kitten']
docs = ['Cars Article', 'Vehicles Guide', 'Dogs 101', 'Cat Lovers', 'Road Trip']

query = np.array([1, 1, 1, 0, 0, 0, 0, 0, 0, 0])  # "automobile"

print("🔍 Semantic Search with LSA (SVD)")
print("=" * 55)

# 1. Apply SVD
U, S, VT = np.linalg.svd(doc_term, full_matrices=False)

# Keep k=2 latent topics
k = 2
U_k = U[:, :k]       # Documents in topic space
S_k = np.diag(S[:k])
VT_k = VT[:k, :]     # Words in topic space

print(f"Reduced to {k} latent topics")
print(f"Singular values: {S[:k].round(2)}")

# 4. Interpret topics (words)
print("\n📚 Discovered Topics:")
for i in range(k):
    topic_words = VT_k[i]
    top_word_idx = np.argsort(np.abs(topic_words))[-4:][::-1]
    top_words = [f"{words[j]}({topic_words[j]:.2f})" for j in top_word_idx]
    print(f"   Topic {i+1}: {', '.join(top_words)}")

# 2. Project query into latent space
query_latent = query @ VT_k.T @ np.linalg.inv(S_k)
print(f"\nQuery 'automobile' in latent space: {query_latent.round(3)}")

# 3. Find most relevant document (cosine similarity)
print("\n📄 Search Results:")
print("-" * 40)

similarities = []
for i in range(len(docs)):
    doc_latent = U_k[i]
    sim = np.dot(query_latent, doc_latent) / (np.linalg.norm(query_latent) * np.linalg.norm(doc_latent) + 1e-10)
    similarities.append((docs[i], sim))

similarities.sort(key=lambda x: x[1], reverse=True)
for rank, (doc, sim) in enumerate(similarities, 1):
    bar = "█" * int(max(0, sim) * 20)
    print(f"   {rank}. {doc:<15} {sim:.3f} {bar}")

# Demonstrate semantic matching
print("\n💡 Key Insight:")
print("   'automobile' matches 'car' & 'vehicle' docs")
print("   even though 'automobile' wasn't in documents!")
print("   SVD discovered the semantic relationship!")

Real-World Insight: This is exactly how early Google worked! Modern search engines (Google, Bing) use neural network embeddings (like BERT), but LSA was the breakthrough that made semantic search possible.

Exercise 4: Noise Reduction in Signals 📡

Use SVD to clean noisy sensor data:

import numpy as np

# Clean signal: 3 underlying components
np.random.seed(42)
t = np.linspace(0, 4*np.pi, 100)

# True signal components
signal1 = np.sin(t)
signal2 = 0.5 * np.cos(2*t)  
signal3 = 0.3 * np.sin(3*t)

# Observed at 5 sensors with different mixings
mixing = np.array([
    [1.0, 0.5, 0.3],
    [0.8, 0.7, 0.2],
    [0.6, 0.4, 0.5],
    [0.9, 0.3, 0.4],
    [0.7, 0.6, 0.3],
])

true_signals = np.vstack([signal1, signal2, signal3])
clean_data = mixing @ true_signals

# Add noise
noise = np.random.randn(5, 100) * 0.3
noisy_data = clean_data + noise

# TODO:
# 1. Apply SVD to noisy data
# 2. Keep only top 3 components (the true rank)
# 3. Reconstruct denoised signal
# 4. Calculate noise reduction (SNR improvement)

💡 Solution

import numpy as np

np.random.seed(42)
t = np.linspace(0, 4*np.pi, 100)

signal1 = np.sin(t)
signal2 = 0.5 * np.cos(2*t)
signal3 = 0.3 * np.sin(3*t)

mixing = np.array([
    [1.0, 0.5, 0.3],
    [0.8, 0.7, 0.2],
    [0.6, 0.4, 0.5],
    [0.9, 0.3, 0.4],
    [0.7, 0.6, 0.3],
])

true_signals = np.vstack([signal1, signal2, signal3])
clean_data = mixing @ true_signals
noise = np.random.randn(5, 100) * 0.3
noisy_data = clean_data + noise

print("📡 Signal Denoising with SVD")
print("=" * 55)

# 1. Apply SVD
U, S, VT = np.linalg.svd(noisy_data, full_matrices=False)

print("Singular Values:")
for i, s in enumerate(S):
    bar = "█" * int(s * 2)
    print(f"   σ{i+1} = {s:6.2f} {bar}")

# 2 & 3. Reconstruct with top 3 components
k = 3
denoised = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# 4. Calculate SNR improvement
def snr(signal, noise):
    signal_power = np.mean(signal ** 2)
    noise_power = np.mean(noise ** 2)
    return 10 * np.log10(signal_power / noise_power)

noise_before = noisy_data - clean_data
noise_after = denoised - clean_data

snr_before = snr(clean_data, noise_before)
snr_after = snr(clean_data, noise_after)

print(f"\n📊 Noise Reduction Results:")
print("-" * 40)
print(f"   Original SNR:  {snr_before:.2f} dB")
print(f"   Denoised SNR:  {snr_after:.2f} dB")
print(f"   Improvement:   {snr_after - snr_before:.2f} dB")
print(f"   Noise reduced: {(1 - 10**((snr_before-snr_after)/10)) * 100:.1f}%")

# Error analysis
mse_noisy = np.mean((noisy_data - clean_data) ** 2)
mse_denoised = np.mean((denoised - clean_data) ** 2)

print(f"\n📉 Error Analysis:")
print(f"   MSE (noisy):    {mse_noisy:.4f}")
print(f"   MSE (denoised): {mse_denoised:.4f}")
print(f"   Error reduced:  {(1 - mse_denoised/mse_noisy) * 100:.1f}%")

# Visual comparison (one sensor)
print("\n📈 Sample Signal (Sensor 1, first 20 points):")
print("   Clean:    " + " ".join([f"{v:5.2f}" for v in clean_data[0, :10]]))
print("   Noisy:    " + " ".join([f"{v:5.2f}" for v in noisy_data[0, :10]]))
print("   Denoised: " + " ".join([f"{v:5.2f}" for v in denoised[0, :10]]))

Real-World Insight: SVD denoising is used in MRI imaging (saving lives by improving image clarity), audio processing (Dolby noise reduction), and financial data analysis (separating signal from market noise).

Key Takeaways

SVD Core Concepts:

✅ Universal Decomposition - Any matrix A = UΣVᵀ (works for non-square!)
✅ Low-Rank Approximation - Keep top k singular values for compression
✅ Collaborative Filtering - Predict missing ratings via latent factors
✅ Latent Factors - Discover hidden patterns (movie genres, user preferences)
✅ Noise Reduction - Large singular values = signal, small = noise

Interview Prep: SVD Questions

Common SVD Interview Questions

Q: What’s the difference between PCA and SVD?

PCA uses eigendecomposition of the covariance matrix (square, symmetric). SVD works directly on any matrix (even non-square). For centered data, SVD of X gives the same principal components as PCA. SVD is more numerically stable.

Q: How does Netflix use SVD for recommendations?

User-movie rating matrix is decomposed into user factors and movie factors. Each user/movie is represented by a latent vector capturing hidden preferences (action lover, comedy hater). Predictions = dot product of user and movie vectors.

Q: How do you choose the rank k for truncated SVD?

Methods: (1) Energy/variance retained (e.g., 95%), (2) Cross-validation for prediction tasks, (3) Visualization of singular value decay, (4) Domain knowledge about expected rank.

Q: What are singular values vs eigenvalues?

Singular values are always non-negative real numbers and exist for any matrix. Eigenvalues can be complex and only exist for square matrices. For symmetric A: singular values = |eigenvalues|.

Common Pitfalls

SVD Mistakes to Avoid:

Confusing U and V - Rows of V are right singular vectors (features); columns of U are left (samples)
Wrong Truncation - Keeping too few components loses signal; too many keeps noise
Forgetting Scale - SVD is sensitive to feature scales; consider normalizing
Cold Start Problem - SVD can’t recommend for new users/items with no data
Memory Issues - Full SVD on large matrices is expensive; use randomized SVD

Course Complete!

🎉 Congratulations! You’ve mastered Linear Algebra for Machine Learning!

Course Complete!

You’ve completed Linear Algebra for Machine Learning!You now have the mathematical foundation to understand neural networks, dimensionality reduction, recommendation systems, and much more. These concepts appear everywhere in ML—from word embeddings to attention mechanisms to image processing.

Your Linear Algebra Toolkit:

✅ Vectors - Data representation, similarity, embeddings
✅ Matrices - Transformations, neural network layers, data batches
✅ Eigenvalues - Feature importance, stability analysis
✅ PCA - Dimensionality reduction, visualization, noise filtering
✅ SVD - Recommendations, compression, matrix approximation

Practice More

Apply your skills on real datasets with Kaggle competitions

Continue Learning

Continue to our AI Engineering course for production ML systems

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Singular Value Decomposition (SVD)

​The Magic Behind “People Like You Also Bought…”

​An Everyday Mystery

​Before We Dive In: Why SVD Matters

​A Non-Math Example: Restaurant Preferences

​The Problem

​The Human Insight

​SVD Discovers This Automatically!

​Predict Missing Ratings

​The Math: Breaking Any Matrix Into Patterns

​The Core Formula

​SVD vs Eigendecomposition

​The Key Relationship

​Low-Rank Approximation

​Example 1: Movie Recommendation System (Netflix-Style)

​The Setup

​Apply SVD

​Discover Hidden Factors

​→ Romance movies (Movies 3, 4, 5)

​Example 2: House Price Patterns

​The Problem

​Apply SVD

​Interpret Patterns

​Example 3: Student Performance Prediction

​The Problem

​Apply SVD

​Predict Missing Grades

​SVD vs PCA vs Eigenvalues

​Comparison

​When to Use Each

​SVD Applications

​1. Image Compression

​2. Noise Reduction

​3. Latent Semantic Analysis (LSA)

​🎯 Practice Exercises & Real-World Applications

​Exercise 1: Build a Movie Recommendation Engine 🎬

​Exercise 2: Image Compression with SVD 📷

​Exercise 3: Latent Semantic Analysis (LSA) for Search 🔍

Singular Value Decomposition (SVD)

The Magic Behind “People Like You Also Bought…”

An Everyday Mystery

Before We Dive In: Why SVD Matters

A Non-Math Example: Restaurant Preferences

The Problem

The Human Insight

SVD Discovers This Automatically!

Predict Missing Ratings

The Math: Breaking Any Matrix Into Patterns

The Core Formula

SVD vs Eigendecomposition

The Key Relationship

Low-Rank Approximation

Example 1: Movie Recommendation System (Netflix-Style)

The Setup

Apply SVD

Discover Hidden Factors

→ Romance movies (Movies 3, 4, 5)

Example 2: House Price Patterns

The Problem

Apply SVD

Interpret Patterns

Example 3: Student Performance Prediction

The Problem

Apply SVD

Predict Missing Grades

SVD vs PCA vs Eigenvalues

Comparison

When to Use Each

SVD Applications

1. Image Compression

2. Noise Reduction

3. Latent Semantic Analysis (LSA)

🎯 Practice Exercises & Real-World Applications

Exercise 1: Build a Movie Recommendation Engine 🎬

Exercise 2: Image Compression with SVD 📷

Exercise 3: Latent Semantic Analysis (LSA) for Search 🔍

Exercise 4: Noise Reduction in Signals 📡

Key Takeaways

Interview Prep: SVD Questions

Common Pitfalls

Course Complete!