Skip to main content
Singular Value Decomposition

Singular Value Decomposition (SVD)

The Magic Behind “People Like You Also Bought…”

An Everyday Mystery

You buy running shoes on Amazon. Suddenly Amazon knows you might want:
  • Protein powder
  • A fitness tracker
  • Compression socks
  • A foam roller
How? You never searched for those things. You never told Amazon you’re a runner. Amazon discovered a hidden pattern: “Runner type” people buy these things together. Similarly:
  • Spotify knows your music taste after 10 songs
  • Netflix predicts you’ll rate a movie 4.2 stars
  • YouTube knows which videos you’ll watch next
All of these use SVD — a technique that finds hidden patterns in incomplete data.
The Real Power: SVD can predict things you haven’t done yet based on patterns from millions of other people.You’ve rated 50 movies? SVD predicts your ratings for the other 9,950 movies!

Before We Dive In: Why SVD Matters

Your DataHidden Patterns FoundBusiness Value
Purchase history”Customer types”Personalized recommendations
Song listening”Music taste dimensions""Discover Weekly” playlist
Movie ratings”Genre preferences""Because you watched…”
Job applications”Candidate profiles”Better matching
Photos”Visual features”Face recognition, search
SVD is behind billions of dollars of revenue at Amazon, Netflix, Spotify, and Google.
Estimated Time: 4-5 hours
Difficulty: Intermediate to Advanced
Prerequisites: Eigenvalues and PCA modules
Key Insight: Any table of data can be decomposed into hidden factors

A Non-Math Example: Restaurant Preferences

The Problem

Five friends rate 6 restaurants. Can we predict what Alice thinks of restaurants she hasn’t tried?
# Friends' restaurant ratings (1-5 stars)
# 0 = haven't tried
ratings = {
    #              Pizza  Sushi  Steak  Salad  Taco  Ramen
    "Alice":      [5,     4,     0,     2,     0,    0],
    "Bob":        [5,     0,     5,     1,     4,    0],
    "Carol":      [1,     5,     1,     5,     2,    5],
    "Dave":       [0,     4,     0,     5,     0,    5],
    "Eve":        [4,     0,     5,     0,     5,    0],
}

The Human Insight

Looking at this, you notice:
  • Alice, Bob, Eve like hearty food (pizza, steak, tacos)
  • Carol, Dave like lighter options (sushi, salad, ramen)
There seem to be 2 “types” of eaters:
  1. “Hearty eater” factor
  2. “Light eater” factor

SVD Discovers This Automatically!

import numpy as np

# Convert to matrix
ratings_matrix = np.array([
    [5, 4, 0, 2, 0, 0],
    [5, 0, 5, 1, 4, 0],
    [1, 5, 1, 5, 2, 5],
    [0, 4, 0, 5, 0, 5],
    [4, 0, 5, 0, 5, 0],
])

# Apply SVD
U, S, VT = np.linalg.svd(ratings_matrix, full_matrices=False)

print("Hidden factors found:", len(S))
print("Importance of each:", S.round(1))
# [12.2, 9.8, 2.1, 1.2, 0.3]
# First 2 factors dominate! (Hearty vs Light)

Predict Missing Ratings

# Keep only top 2 factors
k = 2
predicted = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

print("Original (0 = unknown):")
print(ratings_matrix)

print("\nPredicted:")
print(predicted.round(1))
Result: SVD predicts Alice would rate Steak 4.2 and Taco 3.8! (She’s a hearty eater like Bob and Eve.) This is EXACTLY how Netflix recommendations work! SVD Netflix Recommendations

The Math: Breaking Any Matrix Into Patterns

The Core Formula

Any matrix can be broken into 3 simpler matrices: A=UΣVTA = U \Sigma V^T Where:
  • UU = left singular vectors (patterns in rows — users/people)
  • Σ\Sigma = singular values (how important each pattern is)
  • VTV^T = right singular vectors (patterns in columns — items/restaurants)
import numpy as np

# Any matrix
A = np.array([
    [4, 0, 2],
    [0, 3, 0],
    [2, 0, 1]
])

# SVD decomposition
U, S, VT = np.linalg.svd(A)

print(f"U (row patterns): {U.shape}")    # (3, 3)
print(f"S (importance): {S.shape}")       # (3,)
print(f"VT (column patterns): {VT.shape}")  # (3, 3)

# Reconstruct perfectly
A_reconstructed = U @ np.diag(S) @ VT
print(np.allclose(A, A_reconstructed))  # True!
SVD Math Concept

SVD vs Eigendecomposition

AspectEigendecompositionSVD
Works onSquare matrices onlyAny matrix (m × n)
FormulaA=VΛV1A = V \Lambda V^{-1}A=UΣVTA = U \Sigma V^T
ValuesEigenvalues (can be negative)Singular values (always ≥ 0)
RequirementMatrix must be diagonalizableAlways works!

The Key Relationship

For a matrix AA:
  • Singular values of AA = square roots of eigenvalues of ATAA^T A
  • σi=λi(ATA)\sigma_i = \sqrt{\lambda_i(A^T A)}
A = np.array([[4, 0], [3, -5]])

# SVD
U, S, VT = np.linalg.svd(A)
print("Singular values:", S)  # [6.32, 3.16]

# Eigenvalues of A^T A
eigenvalues = np.linalg.eigvals(A.T @ A)
print("√eigenvalues:", np.sqrt(sorted(eigenvalues, reverse=True)))  # [6.32, 3.16]

Low-Rank Approximation

The magic of SVD: Keep only the top-k singular values for a best approximation! Ak=UkΣkVkT=i=1kσiuiviTA_k = U_k \Sigma_k V_k^T = \sum_{i=1}^{k} \sigma_i u_i v_i^T This is optimal in the Frobenius norm: AkA_k is the best rank-k approximation to AA.
# Original 5×6 matrix with rank 5
# Keep only top 2 singular values → rank 2 approximation

U, S, VT = np.linalg.svd(A, full_matrices=False)

# Keep top k=2 components
k = 2
A_approx = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# How much did we lose?
error = np.linalg.norm(A - A_approx) / np.linalg.norm(A)
print(f"Reconstruction error: {error:.2%}")

Example 1: Movie Recommendation System (Netflix-Style)

The Setup

# User-Movie rating matrix (5 users × 6 movies)
# Rows = users, Columns = movies
# 0 = not rated (what we want to predict!)

import numpy as np

ratings = np.array([
    [5, 3, 0, 1, 0, 0],  # User 0: seems to like action (Movies 0,1)
    [4, 0, 0, 1, 0, 0],  # User 1: similar to User 0
    [1, 1, 0, 5, 4, 5],  # User 2: seems to like romance (Movies 3,4,5)
    [0, 0, 0, 4, 5, 4],  # User 3: similar to User 2
    [0, 1, 5, 4, 0, 0],  # User 4: mixed taste
])

print(f"Matrix shape: {ratings.shape}")  # (5, 6)
print(f"Unknown ratings: {(ratings == 0).sum()} out of {ratings.size}")
# 15 out of 30 ratings are unknown (50% missing!)
Challenge: Predict the 15 missing ratings!

Apply SVD

# Decompose the ratings matrix
U, S, VT = np.linalg.svd(ratings, full_matrices=False)

print("Singular values (pattern importance):")
print(S.round(2))
# [12.48, 9.51, 1.35, 0.89, 0.12]
# First 2 values are MUCH bigger — 2 main patterns!

Discover Hidden Factors

# What are the 2 main patterns?
print("\nMovie patterns (which movies go together):")
print("Pattern 1:", VT[0].round(2))  # [0.67, 0.38, 0.09, -0.23, -0.32, -0.26]
print("Pattern 2:", VT[1].round(2))  # [-0.08, -0.21, 0.15, 0.58, 0.52, 0.57]

# Pattern 1: High for Movies 0,1 (Action movies!)
# Pattern 2: High for Movies 3,4,5 (Romance movies!)
SVD automatically discovered: Movies 0-1 are “action” and Movies 3-5 are “romance” — without us labeling them!

→ Romance movies (Movies 3, 4, 5)


**Interpretation**:
- **Factor 1**: "Action preference"
- **Factor 2**: "Romance preference"

Users are combinations of these factors!

```python
print("\\nUser factors (U):")
print(U_k)

# User 0: [high action, low romance]
# User 2: [low action, high romance]
# User 4: [medium action, medium romance] - mixed taste!
Real Application: This is exactly how Netflix works! They use ~50 hidden factors instead of 2.

Example 2: House Price Patterns

The Problem

# House-Feature matrix (100 houses × 10 features)
# Discover hidden patterns in house data

np.random.seed(42)
n_houses = 100
n_features = 10

# Simulate house data with hidden patterns
houses = np.random.randn(n_houses, n_features)

# Add pattern 1: "Luxury" (beds, baths, sqft high together)
luxury_factor = np.random.randn(n_houses)
houses[:, 0] += luxury_factor * 0.8  # beds
houses[:, 1] += luxury_factor * 0.7  # baths
houses[:, 2] += luxury_factor * 0.9  # sqft

# Add pattern 2: "Location" (school, walk, crime related)
location_factor = np.random.randn(n_houses)
houses[:, 5] += location_factor * 0.6  # school
houses[:, 7] += location_factor * 0.5  # walkability
houses[:, 6] -= location_factor * 0.4  # crime (inverse)

Apply SVD

# SVD
U, S, VT = np.linalg.svd(houses, full_matrices=False)

print(f"Singular values: {S[:5]}")
# [12.3, 10.8, 3.2, 2.9, 2.1]
# Top 2 dominate!

# Keep top 2 patterns
k = 2
houses_compressed = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# Compression ratio
original_size = houses.size
compressed_size = U[:, :k].size + k + VT[:k, :].size
print(f"Compression: {compressed_size / original_size:.1%}")  # 24% of original!

Interpret Patterns

# What do the patterns represent?
features = ['beds', 'baths', 'sqft', 'age', 'dist', 'school', 'crime', 'walk', 'park', 'yard']

print("\\nPattern 1 (Luxury):")
for i, feature in enumerate(features):
    print(f"  {feature}: {VT[0, i]:.3f}")

# High: beds, baths, sqft
# → "Luxury/Size" pattern

print("\\nPattern 2 (Location):")
for i, feature in enumerate(features):
    print(f"  {feature}: {VT[1, i]:.3f}")

# High: school, walkability
# Low: crime
# → "Location Quality" pattern
Real Application: Zillow uses SVD to discover hidden house “types” (luxury urban, suburban family, etc.)

Example 3: Student Performance Prediction

The Problem

# Student-Subject matrix (200 students × 8 subjects)
# Predict grades in subjects students haven't taken yet

students = np.random.randn(200, 8)

# Add pattern 1: "STEM aptitude"
stem_factor = np.random.randn(200)
students[:, 0] += stem_factor * 0.9  # Math
students[:, 2] += stem_factor * 0.8  # Physics
students[:, 4] += stem_factor * 0.7  # CS

# Add pattern 2: "Humanities aptitude"
humanities_factor = np.random.randn(200)
students[:, 1] += humanities_factor * 0.8  # English
students[:, 3] += humanities_factor * 0.7  # History
students[:, 5] += humanities_factor * 0.6  # Art

Apply SVD

U, S, VT = np.linalg.svd(students, full_matrices=False)

print(f"Singular values: {S[:5]}")
# [18.5, 15.2, 4.3, 3.8, 2.9]

# Keep top 2 aptitudes
k = 2
students_approx = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

Predict Missing Grades

# Simulate missing grades (set some to 0)
students_incomplete = students.copy()
mask = np.random.rand(*students.shape) < 0.3  # 30% missing
students_incomplete[mask] = 0

# Predict using SVD
U, S, VT = np.linalg.svd(students_incomplete, full_matrices=False)
students_predicted = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# Compare predictions
print("\\nActual grade:", students[0, 3])
print("Predicted grade:", students_predicted[0, 3])
# Close match!
Real Application: Coursera uses SVD to recommend courses based on your performance in similar courses!

SVD vs PCA vs Eigenvalues

Comparison

MethodInputOutputUse Case
EigenvaluesSquare matrixEigenvalues + eigenvectorsFeature importance
PCAData matrixPrincipal componentsDimensionality reduction
SVDAny matrix3 matrices (U, Σ, V)Recommendation, compression

When to Use Each

Use Eigenvalues when:
  • You have a square matrix (covariance, adjacency)
  • You want to find important directions
  • Example: PageRank, stability analysis
Use PCA when:
  • You want to reduce features
  • You want to visualize high-D data
  • Example: Compress 10 features → 3
Use SVD when:
  • You have a rectangular matrix (users × items)
  • You want to fill missing values
  • You want to discover hidden patterns
  • Example: Recommendations, collaborative filtering

SVD Applications

1. Image Compression

from PIL import Image

# Load image
img = Image.open('photo.jpg').convert('L')  # Grayscale
img_array = np.array(img)

# SVD
U, S, VT = np.linalg.svd(img_array, full_matrices=False)

# Compress: keep top k singular values
k = 50  # Instead of 512
img_compressed = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# Compression ratio
original_size = img_array.size
compressed_size = U[:, :k].size + k + VT[:k, :].size
print(f"Compression: {compressed_size / original_size:.1%}")  # 20%!

# Save compressed image
Image.fromarray(img_compressed.astype(np.uint8)).save('compressed.jpg')

2. Noise Reduction

# Noisy data
data_noisy = data_clean + np.random.randn(*data_clean.shape) * 0.5

# SVD
U, S, VT = np.linalg.svd(data_noisy, full_matrices=False)

# Keep only large singular values (signal)
# Drop small ones (noise)
k = 10
data_denoised = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

3. Latent Semantic Analysis (LSA)

# Document-Term matrix
# Rows = documents, Columns = words
# Discover hidden topics!

from sklearn.feature_extraction.text import TfidfVectorizer

documents = [
    "machine learning is great",
    "deep learning neural networks",
    "python programming language",
    # ... more documents
]

vectorizer = TfidfVectorizer()
doc_term_matrix = vectorizer.fit_transform(documents).toarray()

# SVD
U, S, VT = np.linalg.svd(doc_term_matrix, full_matrices=False)

# Top 3 topics
k = 3
topics = VT[:k, :]

# Interpret topics
terms = vectorizer.get_feature_names_out()
for i in range(k):
    top_terms_idx = np.argsort(topics[i])[-5:][::-1]
    print(f"Topic {i}: {[terms[idx] for idx in top_terms_idx]}")

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises demonstrate SVD applications powering billion-dollar companies.

Exercise 1: Build a Movie Recommendation Engine 🎬

Create a Netflix-style recommendation system:
import numpy as np

# User-Movie ratings matrix (0 = not rated)
# Users: Alice, Bob, Carol, Dave, Eve
# Movies: Action1, Action2, Romance1, Romance2, SciFi1, Comedy1
ratings = np.array([
    [5, 4, 1, 0, 5, 3],   # Alice: likes action, scifi
    [4, 5, 0, 1, 4, 2],   # Bob: similar to Alice
    [1, 0, 5, 5, 2, 4],   # Carol: likes romance, comedy
    [0, 1, 4, 5, 1, 5],   # Dave: similar to Carol
    [5, 5, 1, 1, 0, 0],   # Eve: action fan, hasn't seen scifi/comedy
])

# TODO:
# 1. Apply SVD with rank-2 approximation
# 2. Predict Eve's rating for SciFi1 and Comedy1
# 3. Which movie should we recommend to Eve?
# 4. Find users most similar to Alice
import numpy as np

ratings = np.array([
    [5, 4, 1, 0, 5, 3],
    [4, 5, 0, 1, 4, 2],
    [1, 0, 5, 5, 2, 4],
    [0, 1, 4, 5, 1, 5],
    [5, 5, 1, 1, 0, 0],
])

users = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve']
movies = ['Action1', 'Action2', 'Romance1', 'Romance2', 'SciFi1', 'Comedy1']

print("🎬 Movie Recommendation Engine with SVD")
print("=" * 55)

# 1. Apply SVD
U, S, VT = np.linalg.svd(ratings, full_matrices=False)

# Rank-2 approximation (captures main "taste" dimensions)
k = 2
U_k = U[:, :k]
S_k = np.diag(S[:k])
VT_k = VT[:k, :]

# Predicted ratings
predicted = U_k @ S_k @ VT_k

print("\n📊 Original Ratings:")
print("       ", "  ".join([m[:6].ljust(6) for m in movies]))
for i, user in enumerate(users):
    row = " ".join([f"{r:5.0f}" if r > 0 else "    -" for r in ratings[i]])
    print(f"  {user:6s} {row}")

print("\n🔮 Predicted Ratings (SVD Rank-2):")
print("       ", "  ".join([m[:6].ljust(6) for m in movies]))
for i, user in enumerate(users):
    row = " ".join([f"{r:5.1f}" for r in predicted[i]])
    print(f"  {user:6s} {row}")

# 2. Eve's predictions for unrated movies
print("\n🎯 Predictions for Eve:")
eve_idx = 4
for j in [4, 5]:  # SciFi1 and Comedy1
    print(f"   {movies[j]}: {predicted[eve_idx, j]:.2f}")

# 3. Recommend to Eve
unrated_by_eve = [j for j in range(6) if ratings[eve_idx, j] == 0]
best_movie_idx = max(unrated_by_eve, key=lambda j: predicted[eve_idx, j])
print(f"\n🌟 Recommendation for Eve: {movies[best_movie_idx]}")
print(f"   (Predicted rating: {predicted[eve_idx, best_movie_idx]:.2f})")

# 4. Find users similar to Alice (using user vectors in latent space)
alice_vec = U_k[0]
print("\n👥 Users Most Similar to Alice:")
for i in range(1, len(users)):
    similarity = np.dot(alice_vec, U_k[i]) / (np.linalg.norm(alice_vec) * np.linalg.norm(U_k[i]))
    print(f"   {users[i]}: {similarity:.3f}")
Real-World Insight: Netflix’s recommendation system uses a more advanced version of this with 100+ latent factors. Their recommendation engine is worth ~$1B/year in reduced churn!

Exercise 2: Image Compression with SVD 📷

Compress an image using low-rank approximation:
import numpy as np

# Create a simple 16x16 "image" with structure
np.random.seed(42)
image = np.zeros((16, 16))

# Add some patterns
image[2:6, 2:6] = 0.8     # Square
image[2:6, 10:14] = 0.9   # Another square
image[10:14, 5:11] = 0.7  # Rectangle at bottom
image += np.random.randn(16, 16) * 0.1  # Add noise

# TODO:
# 1. Apply SVD
# 2. Reconstruct with different ranks (1, 3, 5, 10)
# 3. Calculate compression ratio and error for each
# 4. Find minimum rank for < 5% error
import numpy as np

np.random.seed(42)
image = np.zeros((16, 16))
image[2:6, 2:6] = 0.8
image[2:6, 10:14] = 0.9
image[10:14, 5:11] = 0.7
image += np.random.randn(16, 16) * 0.1

print("📷 Image Compression with SVD")
print("=" * 55)
print(f"Original image: {image.shape}")

# Apply SVD
U, S, VT = np.linalg.svd(image, full_matrices=False)

print("\n📊 Singular Values:")
print(f"   {S[:5].round(2)} ...")
print(f"   Top 3 capture: {(S[:3]**2).sum() / (S**2).sum() * 100:.1f}% of variance")

# Compression analysis
print("\n📈 Compression Analysis:")
print("-" * 55)
print(f"{'Rank':<6} {'Storage':<12} {'Compression':<14} {'MSE':<10} {'Quality'}")
print("-" * 55)

original_size = 16 * 16  # 256 values

for k in [1, 3, 5, 10, 16]:
    # Reconstruct with rank k
    reconstructed = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]
    
    # Error
    mse = np.mean((image - reconstructed) ** 2)
    psnr = 10 * np.log10(1.0 / mse) if mse > 0 else 100
    
    # Storage: k values in U columns + k singular values + k rows of VT
    storage = k * 16 + k + k * 16  # 33k values
    compression = original_size / storage
    
    quality = "★" * min(5, int(psnr / 10))
    print(f"k={k:<4} {storage:<12} {compression:<14.2f}x {mse:<10.4f} {quality}")

# Find minimum rank for <5% relative error
target_error = np.mean(image ** 2) * 0.05
for k in range(1, 17):
    recon = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]
    mse = np.mean((image - recon) ** 2)
    if mse < target_error:
        print(f"\n🎯 Minimum rank for <5% error: k={k}")
        print(f"   Compression: {original_size / (33*k):.2f}x")
        break

# Visual comparison (ASCII art)
print("\n🖼️ Visual Comparison (scaled 0-9):")

def to_ascii(img):
    scaled = np.clip(img, 0, 1) * 9
    return "\n".join(["".join([str(int(v)) for v in row]) for row in scaled])

print("\nOriginal:")
print(to_ascii(image[::4, ::4]))  # Sample 4x4

print("\nRank-3 Approximation:")
recon_3 = U[:, :3] @ np.diag(S[:3]) @ VT[:3, :]
print(to_ascii(recon_3[::4, ::4]))
Real-World Insight: SVD-based compression is used in JPEG 2000 and video codecs. A 1080p video frame can be compressed 50x with SVD!

Exercise 3: Latent Semantic Analysis (LSA) for Search 🔍

Build a semantic search engine that understands meaning:
import numpy as np

# Document-term matrix (rows=docs, cols=words)
# Words: car, vehicle, auto, road, engine, cat, dog, pet, animal, kitten
doc_term = np.array([
    [3, 2, 1, 1, 2, 0, 0, 0, 0, 0],  # Doc about cars
    [2, 3, 2, 0, 1, 0, 0, 0, 0, 0],  # Doc about vehicles
    [0, 0, 0, 0, 0, 3, 2, 2, 1, 0],  # Doc about dogs
    [0, 0, 0, 0, 0, 2, 0, 1, 2, 3],  # Doc about cats
    [1, 1, 0, 2, 0, 0, 0, 0, 0, 0],  # Doc about roads
])

# Query: "automobile" (similar to car, vehicle, auto)
query = np.array([1, 1, 1, 0, 0, 0, 0, 0, 0, 0])

# TODO:
# 1. Apply SVD with k=2 latent topics
# 2. Project query into latent space
# 3. Find most relevant document
# 4. Identify the "topics" discovered
import numpy as np

doc_term = np.array([
    [3, 2, 1, 1, 2, 0, 0, 0, 0, 0],
    [2, 3, 2, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 3, 2, 2, 1, 0],
    [0, 0, 0, 0, 0, 2, 0, 1, 2, 3],
    [1, 1, 0, 2, 0, 0, 0, 0, 0, 0],
])

words = ['car', 'vehicle', 'auto', 'road', 'engine', 'cat', 'dog', 'pet', 'animal', 'kitten']
docs = ['Cars Article', 'Vehicles Guide', 'Dogs 101', 'Cat Lovers', 'Road Trip']

query = np.array([1, 1, 1, 0, 0, 0, 0, 0, 0, 0])  # "automobile"

print("🔍 Semantic Search with LSA (SVD)")
print("=" * 55)

# 1. Apply SVD
U, S, VT = np.linalg.svd(doc_term, full_matrices=False)

# Keep k=2 latent topics
k = 2
U_k = U[:, :k]       # Documents in topic space
S_k = np.diag(S[:k])
VT_k = VT[:k, :]     # Words in topic space

print(f"Reduced to {k} latent topics")
print(f"Singular values: {S[:k].round(2)}")

# 4. Interpret topics (words)
print("\n📚 Discovered Topics:")
for i in range(k):
    topic_words = VT_k[i]
    top_word_idx = np.argsort(np.abs(topic_words))[-4:][::-1]
    top_words = [f"{words[j]}({topic_words[j]:.2f})" for j in top_word_idx]
    print(f"   Topic {i+1}: {', '.join(top_words)}")

# 2. Project query into latent space
query_latent = query @ VT_k.T @ np.linalg.inv(S_k)
print(f"\nQuery 'automobile' in latent space: {query_latent.round(3)}")

# 3. Find most relevant document (cosine similarity)
print("\n📄 Search Results:")
print("-" * 40)

similarities = []
for i in range(len(docs)):
    doc_latent = U_k[i]
    sim = np.dot(query_latent, doc_latent) / (np.linalg.norm(query_latent) * np.linalg.norm(doc_latent) + 1e-10)
    similarities.append((docs[i], sim))

similarities.sort(key=lambda x: x[1], reverse=True)
for rank, (doc, sim) in enumerate(similarities, 1):
    bar = "█" * int(max(0, sim) * 20)
    print(f"   {rank}. {doc:<15} {sim:.3f} {bar}")

# Demonstrate semantic matching
print("\n💡 Key Insight:")
print("   'automobile' matches 'car' & 'vehicle' docs")
print("   even though 'automobile' wasn't in documents!")
print("   SVD discovered the semantic relationship!")
Real-World Insight: This is exactly how early Google worked! Modern search engines (Google, Bing) use neural network embeddings (like BERT), but LSA was the breakthrough that made semantic search possible.

Exercise 4: Noise Reduction in Signals 📡

Use SVD to clean noisy sensor data:
import numpy as np

# Clean signal: 3 underlying components
np.random.seed(42)
t = np.linspace(0, 4*np.pi, 100)

# True signal components
signal1 = np.sin(t)
signal2 = 0.5 * np.cos(2*t)  
signal3 = 0.3 * np.sin(3*t)

# Observed at 5 sensors with different mixings
mixing = np.array([
    [1.0, 0.5, 0.3],
    [0.8, 0.7, 0.2],
    [0.6, 0.4, 0.5],
    [0.9, 0.3, 0.4],
    [0.7, 0.6, 0.3],
])

true_signals = np.vstack([signal1, signal2, signal3])
clean_data = mixing @ true_signals

# Add noise
noise = np.random.randn(5, 100) * 0.3
noisy_data = clean_data + noise

# TODO:
# 1. Apply SVD to noisy data
# 2. Keep only top 3 components (the true rank)
# 3. Reconstruct denoised signal
# 4. Calculate noise reduction (SNR improvement)
import numpy as np

np.random.seed(42)
t = np.linspace(0, 4*np.pi, 100)

signal1 = np.sin(t)
signal2 = 0.5 * np.cos(2*t)
signal3 = 0.3 * np.sin(3*t)

mixing = np.array([
    [1.0, 0.5, 0.3],
    [0.8, 0.7, 0.2],
    [0.6, 0.4, 0.5],
    [0.9, 0.3, 0.4],
    [0.7, 0.6, 0.3],
])

true_signals = np.vstack([signal1, signal2, signal3])
clean_data = mixing @ true_signals
noise = np.random.randn(5, 100) * 0.3
noisy_data = clean_data + noise

print("📡 Signal Denoising with SVD")
print("=" * 55)

# 1. Apply SVD
U, S, VT = np.linalg.svd(noisy_data, full_matrices=False)

print("Singular Values:")
for i, s in enumerate(S):
    bar = "█" * int(s * 2)
    print(f"   σ{i+1} = {s:6.2f} {bar}")

# 2 & 3. Reconstruct with top 3 components
k = 3
denoised = U[:, :k] @ np.diag(S[:k]) @ VT[:k, :]

# 4. Calculate SNR improvement
def snr(signal, noise):
    signal_power = np.mean(signal ** 2)
    noise_power = np.mean(noise ** 2)
    return 10 * np.log10(signal_power / noise_power)

noise_before = noisy_data - clean_data
noise_after = denoised - clean_data

snr_before = snr(clean_data, noise_before)
snr_after = snr(clean_data, noise_after)

print(f"\n📊 Noise Reduction Results:")
print("-" * 40)
print(f"   Original SNR:  {snr_before:.2f} dB")
print(f"   Denoised SNR:  {snr_after:.2f} dB")
print(f"   Improvement:   {snr_after - snr_before:.2f} dB")
print(f"   Noise reduced: {(1 - 10**((snr_before-snr_after)/10)) * 100:.1f}%")

# Error analysis
mse_noisy = np.mean((noisy_data - clean_data) ** 2)
mse_denoised = np.mean((denoised - clean_data) ** 2)

print(f"\n📉 Error Analysis:")
print(f"   MSE (noisy):    {mse_noisy:.4f}")
print(f"   MSE (denoised): {mse_denoised:.4f}")
print(f"   Error reduced:  {(1 - mse_denoised/mse_noisy) * 100:.1f}%")

# Visual comparison (one sensor)
print("\n📈 Sample Signal (Sensor 1, first 20 points):")
print("   Clean:    " + " ".join([f"{v:5.2f}" for v in clean_data[0, :10]]))
print("   Noisy:    " + " ".join([f"{v:5.2f}" for v in noisy_data[0, :10]]))
print("   Denoised: " + " ".join([f"{v:5.2f}" for v in denoised[0, :10]]))
Real-World Insight: SVD denoising is used in MRI imaging (saving lives by improving image clarity), audio processing (Dolby noise reduction), and financial data analysis (separating signal from market noise).

Key Takeaways

SVD Core Concepts:
  • Universal Decomposition - Any matrix A = UΣVᵀ (works for non-square!)
  • Low-Rank Approximation - Keep top k singular values for compression
  • Collaborative Filtering - Predict missing ratings via latent factors
  • Latent Factors - Discover hidden patterns (movie genres, user preferences)
  • Noise Reduction - Large singular values = signal, small = noise

Interview Prep: SVD Questions

Q: What’s the difference between PCA and SVD?
PCA uses eigendecomposition of the covariance matrix (square, symmetric). SVD works directly on any matrix (even non-square). For centered data, SVD of X gives the same principal components as PCA. SVD is more numerically stable.
Q: How does Netflix use SVD for recommendations?
User-movie rating matrix is decomposed into user factors and movie factors. Each user/movie is represented by a latent vector capturing hidden preferences (action lover, comedy hater). Predictions = dot product of user and movie vectors.
Q: How do you choose the rank k for truncated SVD?
Methods: (1) Energy/variance retained (e.g., 95%), (2) Cross-validation for prediction tasks, (3) Visualization of singular value decay, (4) Domain knowledge about expected rank.
Q: What are singular values vs eigenvalues?
Singular values are always non-negative real numbers and exist for any matrix. Eigenvalues can be complex and only exist for square matrices. For symmetric A: singular values = |eigenvalues|.

Common Pitfalls

SVD Mistakes to Avoid:
  1. Confusing U and V - Rows of V are right singular vectors (features); columns of U are left (samples)
  2. Wrong Truncation - Keeping too few components loses signal; too many keeps noise
  3. Forgetting Scale - SVD is sensitive to feature scales; consider normalizing
  4. Cold Start Problem - SVD can’t recommend for new users/items with no data
  5. Memory Issues - Full SVD on large matrices is expensive; use randomized SVD

Course Complete!

🎉 Congratulations! You’ve mastered Linear Algebra for Machine Learning!

Course Complete!

You’ve completed Linear Algebra for Machine Learning!You now have the mathematical foundation to understand neural networks, dimensionality reduction, recommendation systems, and much more. These concepts appear everywhere in ML—from word embeddings to attention mechanisms to image processing.
Your Linear Algebra Toolkit:
  • Vectors - Data representation, similarity, embeddings
  • Matrices - Transformations, neural network layers, data batches
  • Eigenvalues - Feature importance, stability analysis
  • PCA - Dimensionality reduction, visualization, noise filtering
  • SVD - Recommendations, compression, matrix approximation