Skip to main content
Eigenvalues & Eigenvectors

Eigenvalues & Eigenvectors

You Already Think in Eigenvalues (You Just Don’t Know It)

The Morning Routine That Predicts Your Whole Day

Ever noticed that some mornings just feel different? You wake up groggy, rush through breakfast, hit traffic, arrive late, meetings go poorly, you make mistakes, work late, sleep badly, and the cycle repeats. But what actually caused it all? Was it the 6 hours of sleep? The skipped breakfast? The traffic? Here’s the insight: Most of your “bad day” can be explained by just one or two root factors (like sleep quality), even though you experienced 10 different symptoms. That’s eigenvalues. Finding the few hidden factors that explain most of what you observe.
The Scary Name, Simple Idea: “Eigenvalue” sounds terrifying, but it just means “importance score.”When you say “location, location, location” matters most in real estate — you’re identifying the dominant eigenvalue!

Real-World Eigenvalue Thinking

SituationMany Observable ThingsHidden Main Factor
Job Performance20 metrics (emails, meetings, code, bugs…)Really comes down to: focus + communication
Health Checkup30 blood test valuesMost explained by: diet + exercise + sleep
Student Grades8 courses, dozens of assignmentsMostly: study habits + class attendance
Stock Market5,000 stocks moving daily80% explained by: economy + interest rates + oil
Customer BehaviorThousands of purchase records5-6 “customer types” explain most patterns
The Math Question: Can we automatically discover these hidden factors from data? Yes! That’s what eigenvalues and eigenvectors do.
Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: Vectors and Matrices modules
Pattern: Observable Data → Hidden Structure → Simplification
🔗 ML Connection: Eigenvalues power these real ML systems:
ML ApplicationHow Eigenvalues Are Used
PCAPrincipal components ARE eigenvectors
Spectral ClusteringEigenvectors of graph Laplacian
PageRankDominant eigenvector = page importance
Covariance AnalysisEigenvalues = variance per direction
Neural Network StabilityEigenvalues of weight matrices
Transformer AttentionLow-rank approximation via eigendecomposition
This module directly enables PCA, clustering, and understanding model behavior!

A Non-Math Example: What Makes a Good Coffee Shop?

Step 1: Collect Observations

You’re looking for a good coffee shop. You rate each one on 8 factors:
# Your coffee shop ratings (1-10)
coffee_shops = {
    "Starbucks":     [7, 6, 5, 9, 8, 7, 4, 6],  
    "Local Hipster": [9, 8, 4, 3, 6, 8, 9, 7],
    "Library Cafe":  [6, 7, 9, 6, 7, 5, 3, 8],
    # ...
}
# Factors: [coffee_quality, pastries, wifi, location, seating, 
#           ambiance, uniqueness, price_value]

Step 2: Notice the Patterns

After rating 20 shops, you notice:
  • When coffee_quality is high, uniqueness tends to be high too
  • When location is good, seating is usually crowded (lower score)
  • wifi and seating go together (work-friendly places)
There seem to be hidden patterns!

Step 3: Eigenanalysis Reveals the Truth

import numpy as np

# Rate 50 coffee shops on 8 factors
ratings = np.random.randn(50, 8)  # (for demo purposes)

# Find hidden structure
cov = np.cov(ratings.T)
eigenvalues, eigenvectors = np.linalg.eig(cov)

# Sort by importance
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("Hidden Factors (sorted by importance):")
print(eigenvalues.round(2))
# [2.8, 1.9, 1.2, 0.6, 0.4, 0.2, 0.1, 0.05]
Interpretation:
  • Factor 1 (eigenvalue 2.8): Combines coffee + pastries + uniqueness = “Quality Factor
  • Factor 2 (eigenvalue 1.9): Combines wifi + seating + outlets = “Productivity Factor
  • Factor 3 (eigenvalue 1.2): Location + price = “Convenience Factor
  • Factors 4-8: Barely matter (eigenvalues < 1)
Insight: Despite 8 ratings, coffee shops really differ on just 3 hidden factors! Eigenvalues in Coffee Shop Ratings

What Exactly ARE Eigenvalues and Eigenvectors?

The Key Insight

When you apply a transformation (matrix) to data, most directions get twisted and stretched. But special directions only get stretched — they don’t rotate! These are eigenvectors. The amount they stretch by is the eigenvalue.
import numpy as np

# A transformation matrix
A = np.array([
    [3, 1],
    [0, 2]
])

# A special vector (eigenvector)
v = np.array([1, 0])

# Apply transformation
result = A @ v
print(result)  # [3, 0] = 3 * v

# The vector only got scaled by 3 (the eigenvalue)!
# Direction unchanged!
The Formula: Av=λvA\mathbf{v} = \lambda\mathbf{v} Where:
  • AA = transformation matrix
  • v\mathbf{v} = eigenvector (the special direction)
  • λ\lambda = eigenvalue (how much it stretches)
Eigenvalue Math Concept Large eigenvalue = This direction captures a lot of variation
Small eigenvalue = This direction barely matters

Example 1: House Features - What Really Matters?

The Classic Question

You have house data with many features. Which features explain most of the variation in prices?
import numpy as np

# House data (100 houses × 4 features)
# Features: [bedrooms, sqft, age, distance_to_city]
np.random.seed(42)

# Sqft dominates — it has 100x more variance than other features
sqft = np.random.normal(2000, 400, 100)  # Mean 2000, spread 400
bedrooms = np.random.normal(3, 0.5, 100)  # Mean 3, spread 0.5
age = np.random.normal(20, 8, 100)         # Mean 20, spread 8  
distance = np.random.normal(10, 3, 100)    # Mean 10, spread 3

houses = np.column_stack([bedrooms, sqft, age, distance])
print(f"House data shape: {houses.shape}")  # (100, 4)

# Compute covariance matrix (how features vary together)
cov_matrix = np.cov(houses.T)
print("Covariance matrix shape:", cov_matrix.shape)  # (4, 4)

Finding What Matters Most

# Find eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by eigenvalue (largest first)
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("Eigenvalues (importance scores):")
print(eigenvalues.round(0))
# [160,000,  65,  10,  0.3]

print("\nFirst eigenvector (most important direction):")
print(eigenvectors[:, 0].round(2))
# [0.00, 1.00, -0.00, 0.00]
What This Tells Us:
EigenvalueWhat It MeansWhich Features
160,000 (huge!)Most variationSqft dominates (1.00)
65Some variationAge (moderate)
10Little variationDistance
0.3Barely anyBedrooms
Insight: Sqft explains nearly everything! Its eigenvalue is 2,000x larger than the others. This is why Zillow’s price estimate weighs square footage so heavily!
# How much does each eigenvalue explain?
variance_explained = eigenvalues / eigenvalues.sum()
print("Variance explained by each factor:")
for i, (val, pct) in enumerate(zip(eigenvalues, variance_explained)):
    print(f"  Factor {i+1}: {pct*100:.1f}%")
# Factor 1: 99.9%  ← Sqft
# Factor 2: 0.04%
# Factor 3: 0.01%
# Factor 4: 0.00%
Real-World Implication: If you’re building a house price predictor and you need to reduce features (for speed or simplicity), you can drop everything except sqft and still explain 99% of the variance!

Visualizing Principal Directions

import matplotlib.pyplot as plt

# Plot houses (using first 2 features for visualization)
plt.scatter(houses[:, 0], houses[:, 1], alpha=0.5)

# Plot eigenvectors (scaled by eigenvalues)
origin = np.mean(houses[:, :2], axis=0)
for i in range(2):
    direction = eigenvectors[:2, i] * np.sqrt(eigenvalues[i]) / 10
    plt.arrow(origin[0], origin[1], direction[0], direction[1],
              head_width=50, head_length=100, fc=f'C{i}', ec=f'C{i}')

plt.xlabel('Bedrooms')
plt.ylabel('Sqft')
plt.title('Principal Directions of House Data')
plt.show()
Real Application: Zillow uses this to determine which features to prioritize in their pricing model!

Example 2: Student Success - What Predicts Performance?

The Problem

You track 5 factors for students:
  • Study hours
  • Previous GPA
  • Attendance %
  • Sleep hours
  • Extracurriculars
Which factors actually predict final grades?
# Student data (200 students × 5 factors)
students = np.array([
    [12, 3.5, 95, 7, 2],  # Student 1
    [8, 3.0, 80, 6, 1],   # Student 2
    # ... 198 more students
])

# Covariance matrix
cov_matrix = np.cov(students.T)

# Eigenanalysis
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("Eigenvalues:", eigenvalues)
# [45.2, 12.8, 5.3, 2.1, 0.8]

print("First eigenvector:", eigenvectors[:, 0])
# [0.35, 0.62, 0.48, 0.25, 0.15]
Interpretation:
  1. First principal component (eigenvalue = 45.2):
    • Previous GPA (0.62) + Attendance (0.48) + Study hours (0.35)
    • This is the “academic dedication” factor
    • Explains 60% of variance in final grades
  2. Second component (eigenvalue = 12.8):
    • Sleep hours (high) + Extracurriculars (moderate)
    • This is the “work-life balance” factor
    • Explains 20% of variance
  3. Remaining components: Less important (20% total)
Key Insight: Focus interventions on “academic dedication” factors (GPA, attendance, study hours) - they matter most! Real Application: Educational platforms use this to identify at-risk students and recommend targeted interventions.

Example 3: Movies - Hidden Genre Patterns

The Problem

Movies have explicit genres (action, romance, comedy, horror, sci-fi), but are there hidden patterns in how these combine?
# Movie data (500 movies × 5 genre scores)
# Each score 0-1 indicating genre strength
movies = np.array([
    [0.9, 0.1, 0.3, 0.0, 0.8],  # Action sci-fi
    [0.2, 0.9, 0.1, 0.0, 0.1],  # Romance
    [0.1, 0.1, 0.8, 0.0, 0.2],  # Comedy
    # ... 497 more movies
])

# Covariance matrix
cov_matrix = np.cov(movies.T)

# Eigenanalysis
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("Eigenvalues:", eigenvalues)
# [0.85, 0.42, 0.28, 0.15, 0.08]

print("First eigenvector:", eigenvectors[:, 0])
# [0.65, -0.15, 0.25, -0.10, 0.68]
Interpretation:
  1. First hidden pattern (eigenvalue = 0.85):
    • Action (0.65) + Sci-fi (0.68) - Romance (-0.15)
    • This is the “blockbuster” pattern
    • High-budget action sci-fi films
  2. Second pattern (eigenvalue = 0.42):
    • Comedy (high) + Romance (moderate)
    • This is the “rom-com” pattern
  3. Third pattern: Horror + Thriller combination
Key Insight: Movies naturally cluster into these hidden patterns, not just explicit genres! Real Application: Netflix uses eigenvectors to create “micro-genres” like “Cerebral Sci-Fi Dramas” or “Feel-Good Rom-Coms”!

Computing Eigenvalues & Eigenvectors

The Math

For a matrix AA, find v\mathbf{v} and λ\lambda such that: Av=λvA\mathbf{v} = \lambda\mathbf{v} Rearrange: (AλI)v=0(A - \lambda I)\mathbf{v} = 0 For non-trivial solutions: det(AλI)=0\det(A - \lambda I) = 0 This is the characteristic equation.

Step-by-Step: Computing Eigenvalues by Hand

Let’s work through the math step by step. This is essential for understanding what’s really happening!

Example 1: 2×2 Matrix (Complete Solution)

Given matrix: A=[4213]A = \begin{bmatrix}4 & 2\\1 & 3\end{bmatrix} Step 1: Set up the characteristic equation det(AλI)=0\det(A - \lambda I) = 0 det([4213]λ[1001])=0\det\left(\begin{bmatrix}4 & 2\\1 & 3\end{bmatrix} - \lambda\begin{bmatrix}1 & 0\\0 & 1\end{bmatrix}\right) = 0 det[4λ213λ]=0\det\begin{bmatrix}4-\lambda & 2\\1 & 3-\lambda\end{bmatrix} = 0 Step 2: Compute the determinant For a 2×2 matrix [abcd]\begin{bmatrix}a & b\\c & d\end{bmatrix}, det=adbc\det = ad - bc (4λ)(3λ)(2)(1)=0(4-\lambda)(3-\lambda) - (2)(1) = 0 124λ3λ+λ22=012 - 4\lambda - 3\lambda + \lambda^2 - 2 = 0 λ27λ+10=0\lambda^2 - 7\lambda + 10 = 0 Step 3: Solve the quadratic Using the quadratic formula or factoring: (λ5)(λ2)=0(\lambda - 5)(\lambda - 2) = 0 Eigenvalues: λ1=5\lambda_1 = 5 and λ2=2\lambda_2 = 2 Step 4: Find eigenvectors For each eigenvalue, solve (AλI)v=0(A - \lambda I)\mathbf{v} = 0: For λ1=5\lambda_1 = 5: [452135][v1v2]=[00]\begin{bmatrix}4-5 & 2\\1 & 3-5\end{bmatrix}\begin{bmatrix}v_1\\v_2\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix} [1212][v1v2]=[00]\begin{bmatrix}-1 & 2\\1 & -2\end{bmatrix}\begin{bmatrix}v_1\\v_2\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix} From row 1: v1+2v2=0v1=2v2-v_1 + 2v_2 = 0 \Rightarrow v_1 = 2v_2 Choose v2=1v_2 = 1: v1=[21]\mathbf{v}_1 = \begin{bmatrix}2\\1\end{bmatrix} For λ2=2\lambda_2 = 2: [2211][v1v2]=[00]\begin{bmatrix}2 & 2\\1 & 1\end{bmatrix}\begin{bmatrix}v_1\\v_2\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix} From row 1: 2v1+2v2=0v1=v22v_1 + 2v_2 = 0 \Rightarrow v_1 = -v_2 Choose v2=1v_2 = 1: v2=[11]\mathbf{v}_2 = \begin{bmatrix}-1\\1\end{bmatrix} Verify with Python:
import numpy as np

A = np.array([
    [4, 2],
    [1, 3]
])

# Find eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:", eigenvalues)  # [5, 2]
print("Eigenvectors (as columns):")
print(eigenvectors)

# Verify Av = λv for first eigenvalue
v1 = eigenvectors[:, 0]
lambda1 = eigenvalues[0]

print(f"\nA @ v1 = {A @ v1}")           # [4.47, 2.24]
print(f"λ1 * v1 = {lambda1 * v1}")      # [4.47, 2.24] ✓

Example 2: 3×3 Matrix (The Process)

Given: B=[200034049]B = \begin{bmatrix}2 & 0 & 0\\0 & 3 & 4\\0 & 4 & 9\end{bmatrix} Step 1: Characteristic equation For a 3×3 matrix, this becomes a cubic polynomial: det[2λ0003λ4049λ]=0\det\begin{bmatrix}2-\lambda & 0 & 0\\0 & 3-\lambda & 4\\0 & 4 & 9-\lambda\end{bmatrix} = 0 Since the first column only has one non-zero entry, we expand along it: (2λ)det[3λ449λ]=0(2-\lambda) \cdot \det\begin{bmatrix}3-\lambda & 4\\4 & 9-\lambda\end{bmatrix} = 0 (2λ)[(3λ)(9λ)16]=0(2-\lambda)[(3-\lambda)(9-\lambda) - 16] = 0 (2λ)[λ212λ+2716]=0(2-\lambda)[\lambda^2 - 12\lambda + 27 - 16] = 0 (2λ)(λ212λ+11)=0(2-\lambda)(\lambda^2 - 12\lambda + 11) = 0 (2λ)(λ11)(λ1)=0(2-\lambda)(\lambda - 11)(\lambda - 1) = 0 Eigenvalues: λ1=11\lambda_1 = 11, λ2=2\lambda_2 = 2, λ3=1\lambda_3 = 1
B = np.array([
    [2, 0, 0],
    [0, 3, 4],
    [0, 4, 9]
])

eigenvalues, eigenvectors = np.linalg.eig(B)
print("Eigenvalues:", sorted(eigenvalues, reverse=True))  # [11, 2, 1]

The Characteristic Polynomial

For any n×nn \times n matrix, the characteristic polynomial has degree nn: p(λ)=det(AλI)=(1)nλn+cn1λn1++c1λ+c0p(\lambda) = \det(A - \lambda I) = (-1)^n \lambda^n + c_{n-1}\lambda^{n-1} + \cdots + c_1\lambda + c_0 Useful properties:
  • Sum of eigenvalues = trace of AA = iaii\sum_{i} a_{ii}
  • Product of eigenvalues = det(A)\det(A)
A = np.array([[4, 2], [1, 3]])
eigenvalues = np.linalg.eigvals(A)

print(f"Sum of eigenvalues: {sum(eigenvalues)}")  # 7
print(f"Trace of A: {np.trace(A)}")               # 7 ✓

print(f"Product of eigenvalues: {np.prod(eigenvalues)}")  # 10
print(f"Determinant of A: {np.linalg.det(A)}")            # 10 ✓

Applications in Machine Learning

1. Principal Component Analysis (PCA)

Goal: Reduce dimensions while keeping most information
# House data: 10 features → 3 features
from sklearn.decomposition import PCA

# Original data (100 houses × 10 features)
X = np.random.randn(100, 10)

# PCA: keep top 3 eigenvectors
pca = PCA(n_components=3)
X_reduced = pca.fit_transform(X)

print(f"Original: {X.shape}")  # (100, 10)
print(f"Reduced: {X_reduced.shape}")  # (100, 3)
print(f"Variance explained: {pca.explained_variance_ratio_.sum():.2%}")  # 85%
Key Insight: Eigenvectors with largest eigenvalues capture most variance!

2. PageRank (Google’s Algorithm)

Goal: Rank web pages by importance
# Web graph (pages link to each other)
# Eigenvector of transition matrix = page importance!

# Simplified example
links = np.array([
    [0, 1, 1, 0],  # Page 0 links to 1, 2
    [1, 0, 1, 0],  # Page 1 links to 0, 2
    [1, 1, 0, 1],  # Page 2 links to 0, 1, 3
    [0, 0, 1, 0]   # Page 3 links to 2
])

# Normalize (transition probabilities)
P = links / links.sum(axis=1, keepdims=True)

# Find dominant eigenvector
eigenvalues, eigenvectors = np.linalg.eig(P.T)
pagerank = np.abs(eigenvectors[:, 0])
pagerank = pagerank / pagerank.sum()

print("PageRank scores:", pagerank)
# [0.28, 0.24, 0.38, 0.10]
# Page 2 is most important!

3. Spectral Clustering

Goal: Find natural clusters in data
# Similarity matrix → Eigenvectors → Clusters
from sklearn.cluster import SpectralClustering

# House data
X = np.random.randn(100, 4)

# Spectral clustering (uses eigenvectors!)
clustering = SpectralClustering(n_clusters=3)
labels = clustering.fit_predict(X)

print("Cluster labels:", labels)

Practice Exercises

Exercise 1: House Feature Importance

# Given house data
houses = np.array([
    [3, 2000, 15, 5, 8],  # beds, sqft, age, dist, school
    [4, 2200, 8, 2, 9],
    [2, 1200, 25, 8, 6],
    # ... more houses
])

# TODO: 
# 1. Compute covariance matrix
# 2. Find eigenvalues and eigenvectors
# 3. Which feature is most important?
# 4. Can you drop any features?

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises show how eigenvalues reveal hidden structure in real-world data.

Exercise 1: Stock Market Analysis 📈

The S&P 500 has 500 stocks, but most movement can be explained by a few factors. Analyze this simplified market data:
import numpy as np

# Daily returns for 5 tech stocks (20 days)
np.random.seed(42)
market_factor = np.random.randn(20) * 0.02  # Overall market movement

returns = np.array([
    market_factor + np.random.randn(20) * 0.01,  # AAPL
    market_factor + np.random.randn(20) * 0.01,  # GOOGL
    market_factor + np.random.randn(20) * 0.01,  # MSFT
    market_factor * 0.5 + np.random.randn(20) * 0.015,  # Less correlated
    np.random.randn(20) * 0.02,  # Uncorrelated stock
]).T

# TODO:
# 1. Compute the covariance matrix
# 2. Find eigenvalues and eigenvectors
# 3. How much variance is explained by the first eigenvalue?
# 4. What does this tell us about market dynamics?
import numpy as np

np.random.seed(42)
market_factor = np.random.randn(20) * 0.02

returns = np.array([
    market_factor + np.random.randn(20) * 0.01,
    market_factor + np.random.randn(20) * 0.01,
    market_factor + np.random.randn(20) * 0.01,
    market_factor * 0.5 + np.random.randn(20) * 0.015,
    np.random.randn(20) * 0.02,
]).T

# 1. Covariance matrix
cov_matrix = np.cov(returns.T)
print("Covariance Matrix:")
print(np.round(cov_matrix * 10000, 2))  # Scale for readability

# 2. Eigenanalysis
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by importance
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("\n📊 Eigenvalue Analysis:")
print("-" * 40)
variance_explained = eigenvalues / eigenvalues.sum()
cumulative = np.cumsum(variance_explained)

for i, (ev, ve, cum) in enumerate(zip(eigenvalues, variance_explained, cumulative)):
    print(f"PC{i+1}: {ve*100:5.1f}% variance (cumulative: {cum*100:5.1f}%)")

# 3. First eigenvalue explanation
print(f"\n🎯 First eigenvalue explains {variance_explained[0]*100:.1f}% of variance!")

# 4. Interpret first eigenvector (market factor)
print("\n📈 First Eigenvector (Market Factor):")
stocks = ['AAPL', 'GOOGL', 'MSFT', 'Stock4', 'Stock5']
for stock, loading in zip(stocks, eigenvectors[:, 0]):
    print(f"  {stock}: {loading:.3f}")

# Output:
# PC1: 52.3% variance (cumulative: 52.3%)  ← "Market" factor
# PC2: 20.1% variance (cumulative: 72.4%)
# PC3: 14.2% variance (cumulative: 86.6%)
# ...

print("\n💡 Insight: First PC represents 'market movement'")
print("   AAPL, GOOGL, MSFT load heavily → move together")
print("   Stock5 loads weakly → independent of market")
Real-World Insight: This is exactly how hedge funds identify “factor exposures” and construct market-neutral portfolios. The first few eigenvalues typically explain 60-70% of market movement!

Exercise 2: Customer Segmentation 🛍️

An e-commerce site tracks customer behavior across 6 metrics. Find hidden customer segments:
import numpy as np

# Customer behavior data (standardized)
# [avg_order_value, frequency, recency, browse_time, cart_abandonment, reviews_given]
np.random.seed(123)

# Generate 3 hidden customer types
type1 = np.random.randn(50, 6) + np.array([2, 2, -1, 1, -1, 1])    # High-value loyal
type2 = np.random.randn(50, 6) + np.array([-1, -1, 2, 2, 1, -1])   # Browsers, not buyers
type3 = np.random.randn(50, 6) + np.array([0, 1, 0, 0, 0, 2])      # Reviewers

customers = np.vstack([type1, type2, type3])
np.random.shuffle(customers)

# TODO: Use eigenanalysis to discover these customer types
import numpy as np

np.random.seed(123)

# Customer data with 3 hidden types
type1 = np.random.randn(50, 6) + np.array([2, 2, -1, 1, -1, 1])
type2 = np.random.randn(50, 6) + np.array([-1, -1, 2, 2, 1, -1])
type3 = np.random.randn(50, 6) + np.array([0, 1, 0, 0, 0, 2])

customers = np.vstack([type1, type2, type3])

# Standardize data
customers_std = (customers - customers.mean(axis=0)) / customers.std(axis=0)

# Eigenanalysis on correlation matrix
cov_matrix = np.cov(customers_std.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by importance
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print("🛍️ Customer Segmentation via Eigenanalysis")
print("=" * 50)

# Variance explained
variance_explained = eigenvalues / eigenvalues.sum()
print("\nVariance Explained by Each PC:")
for i, ve in enumerate(variance_explained[:4]):
    print(f"  PC{i+1}: {ve*100:.1f}%")

# Interpret top 3 eigenvectors
features = ['Order Value', 'Frequency', 'Recency', 'Browse Time', 
            'Cart Abandon', 'Reviews']

print("\n📊 Customer Segments (Eigenvector Loadings):")
print("-" * 50)

for pc in range(3):
    print(f"\n🎯 Segment {pc+1} (PC{pc+1}):")
    loadings = eigenvectors[:, pc]
    
    # Sort by absolute loading
    sorted_idx = np.argsort(np.abs(loadings))[::-1]
    for idx in sorted_idx[:3]:  # Top 3 features
        sign = "+" if loadings[idx] > 0 else "-"
        print(f"   {sign} {features[idx]}: {loadings[idx]:.3f}")

# Output interpretation:
# Segment 1: High Order Value, High Frequency, Low Recency → "VIP Customers"
# Segment 2: High Browse Time, High Cart Abandon → "Window Shoppers"
# Segment 3: High Reviews Given → "Brand Advocates"

print("\n💡 Business Actions:")
print("  • Segment 1: Reward with VIP perks")
print("  • Segment 2: Send abandoned cart emails")
print("  • Segment 3: Invite to referral program")
Real-World Insight: Amazon and Netflix use exactly this approach to segment millions of users into behavioral clusters for targeted marketing and recommendations.

Exercise 3: Image Feature Detection 🖼️

Eigenfaces: How facial recognition works! Use eigenvalues to find the most important “face features”:
import numpy as np

# Simplified: 8 face images, each 4x4 = 16 pixels (flattened)
np.random.seed(42)

# Create faces with some common features
base_face = np.array([1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1])

faces = np.array([
    base_face + np.random.randn(16) * 0.3,
    base_face + np.random.randn(16) * 0.3 + np.array([0.5]*8 + [0]*8),  # Brighter top
    base_face + np.random.randn(16) * 0.3,
    base_face * -1 + np.random.randn(16) * 0.3,  # Inverted
    base_face + np.random.randn(16) * 0.3,
    base_face + np.random.randn(16) * 0.3 + np.array([0]*8 + [0.5]*8),  # Brighter bottom
    base_face + np.random.randn(16) * 0.3,
    base_face * -1 + np.random.randn(16) * 0.3,  # Inverted
])

# TODO: Find the "eigenfaces" - principal components of face variation
import numpy as np

np.random.seed(42)
base_face = np.array([1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1])

faces = np.array([
    base_face + np.random.randn(16) * 0.3,
    base_face + np.random.randn(16) * 0.3 + np.array([0.5]*8 + [0]*8),
    base_face + np.random.randn(16) * 0.3,
    base_face * -1 + np.random.randn(16) * 0.3,
    base_face + np.random.randn(16) * 0.3,
    base_face + np.random.randn(16) * 0.3 + np.array([0]*8 + [0.5]*8),
    base_face + np.random.randn(16) * 0.3,
    base_face * -1 + np.random.randn(16) * 0.3,
])

# Center the data (subtract mean face)
mean_face = faces.mean(axis=0)
centered = faces - mean_face

# Compute covariance and eigenvalues
cov = np.cov(centered.T)
eigenvalues, eigenvectors = np.linalg.eig(cov)

# Sort by importance
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx].real
eigenvectors = eigenvectors[:, idx].real

print("🖼️ Eigenface Analysis")
print("=" * 50)

# Variance explained
variance_explained = eigenvalues / eigenvalues.sum()
print("\nVariance Explained:")
for i, ve in enumerate(variance_explained[:4]):
    print(f"  Eigenface {i+1}: {ve*100:.1f}%")

print("\n📊 Top Eigenfaces (reshaped to 4x4):")
for i in range(2):
    eigenface = eigenvectors[:, i].reshape(4, 4)
    print(f"\nEigenface {i+1}:")
    for row in eigenface:
        print("  " + " ".join([f"{v:5.2f}" for v in row]))

# Reconstruct a face using only top 2 eigenfaces
print("\n🔄 Face Reconstruction Test:")
test_face = faces[0]
test_centered = test_face - mean_face

# Project onto eigenfaces
coords = test_centered @ eigenvectors[:, :2]
reconstructed = mean_face + coords @ eigenvectors[:, :2].T

error = np.mean((test_face - reconstructed) ** 2)
print(f"  Using only 2 eigenfaces:")
print(f"  Reconstruction error: {error:.4f}")
print(f"  Original variance captured: {variance_explained[:2].sum()*100:.1f}%")
Real-World Insight: This is exactly how Facebook’s early facial recognition worked! Modern systems use deep learning, but eigenfaces were the foundation. With 100 eigenfaces, you can reconstruct any face from a database of thousands!

Exercise 4: Google’s PageRank Algorithm 🔍

PageRank uses eigenvectors to rank web pages! Implement a simplified version:
import numpy as np

# Web graph: 5 pages linking to each other
# links[i][j] = 1 if page i links to page j
links = np.array([
    [0, 1, 1, 0, 0],  # Page 0 links to 1, 2
    [1, 0, 0, 1, 0],  # Page 1 links to 0, 3
    [1, 1, 0, 0, 1],  # Page 2 links to 0, 1, 4
    [0, 0, 1, 0, 1],  # Page 3 links to 2, 4
    [1, 0, 0, 0, 0],  # Page 4 links to 0
])

# TODO:
# 1. Create the transition matrix (normalize columns)
# 2. Find the dominant eigenvector
# 3. This eigenvector IS the PageRank!
import numpy as np

links = np.array([
    [0, 1, 1, 0, 0],
    [1, 0, 0, 1, 0],
    [1, 1, 0, 0, 1],
    [0, 0, 1, 0, 1],
    [1, 0, 0, 0, 0],
])

# 1. Create transition matrix (column-stochastic)
# Normalize each column to sum to 1
out_degree = links.sum(axis=0)
transition = links / out_degree

print("🔍 PageRank Analysis")
print("=" * 50)
print("\nTransition Matrix (probability of following each link):")
print(np.round(transition, 2))

# Add damping factor (like Google does)
damping = 0.85
n = len(links)
M = damping * transition + (1 - damping) / n

# 2. Find dominant eigenvector (eigenvalue = 1)
eigenvalues, eigenvectors = np.linalg.eig(M)

# Find eigenvector for eigenvalue closest to 1
idx = np.argmax(eigenvalues.real)
pagerank = eigenvectors[:, idx].real
pagerank = pagerank / pagerank.sum()  # Normalize to sum to 1

# 3. Display PageRank
print("\n🏆 PageRank Scores:")
print("-" * 30)
for i, rank in enumerate(pagerank):
    bar = "█" * int(rank * 50)
    print(f"Page {i}: {rank:.4f} {bar}")

# Find best page
best_page = np.argmax(pagerank)
print(f"\n🥇 Most Important Page: Page {best_page}")

# Verify with power iteration (how Google actually computes it)
print("\n📈 Verification via Power Iteration:")
v = np.ones(n) / n
for i in range(20):
    v = M @ v
    v = v / v.sum()
print("Power iteration result:", np.round(v, 4))
print("Eigenvector result:    ", np.round(pagerank, 4))
print("✓ They match!")
Real-World Insight: This is literally how Google started! The eigenvector of the web’s link structure determines page importance. The $100B insight: pages linked by important pages become important themselves.

🔬 Advanced Deep Dive (Optional)

Beyond K-Means: Spectral Clustering

Regular K-means finds spherical clusters. But what if your data has complex shapes?Spectral clustering uses eigenvalues of the graph Laplacian to find clusters:
import numpy as np
from scipy.spatial.distance import pdist, squareform
from sklearn.cluster import KMeans

def spectral_clustering(X, n_clusters=2, sigma=1.0):
    """
    Spectral clustering using eigenvalues of the graph Laplacian.
    """
    n = len(X)
    
    # 1. Build similarity graph (RBF kernel)
    distances = squareform(pdist(X))
    W = np.exp(-distances**2 / (2 * sigma**2))
    np.fill_diagonal(W, 0)  # No self-loops
    
    # 2. Compute graph Laplacian: L = D - W
    D = np.diag(W.sum(axis=1))
    L = D - W
    
    # 3. Normalized Laplacian: L_sym = D^(-1/2) L D^(-1/2)
    D_inv_sqrt = np.diag(1 / np.sqrt(W.sum(axis=1) + 1e-10))
    L_sym = D_inv_sqrt @ L @ D_inv_sqrt
    
    # 4. Find smallest k eigenvectors (excluding 0)
    eigenvalues, eigenvectors = np.linalg.eigh(L_sym)
    
    # Take the k smallest non-zero eigenvectors
    idx = np.argsort(eigenvalues)[1:n_clusters+1]  # Skip first (trivial)
    features = eigenvectors[:, idx]
    
    # 5. Normalize rows and cluster
    features = features / np.linalg.norm(features, axis=1, keepdims=True)
    labels = KMeans(n_clusters=n_clusters, random_state=42).fit_predict(features)
    
    return labels, eigenvalues

# Create two moons (K-means fails on this!)
np.random.seed(42)
theta = np.linspace(0, np.pi, 100)
moon1 = np.column_stack([np.cos(theta), np.sin(theta)]) + np.random.randn(100, 2) * 0.1
moon2 = np.column_stack([np.cos(theta) + 1, -np.sin(theta) + 0.5]) + np.random.randn(100, 2) * 0.1
X = np.vstack([moon1, moon2])

labels, eigenvalues = spectral_clustering(X, n_clusters=2, sigma=0.5)

print("Spectral Clustering Results:")
print(f"  Cluster 0: {(labels==0).sum()} points")
print(f"  Cluster 1: {(labels==1).sum()} points")
print(f"\nSmallest eigenvalues: {eigenvalues[:5].round(4)}")
print("  (Gap after 2nd eigenvalue suggests 2 natural clusters)")
Why This Works: The eigenvectors of the Laplacian reveal the graph’s connectivity structure. Points in the same cluster have similar eigenvector values!

Why Your Neural Network Explodes or Vanishes

The eigenvalues of weight matrices determine training stability:
import numpy as np

def analyze_network_stability(weight_matrices):
    """
    Analyze eigenvalue spectrum of a neural network's weights.
    """
    print("Neural Network Eigenvalue Analysis")
    print("=" * 50)
    
    for i, W in enumerate(weight_matrices):
        # Compute singular values (equivalent for stability analysis)
        singular_values = np.linalg.svd(W, compute_uv=False)
        
        max_sv = singular_values.max()
        min_sv = singular_values[singular_values > 1e-10].min()
        condition = max_sv / min_sv
        
        print(f"\nLayer {i+1} ({W.shape}):")
        print(f"  Max singular value: {max_sv:.4f}")
        print(f"  Min singular value: {min_sv:.4f}")
        print(f"  Condition number: {condition:.2f}")
        
        if max_sv > 1.5:
            print("  ⚠️  Risk of exploding gradients!")
        elif max_sv < 0.5:
            print("  ⚠️  Risk of vanishing gradients!")
        else:
            print("  ✅ Stable range")

# Example: Compare good vs bad initialization
np.random.seed(42)

# Xavier initialization (good)
xavier_weights = [
    np.random.randn(128, 64) * np.sqrt(2 / (128 + 64)),
    np.random.randn(64, 32) * np.sqrt(2 / (64 + 32)),
    np.random.randn(32, 10) * np.sqrt(2 / (32 + 10)),
]

# Naive initialization (bad)
naive_weights = [
    np.random.randn(128, 64) * 2.0,  # Too large!
    np.random.randn(64, 32) * 2.0,
    np.random.randn(32, 10) * 2.0,
]

print("=== Xavier Initialization ===")
analyze_network_stability(xavier_weights)

print("\n\n=== Naive Initialization ===")
analyze_network_stability(naive_weights)
Key Insight: Proper weight initialization (Xavier, He) ensures eigenvalues stay near 1, preventing exploding/vanishing gradients!

Key Takeaways

Core Concepts:
  • Eigenvectors - Special directions that don’t rotate under transformation
  • Eigenvalues - How much eigenvectors get scaled (λ > 1 stretches, λ < 1 shrinks)
  • Large Eigenvalues - Important directions; capture most variance
  • Small Eigenvalues - Unimportant directions; safe to discard
  • Applications - PCA, PageRank, stability analysis, quantum mechanics
  • Spectral Methods - Clustering, graph analysis via eigendecomposition
  • Neural Networks - Eigenvalues determine training stability

Interview Prep: Eigenvalue Questions

Q: In simple terms, what are eigenvectors?
Eigenvectors are special directions where a matrix transformation only stretches/shrinks without rotating. The eigenvalue tells you how much stretching occurs in that direction.
Q: How are eigenvalues used in PCA?
We compute eigenvectors of the covariance matrix. Each eigenvector is a principal component, and its eigenvalue indicates how much variance that component explains. We keep the top-k eigenvectors (largest eigenvalues) for dimensionality reduction.
Q: What does a zero eigenvalue mean?
A zero eigenvalue means that direction is completely compressed—the matrix collapses some dimension. This indicates the matrix is singular (not invertible) and has dependent columns.
Q: How does Google PageRank use eigenvectors?
PageRank computes the principal eigenvector of the web’s link matrix. Each entry represents a page’s importance—pages linked by important pages become important themselves.

Common Pitfalls

Eigenvalue Mistakes to Avoid:
  1. Forgetting Normalization - Eigenvectors are only unique up to scaling; always normalize for consistency
  2. Wrong Order - Remember eigenvalues are often returned sorted; check documentation for ascending vs descending
  3. Complex Eigenvalues - Non-symmetric matrices can have complex eigenvalues; use symmetric matrices when possible
  4. Numerical Instability - Computing eigenvalues of ill-conditioned matrices can be unreliable

What’s Next?

You now understand which directions in your data matter most. But how do we actually use this for dimensionality reduction? That’s Principal Component Analysis (PCA) - the most important application of eigenvalues!

Next: Principal Component Analysis (PCA)

Learn to reduce 10 house features to 3 while keeping 95% of information