Skip to main content
Vectors - The Language of Similarity

Vectors: The Language of Similarity

A Problem You Already Understand

You’re looking for a new apartment. You visit Zillow and find one you love:
  • 2 bedrooms
  • 1,200 square feet
  • $2,400/month rent
  • 15 minutes from work
Now you want to find similar apartments. Not identical — just similar enough that you’d consider them. Zillow shows you a “Similar Homes” section. But how did they decide which apartments are similar? Think about it: What makes two apartments “similar”?
  • Same number of bedrooms?
  • Similar size?
  • Similar rent?
  • Similar commute?
All of the above, in some combination. And that combination is exactly what vectors and similarity measures capture.
Estimated Time: 3-4 hours
Difficulty: Beginner
Prerequisites: Basic Python
What You’ll Build: A “Find Similar Items” system that works for apartments, songs, or anything
🔗 ML Connection: Vectors are THE foundation of modern ML. Here’s where you’ll see them:
ML SystemVector Representation
Word2Vec/GPTEvery word → 300-768 dimensional vector
Face RecognitionEvery face → 128-512 dimensional embedding
Recommendation SystemsUsers & items in shared vector space
Image ClassificationCNN features as vectors
After this module, you’ll understand exactly how these systems find “similar” items!

Step 1: Describe Things with Numbers

The first insight is simple: we can describe any apartment as a list of numbers. Apartment as Vector
# My favorite apartment
my_apartment = [2, 1200, 2400, 15]
#               ↑   ↑     ↑    ↑
#            beds sqft  rent  commute(min)
Now every apartment is just 4 numbers:
apartment_A = [2, 1200, 2400, 15]   # My favorite
apartment_B = [2, 1100, 2300, 18]   # Very similar!
apartment_C = [4, 2500, 4500, 45]   # Very different
apartment_D = [1, 800, 1900, 10]    # Smaller, cheaper, closer
This list of numbers is called a vector. That’s it. A vector is just an ordered list of numbers that describes something.
Key Insight: Once something is described as numbers, we can use math to compare things automatically. No human judgment needed.
Vector Math Concept

Mathematical Foundations: Vector Operations

Before we measure similarity, let’s master the fundamental operations. These are the building blocks of ALL machine learning.

Vector Addition: Combine Two Vectors

When you add vectors, you add corresponding components: a+b=[a1a2a3]+[b1b2b3]=[a1+b1a2+b2a3+b3]\mathbf{a} + \mathbf{b} = \begin{bmatrix}a_1\\a_2\\a_3\end{bmatrix} + \begin{bmatrix}b_1\\b_2\\b_3\end{bmatrix} = \begin{bmatrix}a_1 + b_1\\a_2 + b_2\\a_3 + b_3\end{bmatrix} Real Example: Combining two shopping carts:
import numpy as np

# Shopping cart contents: [apples, bananas, oranges]
cart_monday = np.array([3, 2, 5])
cart_tuesday = np.array([1, 4, 2])

# Total purchases
total = cart_monday + cart_tuesday
print(f"Total: {total}")  # [4, 6, 7]
Geometric Interpretation: Place vectors tip-to-tail; the sum goes from the first tail to the last tip.

Scalar Multiplication: Scale a Vector

Multiply every component by the same number (scalar): cv=c[v1v2v3]=[cv1cv2cv3]c \cdot \mathbf{v} = c \cdot \begin{bmatrix}v_1\\v_2\\v_3\end{bmatrix} = \begin{bmatrix}c \cdot v_1\\c \cdot v_2\\c \cdot v_3\end{bmatrix} Real Example: Double a recipe:
# Recipe: [flour_cups, sugar_cups, eggs]
recipe = np.array([2, 0.5, 3])

# Double the recipe
doubled = 2 * recipe
print(f"Doubled: {doubled}")  # [4, 1, 6]

# Half the recipe
halved = 0.5 * recipe
print(f"Halved: {halved}")  # [1, 0.25, 1.5]
Geometric Interpretation: Scalar > 1 stretches the vector; 0 < scalar < 1 shrinks it; negative flips direction.

Vector Magnitude (Length)

The magnitude (or norm) measures how “big” a vector is: v=v12+v22++vn2=i=1nvi2\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2} Real Example: Distance from origin:
# Your position: [x, y] = [3, 4]
position = np.array([3, 4])

# Distance from origin (Pythagorean theorem!)
magnitude = np.sqrt(3**2 + 4**2)  # = 5
# Or use NumPy:
magnitude = np.linalg.norm(position)  # = 5.0

print(f"Distance from origin: {magnitude}")
Fun fact: The 3-4-5 triangle is the most famous Pythagorean triple! Ancient Egyptians used it to create right angles in construction.

Unit Vectors: Direction Without Magnitude

A unit vector has length 1 and only represents direction: v^=vv\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|} Real Example: Normalize for comparison:
# Two vectors with different magnitudes
review_1 = np.array([5, 4, 5, 3, 4])  # Enthusiastic reviewer
review_2 = np.array([2, 1, 2, 1, 1])  # Reserved reviewer

# Convert to unit vectors (direction only)
unit_1 = review_1 / np.linalg.norm(review_1)
unit_2 = review_2 / np.linalg.norm(review_2)

print(f"Unit 1: {unit_1.round(3)}")
print(f"Unit 2: {unit_2.round(3)}")
print(f"Lengths: {np.linalg.norm(unit_1):.3f}, {np.linalg.norm(unit_2):.3f}")
# Both have length 1.0!
Key insight: Normalization removes the “enthusiasm” factor and compares only the pattern of ratings.

Vector Subtraction: Finding the Difference

ab=[a1b1a2b2a3b3]\mathbf{a} - \mathbf{b} = \begin{bmatrix}a_1 - b_1\\a_2 - b_2\\a_3 - b_3\end{bmatrix} Real Example: What changed between two time periods?
# Monthly sales: [Product A, Product B, Product C]
january = np.array([1000, 500, 750])
february = np.array([1200, 450, 800])

# Change from January to February
change = february - january
print(f"Change: {change}")  # [200, -50, 50]
# Product A: +200, Product B: -50, Product C: +50

Practice: Vector Arithmetic

Let’s combine these operations:
import numpy as np

# Portfolio weights: [stocks, bonds, real_estate]
portfolio = np.array([0.6, 0.3, 0.1])

# Expected returns for each asset class
returns = np.array([0.10, 0.04, 0.06])  # 10%, 4%, 6%

# Weighted average return (dot product preview!)
portfolio_return = np.sum(portfolio * returns)
print(f"Expected portfolio return: {portfolio_return:.2%}")  # 7.4%

# Rebalance: shift 10% from stocks to bonds
shift = np.array([-0.1, 0.1, 0])
new_portfolio = portfolio + shift
print(f"Rebalanced: {new_portfolio}")  # [0.5, 0.4, 0.1]

Step 2: Measure How Similar Two Apartments Are

Now the real question: Given two apartments as vectors, how do we measure their similarity? Apartment Similarity Space

Attempt 1: Just Subtract (Doesn’t Work Well)

Your first instinct might be to subtract the numbers:
apartment_A = [2, 1200, 2400, 15]
apartment_B = [2, 1100, 2300, 18]

difference = [2-2, 1200-1100, 2400-2300, 15-18]
           = [0, 100, 100, -3]
But what does [0, 100, 100, -3] mean? The numbers have different units (bedrooms vs sqft vs dollars vs minutes). We can’t just add them.

Attempt 2: Euclidean Distance (Works, But Has Issues)

We could calculate the “distance” between apartments in 4D space:
import numpy as np

def distance(a, b):
    """Euclidean distance between two vectors."""
    return np.sqrt(sum((a[i] - b[i])**2 for i in range(len(a))))

apartment_A = np.array([2, 1200, 2400, 15])
apartment_B = np.array([2, 1100, 2300, 18])
apartment_C = np.array([4, 2500, 4500, 45])

print(f"A vs B: {distance(apartment_A, apartment_B):.0f}")  # 141
print(f"A vs C: {distance(apartment_A, apartment_C):.0f}")  # 2462
B is much closer to A than C is. Good! But there’s a problem: the sqft and rent numbers are huge (1000s) while bedrooms and commute are small (single digits). The big numbers dominate everything.

Attempt 3: Normalize First, Then Compare

The fix: scale all features to the same range (usually 0 to 1):
def normalize(apartments):
    """Scale each feature to 0-1 range."""
    apartments = np.array(apartments)
    mins = apartments.min(axis=0)
    maxs = apartments.max(axis=0)
    return (apartments - mins) / (maxs - mins)

# Original apartments
apartments = [
    [2, 1200, 2400, 15],   # A
    [2, 1100, 2300, 18],   # B
    [4, 2500, 4500, 45],   # C
    [1, 800, 1900, 10],    # D
]

# After normalization (all values between 0 and 1)
normalized = normalize(apartments)
print(normalized)
# A: [0.33, 0.24, 0.19, 0.14]
# B: [0.33, 0.18, 0.15, 0.23]
# C: [1.00, 1.00, 1.00, 1.00]
# D: [0.00, 0.00, 0.00, 0.00]
Now all features are on equal footing. A difference of 0.1 in bedrooms matters as much as 0.1 in rent.

Step 3: The Dot Product — Measuring Alignment

There’s an even better way to measure similarity: the dot product.

Mathematical Definition

The dot product (also called inner product or scalar product) of two vectors: ab=i=1naibi=a1b1+a2b2++anbn\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n What it does: Multiply corresponding numbers and add them up.
def dot_product(a, b):
    """Multiply corresponding elements and sum."""
    return sum(a[i] * b[i] for i in range(len(a)))

# Or simply: np.dot(a, b)
Example:
a = [1, 2, 3]
b = [4, 5, 6]

dot = 1*4 + 2*5 + 3*6
    = 4 + 10 + 18
    = 32

Geometric Interpretation

The dot product has a beautiful geometric meaning: ab=abcos(θ)\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos(\theta) Where θ\theta is the angle between the vectors!
Dot Product and Cosine Similarity Geometric Intuition
🎮 Interactive Visualization: Try the code below to see how the dot product changes as you rotate vectors!
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact, FloatSlider

def visualize_dot_product(angle_degrees=45):
    # Fixed vector a
    a = np.array([1, 0])
    
    # Vector b at specified angle
    angle_rad = np.radians(angle_degrees)
    b = np.array([np.cos(angle_rad), np.sin(angle_rad)])
    
    # Calculate dot product
    dot = np.dot(a, b)
    
    # Plot
    plt.figure(figsize=(8, 6))
    plt.quiver(0, 0, a[0], a[1], angles='xy', scale_units='xy', scale=1, color='blue', label='Vector a')
    plt.quiver(0, 0, b[0], b[1], angles='xy', scale_units='xy', scale=1, color='red', label='Vector b')
    plt.xlim(-1.5, 1.5)
    plt.ylim(-1.5, 1.5)
    plt.grid(True, alpha=0.3)
    plt.axhline(y=0, color='k', linewidth=0.5)
    plt.axvline(x=0, color='k', linewidth=0.5)
    plt.title(f'Angle: {angle_degrees}° | Dot Product: {dot:.3f} | cos({angle_degrees}°) = {np.cos(angle_rad):.3f}')
    plt.legend()
    plt.axis('equal')
    plt.show()

# Interactive slider - run in Jupyter!
# interact(visualize_dot_product, angle_degrees=FloatSlider(min=0, max=360, step=5, value=45))
What this tells us:
  • θ=0°\theta = 0° (same direction): cos(0°)=1\cos(0°) = 1 → Maximum positive dot product
  • θ=90°\theta = 90° (perpendicular): cos(90°)=0\cos(90°) = 0 → Dot product is zero
  • θ=180°\theta = 180° (opposite): cos(180°)=1\cos(180°) = -1 → Maximum negative dot product
import numpy as np

# Same direction
a = np.array([3, 0])
b = np.array([5, 0])
print(f"Same direction: {np.dot(a, b)}")  # 15 (positive)

# Perpendicular (90°)
a = np.array([3, 0])
b = np.array([0, 4])
print(f"Perpendicular: {np.dot(a, b)}")   # 0

# Opposite direction
a = np.array([3, 0])
b = np.array([-2, 0])
print(f"Opposite: {np.dot(a, b)}")        # -6 (negative)

# Verify the angle formula
a = np.array([1, 0])
b = np.array([1, 1])  # 45 degrees
angle = np.arccos(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
print(f"Angle between: {np.degrees(angle):.1f}°")  # 45.0°

The Dot Product in Action

Why does this measure similarity? Think about it intuitively:
  • If both apartments are high in the same features (both large, both expensive), the products are large → high dot product
  • If one is high where the other is low, products are small → low dot product
  • Apartments that are “aligned” (similar profile) have high dot products

Step 4: Cosine Similarity — The Industry Standard

The dot product has one problem: bigger vectors give bigger numbers regardless of similarity. Cosine similarity fixes this by normalizing: similarity(A,B)=ABA×B\text{similarity}(A, B) = \frac{A \cdot B}{|A| \times |B|} This gives a number between -1 and 1:
  • 1.0 = identical direction (very similar)
  • 0.0 = perpendicular (unrelated)
  • -1.0 = opposite direction (opposites)
def cosine_similarity(a, b):
    """Similarity based on angle, not magnitude."""
    dot = np.dot(a, b)
    magnitude_a = np.sqrt(np.dot(a, a))  # length of a
    magnitude_b = np.sqrt(np.dot(b, b))  # length of b
    return dot / (magnitude_a * magnitude_b)
Let’s test it on our apartments:
import numpy as np

apartments = {
    'A (my favorite)': np.array([2, 1200, 2400, 15]),
    'B (similar)': np.array([2, 1100, 2300, 18]),
    'C (luxury)': np.array([4, 2500, 4500, 45]),
    'D (studio)': np.array([1, 800, 1900, 10]),
}

my_apt = apartments['A (my favorite)']

print("Similarity to my apartment:")
for name, apt in apartments.items():
    sim = cosine_similarity(my_apt, apt)
    print(f"  {name}: {sim:.3f}")
Output:
Similarity to my apartment:
  A (my favorite): 1.000  ← identical to itself
  B (similar): 0.999      ← very similar!
  C (luxury): 0.997       ← surprisingly similar (same "shape", just bigger)
  D (studio): 0.998       ← also similar (same "shape", just smaller)
Wait, why is C so similar? Because cosine similarity measures direction, not magnitude. C is a “scaled up” version of A — same proportions, just bigger numbers. This is actually useful! It finds apartments with the same profile (ratio of bedrooms to sqft to rent), regardless of absolute size.

Real-World Application: Build a “Similar Apartments” Finder

Let’s build a working system:
import numpy as np

class ApartmentFinder:
    def __init__(self, apartments):
        """
        apartments: dict of {name: [beds, sqft, rent, commute]}
        """
        self.names = list(apartments.keys())
        self.vectors = np.array(list(apartments.values()))
        
        # Normalize for fair comparison
        self.normalized = self._normalize(self.vectors)
    
    def _normalize(self, data):
        mins = data.min(axis=0)
        maxs = data.max(axis=0)
        return (data - mins) / (maxs - mins + 1e-8)  # avoid division by zero
    
    def find_similar(self, query, top_k=3):
        """Find top_k most similar apartments to query."""
        # Normalize the query
        query_norm = (np.array(query) - self.vectors.min(axis=0)) / \
                     (self.vectors.max(axis=0) - self.vectors.min(axis=0) + 1e-8)
        
        # Calculate similarity to all apartments
        similarities = []
        for i, apt in enumerate(self.normalized):
            sim = np.dot(query_norm, apt) / \
                  (np.linalg.norm(query_norm) * np.linalg.norm(apt) + 1e-8)
            similarities.append((self.names[i], sim, self.vectors[i]))
        
        # Sort by similarity (highest first)
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        return similarities[:top_k]

# Database of apartments
listings = {
    'Downtown Loft': [1, 900, 2800, 5],
    'Suburban House': [4, 2200, 3200, 35],
    'Cozy Studio': [0, 500, 1500, 20],
    'Modern 2BR': [2, 1100, 2400, 15],
    'Family Home': [3, 1800, 2900, 25],
    'Luxury Penthouse': [2, 1500, 5500, 10],
    'Budget 1BR': [1, 700, 1800, 30],
    'Midtown 2BR': [2, 1050, 2350, 12],
}

finder = ApartmentFinder(listings)

# What I'm looking for
my_ideal = [2, 1200, 2400, 15]

print("Your search: 2BR, 1200sqft, $2400, 15min commute")
print("\nMost similar apartments:")
for name, similarity, features in finder.find_similar(my_ideal):
    print(f"  {similarity:.2f} - {name}: {int(features[0])}BR, "
          f"{int(features[1])}sqft, ${int(features[2])}, {int(features[3])}min")
Output:
Your search: 2BR, 1200sqft, $2400, 15min commute

Most similar apartments:
  0.98 - Modern 2BR: 2BR, 1100sqft, $2400, 15min
  0.97 - Midtown 2BR: 2BR, 1050sqft, $2350, 12min
  0.89 - Family Home: 3BR, 1800sqft, $2900, 25min
You just built Zillow’s “Similar Homes” feature!

Now Let’s Connect This to Machine Learning

Everything we just learned about apartments applies directly to ML. The concepts are identical — only the application changes.

Pattern: Real World → Vector → Similarity

Real WorldVector RepresentationWhat Similarity Finds
Apartments[beds, sqft, rent, commute]Similar listings
Songs[energy, tempo, danceability, mood]Songs you’ll like
Movies[action, romance, comedy, rating]Movies to recommend
Customers[age, income, purchases, visits]Customer segments
Images[pixel1, pixel2, …, pixel1000000]Similar images
Words[dimension1, …, dimension300]Related words
The math is identical. Once something is a vector, you can find similar items using dot products and cosine similarity.

Example: How Spotify Actually Works

Remember our apartment finder? Spotify does the exact same thing with songs:
# Spotify's actual audio features (simplified)
songs = {
    'Blinding Lights': [0.73, 0.51, 135, 0.00, 0.32],  # [energy, dance, tempo, acoustic, happy]
    'Levitating': [0.69, 0.70, 103, 0.03, 0.91],
    'Someone Like You': [0.34, 0.50, 67, 0.75, 0.14],
    'Uptown Funk': [0.93, 0.89, 115, 0.00, 0.97],
    'Hello': [0.40, 0.48, 79, 0.73, 0.25],
}

# Reusing our same logic!
finder = SongFinder(songs)  # Same algorithm as ApartmentFinder

# You just listened to Blinding Lights
print(finder.find_similar('Blinding Lights', top_k=2))
# → Levitating, Uptown Funk (similar energy/dance profiles)
The entire Spotify recommendation engine is built on the same vector similarity concept you just learned with apartments.

How This Applies to Neural Networks

Now let’s take the final step. In neural networks, everything is vectors, and everything is similarity and transformation.

What a Neural Network Does (Simplified)

  1. Input: Convert your data to a vector (image → pixels, text → numbers)
  2. Layers: Transform the vector through matrix multiplications (we’ll learn this next!)
  3. Output: Compare the final vector to known categories using… similarity
# Simplified: How image classification works
image_vector = [0.1, 0.8, 0.3, ...]  # 1000s of numbers from pixels

# The network transforms this to a "meaning" vector
meaning_vector = neural_network(image_vector)  # Let's say [0.9, 0.1, 0.05]

# Compare to category vectors
cat_vector = [1.0, 0.0, 0.0]  # What a "cat" looks like in meaning-space
dog_vector = [0.0, 1.0, 0.0]  # What a "dog" looks like

# Which is more similar?
cat_similarity = cosine_similarity(meaning_vector, cat_vector)  # 0.95
dog_similarity = cosine_similarity(meaning_vector, dog_vector)  # 0.10

# Prediction: It's a cat! (higher similarity)
The core operation — vector similarity — is exactly what you learned with apartments.
def cosine_similarity(a, b):
    """
    Returns a value between -1 and 1:
    - 1.0 = identical direction (very similar)
    - 0.0 = perpendicular (unrelated)  
    - -1.0 = opposite direction (very different)
    """
    dot = np.dot(a, b)
    magnitude_a = np.linalg.norm(a)  # length of a
    magnitude_b = np.linalg.norm(b)  # length of b
    return dot / (magnitude_a * magnitude_b)
Now let’s use it on our songs:
# Using our song vectors from earlier
sim_blinding_levitating = cosine_similarity(blinding_lights, levitating)
sim_blinding_adele = cosine_similarity(blinding_lights, someone_like_you)

print(f"Blinding Lights vs Levitating: {sim_blinding_levitating:.3f}")
print(f"Blinding Lights vs Someone Like You: {sim_blinding_adele:.3f}")

# Output:
# Blinding Lights vs Levitating: 0.891  (very similar - both upbeat pop)
# Blinding Lights vs Someone Like You: 0.412  (less similar - different vibes)
That’s the Spotify algorithm in a nutshell! Find songs with the highest cosine similarity to what you just played.

Vector Operations: The Building Blocks

Now that we can represent houses as vectors, what can we do with them?

1. Vector Addition: Combining Features

The Question: What if we want to combine two house profiles? Vector Addition Geometric Intuition: Place vectors tip-to-tail. The result is the diagonal. Algebraic Definition: Add corresponding components.
# Two house feature vectors
house_1 = np.array([3, 2000, 15, 5])
house_2 = np.array([2, 1500, 10, 3])

# Average house in the neighborhood
average_house = (house_1 + house_2) / 2
print(average_house)  # [2.5, 1750, 12.5, 4]
Why This Matters:
  • Feature engineering: Combine features to create new ones
  • Gradient descent: Update model parameters by adding gradients
  • Ensemble methods: Average predictions from multiple models
Real-World Example: User preferences
# User's historical preferences
past_prefs = np.array([0.8, 0.2, 0.5])  # [action, comedy, drama]

# Recent viewing behavior
recent = np.array([0.1, 0.3, 0.2])

# Updated preferences (weighted sum)
new_prefs = 0.7 * past_prefs + 0.3 * recent
print(new_prefs)  # [0.59, 0.23, 0.41]

2. Scalar Multiplication: Scaling Features

The Question: What if all house prices in a neighborhood increase by 20%? Scalar Multiplication Geometric Intuition: Stretch or shrink the vector. Direction stays the same. Algebraic Definition: Multiply each component by a number (scalar).
house = np.array([3, 2000, 15, 5])

# Scale by 1.2 (20% increase)
scaled_house = 1.2 * house
print(scaled_house)  # [3.6, 2400, 18, 6]
Why This Matters:
  • Normalization: Scale features to same range
  • Learning rate: Control how much to update parameters
  • Feature weighting: Emphasize important features
ML Application: Gradient descent
# Current model parameters
weights = np.array([50000, 120, -5000, -8000])

# Gradient (direction to improve)
gradient = np.array([100, 0.5, -20, -30])

# Learning rate (how far to move)
learning_rate = 0.01

# Update parameters
weights = weights - learning_rate * gradient
#                    ↑ scalar multiplication!
Key Insight: The learning rate controls the step size. Too large → overshoot. Too small → slow learning.

3. Dot Product: Measuring Similarity

The Big Question: How do we measure if two things are similar? This is THE most important operation in machine learning! Let’s see why through three examples. Dot Product with Houses Algebraic Definition: Multiply corresponding components and sum. Mathematical Formula: vw=i=1nviwi=v1w1+v2w2++vnwn\mathbf{v} \cdot \mathbf{w} = \sum_{i=1}^{n} v_i w_i = v_1w_1 + v_2w_2 + \ldots + v_nw_n Alternative Formula (geometric): vw=vwcos(θ)\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| \cos(\theta) Where θ\theta is the angle between vectors.

Example 1: Comparing Houses

house_1 = np.array([3, 2000, 10, 3])  # Suburban family home
house_2 = np.array([4, 2200, 8, 2])   # Similar house
house_3 = np.array([1, 800, 50, 20])  # Old studio apartment

# Compute dot products
sim_1_2 = np.dot(house_1, house_2)
sim_1_3 = np.dot(house_1, house_3)

print(f"House 1 · House 2 = {sim_1_2}")  # 4,400,098 (large, positive)
print(f"House 1 · House 3 = {sim_1_3}")  # 41,803 (much smaller)
Interpretation:
  • Large dot product = similar houses
  • Small dot product = different houses
  • Why? Similar houses have similar feature values, so products are large
Real application: Zillow uses this to find “similar homes” when you’re browsing!

Example 2: Matching Students for Study Groups

# Student profiles: [math_score, reading_score, science_score, study_hours]
alice = np.array([85, 92, 78, 12])    # Strong in reading
bob = np.array([95, 75, 88, 15])      # Strong in math
charlie = np.array([87, 90, 80, 13])  # Similar to Alice

# Who should Alice study with?
alice_bob = np.dot(alice, bob)
alice_charlie = np.dot(alice, charlie)

print(f"Alice · Bob = {alice_bob}")        # 23,265
print(f"Alice · Charlie = {alice_charlie}") # 24,021 (higher!)
Interpretation: Alice and Charlie have more similar learning patterns! Why this matters:
  • Form effective study groups (similar students help each other)
  • Pair struggling students with successful ones who had similar challenges
  • Predict who will benefit from group work
Real application: Educational platforms use this for peer matching!

Example 3: Movie Recommendations

# Movie features: [rating, runtime, year, action, romance, comedy]
inception = np.array([8.8, 148, 2010, 0.9, 0.1, 0.3])
interstellar = np.array([8.6, 169, 2014, 0.7, 0.2, 0.2])
titanic = np.array([7.9, 195, 1997, 0.3, 0.9, 0.2])

# You just watched Inception. What should Netflix recommend?
inception_interstellar = np.dot(inception, interstellar)
inception_titanic = np.dot(inception, titanic)

print(f"Inception · Interstellar = {inception_interstellar}")  # 26,847
print(f"Inception · Titanic = {inception_titanic}")            # 24,143
Recommendation: Watch Interstellar! (Higher similarity) Why it works: Both are:
  • High-rated sci-fi films
  • Similar runtime
  • Recent releases
  • Action-heavy with minimal romance
Real application: This is literally how Netflix, Spotify, and YouTube work!

Understanding the Dot Product Geometrically

Key Insights:
# Parallel vectors (same direction) → large positive dot product
v1 = np.array([2, 0])
v2 = np.array([3, 0])
print(np.dot(v1, v2))  # 6 (positive, large)

# Perpendicular vectors (90°) → dot product = 0
v3 = np.array([1, 0])
v4 = np.array([0, 1])
print(np.dot(v3, v4))  # 0 (orthogonal = independent!)

# Opposite vectors (180°) → negative dot product
v5 = np.array([1, 1])
v6 = np.array([-1, -1])
print(np.dot(v5, v6))  # -2 (opposite)
What this means:
  • Positive dot product: Vectors point in similar directions (similar items)
  • Zero dot product: Vectors are perpendicular (completely different items)
  • Negative dot product: Vectors point in opposite directions (opposite items)

Why Dot Product is Everywhere in ML

1. Neural Networks: Every layer computes dot products!
# A neuron computes: output = weights · inputs + bias
weights = np.array([0.5, -0.3, 0.8])
inputs = np.array([1.0, 2.0, 3.0])

output = np.dot(weights, inputs) + 0.1
# = (0.5×1.0) + (-0.3×2.0) + (0.8×3.0) + 0.1
# = 0.5 - 0.6 + 2.4 + 0.1 = 2.4
2. Similarity Search: Find similar items
# Find products similar to what user just bought
user_purchase = np.array([1, 0, 1, 0, 1])  # Product features
all_products = np.array([
    [1, 0, 1, 1, 0],  # Product A
    [1, 0, 1, 0, 1],  # Product B (identical!)
    [0, 1, 0, 1, 0],  # Product C (different)
])

similarities = [np.dot(user_purchase, product) for product in all_products]
print(similarities)  # [2, 3, 0] → Recommend Product B!
3. Attention Mechanisms: How transformers (GPT, BERT) work
# Simplified: How much should we "attend" to each word?
query = np.array([0.8, 0.2, 0.5])  # Current word
key1 = np.array([0.9, 0.1, 0.4])   # Word 1
key2 = np.array([0.2, 0.8, 0.1])   # Word 2

attention_1 = np.dot(query, key1)  # 0.94 (high attention!)
attention_2 = np.dot(query, key2)  # 0.37 (low attention)

4. Vector Magnitude: Measuring “Size”

The Question: How “big” is a house (in feature space)? Geometric Intuition: The length of the arrow. Algebraic Definition: Square root of dot product with itself.
house = np.array([3, 2000, 15, 5])

magnitude = np.linalg.norm(house)
# = sqrt(3² + 2000² + 15² + 5²)
# = sqrt(9 + 4,000,000 + 225 + 25)
# = sqrt(4,000,259)
# ≈ 2000.06
Mathematical Formula: v=vv=v12+v22++vn2\|\mathbf{v}\| = \sqrt{\mathbf{v} \cdot \mathbf{v}} = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2} Why This Matters: Normalization!
# Problem: sqft dominates (2000 vs 3 bedrooms)
house = np.array([3, 2000, 15, 5])

# Solution: Normalize to unit length
normalized = house / np.linalg.norm(house)
print(normalized)  # [0.0015, 0.9999, 0.0075, 0.0025]
print(np.linalg.norm(normalized))  # 1.0 (unit vector)
Key Insight: After normalization, all features contribute equally. Sqft no longer dominates!

Similarity Measures: Finding Similar Items

Cosine Similarity: Direction-Based

The Problem with Dot Product: It’s affected by magnitude!
# Two houses with same type, different size
small_house = np.array([2, 1000, 10, 3])
large_house = np.array([4, 2000, 20, 6])  # 2× small_house

# Dot product is very different
print(np.dot(small_house, small_house))  # 1,001,113
print(np.dot(large_house, large_house))  # 4,004,452 (4× larger!)
The Solution: Cosine similarity ignores magnitude, only cares about direction (type). Cosine Similarity Formula: similarity(v,w)=vwvw=cos(θ)\text{similarity}(\mathbf{v}, \mathbf{w}) = \frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{v}\| \|\mathbf{w}\|} = \cos(\theta) Range: -1 (opposite) to +1 (identical direction)
def cosine_similarity(v, w):
    """Compute cosine similarity between two vectors."""
    return np.dot(v, w) / (np.linalg.norm(v) * np.linalg.norm(w))

Example 1: House Type Matching (Ignoring Size)

# Find houses of similar TYPE, regardless of size
small_suburban = np.array([2, 1000, 10, 5])   # Small suburban
large_suburban = np.array([4, 2000, 20, 10])  # Large suburban (2× size)
urban_apartment = np.array([1, 800, 5, 1])    # Urban apartment

# Cosine similarity
print(f"Small vs Large suburban: {cosine_similarity(small_suburban, large_suburban):.3f}")  # 1.000!
print(f"Small suburban vs Urban: {cosine_similarity(small_suburban, urban_apartment):.3f}")  # 0.997
Key Insight: The two suburban houses are identical in TYPE (cosine = 1.0), even though one is twice the size! Why this matters:
  • A family looking for a suburban house doesn’t care if it’s 2000 or 4000 sqft
  • They care about the TYPE: suburban, family-friendly, good schools
  • Cosine similarity captures this!
Real application: Zillow’s “similar homes” feature uses cosine similarity to find homes of similar style, not just similar size.

Example 2: Student Learning Style (Not Just Scores)

# Student profiles: [math, reading, science, study_hours]
alice = np.array([85, 92, 78, 12])      # Strong reader, moderate study
alice_2x = np.array([170, 184, 156, 24]) # Alice with 2× scores (impossible, but illustrative)
bob = np.array([95, 75, 88, 15])        # Strong in math

# Cosine similarity
print(f"Alice vs Alice_2x: {cosine_similarity(alice, alice_2x):.3f}")  # 1.000 (same learning style!)
print(f"Alice vs Bob: {cosine_similarity(alice, bob):.3f}")            # 0.991 (different style)
Interpretation:
  • Alice and Alice_2x have IDENTICAL learning patterns (cosine = 1.0)
  • The magnitude doesn’t matter - it’s the PATTERN that counts
  • Alice is strong in reading, Bob is strong in math (different patterns)
Why this matters:
  • Match students with similar learning STYLES, not just similar scores
  • A student who scores 60/70/65 has the same pattern as one who scores 80/93/87
  • Recommend study materials based on learning style, not absolute performance
Real application: Khan Academy matches students with similar learning patterns to suggest effective study paths.

Example 3: Movie Taste (Not Just Ratings)

# Movie preferences: [action, romance, comedy, horror, sci-fi]
user_A = np.array([5, 1, 3, 0, 4])      # Loves action & sci-fi
user_A_harsh = np.array([3, 0, 2, 0, 2]) # Same taste, harsher ratings
user_B = np.array([1, 5, 2, 4, 0])      # Loves romance & horror

# Cosine similarity
print(f"User A vs A_harsh: {cosine_similarity(user_A, user_A_harsh):.3f}")  # 0.998 (same taste!)
print(f"User A vs B: {cosine_similarity(user_A, user_B):.3f}")              # 0.385 (different taste)
Key Insight: User A and User A_harsh have the SAME TASTE, just different rating scales!
  • User A rates generously (5, 4, 3)
  • User A_harsh rates strictly (3, 2, 1)
  • But they like the SAME TYPES of movies!
Why this matters:
  • Some users rate everything 5 stars, others are harsh critics
  • Cosine similarity finds users with similar TASTE, not similar rating scales
  • Recommend movies based on taste, not rating magnitude
Real application: Netflix uses cosine similarity because users have different rating behaviors, but similar tastes should get similar recommendations.

When to Use Cosine vs. Euclidean Distance

Use Cosine Similarity when:
  • ✅ Direction matters more than magnitude
  • ✅ Different scales (harsh vs. generous raters)
  • ✅ Text similarity (document length doesn’t matter)
  • ✅ Recommendation systems (taste, not intensity)
Use Euclidean Distance when:
  • ✅ Absolute position matters
  • ✅ Same scale for all features
  • ✅ Clustering (K-means)
  • ✅ Anomaly detection (how far from normal?)
# Example: Anomaly detection
normal_house = np.array([3, 2000, 15, 5])
similar_house = np.array([3, 2100, 14, 4])
anomaly = np.array([10, 8000, 2, 50])  # Weird house!

# Euclidean distance (absolute difference)
print(f"Normal vs Similar: {euclidean_distance(normal_house, similar_house):.1f}")  # 100.5
print(f"Normal vs Anomaly: {euclidean_distance(normal_house, anomaly):.1f}")        # 6007.0 (huge!)

# Cosine similarity (direction)
print(f"Normal vs Similar: {cosine_similarity(normal_house, similar_house):.3f}")  # 0.999
print(f"Normal vs Anomaly: {cosine_similarity(normal_house, anomaly):.3f}")        # 0.996 (still high!)
Interpretation: Euclidean distance catches the anomaly better because it cares about MAGNITUDE!

Real-World Application: Finding Similar Houses

Let’s build a simple house recommendation system!
import numpy as np

# Database of houses (bedrooms, sqft, age, distance)
houses = np.array([
    [3, 2000, 15, 5],   # House 0
    [4, 2200, 8, 2],    # House 1
    [2, 1200, 25, 3],   # House 2
    [3, 1900, 12, 6],   # House 3
    [5, 3500, 5, 15],   # House 4
])

# Prices (in thousands)
prices = np.array([320, 380, 250, 310, 550])

# Query: User likes this house
query_house = np.array([3, 2000, 10, 4])

# Find 3 most similar houses
similarities = []
for i, house in enumerate(houses):
    sim = cosine_similarity(query_house, house)
    similarities.append((i, sim, prices[i]))

# Sort by similarity
similarities.sort(key=lambda x: x[1], reverse=True)

print("Top 3 similar houses:")
for i, (idx, sim, price) in enumerate(similarities[:3], 1):
    print(f"{i}. House {idx}: similarity={sim:.3f}, price=${price}k")
Output:
Top 3 similar houses:
1. House 0: similarity=0.999, price=$320k
2. House 3: similarity=0.998, price=$310k
3. House 1: similarity=0.997, price=$380k
Prediction: Based on similar houses, estimated price ≈ $337k (average of top 3)

Supporting Example 1: Document Similarity

The same vector concepts apply to text!
from sklearn.feature_extraction.text import CountVectorizer

documents = [
    "machine learning is awesome",
    "deep learning is a subset of machine learning",
    "neural networks are powerful",
    "python is great for machine learning"
]

# Convert to vectors
vectorizer = CountVectorizer()
doc_vectors = vectorizer.fit_transform(documents).toarray()

print("Vocabulary:", vectorizer.get_feature_names_out())
print("\nDocument vectors:")
print(doc_vectors)

# Find similar documents to "machine learning"
query = "machine learning"
query_vector = vectorizer.transform([query]).toarray()[0]

for i, doc_vec in enumerate(doc_vectors):
    sim = cosine_similarity(query_vector, doc_vec)
    print(f"Doc {i}: {sim:.3f} - {documents[i]}")
Key Insight: Same math, different domain!

Supporting Example 2: User Recommendations

# User-movie rating matrix
ratings = np.array([
    [5, 4, 0, 0, 1],  # User 0: likes action/comedy
    [4, 5, 0, 0, 2],  # User 1: similar to User 0
    [0, 0, 5, 4, 5],  # User 2: likes drama/romance
    [5, 4, 0, 1, 1],  # User 3: similar to User 0
])

# Find users similar to User 0
user_0 = ratings[0]
for i in range(1, len(ratings)):
    sim = cosine_similarity(user_0, ratings[i])
    print(f"User {i}: similarity = {sim:.3f}")

# Output:
# User 1: similarity = 0.987 (recommend same movies!)
# User 2: similarity = 0.140 (different taste)
# User 3: similarity = 0.989 (very similar)

Practice Exercises

Exercise 1: House Price Estimation

# Given these houses and prices
houses = np.array([
    [3, 1800, 20, 5],  # $280k
    [4, 2400, 10, 3],  # $360k
    [2, 1200, 30, 8],  # $220k
])
prices = np.array([280, 360, 220])

# Predict price for this house
new_house = np.array([3, 2000, 15, 4])

# TODO: Find 2 most similar houses and average their prices

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises blend mathematical concepts with real-world scenarios. Try to solve them before peeking at the solutions.

Exercise 1: Music Streaming Recommendations 🎵

Spotify represents songs as vectors based on audio features. Given these song vectors:
SongEnergyDanceabilityAcousticnessTempo (normalized)
Your Favorite0.80.70.20.6
Song A0.90.80.10.7
Song B0.30.40.90.3
Song C0.70.60.30.5
Task: Find which song is most similar to “Your Favorite” using cosine similarity.
import numpy as np

# Define the song vectors
your_favorite = np.array([0.8, 0.7, 0.2, 0.6])
song_A = np.array([0.9, 0.8, 0.1, 0.7])
song_B = np.array([0.3, 0.4, 0.9, 0.3])
song_C = np.array([0.7, 0.6, 0.3, 0.5])

# TODO: Calculate cosine similarity with each song
# TODO: Which song should Spotify recommend?
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

your_favorite = np.array([0.8, 0.7, 0.2, 0.6])
song_A = np.array([0.9, 0.8, 0.1, 0.7])
song_B = np.array([0.3, 0.4, 0.9, 0.3])
song_C = np.array([0.7, 0.6, 0.3, 0.5])

songs = {'Song A': song_A, 'Song B': song_B, 'Song C': song_C}

print("Similarity scores:")
for name, song in songs.items():
    sim = cosine_similarity(your_favorite, song)
    print(f"  {name}: {sim:.4f}")

# Output:
# Song A: 0.9945 ← Most similar (upbeat, danceable)
# Song B: 0.6847  (very different - acoustic, slow)
# Song C: 0.9903  (also quite similar)

print("\n✅ Recommendation: Song A (0.9945 similarity)")
Real-World Insight: This is exactly how Spotify’s “Discover Weekly” works! Songs are represented as 12+ dimensional vectors including tempo, key, loudness, and more.

Exercise 2: E-commerce Product Matching 🛒

Amazon wants to show “Similar Products” when a customer views an item. Products are represented as vectors: Features: [price_tier, avg_rating, num_reviews (log), category_score, brand_popularity]
# Customer is viewing this laptop
current_product = np.array([4, 4.5, 3.2, 0.9, 0.7])

# Candidate products to recommend
products = {
    "Budget Laptop":    np.array([2, 4.2, 2.8, 0.9, 0.4]),
    "Gaming Laptop":    np.array([5, 4.6, 3.5, 0.8, 0.9]),
    "Similar Laptop":   np.array([4, 4.4, 3.0, 0.9, 0.65]),
    "Tablet":           np.array([3, 4.3, 3.1, 0.3, 0.6]),
}
Tasks:
  1. Calculate both Euclidean distance AND cosine similarity for each product
  2. Which metric gives better recommendations and why?
  3. Should we normalize the data first?
import numpy as np

def euclidean_distance(a, b):
    return np.sqrt(np.sum((a - b) ** 2))

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

current_product = np.array([4, 4.5, 3.2, 0.9, 0.7])

products = {
    "Budget Laptop":    np.array([2, 4.2, 2.8, 0.9, 0.4]),
    "Gaming Laptop":    np.array([5, 4.6, 3.5, 0.8, 0.9]),
    "Similar Laptop":   np.array([4, 4.4, 3.0, 0.9, 0.65]),
    "Tablet":           np.array([3, 4.3, 3.1, 0.3, 0.6]),
}

print("Product Comparison:")
print("-" * 55)
print(f"{'Product':<18} {'Euclidean':<12} {'Cosine Sim':<12}")
print("-" * 55)

for name, vec in products.items():
    dist = euclidean_distance(current_product, vec)
    sim = cosine_similarity(current_product, vec)
    print(f"{name:<18} {dist:<12.4f} {sim:<12.4f}")

# Output:
# Budget Laptop     2.1541       0.9812
# Gaming Laptop     1.1533       0.9975  ← Cosine picks this
# Similar Laptop    0.2693       0.9994  ← Euclidean picks this
# Tablet            0.7280       0.9432

print("\n📊 Analysis:")
print("• Euclidean: Similar Laptop wins (closest in absolute values)")
print("• Cosine: Similar Laptop also wins (most similar direction)")
print("\n✅ Both agree! But Euclidean is better here because")
print("   price_tier matters in absolute terms, not just ratio.")
Key Insight:
  • Use Euclidean when magnitude matters (price, ratings)
  • Use Cosine when only direction matters (document topics, user preferences)
  • Always normalize features to different scales!

Exercise 3: Dating App Compatibility 💕

A dating app represents users as compatibility vectors: Features: [adventure_score, introversion, career_focus, family_values, humor_style]
# Your profile
you = np.array([0.8, 0.3, 0.7, 0.6, 0.9])

# Potential matches
matches = {
    "Alex":   np.array([0.7, 0.4, 0.8, 0.5, 0.85]),
    "Jordan": np.array([0.2, 0.9, 0.3, 0.8, 0.4]),
    "Casey":  np.array([0.9, 0.2, 0.6, 0.7, 0.95]),
    "Morgan": np.array([0.5, 0.5, 0.5, 0.5, 0.5]),
}
Tasks:
  1. Calculate a “compatibility score” using dot product
  2. Normalize and use cosine similarity - does the ranking change?
  3. Which match is best and why?
import numpy as np

you = np.array([0.8, 0.3, 0.7, 0.6, 0.9])

matches = {
    "Alex":   np.array([0.7, 0.4, 0.8, 0.5, 0.85]),
    "Jordan": np.array([0.2, 0.9, 0.3, 0.8, 0.4]),
    "Casey":  np.array([0.9, 0.2, 0.6, 0.7, 0.95]),
    "Morgan": np.array([0.5, 0.5, 0.5, 0.5, 0.5]),
}

print("Compatibility Analysis:")
print("-" * 50)
print(f"{'Match':<10} {'Dot Product':<14} {'Cosine Sim':<12}")
print("-" * 50)

for name, profile in matches.items():
    dot = np.dot(you, profile)
    cos = np.dot(you, profile) / (np.linalg.norm(you) * np.linalg.norm(profile))
    print(f"{name:<10} {dot:<14.4f} {cos:<12.4f}")

# Output:
# Alex       2.1650         0.9844  
# Jordan     1.4700         0.7429  
# Casey      2.3350         0.9937  ← Best match!
# Morgan     1.6500         0.9071  

print("\n💕 Best Match: Casey!")
print("   • High adventure (0.9 vs your 0.8)")
print("   • Similar introversion level (0.2 vs 0.3)")
print("   • Compatible humor style (0.95 vs 0.9)")
print("\n⚠️  Jordan is least compatible:")
print("   • Opposite on adventure (0.2 vs 0.8)")
print("   • Opposite on introversion (0.9 vs 0.3)")
Real-World Insight: Dating apps like Hinge and OkCupid use similar vector-based matching, but with 50+ dimensions including behavioral data from swipes and messages!

Exercise 4: Document Search Engine 📄

Build a simple search engine using TF-IDF vectors:
# Documents (already converted to TF-IDF vectors)
# Dimensions: [python, machine, learning, data, web, api]
documents = {
    "ML Tutorial":     np.array([0.5, 0.8, 0.9, 0.7, 0.1, 0.2]),
    "Web Dev Guide":   np.array([0.2, 0.1, 0.0, 0.3, 0.9, 0.8]),
    "Data Science":    np.array([0.6, 0.5, 0.7, 0.9, 0.2, 0.3]),
    "Python Basics":   np.array([0.9, 0.2, 0.3, 0.4, 0.3, 0.4]),
}

# User searches for "machine learning python"
query = np.array([0.7, 0.9, 0.8, 0.3, 0.0, 0.0])
Tasks:
  1. Rank documents by relevance to the query
  2. What’s the top result?
  3. Why might “Data Science” rank higher than “Python Basics” even though query has “python”?
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

documents = {
    "ML Tutorial":     np.array([0.5, 0.8, 0.9, 0.7, 0.1, 0.2]),
    "Web Dev Guide":   np.array([0.2, 0.1, 0.0, 0.3, 0.9, 0.8]),
    "Data Science":    np.array([0.6, 0.5, 0.7, 0.9, 0.2, 0.3]),
    "Python Basics":   np.array([0.9, 0.2, 0.3, 0.4, 0.3, 0.4]),
}

query = np.array([0.7, 0.9, 0.8, 0.3, 0.0, 0.0])

print("🔍 Search Results for 'machine learning python':")
print("-" * 45)

results = []
for name, doc in documents.items():
    sim = cosine_similarity(query, doc)
    results.append((name, sim))

# Sort by similarity (descending)
results.sort(key=lambda x: x[1], reverse=True)

for rank, (name, sim) in enumerate(results, 1):
    print(f"{rank}. {name:<18} (relevance: {sim:.4f})")

# Output:
# 1. ML Tutorial        (relevance: 0.9357) ← Top result!
# 2. Data Science       (relevance: 0.8234)
# 3. Python Basics      (relevance: 0.7156)
# 4. Web Dev Guide      (relevance: 0.1342)

print("\n📊 Why 'Data Science' > 'Python Basics'?")
print("   Query emphasizes 'machine' (0.9) and 'learning' (0.8)")
print("   Data Science has machine=0.5, learning=0.7")
print("   Python Basics has machine=0.2, learning=0.3")
print("   Even though Python Basics has higher 'python' score,")
print("   the overall direction is less aligned with the query!")
Real-World Insight: This is how Google Search worked in its early days! Modern search engines add hundreds more signals (links, freshness, user behavior).

🚨 Real-World Challenge: Handling Messy Data

In textbooks, data is clean. In production, data is messy. Here’s how to handle real-world vector problems:
Production Reality: Real data has missing values, outliers, inconsistent scales, and noise. Your similarity system will fail if you don’t handle these!

Missing Values

import numpy as np

# Real apartment data with missing values (NaN)
apartments = np.array([
    [2, 1200, 2400, 15],      # Complete
    [2, np.nan, 2300, 18],    # Missing sqft
    [np.nan, 2500, 4500, 45], # Missing bedrooms
    [1, 800, np.nan, 10],     # Missing rent
])

# Strategy 1: Impute with column mean
def impute_mean(data):
    result = data.copy()
    for col in range(data.shape[1]):
        col_mean = np.nanmean(data[:, col])
        mask = np.isnan(result[:, col])
        result[mask, col] = col_mean
    return result

# Strategy 2: Impute with median (robust to outliers)
def impute_median(data):
    result = data.copy()
    for col in range(data.shape[1]):
        col_median = np.nanmedian(data[:, col])
        mask = np.isnan(result[:, col])
        result[mask, col] = col_median
    return result

cleaned = impute_mean(apartments)
print("Cleaned data:\n", cleaned)

Outlier Detection

# Detect outliers using z-score
def detect_outliers(data, threshold=3):
    """Flag values more than `threshold` std devs from mean."""
    means = np.nanmean(data, axis=0)
    stds = np.nanstd(data, axis=0)
    z_scores = np.abs((data - means) / (stds + 1e-8))
    return z_scores > threshold

# Example: Luxury penthouse is an outlier
apartments = np.array([
    [2, 1200, 2400, 15],
    [2, 1100, 2300, 18],
    [2, 1150, 2500, 16],
    [2, 50000, 100000, 15],  # Outlier! Mansion accidentally in apartment data
])

outliers = detect_outliers(apartments)
print("Outlier locations:\n", outliers)
# Handle: Remove, cap, or flag for review

Feature Scaling Choices

# Different scaling methods for different situations

# Min-Max: Scale to [0, 1] - use when you need bounded values
def minmax_scale(data):
    mins = data.min(axis=0)
    maxs = data.max(axis=0)
    return (data - mins) / (maxs - mins + 1e-8)

# Z-Score: Center and scale - use when comparing distributions
def zscore_scale(data):
    means = data.mean(axis=0)
    stds = data.std(axis=0)
    return (data - means) / (stds + 1e-8)

# Robust: Use median/IQR - use when outliers are present
def robust_scale(data):
    medians = np.median(data, axis=0)
    q75 = np.percentile(data, 75, axis=0)
    q25 = np.percentile(data, 25, axis=0)
    iqr = q75 - q25
    return (data - medians) / (iqr + 1e-8)

print("Choose your scaler based on your data characteristics!")
Rule of Thumb:
  • Min-Max: Neural networks, bounded features
  • Z-Score: Most ML algorithms, normally distributed data
  • Robust: Data with outliers, skewed distributions

🔬 Advanced Deep Dive (Optional)

Why High Dimensions Are Weird

In high dimensions, our intuition breaks down completely:
import numpy as np

def random_vector_similarity(dim, n_pairs=1000):
    """Average cosine similarity between random unit vectors."""
    similarities = []
    for _ in range(n_pairs):
        a = np.random.randn(dim)
        b = np.random.randn(dim)
        a = a / np.linalg.norm(a)
        b = b / np.linalg.norm(b)
        similarities.append(np.dot(a, b))
    return np.mean(similarities), np.std(similarities)

print("Random vector similarity by dimension:")
for dim in [2, 10, 100, 1000, 10000]:
    mean, std = random_vector_similarity(dim)
    print(f"  {dim:5d}D: mean={mean:+.4f}, std={std:.4f}")

# Output:
#      2D: mean=+0.0012, std=0.7071  ← High variance
#     10D: mean=-0.0008, std=0.3162
#    100D: mean=+0.0002, std=0.1000
#   1000D: mean=-0.0001, std=0.0316  ← Nearly orthogonal!
#  10000D: mean=+0.0000, std=0.0100  ← All vectors ~90° apart
Key Insight: In 10,000 dimensions, random vectors are almost perfectly orthogonal! This is why:
  • Random embeddings don’t work (everything is equally dissimilar)
  • Trained embeddings are necessary (learn meaningful directions)
  • Dimension reduction (PCA, t-SNE) helps visualization

Volume Concentration

# In high-D, almost all volume is at the surface of a sphere!
def shell_volume_ratio(dim, thickness=0.01):
    """What fraction of unit ball is within `thickness` of surface?"""
    inner_radius = 1 - thickness
    # V(r) ∝ r^d
    inner_volume_ratio = inner_radius ** dim
    shell_ratio = 1 - inner_volume_ratio
    return shell_ratio

print("Fraction of volume near surface (within 1%):")
for dim in [2, 10, 50, 100, 500]:
    ratio = shell_volume_ratio(dim)
    print(f"  {dim:3d}D: {ratio:.4%}")

# Output:
#    2D: 1.99%
#   10D: 9.56%
#   50D: 39.50%
#  100D: 63.40%
#  500D: 99.33%  ← Almost everything is on the edge!

Implications for ML

  1. Nearest Neighbors degrades: All points become equidistant
  2. More data needed: Exponentially more samples to cover space
  3. Regularization essential: Prevents overfitting in sparse spaces
  4. Feature selection matters: Irrelevant features hurt more in high-D

Key Takeaways

Vectors represent data - Houses, images, text all become vectors
Dot product measures similarity - Foundation of neural networks
Cosine similarity - Direction-based (ignores magnitude)
Euclidean distance - Position-based (includes magnitude)
Normalization matters - Prevent one feature from dominating
Same math, different domains - Vectors work everywhere!
Handle messy data - Missing values, outliers, and scaling are production realities
High dimensions are weird - Curse of dimensionality affects all similarity search

🔗 Math → ML Connection Summary

What you learned in this module powers these ML systems:
Vector ConceptML ApplicationReal-World Example
Representing data as vectorsFeature vectors in any ML modelEvery scikit-learn model takes feature vectors
Dot productNeural network layers, attentiony = W·x + b is the core of deep learning
Cosine similaritySemantic search, recommendationsChatGPT’s embeddings, Spotify recommendations
Euclidean distanceKNN classification, clusteringCustomer segmentation, image retrieval
NormalizationBatch normalization, feature scalingRequired preprocessing for most models
High-dimensional vectorsWord embeddings, image featuresGPT uses 12,000+ dimensional embeddings
Next time you use any ML model, remember: it’s operating on vectors using these exact operations!

For learners who want the mathematical foundations:

Vector Spaces: The Abstract View

A vector space is a set of objects (vectors) with two operations (addition and scalar multiplication) that satisfy certain axioms. This abstraction lets us apply vector math to surprising domains:
Domain”Vectors”AdditionScalar Multiplication
Functionsf(x), g(x)(f+g)(x) = f(x) + g(x)(cf)(x) = c·f(x)
Polynomials1, x, x², …Combine coefficientsScale coefficients
MatricesAny m×n matrixElement-wise additionElement-wise scaling
SignalsTime seriesAdd signalsAmplify/attenuate

Linear Independence & Basis

A set of vectors is linearly independent if no vector can be written as a combination of others:If c1v1+c2v2++cnvn=0, then all ci=0\text{If } c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_n\mathbf{v}_n = \mathbf{0} \text{, then all } c_i = 0A basis is a minimal set of linearly independent vectors that span the space.ML Application: In neural networks, we’re essentially finding a good basis to represent data. Autoencoders find compressed bases; attention mechanisms dynamically select relevant basis directions.

Inner Product Spaces

Our dot product is a specific inner product. More generally, an inner product ⟨·,·⟩ satisfies:
  1. ⟨u, v⟩ = ⟨v, u⟩ (symmetry)
  2. ⟨au + bv, w⟩ = a⟨u, w⟩ + b⟨v, w⟩ (linearity)
  3. ⟨v, v⟩ ≥ 0, with equality iff v = 0 (positive definiteness)
Why this matters: Different inner products define different notions of similarity! Kernel methods in ML use custom inner products to find nonlinear patterns.
  • Gilbert Strang’s Linear Algebra (MIT OpenCourseWare) - Rigorous but intuitive
  • 3Blue1Brown: Essence of Linear Algebra - Visual understanding
  • Mathematics for Machine Learning book, Ch. 2-3 - ML-focused treatment

Word Embeddings: Vectors in NLP

Mind-blowing application: Words are vectors, and vector math works on meaning!
# Word2Vec / GloVe represent words as ~300-dimensional vectors
# Famous example: King - Man + Woman ≈ Queen

king = np.array([0.5, 0.3, 0.8, ...])    # 300 dimensions
man = np.array([0.4, 0.2, 0.1, ...])
woman = np.array([0.4, 0.3, 0.2, ...])

# Vector arithmetic on meaning!
result = king - man + woman
# result is closest to the "queen" vector!

# This works because:
# king - man captures "royalty without gender"
# Adding woman reintroduces gender → queen
Modern AI (GPT-4, Claude) uses this same principle with transformer embeddings of 12,000+ dimensions!

Interview Questions: Vectors

Answer: The dot product ab=aibi\mathbf{a} \cdot \mathbf{b} = \sum a_i b_i measures alignment between vectors. In ML:
  • Neural networks: Every neuron computes a dot product (weights · inputs)
  • Attention mechanisms: Query-key dot products determine what to focus on
  • Similarity search: Cosine similarity uses normalized dot products
  • Loss functions: Many involve dot products (cross-entropy, hinge loss)
Answer:
  • Cosine: When magnitude doesn’t matter (text similarity, user preferences, normalized data)
  • Euclidean: When absolute values matter (physical distance, raw measurements)
  • Example: Two documents about ML with different lengths should be similar (cosine), but two GPS coordinates need actual distance (Euclidean)
Answer: In high dimensions:
  • All points become roughly equidistant (“curse of dimensionality”)
  • Random vectors are almost orthogonal (cosine ≈ 0)
  • This is why PCA/dimension reduction is important
  • Modern embeddings (512-4096 dim) are trained to preserve meaningful similarity

What’s Next?

You now understand how to represent houses as vectors and measure similarity. But how do we actually predict the price? That’s where matrices come in. A matrix is a function that transforms input (house features) into output (price prediction). This is exactly how neural networks work!

Next: Matrices & Transformations

Learn how matrices transform house features into price predictions