> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Vectors: The Language of Similarity

> From finding similar houses to understanding how recommendations work

<Frame>
  <img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/math-for-ml-linear-algebra/vectors-concept.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=921b5c2767377b4604312933696c4368" alt="Vectors - The Language of Similarity" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/vectors-concept.svg" />
</Frame>

# Vectors: The Language of Similarity

## A Problem You Already Understand

You're looking for a new apartment. You visit Zillow and find one you love:

* **2 bedrooms**
* **1,200 square feet**
* **\$2,400/month rent**
* **15 minutes from work**

Now you want to find **similar apartments**. Not identical — just similar enough that you'd consider them.

Zillow shows you a "Similar Homes" section. But how did they decide which apartments are similar?

**Think about it**: What makes two apartments "similar"?

* Same number of bedrooms?
* Similar size?
* Similar rent?
* Similar commute?

**All of the above**, in some combination. And that combination is exactly what vectors and similarity measures capture.

<Info>
  **Estimated Time**: 3-4 hours\
  **Difficulty**: Beginner\
  **Prerequisites**: Basic Python\
  **What You'll Build**: A "Find Similar Items" system that works for apartments, songs, or anything
</Info>

<Note>
  **🔗 ML Connection**: Vectors are THE foundation of modern ML. Here's where you'll see them:

  | ML System                  | Vector Representation                      |
  | -------------------------- | ------------------------------------------ |
  | **Word2Vec/GPT**           | Every word → 300-768 dimensional vector    |
  | **Face Recognition**       | Every face → 128-512 dimensional embedding |
  | **Recommendation Systems** | Users & items in shared vector space       |
  | **Image Classification**   | CNN features as vectors                    |

  After this module, you'll understand exactly how these systems find "similar" items!
</Note>

***

## Step 1: Describe Things with Numbers

The first insight is simple: **we can describe any apartment as a list of numbers.**

<img src="https://mintcdn.com/devweeekends/1GcDwVN8SzYRbJg1/images/courses/math-for-ml-linear-algebra/apartment-as-vector.svg?fit=max&auto=format&n=1GcDwVN8SzYRbJg1&q=85&s=7bf784c6055f80dd6558921bec615568" alt="Apartment as Vector" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/apartment-as-vector.svg" />

```python theme={null}
# My favorite apartment
my_apartment = [2, 1200, 2400, 15]
#               ↑   ↑     ↑    ↑
#            beds sqft  rent  commute(min)
```

Now every apartment is just 4 numbers:

```python theme={null}
apartment_A = [2, 1200, 2400, 15]   # My favorite
apartment_B = [2, 1100, 2300, 18]   # Very similar!
apartment_C = [4, 2500, 4500, 45]   # Very different
apartment_D = [1, 800, 1900, 10]    # Smaller, cheaper, closer
```

**This list of numbers is called a vector.** That's it. A vector is just an ordered list of numbers that describes something.

<Note>
  **Key Insight**: Once something is described as numbers, we can use math to compare things automatically. No human judgment needed.
</Note>

<img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/math-for-ml-linear-algebra/vector-math-concept.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=c15e9344ecec6323afcb6221726b36d1" alt="Vector Math Concept" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/vector-math-concept.svg" />

***

## Mathematical Foundations: Vector Operations

Before we measure similarity, let's master the fundamental operations. These are the building blocks of ALL machine learning.

### Vector Addition: Combine Two Vectors

When you add vectors, you add corresponding components:

$$
\mathbf{a} + \mathbf{b} = \begin{bmatrix}a_1\\a_2\\a_3\end{bmatrix} + \begin{bmatrix}b_1\\b_2\\b_3\end{bmatrix} = \begin{bmatrix}a_1 + b_1\\a_2 + b_2\\a_3 + b_3\end{bmatrix}
$$

**Real Example**: Combining two shopping carts:

```python theme={null}
import numpy as np

# Shopping cart contents: [apples, bananas, oranges]
cart_monday = np.array([3, 2, 5])
cart_tuesday = np.array([1, 4, 2])

# Total purchases
total = cart_monday + cart_tuesday
print(f"Total: {total}")  # [4, 6, 7]
```

**Geometric Interpretation**: Place vectors tip-to-tail; the sum goes from the first tail to the last tip.

### Scalar Multiplication: Scale a Vector

Multiply every component by the same number (scalar):

$$
c \cdot \mathbf{v} = c \cdot \begin{bmatrix}v_1\\v_2\\v_3\end{bmatrix} = \begin{bmatrix}c \cdot v_1\\c \cdot v_2\\c \cdot v_3\end{bmatrix}
$$

**Real Example**: Double a recipe:

```python theme={null}
# Recipe: [flour_cups, sugar_cups, eggs]
recipe = np.array([2, 0.5, 3])

# Double the recipe
doubled = 2 * recipe
print(f"Doubled: {doubled}")  # [4, 1, 6]

# Half the recipe
halved = 0.5 * recipe
print(f"Halved: {halved}")  # [1, 0.25, 1.5]
```

**Geometric Interpretation**: Scalar > 1 stretches the vector; 0 \< scalar \< 1 shrinks it; negative flips direction.

### Vector Magnitude (Length)

The magnitude (or norm) measures how "big" a vector is:

$$
\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}
$$

**Real Example**: Distance from origin:

```python theme={null}
# Your position: [x, y] = [3, 4]
position = np.array([3, 4])

# Distance from origin (Pythagorean theorem!)
magnitude = np.sqrt(3**2 + 4**2)  # = 5
# Or use NumPy:
magnitude = np.linalg.norm(position)  # = 5.0

print(f"Distance from origin: {magnitude}")
```

<Tip>
  **Fun fact**: The 3-4-5 triangle is the most famous Pythagorean triple! Ancient Egyptians used it to create right angles in construction.
</Tip>

### Unit Vectors: Direction Without Magnitude

A **unit vector** has length 1 and only represents direction:

$$
\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}
$$

**Real Example**: Normalize for comparison:

```python theme={null}
# Two vectors with different magnitudes
review_1 = np.array([5, 4, 5, 3, 4])  # Enthusiastic reviewer
review_2 = np.array([2, 1, 2, 1, 1])  # Reserved reviewer

# Convert to unit vectors (direction only)
unit_1 = review_1 / np.linalg.norm(review_1)
unit_2 = review_2 / np.linalg.norm(review_2)

print(f"Unit 1: {unit_1.round(3)}")
print(f"Unit 2: {unit_2.round(3)}")
print(f"Lengths: {np.linalg.norm(unit_1):.3f}, {np.linalg.norm(unit_2):.3f}")
# Both have length 1.0!
```

**Key insight**: Normalization removes the "enthusiasm" factor and compares only the *pattern* of ratings.

### Vector Subtraction: Finding the Difference

$$
\mathbf{a} - \mathbf{b} = \begin{bmatrix}a_1 - b_1\\a_2 - b_2\\a_3 - b_3\end{bmatrix}
$$

**Real Example**: What changed between two time periods?

```python theme={null}
# Monthly sales: [Product A, Product B, Product C]
january = np.array([1000, 500, 750])
february = np.array([1200, 450, 800])

# Change from January to February
change = february - january
print(f"Change: {change}")  # [200, -50, 50]
# Product A: +200, Product B: -50, Product C: +50
```

### Practice: Vector Arithmetic

Let's combine these operations:

```python theme={null}
import numpy as np

# Portfolio weights: [stocks, bonds, real_estate]
portfolio = np.array([0.6, 0.3, 0.1])

# Expected returns for each asset class
returns = np.array([0.10, 0.04, 0.06])  # 10%, 4%, 6%

# Weighted average return (dot product preview!)
portfolio_return = np.sum(portfolio * returns)
print(f"Expected portfolio return: {portfolio_return:.2%}")  # 7.4%

# Rebalance: shift 10% from stocks to bonds
shift = np.array([-0.1, 0.1, 0])
new_portfolio = portfolio + shift
print(f"Rebalanced: {new_portfolio}")  # [0.5, 0.4, 0.1]
```

***

## Step 2: Measure How Similar Two Apartments Are

Now the real question: **Given two apartments as vectors, how do we measure their similarity?**

<img src="https://mintcdn.com/devweeekends/1GcDwVN8SzYRbJg1/images/courses/math-for-ml-linear-algebra/apartment-similarity-space.svg?fit=max&auto=format&n=1GcDwVN8SzYRbJg1&q=85&s=608036582e9b92803ea2738ec4af3ce0" alt="Apartment Similarity Space" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/apartment-similarity-space.svg" />

### Attempt 1: Just Subtract (Doesn't Work Well)

Your first instinct might be to subtract the numbers:

```python theme={null}
apartment_A = [2, 1200, 2400, 15]
apartment_B = [2, 1100, 2300, 18]

difference = [2-2, 1200-1100, 2400-2300, 15-18]
           = [0, 100, 100, -3]
```

But what does `[0, 100, 100, -3]` mean? The numbers have different units (bedrooms vs sqft vs dollars vs minutes). We can't just add them.

### Attempt 2: Euclidean Distance (Works, But Has Issues)

We could calculate the "distance" between apartments in 4D space:

```python theme={null}
import numpy as np

def distance(a, b):
    """Euclidean distance between two vectors."""
    return np.sqrt(sum((a[i] - b[i])**2 for i in range(len(a))))

apartment_A = np.array([2, 1200, 2400, 15])
apartment_B = np.array([2, 1100, 2300, 18])
apartment_C = np.array([4, 2500, 4500, 45])

print(f"A vs B: {distance(apartment_A, apartment_B):.0f}")  # 141
print(f"A vs C: {distance(apartment_A, apartment_C):.0f}")  # 2462
```

B is much closer to A than C is. Good!

**But there's a problem**: the sqft and rent numbers are huge (1000s) while bedrooms and commute are small (single digits). The big numbers dominate everything.

### Attempt 3: Normalize First, Then Compare

The fix: **scale all features to the same range** (usually 0 to 1):

```python theme={null}
def normalize(apartments):
    """Scale each feature to 0-1 range."""
    apartments = np.array(apartments)
    mins = apartments.min(axis=0)
    maxs = apartments.max(axis=0)
    return (apartments - mins) / (maxs - mins)

# Original apartments
apartments = [
    [2, 1200, 2400, 15],   # A
    [2, 1100, 2300, 18],   # B
    [4, 2500, 4500, 45],   # C
    [1, 800, 1900, 10],    # D
]

# After normalization (all values between 0 and 1)
normalized = normalize(apartments)
print(normalized)
# A: [0.33, 0.24, 0.19, 0.14]
# B: [0.33, 0.18, 0.15, 0.23]
# C: [1.00, 1.00, 1.00, 1.00]
# D: [0.00, 0.00, 0.00, 0.00]
```

Now all features are on equal footing. A difference of 0.1 in bedrooms matters as much as 0.1 in rent.

***

## Step 3: The Dot Product — Measuring Alignment

There's an even better way to measure similarity: the **dot product**.

### Mathematical Definition

The dot product (also called inner product or scalar product) of two vectors:

$$
\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n
$$

**What it does**: Multiply corresponding numbers and add them up.

```python theme={null}
def dot_product(a, b):
    """Multiply corresponding elements and sum."""
    return sum(a[i] * b[i] for i in range(len(a)))

# Or simply: np.dot(a, b)
```

**Example:**

```python theme={null}
a = [1, 2, 3]
b = [4, 5, 6]

dot = 1*4 + 2*5 + 3*6
    = 4 + 10 + 18
    = 32
```

### Geometric Interpretation

The dot product has a beautiful geometric meaning:

$$
\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos(\theta)
$$

Where $\theta$ is the angle between the vectors!

<Frame>
  <img src="https://mintcdn.com/devweeekends/1GcDwVN8SzYRbJg1/images/courses/math-for-ml-linear-algebra/dot-product-intuition.svg?fit=max&auto=format&n=1GcDwVN8SzYRbJg1&q=85&s=9153b58ee095edb828bf4fd24ee9a463" alt="Dot Product and Cosine Similarity Geometric Intuition" width="850" height="450" data-path="images/courses/math-for-ml-linear-algebra/dot-product-intuition.svg" />
</Frame>

<Tip>
  **🎮 Interactive Visualization**: Try the code below to see how the dot product changes as you rotate vectors!

  ```python theme={null}
  import numpy as np
  import matplotlib.pyplot as plt
  from ipywidgets import interact, FloatSlider

  def visualize_dot_product(angle_degrees=45):
      # Fixed vector a
      a = np.array([1, 0])
      
      # Vector b at specified angle
      angle_rad = np.radians(angle_degrees)
      b = np.array([np.cos(angle_rad), np.sin(angle_rad)])
      
      # Calculate dot product
      dot = np.dot(a, b)
      
      # Plot
      plt.figure(figsize=(8, 6))
      plt.quiver(0, 0, a[0], a[1], angles='xy', scale_units='xy', scale=1, color='blue', label='Vector a')
      plt.quiver(0, 0, b[0], b[1], angles='xy', scale_units='xy', scale=1, color='red', label='Vector b')
      plt.xlim(-1.5, 1.5)
      plt.ylim(-1.5, 1.5)
      plt.grid(True, alpha=0.3)
      plt.axhline(y=0, color='k', linewidth=0.5)
      plt.axvline(x=0, color='k', linewidth=0.5)
      plt.title(f'Angle: {angle_degrees}° | Dot Product: {dot:.3f} | cos({angle_degrees}°) = {np.cos(angle_rad):.3f}')
      plt.legend()
      plt.axis('equal')
      plt.show()

  # Interactive slider - run in Jupyter!
  # interact(visualize_dot_product, angle_degrees=FloatSlider(min=0, max=360, step=5, value=45))
  ```
</Tip>

**What this tells us:**

* $\theta = 0°$ (same direction): $\cos(0°) = 1$ → Maximum positive dot product
* $\theta = 90°$ (perpendicular): $\cos(90°) = 0$ → Dot product is zero
* $\theta = 180°$ (opposite): $\cos(180°) = -1$ → Maximum negative dot product

```python theme={null}
import numpy as np

# Same direction
a = np.array([3, 0])
b = np.array([5, 0])
print(f"Same direction: {np.dot(a, b)}")  # 15 (positive)

# Perpendicular (90°)
a = np.array([3, 0])
b = np.array([0, 4])
print(f"Perpendicular: {np.dot(a, b)}")   # 0

# Opposite direction
a = np.array([3, 0])
b = np.array([-2, 0])
print(f"Opposite: {np.dot(a, b)}")        # -6 (negative)

# Verify the angle formula
a = np.array([1, 0])
b = np.array([1, 1])  # 45 degrees
angle = np.arccos(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
print(f"Angle between: {np.degrees(angle):.1f}°")  # 45.0°
```

### The Dot Product in Action

**Why does this measure similarity?**

Think about it intuitively:

* If both apartments are **high in the same features** (both large, both expensive), the products are large → high dot product
* If one is high where the other is low, products are small → low dot product
* Apartments that are "aligned" (similar profile) have high dot products

***

## Step 4: Cosine Similarity — The Industry Standard

The dot product has one problem: **bigger vectors give bigger numbers** regardless of similarity.

**Cosine similarity** fixes this by normalizing:

$$
\text{similarity}(A, B) = \frac{A \cdot B}{|A| \times |B|}
$$

This gives a number between -1 and 1:

* **1.0** = identical direction (very similar)
* **0.0** = perpendicular (unrelated)
* **-1.0** = opposite direction (opposites)

```python theme={null}
def cosine_similarity(a, b):
    """Similarity based on angle, not magnitude."""
    dot = np.dot(a, b)
    magnitude_a = np.sqrt(np.dot(a, a))  # length of a
    magnitude_b = np.sqrt(np.dot(b, b))  # length of b
    return dot / (magnitude_a * magnitude_b)
```

**Let's test it on our apartments:**

```python theme={null}
import numpy as np

apartments = {
    'A (my favorite)': np.array([2, 1200, 2400, 15]),
    'B (similar)': np.array([2, 1100, 2300, 18]),
    'C (luxury)': np.array([4, 2500, 4500, 45]),
    'D (studio)': np.array([1, 800, 1900, 10]),
}

my_apt = apartments['A (my favorite)']

print("Similarity to my apartment:")
for name, apt in apartments.items():
    sim = cosine_similarity(my_apt, apt)
    print(f"  {name}: {sim:.3f}")
```

**Output:**

```
Similarity to my apartment:
  A (my favorite): 1.000  ← identical to itself
  B (similar): 0.999      ← very similar!
  C (luxury): 0.997       ← surprisingly similar (same "shape", just bigger)
  D (studio): 0.998       ← also similar (same "shape", just smaller)
```

**Wait, why is C so similar?** Because cosine similarity measures **direction**, not **magnitude**. C is a "scaled up" version of A — same proportions, just bigger numbers.

This is actually useful! It finds apartments with the **same profile** (ratio of bedrooms to sqft to rent), regardless of absolute size.

***

## Real-World Application: Build a "Similar Apartments" Finder

Let's build a working system:

```python theme={null}
import numpy as np

class ApartmentFinder:
    def __init__(self, apartments):
        """
        apartments: dict of {name: [beds, sqft, rent, commute]}
        """
        self.names = list(apartments.keys())
        self.vectors = np.array(list(apartments.values()))
        
        # Normalize for fair comparison
        self.normalized = self._normalize(self.vectors)
    
    def _normalize(self, data):
        mins = data.min(axis=0)
        maxs = data.max(axis=0)
        return (data - mins) / (maxs - mins + 1e-8)  # avoid division by zero
    
    def find_similar(self, query, top_k=3):
        """Find top_k most similar apartments to query."""
        # Normalize the query
        query_norm = (np.array(query) - self.vectors.min(axis=0)) / \
                     (self.vectors.max(axis=0) - self.vectors.min(axis=0) + 1e-8)
        
        # Calculate similarity to all apartments
        similarities = []
        for i, apt in enumerate(self.normalized):
            sim = np.dot(query_norm, apt) / \
                  (np.linalg.norm(query_norm) * np.linalg.norm(apt) + 1e-8)
            similarities.append((self.names[i], sim, self.vectors[i]))
        
        # Sort by similarity (highest first)
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        return similarities[:top_k]

# Database of apartments
listings = {
    'Downtown Loft': [1, 900, 2800, 5],
    'Suburban House': [4, 2200, 3200, 35],
    'Cozy Studio': [0, 500, 1500, 20],
    'Modern 2BR': [2, 1100, 2400, 15],
    'Family Home': [3, 1800, 2900, 25],
    'Luxury Penthouse': [2, 1500, 5500, 10],
    'Budget 1BR': [1, 700, 1800, 30],
    'Midtown 2BR': [2, 1050, 2350, 12],
}

finder = ApartmentFinder(listings)

# What I'm looking for
my_ideal = [2, 1200, 2400, 15]

print("Your search: 2BR, 1200sqft, $2400, 15min commute")
print("\nMost similar apartments:")
for name, similarity, features in finder.find_similar(my_ideal):
    print(f"  {similarity:.2f} - {name}: {int(features[0])}BR, "
          f"{int(features[1])}sqft, ${int(features[2])}, {int(features[3])}min")
```

**Output:**

```
Your search: 2BR, 1200sqft, $2400, 15min commute

Most similar apartments:
  0.98 - Modern 2BR: 2BR, 1100sqft, $2400, 15min
  0.97 - Midtown 2BR: 2BR, 1050sqft, $2350, 12min
  0.89 - Family Home: 3BR, 1800sqft, $2900, 25min
```

**You just built Zillow's "Similar Homes" feature!**

***

## Now Let's Connect This to Machine Learning

Everything we just learned about apartments applies directly to ML. The concepts are identical — only the application changes.

### Pattern: Real World → Vector → Similarity

| Real World | Vector Representation                | What Similarity Finds |
| ---------- | ------------------------------------ | --------------------- |
| Apartments | \[beds, sqft, rent, commute]         | Similar listings      |
| Songs      | \[energy, tempo, danceability, mood] | Songs you'll like     |
| Movies     | \[action, romance, comedy, rating]   | Movies to recommend   |
| Customers  | \[age, income, purchases, visits]    | Customer segments     |
| Images     | \[pixel1, pixel2, ..., pixel1000000] | Similar images        |
| Words      | \[dimension1, ..., dimension300]     | Related words         |

**The math is identical.** Once something is a vector, you can find similar items using dot products and cosine similarity.

### Example: How Spotify Actually Works

Remember our apartment finder? Spotify does the exact same thing with songs:

```python theme={null}
# Spotify's actual audio features (simplified)
songs = {
    'Blinding Lights': [0.73, 0.51, 135, 0.00, 0.32],  # [energy, dance, tempo, acoustic, happy]
    'Levitating': [0.69, 0.70, 103, 0.03, 0.91],
    'Someone Like You': [0.34, 0.50, 67, 0.75, 0.14],
    'Uptown Funk': [0.93, 0.89, 115, 0.00, 0.97],
    'Hello': [0.40, 0.48, 79, 0.73, 0.25],
}

# Reusing our same logic!
finder = SongFinder(songs)  # Same algorithm as ApartmentFinder

# You just listened to Blinding Lights
print(finder.find_similar('Blinding Lights', top_k=2))
# → Levitating, Uptown Funk (similar energy/dance profiles)
```

**The entire Spotify recommendation engine is built on the same vector similarity concept you just learned with apartments.**

***

## How This Applies to Neural Networks

Now let's take the final step. In neural networks, **everything is vectors**, and **everything is similarity and transformation**.

### What a Neural Network Does (Simplified)

1. **Input**: Convert your data to a vector (image → pixels, text → numbers)
2. **Layers**: Transform the vector through matrix multiplications (we'll learn this next!)
3. **Output**: Compare the final vector to known categories using... similarity

```python theme={null}
# Simplified: How image classification works
image_vector = [0.1, 0.8, 0.3, ...]  # 1000s of numbers from pixels

# The network transforms this to a "meaning" vector
meaning_vector = neural_network(image_vector)  # Let's say [0.9, 0.1, 0.05]

# Compare to category vectors
cat_vector = [1.0, 0.0, 0.0]  # What a "cat" looks like in meaning-space
dog_vector = [0.0, 1.0, 0.0]  # What a "dog" looks like

# Which is more similar?
cat_similarity = cosine_similarity(meaning_vector, cat_vector)  # 0.95
dog_similarity = cosine_similarity(meaning_vector, dog_vector)  # 0.10

# Prediction: It's a cat! (higher similarity)
```

**The core operation — vector similarity — is exactly what you learned with apartments.**

```python theme={null}
def cosine_similarity(a, b):
    """
    Returns a value between -1 and 1:
    - 1.0 = identical direction (very similar)
    - 0.0 = perpendicular (unrelated)  
    - -1.0 = opposite direction (very different)
    """
    dot = np.dot(a, b)
    magnitude_a = np.linalg.norm(a)  # length of a
    magnitude_b = np.linalg.norm(b)  # length of b
    return dot / (magnitude_a * magnitude_b)
```

**Now let's use it on our songs:**

```python theme={null}
# Using our song vectors from earlier
sim_blinding_levitating = cosine_similarity(blinding_lights, levitating)
sim_blinding_adele = cosine_similarity(blinding_lights, someone_like_you)

print(f"Blinding Lights vs Levitating: {sim_blinding_levitating:.3f}")
print(f"Blinding Lights vs Someone Like You: {sim_blinding_adele:.3f}")

# Output:
# Blinding Lights vs Levitating: 0.891  (very similar - both upbeat pop)
# Blinding Lights vs Someone Like You: 0.412  (less similar - different vibes)
```

**That's the Spotify algorithm in a nutshell!** Find songs with the highest cosine similarity to what you just played.

***

## Vector Operations: The Building Blocks

Now that we can represent houses as vectors, what can we do with them?

### 1. Vector Addition: Combining Features

**The Question**: What if we want to combine two house profiles?

<img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/math-for-ml-linear-algebra/vector-addition.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=7a6cf64b3419789032f0a0f08b90dfb7" alt="Vector Addition" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/vector-addition.svg" />

**Geometric Intuition**: Place vectors tip-to-tail. The result is the diagonal.

**Algebraic Definition**: Add corresponding components.

```python theme={null}
# Two house feature vectors
house_1 = np.array([3, 2000, 15, 5])
house_2 = np.array([2, 1500, 10, 3])

# Average house in the neighborhood
average_house = (house_1 + house_2) / 2
print(average_house)  # [2.5, 1750, 12.5, 4]
```

**Why This Matters**:

* **Feature engineering**: Combine features to create new ones
* **Gradient descent**: Update model parameters by adding gradients
* **Ensemble methods**: Average predictions from multiple models

**Real-World Example**: User preferences

```python theme={null}
# User's historical preferences
past_prefs = np.array([0.8, 0.2, 0.5])  # [action, comedy, drama]

# Recent viewing behavior
recent = np.array([0.1, 0.3, 0.2])

# Updated preferences (weighted sum)
new_prefs = 0.7 * past_prefs + 0.3 * recent
print(new_prefs)  # [0.59, 0.23, 0.41]
```

***

### 2. Scalar Multiplication: Scaling Features

**The Question**: What if all house prices in a neighborhood increase by 20%?

<img src="https://mintcdn.com/devweeekends/1cs3K7TO-w20cKuc/images/courses/math-for-ml-linear-algebra/scalar-multiplication.svg?fit=max&auto=format&n=1cs3K7TO-w20cKuc&q=85&s=b95a94744c14a93233426f95dcfac482" alt="Scalar Multiplication" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/scalar-multiplication.svg" />

**Geometric Intuition**: Stretch or shrink the vector. Direction stays the same.

**Algebraic Definition**: Multiply each component by a number (scalar).

```python theme={null}
house = np.array([3, 2000, 15, 5])

# Scale by 1.2 (20% increase)
scaled_house = 1.2 * house
print(scaled_house)  # [3.6, 2400, 18, 6]
```

**Why This Matters**:

* **Normalization**: Scale features to same range
* **Learning rate**: Control how much to update parameters
* **Feature weighting**: Emphasize important features

**ML Application**: Gradient descent

```python theme={null}
# Current model parameters
weights = np.array([50000, 120, -5000, -8000])

# Gradient (direction to improve)
gradient = np.array([100, 0.5, -20, -30])

# Learning rate (how far to move)
learning_rate = 0.01

# Update parameters
weights = weights - learning_rate * gradient
#                    ↑ scalar multiplication!
```

**Key Insight**: The learning rate controls the step size. Too large → overshoot. Too small → slow learning.

***

### 3. Dot Product: Measuring Similarity

**The Big Question**: How do we measure if two things are similar?

This is THE most important operation in machine learning! Let's see why through three examples.

<img src="https://mintcdn.com/devweeekends/1GcDwVN8SzYRbJg1/images/courses/math-for-ml-linear-algebra/dot-product-houses.svg?fit=max&auto=format&n=1GcDwVN8SzYRbJg1&q=85&s=d5d988776eeaa74d17a77d689aa49120" alt="Dot Product with Houses" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/dot-product-houses.svg" />

**Algebraic Definition**: Multiply corresponding components and sum.

**Mathematical Formula**:

$$
\mathbf{v} \cdot \mathbf{w} = \sum_{i=1}^{n} v_i w_i = v_1w_1 + v_2w_2 + \ldots + v_nw_n
$$

**Alternative Formula** (geometric):

$$
\mathbf{v} \cdot \mathbf{w} = \|\mathbf{v}\| \|\mathbf{w}\| \cos(\theta)
$$

Where $\theta$ is the angle between vectors.

***

### Example 1: Comparing Houses

```python theme={null}
house_1 = np.array([3, 2000, 10, 3])  # Suburban family home
house_2 = np.array([4, 2200, 8, 2])   # Similar house
house_3 = np.array([1, 800, 50, 20])  # Old studio apartment

# Compute dot products
sim_1_2 = np.dot(house_1, house_2)
sim_1_3 = np.dot(house_1, house_3)

print(f"House 1 · House 2 = {sim_1_2}")  # 4,400,098 (large, positive)
print(f"House 1 · House 3 = {sim_1_3}")  # 41,803 (much smaller)
```

**Interpretation**:

* Large dot product = similar houses
* Small dot product = different houses
* **Why?** Similar houses have similar feature values, so products are large

**Real application**: Zillow uses this to find "similar homes" when you're browsing!

***

### Example 2: Matching Students for Study Groups

```python theme={null}
# Student profiles: [math_score, reading_score, science_score, study_hours]
alice = np.array([85, 92, 78, 12])    # Strong in reading
bob = np.array([95, 75, 88, 15])      # Strong in math
charlie = np.array([87, 90, 80, 13])  # Similar to Alice

# Who should Alice study with?
alice_bob = np.dot(alice, bob)
alice_charlie = np.dot(alice, charlie)

print(f"Alice · Bob = {alice_bob}")        # 23,265
print(f"Alice · Charlie = {alice_charlie}") # 24,021 (higher!)
```

**Interpretation**: Alice and Charlie have more similar learning patterns!

**Why this matters**:

* Form effective study groups (similar students help each other)
* Pair struggling students with successful ones who had similar challenges
* Predict who will benefit from group work

**Real application**: Educational platforms use this for peer matching!

***

### Example 3: Movie Recommendations

```python theme={null}
# Movie features: [rating, runtime, year, action, romance, comedy]
inception = np.array([8.8, 148, 2010, 0.9, 0.1, 0.3])
interstellar = np.array([8.6, 169, 2014, 0.7, 0.2, 0.2])
titanic = np.array([7.9, 195, 1997, 0.3, 0.9, 0.2])

# You just watched Inception. What should Netflix recommend?
inception_interstellar = np.dot(inception, interstellar)
inception_titanic = np.dot(inception, titanic)

print(f"Inception · Interstellar = {inception_interstellar}")  # 26,847
print(f"Inception · Titanic = {inception_titanic}")            # 24,143
```

**Recommendation**: Watch Interstellar! (Higher similarity)

**Why it works**: Both are:

* High-rated sci-fi films
* Similar runtime
* Recent releases
* Action-heavy with minimal romance

**Real application**: This is literally how Netflix, Spotify, and YouTube work!

***

### Understanding the Dot Product Geometrically

**Key Insights**:

```python theme={null}
# Parallel vectors (same direction) → large positive dot product
v1 = np.array([2, 0])
v2 = np.array([3, 0])
print(np.dot(v1, v2))  # 6 (positive, large)

# Perpendicular vectors (90°) → dot product = 0
v3 = np.array([1, 0])
v4 = np.array([0, 1])
print(np.dot(v3, v4))  # 0 (orthogonal = independent!)

# Opposite vectors (180°) → negative dot product
v5 = np.array([1, 1])
v6 = np.array([-1, -1])
print(np.dot(v5, v6))  # -2 (opposite)
```

**What this means**:

* **Positive dot product**: Vectors point in similar directions (similar items)
* **Zero dot product**: Vectors are perpendicular (completely different items)
* **Negative dot product**: Vectors point in opposite directions (opposite items)

***

### Why Dot Product is Everywhere in ML

**1. Neural Networks**: Every layer computes dot products!

```python theme={null}
# A neuron computes: output = weights · inputs + bias
weights = np.array([0.5, -0.3, 0.8])
inputs = np.array([1.0, 2.0, 3.0])

output = np.dot(weights, inputs) + 0.1
# = (0.5×1.0) + (-0.3×2.0) + (0.8×3.0) + 0.1
# = 0.5 - 0.6 + 2.4 + 0.1 = 2.4
```

**2. Similarity Search**: Find similar items

```python theme={null}
# Find products similar to what user just bought
user_purchase = np.array([1, 0, 1, 0, 1])  # Product features
all_products = np.array([
    [1, 0, 1, 1, 0],  # Product A
    [1, 0, 1, 0, 1],  # Product B (identical!)
    [0, 1, 0, 1, 0],  # Product C (different)
])

similarities = [np.dot(user_purchase, product) for product in all_products]
print(similarities)  # [2, 3, 0] → Recommend Product B!
```

**3. Attention Mechanisms**: How transformers (GPT, BERT) work

```python theme={null}
# Simplified: How much should we "attend" to each word?
query = np.array([0.8, 0.2, 0.5])  # Current word
key1 = np.array([0.9, 0.1, 0.4])   # Word 1
key2 = np.array([0.2, 0.8, 0.1])   # Word 2

attention_1 = np.dot(query, key1)  # 0.94 (high attention!)
attention_2 = np.dot(query, key2)  # 0.37 (low attention)
```

***

### 4. Vector Magnitude: Measuring "Size"

**The Question**: How "big" is a house (in feature space)?

**Geometric Intuition**: The length of the arrow.

**Algebraic Definition**: Square root of dot product with itself.

```python theme={null}
house = np.array([3, 2000, 15, 5])

magnitude = np.linalg.norm(house)
# = sqrt(3² + 2000² + 15² + 5²)
# = sqrt(9 + 4,000,000 + 225 + 25)
# = sqrt(4,000,259)
# ≈ 2000.06
```

**Mathematical Formula**:

$$
\|\mathbf{v}\| = \sqrt{\mathbf{v} \cdot \mathbf{v}} = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2}
$$

**Why This Matters**: Normalization!

```python theme={null}
# Problem: sqft dominates (2000 vs 3 bedrooms)
house = np.array([3, 2000, 15, 5])

# Solution: Normalize to unit length
normalized = house / np.linalg.norm(house)
print(normalized)  # [0.0015, 0.9999, 0.0075, 0.0025]
print(np.linalg.norm(normalized))  # 1.0 (unit vector)
```

**Key Insight**: After normalization, all features contribute equally. Sqft no longer dominates!

***

## Similarity Measures: Finding Similar Items

### Cosine Similarity: Direction-Based

**The Problem with Dot Product**: It's affected by magnitude!

```python theme={null}
# Two houses with same type, different size
small_house = np.array([2, 1000, 10, 3])
large_house = np.array([4, 2000, 20, 6])  # 2× small_house

# Dot product is very different
print(np.dot(small_house, small_house))  # 1,001,113
print(np.dot(large_house, large_house))  # 4,004,452 (4× larger!)
```

**The Solution**: Cosine similarity ignores magnitude, only cares about direction (type).

<img src="https://mintcdn.com/devweeekends/1GcDwVN8SzYRbJg1/images/courses/math-for-ml-linear-algebra/cosine-similarity.svg?fit=max&auto=format&n=1GcDwVN8SzYRbJg1&q=85&s=5881c084adf0e02c63889eceb1d208dc" alt="Cosine Similarity" width="1080" height="1080" data-path="images/courses/math-for-ml-linear-algebra/cosine-similarity.svg" />

**Formula**:

$$
\text{similarity}(\mathbf{v}, \mathbf{w}) = \frac{\mathbf{v} \cdot \mathbf{w}}{\|\mathbf{v}\| \|\mathbf{w}\|} = \cos(\theta)
$$

**Range**: -1 (opposite) to +1 (identical direction)

```python theme={null}
def cosine_similarity(v, w):
    """Compute cosine similarity between two vectors."""
    return np.dot(v, w) / (np.linalg.norm(v) * np.linalg.norm(w))
```

***

### Example 1: House Type Matching (Ignoring Size)

```python theme={null}
# Find houses of similar TYPE, regardless of size
small_suburban = np.array([2, 1000, 10, 5])   # Small suburban
large_suburban = np.array([4, 2000, 20, 10])  # Large suburban (2× size)
urban_apartment = np.array([1, 800, 5, 1])    # Urban apartment

# Cosine similarity
print(f"Small vs Large suburban: {cosine_similarity(small_suburban, large_suburban):.3f}")  # 1.000!
print(f"Small suburban vs Urban: {cosine_similarity(small_suburban, urban_apartment):.3f}")  # 0.997
```

**Key Insight**: The two suburban houses are identical in TYPE (cosine = 1.0), even though one is twice the size!

**Why this matters**:

* A family looking for a suburban house doesn't care if it's 2000 or 4000 sqft
* They care about the TYPE: suburban, family-friendly, good schools
* Cosine similarity captures this!

**Real application**: Zillow's "similar homes" feature uses cosine similarity to find homes of similar style, not just similar size.

***

### Example 2: Student Learning Style (Not Just Scores)

```python theme={null}
# Student profiles: [math, reading, science, study_hours]
alice = np.array([85, 92, 78, 12])      # Strong reader, moderate study
alice_2x = np.array([170, 184, 156, 24]) # Alice with 2× scores (impossible, but illustrative)
bob = np.array([95, 75, 88, 15])        # Strong in math

# Cosine similarity
print(f"Alice vs Alice_2x: {cosine_similarity(alice, alice_2x):.3f}")  # 1.000 (same learning style!)
print(f"Alice vs Bob: {cosine_similarity(alice, bob):.3f}")            # 0.991 (different style)
```

**Interpretation**:

* Alice and Alice\_2x have IDENTICAL learning patterns (cosine = 1.0)
* The magnitude doesn't matter - it's the PATTERN that counts
* Alice is strong in reading, Bob is strong in math (different patterns)

**Why this matters**:

* Match students with similar learning STYLES, not just similar scores
* A student who scores 60/70/65 has the same pattern as one who scores 80/93/87
* Recommend study materials based on learning style, not absolute performance

**Real application**: Khan Academy matches students with similar learning patterns to suggest effective study paths.

***

### Example 3: Movie Taste (Not Just Ratings)

```python theme={null}
# Movie preferences: [action, romance, comedy, horror, sci-fi]
user_A = np.array([5, 1, 3, 0, 4])      # Loves action & sci-fi
user_A_harsh = np.array([3, 0, 2, 0, 2]) # Same taste, harsher ratings
user_B = np.array([1, 5, 2, 4, 0])      # Loves romance & horror

# Cosine similarity
print(f"User A vs A_harsh: {cosine_similarity(user_A, user_A_harsh):.3f}")  # 0.998 (same taste!)
print(f"User A vs B: {cosine_similarity(user_A, user_B):.3f}")              # 0.385 (different taste)
```

**Key Insight**: User A and User A\_harsh have the SAME TASTE, just different rating scales!

* User A rates generously (5, 4, 3)
* User A\_harsh rates strictly (3, 2, 1)
* But they like the SAME TYPES of movies!

**Why this matters**:

* Some users rate everything 5 stars, others are harsh critics
* Cosine similarity finds users with similar TASTE, not similar rating scales
* Recommend movies based on taste, not rating magnitude

**Real application**: Netflix uses cosine similarity because users have different rating behaviors, but similar tastes should get similar recommendations.

***

### When to Use Cosine vs. Euclidean Distance

**Use Cosine Similarity when**:

* ✅ Direction matters more than magnitude
* ✅ Different scales (harsh vs. generous raters)
* ✅ Text similarity (document length doesn't matter)
* ✅ Recommendation systems (taste, not intensity)

**Use Euclidean Distance when**:

* ✅ Absolute position matters
* ✅ Same scale for all features
* ✅ Clustering (K-means)
* ✅ Anomaly detection (how far from normal?)

```python theme={null}
# Example: Anomaly detection
normal_house = np.array([3, 2000, 15, 5])
similar_house = np.array([3, 2100, 14, 4])
anomaly = np.array([10, 8000, 2, 50])  # Weird house!

# Euclidean distance (absolute difference)
print(f"Normal vs Similar: {euclidean_distance(normal_house, similar_house):.1f}")  # 100.5
print(f"Normal vs Anomaly: {euclidean_distance(normal_house, anomaly):.1f}")        # 6007.0 (huge!)

# Cosine similarity (direction)
print(f"Normal vs Similar: {cosine_similarity(normal_house, similar_house):.3f}")  # 0.999
print(f"Normal vs Anomaly: {cosine_similarity(normal_house, anomaly):.3f}")        # 0.996 (still high!)
```

**Interpretation**: Euclidean distance catches the anomaly better because it cares about MAGNITUDE!

***

## Real-World Application: Finding Similar Houses

Let's build a simple house recommendation system!

```python theme={null}
import numpy as np

# Database of houses (bedrooms, sqft, age, distance)
houses = np.array([
    [3, 2000, 15, 5],   # House 0
    [4, 2200, 8, 2],    # House 1
    [2, 1200, 25, 3],   # House 2
    [3, 1900, 12, 6],   # House 3
    [5, 3500, 5, 15],   # House 4
])

# Prices (in thousands)
prices = np.array([320, 380, 250, 310, 550])

# Query: User likes this house
query_house = np.array([3, 2000, 10, 4])

# Find 3 most similar houses
similarities = []
for i, house in enumerate(houses):
    sim = cosine_similarity(query_house, house)
    similarities.append((i, sim, prices[i]))

# Sort by similarity
similarities.sort(key=lambda x: x[1], reverse=True)

print("Top 3 similar houses:")
for i, (idx, sim, price) in enumerate(similarities[:3], 1):
    print(f"{i}. House {idx}: similarity={sim:.3f}, price=${price}k")
```

**Output**:

```
Top 3 similar houses:
1. House 0: similarity=0.999, price=$320k
2. House 3: similarity=0.998, price=$310k
3. House 1: similarity=0.997, price=$380k
```

**Prediction**: Based on similar houses, estimated price ≈ \$337k (average of top 3)

***

## Supporting Example 1: Document Similarity

The same vector concepts apply to text!

```python theme={null}
from sklearn.feature_extraction.text import CountVectorizer

documents = [
    "machine learning is awesome",
    "deep learning is a subset of machine learning",
    "neural networks are powerful",
    "python is great for machine learning"
]

# Convert to vectors
vectorizer = CountVectorizer()
doc_vectors = vectorizer.fit_transform(documents).toarray()

print("Vocabulary:", vectorizer.get_feature_names_out())
print("\nDocument vectors:")
print(doc_vectors)

# Find similar documents to "machine learning"
query = "machine learning"
query_vector = vectorizer.transform([query]).toarray()[0]

for i, doc_vec in enumerate(doc_vectors):
    sim = cosine_similarity(query_vector, doc_vec)
    print(f"Doc {i}: {sim:.3f} - {documents[i]}")
```

**Key Insight**: Same math, different domain!

***

## Supporting Example 2: User Recommendations

```python theme={null}
# User-movie rating matrix
ratings = np.array([
    [5, 4, 0, 0, 1],  # User 0: likes action/comedy
    [4, 5, 0, 0, 2],  # User 1: similar to User 0
    [0, 0, 5, 4, 5],  # User 2: likes drama/romance
    [5, 4, 0, 1, 1],  # User 3: similar to User 0
])

# Find users similar to User 0
user_0 = ratings[0]
for i in range(1, len(ratings)):
    sim = cosine_similarity(user_0, ratings[i])
    print(f"User {i}: similarity = {sim:.3f}")

# Output:
# User 1: similarity = 0.987 (recommend same movies!)
# User 2: similarity = 0.140 (different taste)
# User 3: similarity = 0.989 (very similar)
```

***

## Practice Exercises

### Exercise 1: House Price Estimation

```python theme={null}
# Given these houses and prices
houses = np.array([
    [3, 1800, 20, 5],  # $280k
    [4, 2400, 10, 3],  # $360k
    [2, 1200, 30, 8],  # $220k
])
prices = np.array([280, 360, 220])

# Predict price for this house
new_house = np.array([3, 2000, 15, 4])

# TODO: Find 2 most similar houses and average their prices
```

<details>
  <summary>Solution</summary>

  ```python theme={null}
  similarities = []
  for i, house in enumerate(houses):
      sim = cosine_similarity(new_house, house)
      similarities.append((i, sim, prices[i]))

  similarities.sort(key=lambda x: x[1], reverse=True)

  # Top 2 similar houses
  top_2_prices = [similarities[0][2], similarities[1][2]]
  predicted_price = np.mean(top_2_prices)

  print(f"Predicted price: ${predicted_price}k")
  # Output: Predicted price: $320k
  ```
</details>

***

## 🎯 Practice Exercises & Real-World Applications

<Note>
  **Challenge yourself!** These exercises blend mathematical concepts with real-world scenarios. Try to solve them before peeking at the solutions.
</Note>

### Exercise 1: Music Streaming Recommendations 🎵

Spotify represents songs as vectors based on audio features. Given these song vectors:

| Song          | Energy | Danceability | Acousticness | Tempo (normalized) |
| ------------- | ------ | ------------ | ------------ | ------------------ |
| Your Favorite | 0.8    | 0.7          | 0.2          | 0.6                |
| Song A        | 0.9    | 0.8          | 0.1          | 0.7                |
| Song B        | 0.3    | 0.4          | 0.9          | 0.3                |
| Song C        | 0.7    | 0.6          | 0.3          | 0.5                |

**Task**: Find which song is most similar to "Your Favorite" using cosine similarity.

```python theme={null}
import numpy as np

# Define the song vectors
your_favorite = np.array([0.8, 0.7, 0.2, 0.6])
song_A = np.array([0.9, 0.8, 0.1, 0.7])
song_B = np.array([0.3, 0.4, 0.9, 0.3])
song_C = np.array([0.7, 0.6, 0.3, 0.5])

# TODO: Calculate cosine similarity with each song
# TODO: Which song should Spotify recommend?
```

<Accordion title="💡 Solution">
  ```python theme={null}
  import numpy as np

  def cosine_similarity(a, b):
      return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

  your_favorite = np.array([0.8, 0.7, 0.2, 0.6])
  song_A = np.array([0.9, 0.8, 0.1, 0.7])
  song_B = np.array([0.3, 0.4, 0.9, 0.3])
  song_C = np.array([0.7, 0.6, 0.3, 0.5])

  songs = {'Song A': song_A, 'Song B': song_B, 'Song C': song_C}

  print("Similarity scores:")
  for name, song in songs.items():
      sim = cosine_similarity(your_favorite, song)
      print(f"  {name}: {sim:.4f}")

  # Output:
  # Song A: 0.9945 ← Most similar (upbeat, danceable)
  # Song B: 0.6847  (very different - acoustic, slow)
  # Song C: 0.9903  (also quite similar)

  print("\n✅ Recommendation: Song A (0.9945 similarity)")
  ```

  **Real-World Insight**: This is exactly how Spotify's "Discover Weekly" works! Songs are represented as 12+ dimensional vectors including tempo, key, loudness, and more.
</Accordion>

***

### Exercise 2: E-commerce Product Matching 🛒

Amazon wants to show "Similar Products" when a customer views an item. Products are represented as vectors:

**Features**: \[price\_tier, avg\_rating, num\_reviews (log), category\_score, brand\_popularity]

```python theme={null}
# Customer is viewing this laptop
current_product = np.array([4, 4.5, 3.2, 0.9, 0.7])

# Candidate products to recommend
products = {
    "Budget Laptop":    np.array([2, 4.2, 2.8, 0.9, 0.4]),
    "Gaming Laptop":    np.array([5, 4.6, 3.5, 0.8, 0.9]),
    "Similar Laptop":   np.array([4, 4.4, 3.0, 0.9, 0.65]),
    "Tablet":           np.array([3, 4.3, 3.1, 0.3, 0.6]),
}
```

**Tasks**:

1. Calculate both Euclidean distance AND cosine similarity for each product
2. Which metric gives better recommendations and why?
3. Should we normalize the data first?

<Accordion title="💡 Solution">
  ```python theme={null}
  import numpy as np

  def euclidean_distance(a, b):
      return np.sqrt(np.sum((a - b) ** 2))

  def cosine_similarity(a, b):
      return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

  current_product = np.array([4, 4.5, 3.2, 0.9, 0.7])

  products = {
      "Budget Laptop":    np.array([2, 4.2, 2.8, 0.9, 0.4]),
      "Gaming Laptop":    np.array([5, 4.6, 3.5, 0.8, 0.9]),
      "Similar Laptop":   np.array([4, 4.4, 3.0, 0.9, 0.65]),
      "Tablet":           np.array([3, 4.3, 3.1, 0.3, 0.6]),
  }

  print("Product Comparison:")
  print("-" * 55)
  print(f"{'Product':<18} {'Euclidean':<12} {'Cosine Sim':<12}")
  print("-" * 55)

  for name, vec in products.items():
      dist = euclidean_distance(current_product, vec)
      sim = cosine_similarity(current_product, vec)
      print(f"{name:<18} {dist:<12.4f} {sim:<12.4f}")

  # Output:
  # Budget Laptop     2.1541       0.9812
  # Gaming Laptop     1.1533       0.9975  ← Cosine picks this
  # Similar Laptop    0.2693       0.9994  ← Euclidean picks this
  # Tablet            0.7280       0.9432

  print("\n📊 Analysis:")
  print("• Euclidean: Similar Laptop wins (closest in absolute values)")
  print("• Cosine: Similar Laptop also wins (most similar direction)")
  print("\n✅ Both agree! But Euclidean is better here because")
  print("   price_tier matters in absolute terms, not just ratio.")
  ```

  **Key Insight**:

  * Use **Euclidean** when magnitude matters (price, ratings)
  * Use **Cosine** when only direction matters (document topics, user preferences)
  * **Always normalize** features to different scales!
</Accordion>

***

### Exercise 3: Dating App Compatibility 💕

A dating app represents users as compatibility vectors:

**Features**: \[adventure\_score, introversion, career\_focus, family\_values, humor\_style]

```python theme={null}
# Your profile
you = np.array([0.8, 0.3, 0.7, 0.6, 0.9])

# Potential matches
matches = {
    "Alex":   np.array([0.7, 0.4, 0.8, 0.5, 0.85]),
    "Jordan": np.array([0.2, 0.9, 0.3, 0.8, 0.4]),
    "Casey":  np.array([0.9, 0.2, 0.6, 0.7, 0.95]),
    "Morgan": np.array([0.5, 0.5, 0.5, 0.5, 0.5]),
}
```

**Tasks**:

1. Calculate a "compatibility score" using dot product
2. Normalize and use cosine similarity - does the ranking change?
3. Which match is best and why?

<Accordion title="💡 Solution">
  ```python theme={null}
  import numpy as np

  you = np.array([0.8, 0.3, 0.7, 0.6, 0.9])

  matches = {
      "Alex":   np.array([0.7, 0.4, 0.8, 0.5, 0.85]),
      "Jordan": np.array([0.2, 0.9, 0.3, 0.8, 0.4]),
      "Casey":  np.array([0.9, 0.2, 0.6, 0.7, 0.95]),
      "Morgan": np.array([0.5, 0.5, 0.5, 0.5, 0.5]),
  }

  print("Compatibility Analysis:")
  print("-" * 50)
  print(f"{'Match':<10} {'Dot Product':<14} {'Cosine Sim':<12}")
  print("-" * 50)

  for name, profile in matches.items():
      dot = np.dot(you, profile)
      cos = np.dot(you, profile) / (np.linalg.norm(you) * np.linalg.norm(profile))
      print(f"{name:<10} {dot:<14.4f} {cos:<12.4f}")

  # Output:
  # Alex       2.1650         0.9844  
  # Jordan     1.4700         0.7429  
  # Casey      2.3350         0.9937  ← Best match!
  # Morgan     1.6500         0.9071  

  print("\n💕 Best Match: Casey!")
  print("   • High adventure (0.9 vs your 0.8)")
  print("   • Similar introversion level (0.2 vs 0.3)")
  print("   • Compatible humor style (0.95 vs 0.9)")
  print("\n⚠️  Jordan is least compatible:")
  print("   • Opposite on adventure (0.2 vs 0.8)")
  print("   • Opposite on introversion (0.9 vs 0.3)")
  ```

  **Real-World Insight**: Dating apps like Hinge and OkCupid use similar vector-based matching, but with 50+ dimensions including behavioral data from swipes and messages!
</Accordion>

***

### Exercise 4: Document Search Engine 📄

Build a simple search engine using TF-IDF vectors:

```python theme={null}
# Documents (already converted to TF-IDF vectors)
# Dimensions: [python, machine, learning, data, web, api]
documents = {
    "ML Tutorial":     np.array([0.5, 0.8, 0.9, 0.7, 0.1, 0.2]),
    "Web Dev Guide":   np.array([0.2, 0.1, 0.0, 0.3, 0.9, 0.8]),
    "Data Science":    np.array([0.6, 0.5, 0.7, 0.9, 0.2, 0.3]),
    "Python Basics":   np.array([0.9, 0.2, 0.3, 0.4, 0.3, 0.4]),
}

# User searches for "machine learning python"
query = np.array([0.7, 0.9, 0.8, 0.3, 0.0, 0.0])
```

**Tasks**:

1. Rank documents by relevance to the query
2. What's the top result?
3. Why might "Data Science" rank higher than "Python Basics" even though query has "python"?

<Accordion title="💡 Solution">
  ```python theme={null}
  import numpy as np

  def cosine_similarity(a, b):
      return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

  documents = {
      "ML Tutorial":     np.array([0.5, 0.8, 0.9, 0.7, 0.1, 0.2]),
      "Web Dev Guide":   np.array([0.2, 0.1, 0.0, 0.3, 0.9, 0.8]),
      "Data Science":    np.array([0.6, 0.5, 0.7, 0.9, 0.2, 0.3]),
      "Python Basics":   np.array([0.9, 0.2, 0.3, 0.4, 0.3, 0.4]),
  }

  query = np.array([0.7, 0.9, 0.8, 0.3, 0.0, 0.0])

  print("🔍 Search Results for 'machine learning python':")
  print("-" * 45)

  results = []
  for name, doc in documents.items():
      sim = cosine_similarity(query, doc)
      results.append((name, sim))

  # Sort by similarity (descending)
  results.sort(key=lambda x: x[1], reverse=True)

  for rank, (name, sim) in enumerate(results, 1):
      print(f"{rank}. {name:<18} (relevance: {sim:.4f})")

  # Output:
  # 1. ML Tutorial        (relevance: 0.9357) ← Top result!
  # 2. Data Science       (relevance: 0.8234)
  # 3. Python Basics      (relevance: 0.7156)
  # 4. Web Dev Guide      (relevance: 0.1342)

  print("\n📊 Why 'Data Science' > 'Python Basics'?")
  print("   Query emphasizes 'machine' (0.9) and 'learning' (0.8)")
  print("   Data Science has machine=0.5, learning=0.7")
  print("   Python Basics has machine=0.2, learning=0.3")
  print("   Even though Python Basics has higher 'python' score,")
  print("   the overall direction is less aligned with the query!")
  ```

  **Real-World Insight**: This is how Google Search worked in its early days! Modern search engines add hundreds more signals (links, freshness, user behavior).
</Accordion>

***

## 🚨 Real-World Challenge: Handling Messy Data

In textbooks, data is clean. In production, data is messy. Here's how to handle real-world vector problems:

<Warning>
  **Production Reality**: Real data has missing values, outliers, inconsistent scales, and noise. Your similarity system will fail if you don't handle these!
</Warning>

### Missing Values

```python theme={null}
import numpy as np

# Real apartment data with missing values (NaN)
apartments = np.array([
    [2, 1200, 2400, 15],      # Complete
    [2, np.nan, 2300, 18],    # Missing sqft
    [np.nan, 2500, 4500, 45], # Missing bedrooms
    [1, 800, np.nan, 10],     # Missing rent
])

# Strategy 1: Impute with column mean
def impute_mean(data):
    result = data.copy()
    for col in range(data.shape[1]):
        col_mean = np.nanmean(data[:, col])
        mask = np.isnan(result[:, col])
        result[mask, col] = col_mean
    return result

# Strategy 2: Impute with median (robust to outliers)
def impute_median(data):
    result = data.copy()
    for col in range(data.shape[1]):
        col_median = np.nanmedian(data[:, col])
        mask = np.isnan(result[:, col])
        result[mask, col] = col_median
    return result

cleaned = impute_mean(apartments)
print("Cleaned data:\n", cleaned)
```

### Outlier Detection

```python theme={null}
# Detect outliers using z-score
def detect_outliers(data, threshold=3):
    """Flag values more than `threshold` std devs from mean."""
    means = np.nanmean(data, axis=0)
    stds = np.nanstd(data, axis=0)
    z_scores = np.abs((data - means) / (stds + 1e-8))
    return z_scores > threshold

# Example: Luxury penthouse is an outlier
apartments = np.array([
    [2, 1200, 2400, 15],
    [2, 1100, 2300, 18],
    [2, 1150, 2500, 16],
    [2, 50000, 100000, 15],  # Outlier! Mansion accidentally in apartment data
])

outliers = detect_outliers(apartments)
print("Outlier locations:\n", outliers)
# Handle: Remove, cap, or flag for review
```

### Feature Scaling Choices

```python theme={null}
# Different scaling methods for different situations

# Min-Max: Scale to [0, 1] - use when you need bounded values
def minmax_scale(data):
    mins = data.min(axis=0)
    maxs = data.max(axis=0)
    return (data - mins) / (maxs - mins + 1e-8)

# Z-Score: Center and scale - use when comparing distributions
def zscore_scale(data):
    means = data.mean(axis=0)
    stds = data.std(axis=0)
    return (data - means) / (stds + 1e-8)

# Robust: Use median/IQR - use when outliers are present
def robust_scale(data):
    medians = np.median(data, axis=0)
    q75 = np.percentile(data, 75, axis=0)
    q25 = np.percentile(data, 25, axis=0)
    iqr = q75 - q25
    return (data - medians) / (iqr + 1e-8)

print("Choose your scaler based on your data characteristics!")
```

<Tip>
  **Rule of Thumb**:

  * **Min-Max**: Neural networks, bounded features
  * **Z-Score**: Most ML algorithms, normally distributed data
  * **Robust**: Data with outliers, skewed distributions
</Tip>

***

## 🔬 Advanced Deep Dive (Optional)

<Accordion title="Advanced: High-Dimensional Geometry & the Curse of Dimensionality" icon="flask">
  ### Why High Dimensions Are Weird

  In high dimensions, our intuition breaks down completely:

  ```python theme={null}
  import numpy as np

  def random_vector_similarity(dim, n_pairs=1000):
      """Average cosine similarity between random unit vectors."""
      similarities = []
      for _ in range(n_pairs):
          a = np.random.randn(dim)
          b = np.random.randn(dim)
          a = a / np.linalg.norm(a)
          b = b / np.linalg.norm(b)
          similarities.append(np.dot(a, b))
      return np.mean(similarities), np.std(similarities)

  print("Random vector similarity by dimension:")
  for dim in [2, 10, 100, 1000, 10000]:
      mean, std = random_vector_similarity(dim)
      print(f"  {dim:5d}D: mean={mean:+.4f}, std={std:.4f}")

  # Output:
  #      2D: mean=+0.0012, std=0.7071  ← High variance
  #     10D: mean=-0.0008, std=0.3162
  #    100D: mean=+0.0002, std=0.1000
  #   1000D: mean=-0.0001, std=0.0316  ← Nearly orthogonal!
  #  10000D: mean=+0.0000, std=0.0100  ← All vectors ~90° apart
  ```

  **Key Insight**: In 10,000 dimensions, random vectors are almost perfectly orthogonal! This is why:

  * Random embeddings don't work (everything is equally dissimilar)
  * Trained embeddings are necessary (learn meaningful directions)
  * Dimension reduction (PCA, t-SNE) helps visualization

  ### Volume Concentration

  ```python theme={null}
  # In high-D, almost all volume is at the surface of a sphere!
  def shell_volume_ratio(dim, thickness=0.01):
      """What fraction of unit ball is within `thickness` of surface?"""
      inner_radius = 1 - thickness
      # V(r) ∝ r^d
      inner_volume_ratio = inner_radius ** dim
      shell_ratio = 1 - inner_volume_ratio
      return shell_ratio

  print("Fraction of volume near surface (within 1%):")
  for dim in [2, 10, 50, 100, 500]:
      ratio = shell_volume_ratio(dim)
      print(f"  {dim:3d}D: {ratio:.4%}")

  # Output:
  #    2D: 1.99%
  #   10D: 9.56%
  #   50D: 39.50%
  #  100D: 63.40%
  #  500D: 99.33%  ← Almost everything is on the edge!
  ```

  ### Implications for ML

  1. **Nearest Neighbors degrades**: All points become equidistant
  2. **More data needed**: Exponentially more samples to cover space
  3. **Regularization essential**: Prevents overfitting in sparse spaces
  4. **Feature selection matters**: Irrelevant features hurt more in high-D
</Accordion>

<Accordion title="Advanced: Locality-Sensitive Hashing (LSH) for Fast Similarity Search" icon="bolt">
  ### The Problem: Brute Force Doesn't Scale

  Finding similar vectors in a billion-vector database takes forever with brute force:

  ```python theme={null}
  # Brute force: O(n × d) per query
  def brute_force_search(query, database, k=10):
      """Find k nearest neighbors - SLOW for large databases!"""
      similarities = []
      for i, vec in enumerate(database):
          sim = np.dot(query, vec) / (np.linalg.norm(query) * np.linalg.norm(vec))
          similarities.append((i, sim))
      return sorted(similarities, key=lambda x: x[1], reverse=True)[:k]

  # 1 billion vectors × 768 dimensions = 3+ trillion operations per query!
  ```

  ### LSH: Approximate but Fast

  Locality-Sensitive Hashing groups similar vectors into the same "bucket":

  ```python theme={null}
  class SimpleLSH:
      """Simplified LSH for cosine similarity."""
      
      def __init__(self, dim, n_hyperplanes=16):
          # Random hyperplanes divide space into 2^n regions
          self.hyperplanes = np.random.randn(n_hyperplanes, dim)
      
      def hash(self, vector):
          """Convert vector to binary hash."""
          # Which side of each hyperplane?
          projections = self.hyperplanes @ vector
          bits = (projections > 0).astype(int)
          return tuple(bits)
      
      def build_index(self, vectors):
          """Group vectors by hash."""
          self.buckets = {}
          for i, vec in enumerate(vectors):
              h = self.hash(vec)
              if h not in self.buckets:
                  self.buckets[h] = []
              self.buckets[h].append(i)
          return self
      
      def search(self, query, vectors, k=10):
          """Search only in same bucket - much faster!"""
          h = self.hash(query)
          candidates = self.buckets.get(h, [])
          
          # Compare only with candidates
          similarities = []
          for i in candidates:
              sim = np.dot(query, vectors[i]) / (np.linalg.norm(query) * np.linalg.norm(vectors[i]))
              similarities.append((i, sim))
          
          return sorted(similarities, key=lambda x: x[1], reverse=True)[:k]

  # Usage
  dim = 768
  n_vectors = 100000
  vectors = np.random.randn(n_vectors, dim)
  query = np.random.randn(dim)

  lsh = SimpleLSH(dim, n_hyperplanes=12)
  lsh.build_index(vectors)

  # Instead of searching 100,000 vectors, we search ~100!
  results = lsh.search(query, vectors, k=5)
  print(f"Found {len(results)} approximate neighbors")
  ```

  **Trade-off**: Speed vs accuracy. LSH might miss some true neighbors, but it's 100-1000x faster!

  **Production systems** (Pinecone, Milvus, Faiss) use sophisticated variants of LSH and graph-based methods.
</Accordion>

***

## Key Takeaways

✅ **Vectors represent data** - Houses, images, text all become vectors\
✅ **Dot product measures similarity** - Foundation of neural networks\
✅ **Cosine similarity** - Direction-based (ignores magnitude)\
✅ **Euclidean distance** - Position-based (includes magnitude)\
✅ **Normalization matters** - Prevent one feature from dominating\
✅ **Same math, different domains** - Vectors work everywhere!\
✅ **Handle messy data** - Missing values, outliers, and scaling are production realities\
✅ **High dimensions are weird** - Curse of dimensionality affects all similarity search

***

## 🔗 Math → ML Connection Summary

<Note>
  **What you learned in this module powers these ML systems:**

  | Vector Concept                   | ML Application                       | Real-World Example                             |
  | -------------------------------- | ------------------------------------ | ---------------------------------------------- |
  | **Representing data as vectors** | Feature vectors in any ML model      | Every scikit-learn model takes feature vectors |
  | **Dot product**                  | Neural network layers, attention     | `y = W·x + b` is the core of deep learning     |
  | **Cosine similarity**            | Semantic search, recommendations     | ChatGPT's embeddings, Spotify recommendations  |
  | **Euclidean distance**           | KNN classification, clustering       | Customer segmentation, image retrieval         |
  | **Normalization**                | Batch normalization, feature scaling | Required preprocessing for most models         |
  | **High-dimensional vectors**     | Word embeddings, image features      | GPT uses 12,000+ dimensional embeddings        |

  **Next time you use any ML model, remember: it's operating on vectors using these exact operations!**
</Note>

***

<Accordion title="🚀 Going Deeper: Vector Spaces (Optional Advanced Theory)" icon="graduation-cap">
  **For learners who want the mathematical foundations:**

  ### Vector Spaces: The Abstract View

  A **vector space** is a set of objects (vectors) with two operations (addition and scalar multiplication) that satisfy certain axioms. This abstraction lets us apply vector math to surprising domains:

  | Domain          | "Vectors"      | Addition               | Scalar Multiplication |
  | --------------- | -------------- | ---------------------- | --------------------- |
  | **Functions**   | f(x), g(x)     | (f+g)(x) = f(x) + g(x) | (cf)(x) = c·f(x)      |
  | **Polynomials** | 1, x, x², ...  | Combine coefficients   | Scale coefficients    |
  | **Matrices**    | Any m×n matrix | Element-wise addition  | Element-wise scaling  |
  | **Signals**     | Time series    | Add signals            | Amplify/attenuate     |

  ### Linear Independence & Basis

  A set of vectors is **linearly independent** if no vector can be written as a combination of others:

  $\text{If } c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_n\mathbf{v}_n = \mathbf{0} \text{, then all } c_i = 0$

  A **basis** is a minimal set of linearly independent vectors that span the space.

  **ML Application**: In neural networks, we're essentially finding a good basis to represent data. Autoencoders find compressed bases; attention mechanisms dynamically select relevant basis directions.

  ### Inner Product Spaces

  Our dot product is a specific **inner product**. More generally, an inner product ⟨·,·⟩ satisfies:

  1. ⟨u, v⟩ = ⟨v, u⟩ (symmetry)
  2. ⟨au + bv, w⟩ = a⟨u, w⟩ + b⟨v, w⟩ (linearity)
  3. ⟨v, v⟩ ≥ 0, with equality iff v = 0 (positive definiteness)

  **Why this matters**: Different inner products define different notions of similarity! Kernel methods in ML use custom inner products to find nonlinear patterns.

  ### Recommended Deep-Dive Resources

  * **Gilbert Strang's Linear Algebra** (MIT OpenCourseWare) - Rigorous but intuitive
  * **3Blue1Brown: Essence of Linear Algebra** - Visual understanding
  * **Mathematics for Machine Learning** book, Ch. 2-3 - ML-focused treatment
</Accordion>

***

## Word Embeddings: Vectors in NLP

<Note>
  **Mind-blowing application**: Words are vectors, and vector math works on meaning!
</Note>

```python theme={null}
# Word2Vec / GloVe represent words as ~300-dimensional vectors
# Famous example: King - Man + Woman ≈ Queen

king = np.array([0.5, 0.3, 0.8, ...])    # 300 dimensions
man = np.array([0.4, 0.2, 0.1, ...])
woman = np.array([0.4, 0.3, 0.2, ...])

# Vector arithmetic on meaning!
result = king - man + woman
# result is closest to the "queen" vector!

# This works because:
# king - man captures "royalty without gender"
# Adding woman reintroduces gender → queen
```

**Modern AI (GPT-4, Claude)** uses this same principle with transformer embeddings of 12,000+ dimensions!

***

## Interview Questions: Vectors

<AccordionGroup>
  <Accordion title="What is the dot product, and why is it important in ML?" icon="question">
    **Answer**: The dot product $\mathbf{a} \cdot \mathbf{b} = \sum a_i b_i$ measures alignment between vectors. In ML:

    * **Neural networks**: Every neuron computes a dot product (weights · inputs)
    * **Attention mechanisms**: Query-key dot products determine what to focus on
    * **Similarity search**: Cosine similarity uses normalized dot products
    * **Loss functions**: Many involve dot products (cross-entropy, hinge loss)
  </Accordion>

  <Accordion title="When would you use cosine similarity vs Euclidean distance?" icon="ruler-combined">
    **Answer**:

    * **Cosine**: When magnitude doesn't matter (text similarity, user preferences, normalized data)
    * **Euclidean**: When absolute values matter (physical distance, raw measurements)
    * **Example**: Two documents about ML with different lengths should be similar (cosine), but two GPS coordinates need actual distance (Euclidean)
  </Accordion>

  <Accordion title="How does dimensionality affect similarity?" icon="cube">
    **Answer**: In high dimensions:

    * All points become roughly equidistant ("curse of dimensionality")
    * Random vectors are almost orthogonal (cosine ≈ 0)
    * This is why PCA/dimension reduction is important
    * Modern embeddings (512-4096 dim) are trained to preserve meaningful similarity
  </Accordion>
</AccordionGroup>

***

## What's Next?

You now understand how to represent houses as vectors and measure similarity. But how do we actually **predict** the price?

That's where **matrices** come in. A matrix is a function that transforms input (house features) into output (price prediction). This is exactly how neural networks work!

<Card title="Next: Matrices & Transformations" icon="arrow-right" href="/courses/math-for-ml-linear-algebra/03-matrices">
  Learn how matrices transform house features into price predictions
</Card>
