Skip to main content
Matrices & Linear Transformations

Matrices & Linear Transformations

A Problem You Already Understand: Calculating Final Grades

You’re a teacher with a spreadsheet of student data:
StudentHomework (40%)Midterm (25%)Final (35%)
Alice928885
Bob788290
Carol959092
Question: What’s each student’s final grade? You already know how to do this in Excel:
=0.40*B2 + 0.25*C2 + 0.35*D2
For Alice: 0.40×92 + 0.25×88 + 0.35×85 = 36.8 + 22 + 29.75 = 88.55 Congratulations — you just did matrix multiplication! Grade Calculation as Matrix Math
Estimated Time: 4-5 hours
Difficulty: Beginner to Intermediate
Prerequisites: Vectors module
What You’ll Build: Grade calculator, photo filter app, and a simple prediction model

What You Just Did (Mathematically)

Let’s write that Excel formula as math: Matrix Transformation Math Concept
import numpy as np

# Student scores as a vector
alice = np.array([92, 88, 85])  # [homework, midterm, final]

# Weights as a vector  
weights = np.array([0.40, 0.25, 0.35])

# Final grade = weighted sum
final_grade = np.dot(weights, alice)
# = 0.40×92 + 0.25×88 + 0.35×85
# = 88.55

print(f"Alice's grade: {final_grade:.1f}")
The dot product IS the weighted sum. You’ve been doing “matrix math” in Excel for years!

Scaling Up: All Students at Once

What if you have 100 students? You don’t want to calculate one by one.
# All students in a matrix (each row = one student)
scores = np.array([
    [92, 88, 85],   # Alice
    [78, 82, 90],   # Bob  
    [95, 90, 92],   # Carol
])

# Weights
weights = np.array([0.40, 0.25, 0.35])

# Calculate ALL final grades at once!
final_grades = scores @ weights  # Matrix multiplication

print("Final grades:")
print(f"  Alice: {final_grades[0]:.1f}")
print(f"  Bob: {final_grades[1]:.1f}")
print(f"  Carol: {final_grades[2]:.1f}")
Output:
Final grades:
  Alice: 88.6
  Bob: 83.2
  Carol: 92.2
That’s matrix multiplication: applying the same weighted sum to every row, all at once.

What Is a Matrix?

Now let’s get formal.

A Matrix is a Table of Numbers

# A 3×3 matrix (3 rows, 3 columns)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print(A.shape)  # (3, 3)
Mathematical notation: An m×nm \times n matrix has mm rows and nn columns: A=[a11a12a1na21a22a2nam1am2amn]A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} Where aija_{ij} is the element at row ii, column jj.

Matrix Operations: The Complete Toolkit

Matrix Addition

Add matrices of the same size element-by-element: A+B=[a11a12a21a22]+[b11b12b21b22]=[a11+b11a12+b12a21+b21a22+b22]A + B = \begin{bmatrix}a_{11} & a_{12}\\a_{21} & a_{22}\end{bmatrix} + \begin{bmatrix}b_{11} & b_{12}\\b_{21} & b_{22}\end{bmatrix} = \begin{bmatrix}a_{11}+b_{11} & a_{12}+b_{12}\\a_{21}+b_{21} & a_{22}+b_{22}\end{bmatrix}
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

C = A + B
print(C)
# [[6, 8], [10, 12]]

Scalar Multiplication

Multiply every element by a number: cA=c[a11a12a21a22]=[ca11ca12ca21ca22]cA = c \begin{bmatrix}a_{11} & a_{12}\\a_{21} & a_{22}\end{bmatrix} = \begin{bmatrix}c \cdot a_{11} & c \cdot a_{12}\\c \cdot a_{21} & c \cdot a_{22}\end{bmatrix}
A = np.array([[1, 2], [3, 4]])
print(3 * A)
# [[3, 6], [9, 12]]

Matrix Transpose

Flip rows and columns (swap aija_{ij} with ajia_{ji}): AT=[123456]T=[142536]A^T = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}^T = \begin{bmatrix}1 & 4\\2 & 5\\3 & 6\end{bmatrix}
A = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Original shape: {A.shape}")    # (2, 3)
print(f"Transposed shape: {A.T.shape}") # (3, 2)
print(A.T)
# [[1, 4], [2, 5], [3, 6]]
Key properties:
  • (AT)T=A(A^T)^T = A (transpose twice = original)
  • (A+B)T=AT+BT(A + B)^T = A^T + B^T
  • (AB)T=BTAT(AB)^T = B^T A^T (note the reversed order!)

Matrix Multiplication

This is the most important operation! For C=ABC = AB: Cij=k=1nAikBkj=(row i of A)(column j of B)C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj} = \text{(row } i \text{ of } A) \cdot \text{(column } j \text{ of } B) The rule: (row) × (column) = one number, repeated for every position.
A = np.array([[1, 2], [3, 4]])  # 2×2
B = np.array([[5, 6], [7, 8]])  # 2×2

C = A @ B  # or np.matmul(A, B) or np.dot(A, B)
print(C)
# [[19, 22], [43, 50]]

# Let's verify C[0,0]:
# Row 0 of A: [1, 2]
# Col 0 of B: [5, 7]
# Dot product: 1×5 + 2×7 = 5 + 14 = 19 ✓
Worked example — step by step: [1234]×[5678]=[(1)(5)+(2)(7)(1)(6)+(2)(8)(3)(5)+(4)(7)(3)(6)+(4)(8)]=[19224350]\begin{bmatrix}1 & 2\\3 & 4\end{bmatrix} \times \begin{bmatrix}5 & 6\\7 & 8\end{bmatrix} = \begin{bmatrix}(1)(5)+(2)(7) & (1)(6)+(2)(8)\\(3)(5)+(4)(7) & (3)(6)+(4)(8)\end{bmatrix} = \begin{bmatrix}19 & 22\\43 & 50\end{bmatrix}
Matrix multiplication is NOT commutative! ABBAAB \neq BA in general.
print(A @ B)  # [[19, 22], [43, 50]]
print(B @ A)  # [[23, 34], [31, 46]] - Different!
Dimension rule: For ABAB to work, columns of AA must equal rows of BB:
  • (m×n)×(n×p)=(m×p)(m \times n) \times (n \times p) = (m \times p)
  • The inner dimensions must match!
A = np.array([[1, 2, 3], [4, 5, 6]])  # 2×3
B = np.array([[1], [2], [3]])          # 3×1

C = A @ B  # (2×3) × (3×1) = (2×1)
print(C.shape)  # (2, 1)

The Identity Matrix

The identity matrix II is like “1” for matrices — multiplying by it changes nothing: I=[100010001],AI=IA=AI = \begin{bmatrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{bmatrix}, \quad AI = IA = A
I = np.eye(3)  # 3×3 identity matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(np.allclose(A @ I, A))  # True
print(np.allclose(I @ A, A))  # True

Matrix Inverse

The inverse A1A^{-1} “undoes” multiplication by AA: AA1=A1A=IAA^{-1} = A^{-1}A = I For a 2×2 matrix: A=[abcd],A1=1adbc[dbca]A = \begin{bmatrix}a & b\\c & d\end{bmatrix}, \quad A^{-1} = \frac{1}{ad-bc}\begin{bmatrix}d & -b\\-c & a\end{bmatrix} The term adbcad - bc is called the determinant. If it’s zero, no inverse exists!
A = np.array([[4, 7], [2, 6]])

# Determinant
det = 4*6 - 7*2  # = 10

# Inverse
A_inv = np.linalg.inv(A)
print(A_inv)
# [[ 0.6, -0.7], [-0.2,  0.4]]

# Verify: A @ A_inv = I
print(np.allclose(A @ A_inv, np.eye(2)))  # True

Determinant

The determinant measures how much a matrix “scales” area (or volume): 2×2 determinant: det[abcd]=adbc\det\begin{bmatrix}a & b\\c & d\end{bmatrix} = ad - bc Properties:
  • det(I)=1\det(I) = 1
  • det(AB)=det(A)det(B)\det(AB) = \det(A) \cdot \det(B)
  • If det(A)=0\det(A) = 0, the matrix is “singular” (no inverse)
  • det(A)|\det(A)| = factor by which area is scaled
A = np.array([[2, 0], [0, 3]])
print(np.linalg.det(A))  # 6.0

# This matrix scales area by 6x (2× in x-direction, 3× in y-direction)

A Matrix is a Transformation Machine

Here’s the key insight: A matrix takes a vector in, and spits a different vector out.
Input Vector  →  [Matrix]  →  Output Vector
   (3 numbers)               (could be different size!)
Example: Our grade calculation
[92, 88, 85]  →  [0.40, 0.25, 0.35]  →  88.55
   3 numbers         weights           1 number
The matrix (our weights) transformed 3 scores into 1 grade.

Real-World Example: Photo Filters

Ever wonder how Instagram filters work? They’re matrix operations! Photo Filter as Matrix Math

Brightness: Multiply Every Pixel

from PIL import Image
import numpy as np

# Load an image (each pixel is [R, G, B] from 0-255)
image = np.array(Image.open('photo.jpg'))
print(image.shape)  # e.g., (1080, 1920, 3) - height × width × RGB

# Increase brightness by 20%
brighter = image * 1.2
brighter = np.clip(brighter, 0, 255)  # Keep values in valid range

# That's scalar multiplication - a type of matrix operation!

Color Transformation: Matrix Multiplication

What if you want to make a photo look “warmer” (more red/yellow) or “cooler” (more blue)?
# A color transformation matrix
# Each row says how to calculate new [R, G, B] from old [R, G, B]
warm_filter = np.array([
    [1.2, 0.1, 0.0],   # New R = 1.2×R + 0.1×G + 0×B (boost red)
    [0.0, 1.1, 0.0],   # New G = 0×R + 1.1×G + 0×B (slight boost)
    [0.0, 0.0, 0.9],   # New B = 0×R + 0×G + 0.9×B (reduce blue)
])

# Apply to one pixel
original_pixel = np.array([100, 150, 200])  # Some blue-ish color
new_pixel = warm_filter @ original_pixel
print(f"Original: {original_pixel}")
print(f"Warmer: {new_pixel.astype(int)}")

# Output:
# Original: [100 150 200]
# Warmer: [135 165 180]  ← More red, less blue!

Grayscale: Average the Colors

# Grayscale transformation
# Human eyes are more sensitive to green, so we weight it higher
grayscale_weights = np.array([0.299, 0.587, 0.114])

pixel = np.array([100, 150, 200])
gray_value = np.dot(grayscale_weights, pixel)
print(f"Gray value: {gray_value:.0f}")  # 143
Every Instagram filter is a matrix transformation on your pixels!

Real-World Example: Predicting House Prices

Now let’s apply this to prediction — the core of machine learning.

The Setup

You have data about houses:
HouseBedroomsSqftAge (years)Price
A3200010$350k
B425005$450k
C2120020$250k
Question: Given a NEW house with 3 bedrooms, 1800 sqft, 15 years old — what’s the predicted price?

The Approach: Weighted Sum of Features

Just like grades! Each feature contributes to the price:
# New house features
new_house = np.array([3, 1800, 15])

# Weights (how much each feature matters)
# These would be "learned" from data in real ML
weights = np.array([
    50000,   # Each bedroom adds $50k
    150,     # Each sqft adds $150
    -3000,   # Each year of age subtracts $3k
])

# Prediction
predicted_price = np.dot(weights, new_house)
# = 50000×3 + 150×1800 + (-3000)×15
# = 150000 + 270000 - 45000
# = $375,000

print(f"Predicted price: ${predicted_price:,}")

Predicting Many Houses at Once

# Multiple houses
houses = np.array([
    [3, 2000, 10],   # House 1
    [4, 2500, 5],    # House 2
    [2, 1200, 20],   # House 3
    [3, 1800, 15],   # House 4 (our query)
])

# Predict ALL prices at once
prices = houses @ weights

print("Predicted prices:")
for i, price in enumerate(prices, 1):
    print(f"  House {i}: ${price:,}")
Output:
Predicted prices:
  House 1: $420,000
  House 2: $560,000
  House 3: $280,000
  House 4: $375,000
This is linear regression — and it’s just matrix multiplication!

The Connection to Machine Learning

Now you understand the core operation of ML. Let’s make the connection explicit.

What a Neural Network Layer Does

Every layer in a neural network does exactly what we just did:
# Simplified neural network layer
def neural_network_layer(input_vector, weights, bias):
    """
    This is literally what PyTorch/TensorFlow do!
    """
    output = np.dot(weights, input_vector) + bias
    return output

# Example: 4 input features → 2 output neurons
input_features = np.array([3, 2000, 15, 5])  # House features

# Weight matrix: 2 outputs × 4 inputs
weights = np.array([
    [50000, 150, -3000, -5000],    # Weights for output 1
    [30000, 100, -2000, -3000],    # Weights for output 2
])

biases = np.array([10000, 5000])

# Forward pass
output = neural_network_layer(input_features, weights, biases)
print(f"Layer output: {output}")
# Two output values, computed by two different weighted sums

The Pattern

ApplicationInputWeightsOutput
Grade calculation[homework, midterm, final][0.4, 0.25, 0.35]final_grade
House pricing[beds, sqft, age][50k, 150, -3k]price
Photo filter[R, G, B]3×3 color matrix[R’, G’, B’]
Neural networkfeatureslearned weightspredictions
The math is identical. Only the application changes.
final_grade = np.dot(weights, student)
# = (2.5×12) + (0.3×85) + (1.5×8) + (0.2×90)
# = 30 + 25.5 + 12 + 18
# = 85.5

print(f"Predicted final grade: {final_grade:.1f}")

Batch Prediction for Entire Class

# All students in class
students = np.array([
    [12, 85, 8, 90],   # Student A
    [8, 70, 6, 75],    # Student B
    [15, 95, 10, 95],  # Student C
])

# Predict all grades at once
grades = students @ weights
print(grades)  # [85.5, 63.5, 102.5]
Real application: Learning management systems use this to predict which students need help!

Example 3: Movie Rating Prediction

The Model

# Movie features
movie = np.array([
    0.9,    # action score (0-1)
    0.1,    # romance score
    0.3,    # comedy score
    8.5,    # IMDb rating
    2020    # release year
])

# User preferences (weights)
user_weights = np.array([
    5,      # Loves action (+5 points)
    -2,     # Dislikes romance (-2 points)
    3,      # Likes comedy (+3 points)
    0.5,    # Considers IMDb rating
    -0.001  # Slight preference for newer movies
])

# Predict user's rating
predicted_rating = np.dot(user_weights, movie)
# = (5×0.9) + (-2×0.1) + (3×0.3) + (0.5×8.5) + (-0.001×2020)
# = 4.5 - 0.2 + 0.9 + 4.25 - 2.02
# = 7.43

print(f"Predicted rating: {predicted_rating:.1f}/10")

Recommend Movies to User

# Movie database
movies = np.array([
    [0.9, 0.1, 0.3, 8.5, 2020],  # Action movie
    [0.2, 0.9, 0.1, 7.5, 2019],  # Romance movie
    [0.7, 0.2, 0.8, 8.0, 2021],  # Action-comedy
])

# Predict ratings for all movies
ratings = movies @ user_weights
print(ratings)  # [7.43, 2.73, 8.88]

# Recommend highest-rated movie
best_movie = np.argmax(ratings)
print(f"Recommended: Movie {best_movie}")  # Movie 2 (action-comedy)
Real application: Netflix uses matrices to predict your ratings for millions of movies!

Matrix Operations

1. Matrix Addition

When to use: Combining multiple models, updating parameters
# Two prediction models
model_1 = np.array([[1, 2], [3, 4]])
model_2 = np.array([[5, 6], [7, 8]])

# Ensemble: average the models
ensemble = (model_1 + model_2) / 2
print(ensemble)
# [[3, 4],
#  [5, 6]]

2. Scalar Multiplication

When to use: Scaling predictions, learning rates
# Original weights
weights = np.array([[50000, 120, -5000, -8000]])

# Scale down for regularization
scaled_weights = 0.9 * weights

3. Matrix Multiplication

The Most Important Operation! Rule: (m×n) matrix × (n×p) matrix = (m×p) matrix The inner dimensions must match!
# Example: Neural network layer
# Input: 3 features → Output: 2 neurons

# Weight matrix (2 neurons × 3 features)
W = np.array([
    [0.5, -0.3, 0.8],  # Weights for neuron 1
    [0.2, 0.6, -0.4]   # Weights for neuron 2
])

# Input vector (3 features)
x = np.array([1.0, 2.0, 3.0])

# Output (2 neurons)
y = W @ x
# Neuron 1: (0.5×1.0) + (-0.3×2.0) + (0.8×3.0) = 2.3
# Neuron 2: (0.2×1.0) + (0.6×2.0) + (-0.4×3.0) = 0.2

print(y)  # [2.3, 0.2]
Neural Network Layer Key Insight: Every neural network layer is just matrix multiplication!

Matrix Multiplication: Three Examples

Example 1: Multi-Output House Prediction

Predict multiple outputs from house features:
# House features
house = np.array([3, 2000, 15, 5])

# Weight matrix (3 outputs × 4 features)
W = np.array([
    [50000, 120, -5000, -8000],   # Price prediction
    [0.8, 0.0002, -0.01, -0.02],  # Desirability score (0-1)
    [2, 0.001, -0.5, -0.3]        # Market time (months)
])

# Predict all outputs
outputs = W @ house
print(f"Price: ${outputs[0]:,.0f}")
print(f"Desirability: {outputs[1]:.2f}")
print(f"Time to sell: {outputs[2]:.1f} months")
Output:
Price: $275,000
Desirability: 0.73
Time to sell: 2.8 months

Example 2: Student Performance Across Subjects

Predict grades in multiple subjects:
# Student features
student = np.array([12, 85, 8, 90])  # [study_hours, prev_score, assignments, attendance]

# Weight matrix (3 subjects × 4 features)
W = np.array([
    [2.5, 0.3, 1.5, 0.2],   # Math
    [2.0, 0.4, 1.0, 0.3],   # English
    [3.0, 0.2, 2.0, 0.1]    # Science
])

# Predict all subject grades
grades = W @ student
print(f"Math: {grades[0]:.1f}")
print(f"English: {grades[1]:.1f}")
print(f"Science: {grades[2]:.1f}")

Example 3: Multi-User Movie Recommendations

Predict ratings for multiple users:
# Movie features
movie = np.array([0.9, 0.1, 0.3, 8.5, 2020])

# User preference matrix (3 users × 5 features)
U = np.array([
    [5, -2, 3, 0.5, -0.001],    # User 1: loves action
    [-1, 5, 2, 0.3, -0.002],    # User 2: loves romance
    [3, 1, 4, 0.4, 0]           # User 3: balanced
])

# Predict ratings for all users
ratings = U @ movie
print(f"User 1 rating: {ratings[0]:.1f}")
print(f"User 2 rating: {ratings[1]:.1f}")
print(f"User 3 rating: {ratings[2]:.1f}")

Matrix Transpose

Definition: Flip rows and columns
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

A_T = A.T
print(A_T)
# [[1, 4],
#  [2, 5],
#  [3, 6]]
Why it matters: Used in backpropagation!
# Forward pass
X = np.array([[1, 2, 3]])  # (1×3) input
W = np.array([[0.5], [0.3], [0.8]])  # (3×1) weights
y = X @ W  # (1×1) output

# Backward pass (gradient)
dy = np.array([[1.0]])  # Gradient from loss
dW = X.T @ dy  # (3×1) gradient for weights

Identity Matrix

Definition: Matrix that doesn’t change vectors Iv=vI\mathbf{v} = \mathbf{v}
I = np.eye(3)  # 3×3 identity
print(I)
# [[1, 0, 0],
#  [0, 1, 0],
#  [0, 0, 1]]

v = np.array([5, 10, 15])
result = I @ v
print(result)  # [5, 10, 15] (unchanged!)
Why it matters: Used in regularization, initialization, and matrix inversion.

Matrix Inverse

Definition: Matrix that “undoes” another matrix AA1=IA A^{-1} = I
A = np.array([
    [4, 7],
    [2, 6]
])

A_inv = np.linalg.inv(A)
print(A_inv)

# Verify
print(A @ A_inv)  # ≈ Identity matrix
Application: Solving linear equations
# Solve: Ax = b for x
A = np.array([[2, 1], [1, 3]])
b = np.array([5, 7])

# Solution: x = A^(-1) b
x = np.linalg.inv(A) @ b
print(x)  # [1, 3]

# Verify
print(A @ x)  # [5, 7] ✓

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises connect matrix operations to real applications you use every day.

Exercise 1: Instagram Color Filters 📸

Create custom photo filters using matrix transformations:
import numpy as np

# A pixel is represented as [R, G, B] from 0-255
sample_pixel = np.array([180, 120, 90])  # A warm skin tone

# TODO: Create these filter matrices and apply them
# 1. Sepia filter (vintage brown look)
# 2. Negative filter (invert colors)
# 3. Custom "sunset" filter (boost reds/oranges)
Tasks:
  1. Design a 3×3 sepia transformation matrix
  2. Apply it to the sample pixel
  3. Create your own artistic filter
import numpy as np

sample_pixel = np.array([180, 120, 90])

# 1. Sepia Filter Matrix (classic vintage look)
sepia = np.array([
    [0.393, 0.769, 0.189],  # New R
    [0.349, 0.686, 0.168],  # New G
    [0.272, 0.534, 0.131],  # New B
])

# 2. Negative Filter (invert)
negative = np.array([
    [-1, 0, 0],
    [0, -1, 0],
    [0, 0, -1],
])

# 3. Sunset Filter (warm, golden hour look)
sunset = np.array([
    [1.2, 0.1, 0.0],   # Boost red
    [0.1, 1.0, 0.0],   # Slight red bleed to green
    [0.0, 0.0, 0.7],   # Reduce blue
])

print("Original pixel:", sample_pixel)
print("-" * 40)

# Apply Sepia
sepia_result = np.clip(sepia @ sample_pixel, 0, 255).astype(int)
print(f"Sepia filter:   {sepia_result}")

# Apply Negative (add 255 to invert)
negative_result = np.clip(negative @ sample_pixel + 255, 0, 255).astype(int)
print(f"Negative:       {negative_result}")

# Apply Sunset
sunset_result = np.clip(sunset @ sample_pixel, 0, 255).astype(int)
print(f"Sunset filter:  {sunset_result}")

# Output:
# Original pixel: [180 120  90]
# Sepia filter:   [196 174 136]  ← Warm vintage brown
# Negative:       [ 75 135 165]  ← Inverted colors
# Sunset filter:  [228 138  63]  ← Golden warm tones
Real-World Insight: Instagram’s filters are exactly this - matrix multiplications applied to every pixel! The “Clarendon” filter boosts contrast, “Gingham” adds vintage fade.

Exercise 2: Multi-Store Inventory Management 🏪

A retail company has 3 stores and 4 products. Calculate total revenue using matrix multiplication:
# Inventory: rows = stores, cols = products
inventory = np.array([
    [50, 30, 100, 25],   # Store A
    [80, 45, 60, 40],    # Store B
    [35, 60, 80, 55],    # Store C
])

# Prices per product
prices = np.array([29.99, 49.99, 9.99, 79.99])

# Units sold (percentage of inventory)
sales_rate = np.array([
    [0.8, 0.6, 0.9, 0.5],   # Store A
    [0.7, 0.8, 0.7, 0.6],   # Store B
    [0.9, 0.5, 0.8, 0.7],   # Store C
])
Tasks:
  1. Calculate units sold at each store (element-wise multiply inventory × sales_rate)
  2. Calculate total revenue per store (matrix × price vector)
  3. Which store had the highest revenue?
import numpy as np

inventory = np.array([
    [50, 30, 100, 25],   # Store A
    [80, 45, 60, 40],    # Store B
    [35, 60, 80, 55],    # Store C
])

prices = np.array([29.99, 49.99, 9.99, 79.99])

sales_rate = np.array([
    [0.8, 0.6, 0.9, 0.5],
    [0.7, 0.8, 0.7, 0.6],
    [0.9, 0.5, 0.8, 0.7],
])

# 1. Units sold at each store (element-wise)
units_sold = inventory * sales_rate
print("Units Sold per Store:")
print(units_sold)

# 2. Revenue per store (matrix-vector multiplication)
revenue_per_store = units_sold @ prices
print("\n💰 Revenue per Store:")
stores = ['Store A', 'Store B', 'Store C']
for store, rev in zip(stores, revenue_per_store):
    print(f"  {store}: ${rev:,.2f}")

# 3. Best performing store
best_store = stores[np.argmax(revenue_per_store)]
print(f"\n🏆 Highest Revenue: {best_store} (${max(revenue_per_store):,.2f})")

# Output:
# Revenue per Store:
#   Store A: $3,288.35
#   Store B: $4,275.18  ← Winner!
#   Store C: $4,270.63

# Bonus: Revenue breakdown by product
print("\n📊 Revenue by Product (all stores):")
product_revenue = units_sold.sum(axis=0) * prices
products = ['Product 1', 'Product 2', 'Product 3', 'Product 4']
for prod, rev in zip(products, product_revenue):
    print(f"  {prod}: ${rev:,.2f}")
Real-World Insight: This is how Walmart, Target, and Amazon calculate daily revenue across thousands of stores and millions of products - all matrix operations!

Exercise 3: Simple Neural Network Forward Pass 🧠

Implement a tiny neural network using only matrix operations:
import numpy as np

# Input: 4 features (e.g., house: beds, baths, sqft/1000, age/10)
X = np.array([[3, 2, 1.5, 1.5]])  # 1 house

# Layer 1: 4 inputs → 3 neurons
W1 = np.array([
    [0.5, 0.3, 0.8, -0.2],
    [0.2, 0.6, 0.4, -0.1],
    [0.7, 0.1, 0.5, -0.3],
])
b1 = np.array([0.1, 0.2, 0.1])

# Layer 2: 3 neurons → 1 output (price)
W2 = np.array([[0.8, 0.5, 0.6]])
b2 = np.array([0.1])
Tasks:
  1. Compute the output of Layer 1: h = X @ W1.T + b1
  2. Apply ReLU activation: h = max(0, h)
  3. Compute final output: y = h @ W2.T + b2
  4. What’s the predicted price?
import numpy as np

# Input
X = np.array([[3, 2, 1.5, 1.5]])

# Layer 1 weights and bias
W1 = np.array([
    [0.5, 0.3, 0.8, -0.2],
    [0.2, 0.6, 0.4, -0.1],
    [0.7, 0.1, 0.5, -0.3],
])
b1 = np.array([0.1, 0.2, 0.1])

# Layer 2 weights and bias
W2 = np.array([[0.8, 0.5, 0.6]])
b2 = np.array([0.1])

def relu(x):
    return np.maximum(0, x)

# Forward pass
print("🧠 Neural Network Forward Pass")
print("=" * 40)

# Layer 1
z1 = X @ W1.T + b1
print(f"Layer 1 pre-activation (z1): {z1}")

h1 = relu(z1)
print(f"Layer 1 post-ReLU (h1):      {h1}")

# Layer 2 (output)
z2 = h1 @ W2.T + b2
print(f"\nFinal output (z2): {z2}")

# Interpret as price (scale back)
predicted_price = z2[0, 0] * 100000  # Scale factor
print(f"\n🏠 Predicted House Price: ${predicted_price:,.0f}")

# Output:
# Layer 1 pre-activation: [[3.4  2.55 3.3 ]]
# Layer 1 post-ReLU:      [[3.4  2.55 3.3 ]]
# Final output: [[5.095]]
# Predicted House Price: $509,500
Real-World Insight: This is EXACTLY how PyTorch and TensorFlow work under the hood! Every deep learning model is just chains of matrix multiplications with non-linear activations.

Exercise 4: Cryptography - Hill Cipher 🔐

The Hill Cipher uses matrix multiplication to encrypt messages:
import numpy as np

# Encryption key (2x2 matrix)
key = np.array([
    [3, 3],
    [2, 5]
])

# Message: "HI" → Convert to numbers (A=0, B=1, ... Z=25)
# H=7, I=8
message = np.array([[7], [8]])

# Encrypt: ciphertext = key @ message (mod 26)
Tasks:
  1. Encrypt the message “HI”
  2. Find the decryption key (matrix inverse mod 26)
  3. Decrypt back to the original message
import numpy as np

def mod_inverse(a, m):
    """Find modular multiplicative inverse of a mod m"""
    for x in range(1, m):
        if (a * x) % m == 1:
            return x
    return None

# Encryption key
key = np.array([
    [3, 3],
    [2, 5]
])

# Message: "HI"
message = np.array([[7], [8]])  # H=7, I=8
print(f"Original message: HI → {message.flatten()}")

# 1. ENCRYPT
ciphertext = (key @ message) % 26
print(f"Encrypted: {ciphertext.flatten()}")

# Convert back to letters
cipher_letters = ''.join([chr(c + ord('A')) for c in ciphertext.flatten()])
print(f"Ciphertext letters: {cipher_letters}")

# 2. FIND DECRYPTION KEY
# Key inverse (mod 26) = (1/det) * adjugate (mod 26)
det = int(np.linalg.det(key)) % 26  # det = 3*5 - 3*2 = 9
det_inv = mod_inverse(det, 26)  # 9^(-1) mod 26 = 3

# Adjugate matrix
adjugate = np.array([
    [5, -3],
    [-2, 3]
])

# Decryption key
key_inv = (det_inv * adjugate) % 26
print(f"\nDecryption key:\n{key_inv}")

# 3. DECRYPT
decrypted = (key_inv @ ciphertext) % 26
print(f"\nDecrypted: {decrypted.flatten()}")

# Convert back to letters
original = ''.join([chr(int(c) + ord('A')) for c in decrypted.flatten()])
print(f"Original message: {original}")

# Output:
# Original message: HI → [7 8]
# Encrypted: [19 2]
# Ciphertext letters: TC
# Decryption key: [[15 17] [20  9]]
# Decrypted: [[7] [8]]
# Original message: HI ✓
Real-World Insight: While Hill Cipher is breakable, modern encryption (RSA, AES) uses similar matrix operations in much larger spaces. Your HTTPS connection uses these principles!

Key Takeaways

Matrices are functions that transform vectors
Matrix multiplication = applying transformations
Neural networks = stacked matrix multiplications
Batch processing = process many inputs at once
Transpose = used in backpropagation
Inverse = solving equations, undoing transformations

GPU Computing: Why Matrices Matter for Speed

Why GPUs are 100x faster: GPUs have thousands of cores designed for parallel matrix operations.
import numpy as np
import time

# CPU: Sequential operations
n = 5000
A = np.random.randn(n, n)
B = np.random.randn(n, n)

start = time.time()
C = A @ B  # Matrix multiplication
cpu_time = time.time() - start

print(f"CPU: {cpu_time:.2f}s for {n}x{n} matrix multiply")
# CPU: ~2-5 seconds

# On GPU (with PyTorch/CUDA):
# import torch
# A_gpu = torch.randn(n, n, device='cuda')
# B_gpu = torch.randn(n, n, device='cuda')
# C_gpu = A_gpu @ B_gpu  # ~0.02 seconds!
Why this matters:
  • GPT-4 training: 25,000 GPUs × months
  • Each forward pass: trillions of matrix operations
  • Without GPU optimization: would take centuries!

Interview Questions: Matrices

Answer: A neural network layer computes: h=σ(Wx+b)\mathbf{h} = \sigma(W\mathbf{x} + \mathbf{b})Where:
  • WW is the weight matrix (learned parameters)
  • x\mathbf{x} is the input vector
  • b\mathbf{b} is the bias vector
  • σ\sigma is the activation function (ReLU, sigmoid, etc.)
Matrix multiplication WxW\mathbf{x} computes weighted sums of inputs. The bias shifts the result. The activation adds non-linearity, enabling the network to learn complex patterns.
Answer: Batch processing (processing multiple samples simultaneously) is crucial because:
  1. GPU efficiency: GPUs parallelize matrix operations; single samples waste capacity
  2. Gradient stability: Averaging gradients over a batch reduces noise
  3. Memory efficiency: One matrix multiply instead of N vector operations
  4. Modern batch sizes: 32-8192 depending on task and memory
Answer: For two n×nn \times n matrices:
  • Naive algorithm: O(n3)O(n^3) - each of n2n^2 outputs requires nn multiplications
  • Strassen’s algorithm: O(n2.807)O(n^{2.807}) - faster but less numerically stable
  • Best known: O(n2.373)O(n^{2.373}) - theoretical, not practical
In practice, optimized libraries (BLAS, cuBLAS) use cache-aware algorithms that approach theoretical limits.

What’s Next?

You now understand how matrices transform data. But which transformations are most important? Which directions in your data carry the most information? That’s where eigenvalues and eigenvectors come in - they reveal the “natural axes” of your data!

Next: Eigenvalues & Eigenvectors

Discover which house features matter most for price prediction