Matrices & Linear Transformations

A Problem You Already Understand: Calculating Final Grades

You’re a teacher with a spreadsheet of student data:

Student	Homework (40%)	Midterm (25%)	Final (35%)
Alice	92	88	85
Bob	78	82	90
Carol	95	90	92

Question: What’s each student’s final grade? You already know how to do this in Excel:

=0.40*B2 + 0.25*C2 + 0.35*D2

For Alice: 0.40×92 + 0.25×88 + 0.35×85 = 36.8 + 22 + 29.75 = 88.55 Congratulations — you just did matrix multiplication!

Estimated Time: 4-5 hours
Difficulty: Beginner to Intermediate
Prerequisites: Vectors module
What You’ll Build: Grade calculator, photo filter app, and a simple prediction model

What You Just Did (Mathematically)

Let’s write that Excel formula as math:

import numpy as np

# Student scores as a vector
alice = np.array([92, 88, 85])  # [homework, midterm, final]

# Weights as a vector  
weights = np.array([0.40, 0.25, 0.35])

# Final grade = weighted sum
final_grade = np.dot(weights, alice)
# = 0.40×92 + 0.25×88 + 0.35×85
# = 88.55

print(f"Alice's grade: {final_grade:.1f}")

The dot product IS the weighted sum. You’ve been doing “matrix math” in Excel for years!

Scaling Up: All Students at Once

What if you have 100 students? You don’t want to calculate one by one.

# All students in a matrix (each row = one student)
scores = np.array([
    [92, 88, 85],   # Alice
    [78, 82, 90],   # Bob  
    [95, 90, 92],   # Carol
])

# Weights
weights = np.array([0.40, 0.25, 0.35])

# Calculate ALL final grades at once!
final_grades = scores @ weights  # Matrix multiplication

print("Final grades:")
print(f"  Alice: {final_grades[0]:.1f}")
print(f"  Bob: {final_grades[1]:.1f}")
print(f"  Carol: {final_grades[2]:.1f}")

Output:

Final grades:
  Alice: 88.6
  Bob: 83.2
  Carol: 92.2

That’s matrix multiplication: applying the same weighted sum to every row, all at once.

What Is a Matrix?

Now let’s get formal.

A Matrix is a Table of Numbers

# A 3×3 matrix (3 rows, 3 columns)
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print(A.shape)  # (3, 3)

Mathematical notation: An

m \times n

matrix has

m

rows and

n

columns:

A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}

Where

a_{ij}

is the element at row

i

, column

j

Matrix Operations: The Complete Toolkit

Matrix Addition

Add matrices of the same size element-by-element:

A + B = \begin{bmatrix}a_{11} & a_{12}\\a_{21} & a_{22}\end{bmatrix} + \begin{bmatrix}b_{11} & b_{12}\\b_{21} & b_{22}\end{bmatrix} = \begin{bmatrix}a_{11}+b_{11} & a_{12}+b_{12}\\a_{21}+b_{21} & a_{22}+b_{22}\end{bmatrix}

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

C = A + B
print(C)
# [[6, 8], [10, 12]]

Scalar Multiplication

Multiply every element by a number:

cA = c \begin{bmatrix}a_{11} & a_{12}\\a_{21} & a_{22}\end{bmatrix} = \begin{bmatrix}c \cdot a_{11} & c \cdot a_{12}\\c \cdot a_{21} & c \cdot a_{22}\end{bmatrix}

A = np.array([[1, 2], [3, 4]])
print(3 * A)
# [[3, 6], [9, 12]]

Matrix Transpose

Flip rows and columns (swap

a_{ij}

with

a_{ji}

A^T = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}^T = \begin{bmatrix}1 & 4\\2 & 5\\3 & 6\end{bmatrix}

A = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Original shape: {A.shape}")    # (2, 3)
print(f"Transposed shape: {A.T.shape}") # (3, 2)
print(A.T)
# [[1, 4], [2, 5], [3, 6]]

Key properties:

$(A^T)^T = A$ (transpose twice = original)
$(A + B)^T = A^T + B^T$
$(AB)^T = B^T A^T$ (note the reversed order!)

Matrix Multiplication

This is the most important operation! For

C = AB

C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj} = \text{(row } i \text{ of } A) \cdot \text{(column } j \text{ of } B)

The rule: (row) × (column) = one number, repeated for every position.

A = np.array([[1, 2], [3, 4]])  # 2×2
B = np.array([[5, 6], [7, 8]])  # 2×2

C = A @ B  # or np.matmul(A, B) or np.dot(A, B)
print(C)
# [[19, 22], [43, 50]]

# Let's verify C[0,0]:
# Row 0 of A: [1, 2]
# Col 0 of B: [5, 7]
# Dot product: 1×5 + 2×7 = 5 + 14 = 19 ✓

Worked example — step by step:

\begin{bmatrix}1 & 2\\3 & 4\end{bmatrix} \times \begin{bmatrix}5 & 6\\7 & 8\end{bmatrix} = \begin{bmatrix}(1)(5)+(2)(7) & (1)(6)+(2)(8)\\(3)(5)+(4)(7) & (3)(6)+(4)(8)\end{bmatrix} = \begin{bmatrix}19 & 22\\43 & 50\end{bmatrix}

Matrix multiplication is NOT commutative!

AB \neq BA

in general.

print(A @ B)  # [[19, 22], [43, 50]]
print(B @ A)  # [[23, 34], [31, 46]] - Different!

Dimension rule: For

AB

to work, columns of

A

must equal rows of

B

$(m \times n) \times (n \times p) = (m \times p)$
The inner dimensions must match!

A = np.array([[1, 2, 3], [4, 5, 6]])  # 2×3
B = np.array([[1], [2], [3]])          # 3×1

C = A @ B  # (2×3) × (3×1) = (2×1)
print(C.shape)  # (2, 1)

The Identity Matrix

The identity matrix

I

is like “1” for matrices — multiplying by it changes nothing:

I = \begin{bmatrix}1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1\end{bmatrix}, \quad AI = IA = A

I = np.eye(3)  # 3×3 identity matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(np.allclose(A @ I, A))  # True
print(np.allclose(I @ A, A))  # True

Matrix Inverse

The inverse

A^{-1}

“undoes” multiplication by

A

AA^{-1} = A^{-1}A = I

For a 2×2 matrix:

A = \begin{bmatrix}a & b\\c & d\end{bmatrix}, \quad A^{-1} = \frac{1}{ad-bc}\begin{bmatrix}d & -b\\-c & a\end{bmatrix}

The term

ad - bc

is called the determinant. If it’s zero, no inverse exists!

A = np.array([[4, 7], [2, 6]])

# Determinant
det = 4*6 - 7*2  # = 10

# Inverse
A_inv = np.linalg.inv(A)
print(A_inv)
# [[ 0.6, -0.7], [-0.2,  0.4]]

# Verify: A @ A_inv = I
print(np.allclose(A @ A_inv, np.eye(2)))  # True

Determinant

The determinant measures how much a matrix “scales” area (or volume): 2×2 determinant:

\det\begin{bmatrix}a & b\\c & d\end{bmatrix} = ad - bc

Properties:

$\det(I) = 1$
$\det(AB) = \det(A) \cdot \det(B)$
If $\det(A) = 0$ , the matrix is “singular” (no inverse)
$|\det(A)|$ = factor by which area is scaled

A = np.array([[2, 0], [0, 3]])
print(np.linalg.det(A))  # 6.0

# This matrix scales area by 6x (2× in x-direction, 3× in y-direction)

A Matrix is a Transformation Machine

Here’s the key insight: A matrix takes a vector in, and spits a different vector out.

Input Vector  →  [Matrix]  →  Output Vector
   (3 numbers)               (could be different size!)

Example: Our grade calculation

[92, 88, 85]  →  [0.40, 0.25, 0.35]  →  88.55
   3 numbers         weights           1 number

The matrix (our weights) transformed 3 scores into 1 grade.

Real-World Example: Photo Filters

Ever wonder how Instagram filters work? They’re matrix operations!

Brightness: Multiply Every Pixel

from PIL import Image
import numpy as np

# Load an image (each pixel is [R, G, B] from 0-255)
image = np.array(Image.open('photo.jpg'))
print(image.shape)  # e.g., (1080, 1920, 3) - height × width × RGB

# Increase brightness by 20%
brighter = image * 1.2
brighter = np.clip(brighter, 0, 255)  # Keep values in valid range

# That's scalar multiplication - a type of matrix operation!

Color Transformation: Matrix Multiplication

What if you want to make a photo look “warmer” (more red/yellow) or “cooler” (more blue)?

# A color transformation matrix
# Each row says how to calculate new [R, G, B] from old [R, G, B]
warm_filter = np.array([
    [1.2, 0.1, 0.0],   # New R = 1.2×R + 0.1×G + 0×B (boost red)
    [0.0, 1.1, 0.0],   # New G = 0×R + 1.1×G + 0×B (slight boost)
    [0.0, 0.0, 0.9],   # New B = 0×R + 0×G + 0.9×B (reduce blue)
])

# Apply to one pixel
original_pixel = np.array([100, 150, 200])  # Some blue-ish color
new_pixel = warm_filter @ original_pixel
print(f"Original: {original_pixel}")
print(f"Warmer: {new_pixel.astype(int)}")

# Output:
# Original: [100 150 200]
# Warmer: [135 165 180]  ← More red, less blue!

Grayscale: Average the Colors

# Grayscale transformation
# Human eyes are more sensitive to green, so we weight it higher
grayscale_weights = np.array([0.299, 0.587, 0.114])

pixel = np.array([100, 150, 200])
gray_value = np.dot(grayscale_weights, pixel)
print(f"Gray value: {gray_value:.0f}")  # 143

Every Instagram filter is a matrix transformation on your pixels!

Real-World Example: Predicting House Prices

Now let’s apply this to prediction — the core of machine learning.

The Setup

You have data about houses:

House	Bedrooms	Sqft	Age (years)	Price
A	3	2000	10	$350k
B	4	2500	5	$450k
C	2	1200	20	$250k

Question: Given a NEW house with 3 bedrooms, 1800 sqft, 15 years old — what’s the predicted price?

The Approach: Weighted Sum of Features

Just like grades! Each feature contributes to the price:

# New house features
new_house = np.array([3, 1800, 15])

# Weights (how much each feature matters)
# These would be "learned" from data in real ML
weights = np.array([
    50000,   # Each bedroom adds $50k
    150,     # Each sqft adds $150
    -3000,   # Each year of age subtracts $3k
])

# Prediction
predicted_price = np.dot(weights, new_house)
# = 50000×3 + 150×1800 + (-3000)×15
# = 150000 + 270000 - 45000
# = $375,000

print(f"Predicted price: ${predicted_price:,}")

Predicting Many Houses at Once

# Multiple houses
houses = np.array([
    [3, 2000, 10],   # House 1
    [4, 2500, 5],    # House 2
    [2, 1200, 20],   # House 3
    [3, 1800, 15],   # House 4 (our query)
])

# Predict ALL prices at once
prices = houses @ weights

print("Predicted prices:")
for i, price in enumerate(prices, 1):
    print(f"  House {i}: ${price:,}")

Output:

Predicted prices:
  House 1: $420,000
  House 2: $560,000
  House 3: $280,000
  House 4: $375,000

This is linear regression — and it’s just matrix multiplication!

The Connection to Machine Learning

Now you understand the core operation of ML. Let’s make the connection explicit.

What a Neural Network Layer Does

Every layer in a neural network does exactly what we just did:

# Simplified neural network layer
def neural_network_layer(input_vector, weights, bias):
    """
    This is literally what PyTorch/TensorFlow do!
    """
    output = np.dot(weights, input_vector) + bias
    return output

# Example: 4 input features → 2 output neurons
input_features = np.array([3, 2000, 15, 5])  # House features

# Weight matrix: 2 outputs × 4 inputs
weights = np.array([
    [50000, 150, -3000, -5000],    # Weights for output 1
    [30000, 100, -2000, -3000],    # Weights for output 2
])

biases = np.array([10000, 5000])

# Forward pass
output = neural_network_layer(input_features, weights, biases)
print(f"Layer output: {output}")
# Two output values, computed by two different weighted sums

The Pattern

Application	Input	Weights	Output
Grade calculation	[homework, midterm, final]	[0.4, 0.25, 0.35]	final_grade
House pricing	[beds, sqft, age]	[50k, 150, -3k]	price
Photo filter	[R, G, B]	3×3 color matrix	[R’, G’, B’]
Neural network	features	learned weights	predictions

The math is identical. Only the application changes.

final_grade = np.dot(weights, student)
# = (2.5×12) + (0.3×85) + (1.5×8) + (0.2×90)
# = 30 + 25.5 + 12 + 18
# = 85.5

print(f"Predicted final grade: {final_grade:.1f}")

Batch Prediction for Entire Class

# All students in class
students = np.array([
    [12, 85, 8, 90],   # Student A
    [8, 70, 6, 75],    # Student B
    [15, 95, 10, 95],  # Student C
])

# Predict all grades at once
grades = students @ weights
print(grades)  # [85.5, 63.5, 102.5]

Real application: Learning management systems use this to predict which students need help!

Example 3: Movie Rating Prediction

The Model

# Movie features
movie = np.array([
    0.9,    # action score (0-1)
    0.1,    # romance score
    0.3,    # comedy score
    8.5,    # IMDb rating
    2020    # release year
])

# User preferences (weights)
user_weights = np.array([
    5,      # Loves action (+5 points)
    -2,     # Dislikes romance (-2 points)
    3,      # Likes comedy (+3 points)
    0.5,    # Considers IMDb rating
    -0.001  # Slight preference for newer movies
])

# Predict user's rating
predicted_rating = np.dot(user_weights, movie)
# = (5×0.9) + (-2×0.1) + (3×0.3) + (0.5×8.5) + (-0.001×2020)
# = 4.5 - 0.2 + 0.9 + 4.25 - 2.02
# = 7.43

print(f"Predicted rating: {predicted_rating:.1f}/10")

# Movie database
movies = np.array([
    [0.9, 0.1, 0.3, 8.5, 2020],  # Action movie
    [0.2, 0.9, 0.1, 7.5, 2019],  # Romance movie
    [0.7, 0.2, 0.8, 8.0, 2021],  # Action-comedy
])

# Predict ratings for all movies
ratings = movies @ user_weights
print(ratings)  # [7.43, 2.73, 8.88]

# Recommend highest-rated movie
best_movie = np.argmax(ratings)
print(f"Recommended: Movie {best_movie}")  # Movie 2 (action-comedy)

Real application: Netflix uses matrices to predict your ratings for millions of movies!

Matrix Operations

1. Matrix Addition

When to use: Combining multiple models, updating parameters

# Two prediction models
model_1 = np.array([[1, 2], [3, 4]])
model_2 = np.array([[5, 6], [7, 8]])

# Ensemble: average the models
ensemble = (model_1 + model_2) / 2
print(ensemble)
# [[3, 4],
#  [5, 6]]

2. Scalar Multiplication

When to use: Scaling predictions, learning rates

# Original weights
weights = np.array([[50000, 120, -5000, -8000]])

# Scale down for regularization
scaled_weights = 0.9 * weights

3. Matrix Multiplication

The Most Important Operation! Rule: (m×n) matrix × (n×p) matrix = (m×p) matrix The inner dimensions must match!

# Example: Neural network layer
# Input: 3 features → Output: 2 neurons

# Weight matrix (2 neurons × 3 features)
W = np.array([
    [0.5, -0.3, 0.8],  # Weights for neuron 1
    [0.2, 0.6, -0.4]   # Weights for neuron 2
])

# Input vector (3 features)
x = np.array([1.0, 2.0, 3.0])

# Output (2 neurons)
y = W @ x
# Neuron 1: (0.5×1.0) + (-0.3×2.0) + (0.8×3.0) = 2.3
# Neuron 2: (0.2×1.0) + (0.6×2.0) + (-0.4×3.0) = 0.2

print(y)  # [2.3, 0.2]

Key Insight: Every neural network layer is just matrix multiplication!

Matrix Multiplication: Three Examples

Example 1: Multi-Output House Prediction

Predict multiple outputs from house features:

# House features
house = np.array([3, 2000, 15, 5])

# Weight matrix (3 outputs × 4 features)
W = np.array([
    [50000, 120, -5000, -8000],   # Price prediction
    [0.8, 0.0002, -0.01, -0.02],  # Desirability score (0-1)
    [2, 0.001, -0.5, -0.3]        # Market time (months)
])

# Predict all outputs
outputs = W @ house
print(f"Price: ${outputs[0]:,.0f}")
print(f"Desirability: {outputs[1]:.2f}")
print(f"Time to sell: {outputs[2]:.1f} months")

Output:

Price: $275,000
Desirability: 0.73
Time to sell: 2.8 months

Example 2: Student Performance Across Subjects

Predict grades in multiple subjects:

# Student features
student = np.array([12, 85, 8, 90])  # [study_hours, prev_score, assignments, attendance]

# Weight matrix (3 subjects × 4 features)
W = np.array([
    [2.5, 0.3, 1.5, 0.2],   # Math
    [2.0, 0.4, 1.0, 0.3],   # English
    [3.0, 0.2, 2.0, 0.1]    # Science
])

# Predict all subject grades
grades = W @ student
print(f"Math: {grades[0]:.1f}")
print(f"English: {grades[1]:.1f}")
print(f"Science: {grades[2]:.1f}")

Example 3: Multi-User Movie Recommendations

Predict ratings for multiple users:

# Movie features
movie = np.array([0.9, 0.1, 0.3, 8.5, 2020])

# User preference matrix (3 users × 5 features)
U = np.array([
    [5, -2, 3, 0.5, -0.001],    # User 1: loves action
    [-1, 5, 2, 0.3, -0.002],    # User 2: loves romance
    [3, 1, 4, 0.4, 0]           # User 3: balanced
])

# Predict ratings for all users
ratings = U @ movie
print(f"User 1 rating: {ratings[0]:.1f}")
print(f"User 2 rating: {ratings[1]:.1f}")
print(f"User 3 rating: {ratings[2]:.1f}")

Matrix Transpose

Definition: Flip rows and columns

A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

A_T = A.T
print(A_T)
# [[1, 4],
#  [2, 5],
#  [3, 6]]

Why it matters: Used in backpropagation!

# Forward pass
X = np.array([[1, 2, 3]])  # (1×3) input
W = np.array([[0.5], [0.3], [0.8]])  # (3×1) weights
y = X @ W  # (1×1) output

# Backward pass (gradient)
dy = np.array([[1.0]])  # Gradient from loss
dW = X.T @ dy  # (3×1) gradient for weights

Identity Matrix

Definition: Matrix that doesn’t change vectors

I\mathbf{v} = \mathbf{v}

I = np.eye(3)  # 3×3 identity
print(I)
# [[1, 0, 0],
#  [0, 1, 0],
#  [0, 0, 1]]

v = np.array([5, 10, 15])
result = I @ v
print(result)  # [5, 10, 15] (unchanged!)

Why it matters: Used in regularization, initialization, and matrix inversion.

Matrix Inverse

Definition: Matrix that “undoes” another matrix

A A^{-1} = I

A = np.array([
    [4, 7],
    [2, 6]
])

A_inv = np.linalg.inv(A)
print(A_inv)

# Verify
print(A @ A_inv)  # ≈ Identity matrix

Application: Solving linear equations

# Solve: Ax = b for x
A = np.array([[2, 1], [1, 3]])
b = np.array([5, 7])

# Solution: x = A^(-1) b
x = np.linalg.inv(A) @ b
print(x)  # [1, 3]

# Verify
print(A @ x)  # [5, 7] ✓

🎯 Practice Exercises & Real-World Applications

Challenge yourself! These exercises connect matrix operations to real applications you use every day.

Exercise 1: Instagram Color Filters 📸

Create custom photo filters using matrix transformations:

import numpy as np

# A pixel is represented as [R, G, B] from 0-255
sample_pixel = np.array([180, 120, 90])  # A warm skin tone

# TODO: Create these filter matrices and apply them
# 1. Sepia filter (vintage brown look)
# 2. Negative filter (invert colors)
# 3. Custom "sunset" filter (boost reds/oranges)

Tasks:

Design a 3×3 sepia transformation matrix
Apply it to the sample pixel
Create your own artistic filter

💡 Solution

import numpy as np

sample_pixel = np.array([180, 120, 90])

# 1. Sepia Filter Matrix (classic vintage look)
sepia = np.array([
    [0.393, 0.769, 0.189],  # New R
    [0.349, 0.686, 0.168],  # New G
    [0.272, 0.534, 0.131],  # New B
])

# 2. Negative Filter (invert)
negative = np.array([
    [-1, 0, 0],
    [0, -1, 0],
    [0, 0, -1],
])

# 3. Sunset Filter (warm, golden hour look)
sunset = np.array([
    [1.2, 0.1, 0.0],   # Boost red
    [0.1, 1.0, 0.0],   # Slight red bleed to green
    [0.0, 0.0, 0.7],   # Reduce blue
])

print("Original pixel:", sample_pixel)
print("-" * 40)

# Apply Sepia
sepia_result = np.clip(sepia @ sample_pixel, 0, 255).astype(int)
print(f"Sepia filter:   {sepia_result}")

# Apply Negative (add 255 to invert)
negative_result = np.clip(negative @ sample_pixel + 255, 0, 255).astype(int)
print(f"Negative:       {negative_result}")

# Apply Sunset
sunset_result = np.clip(sunset @ sample_pixel, 0, 255).astype(int)
print(f"Sunset filter:  {sunset_result}")

# Output:
# Original pixel: [180 120  90]
# Sepia filter:   [196 174 136]  ← Warm vintage brown
# Negative:       [ 75 135 165]  ← Inverted colors
# Sunset filter:  [228 138  63]  ← Golden warm tones

Real-World Insight: Instagram’s filters are exactly this - matrix multiplications applied to every pixel! The “Clarendon” filter boosts contrast, “Gingham” adds vintage fade.

Exercise 2: Multi-Store Inventory Management 🏪

A retail company has 3 stores and 4 products. Calculate total revenue using matrix multiplication:

# Inventory: rows = stores, cols = products
inventory = np.array([
    [50, 30, 100, 25],   # Store A
    [80, 45, 60, 40],    # Store B
    [35, 60, 80, 55],    # Store C
])

# Prices per product
prices = np.array([29.99, 49.99, 9.99, 79.99])

# Units sold (percentage of inventory)
sales_rate = np.array([
    [0.8, 0.6, 0.9, 0.5],   # Store A
    [0.7, 0.8, 0.7, 0.6],   # Store B
    [0.9, 0.5, 0.8, 0.7],   # Store C
])

Tasks:

Calculate units sold at each store (element-wise multiply inventory × sales_rate)
Calculate total revenue per store (matrix × price vector)
Which store had the highest revenue?

💡 Solution

import numpy as np

inventory = np.array([
    [50, 30, 100, 25],   # Store A
    [80, 45, 60, 40],    # Store B
    [35, 60, 80, 55],    # Store C
])

prices = np.array([29.99, 49.99, 9.99, 79.99])

sales_rate = np.array([
    [0.8, 0.6, 0.9, 0.5],
    [0.7, 0.8, 0.7, 0.6],
    [0.9, 0.5, 0.8, 0.7],
])

# 1. Units sold at each store (element-wise)
units_sold = inventory * sales_rate
print("Units Sold per Store:")
print(units_sold)

# 2. Revenue per store (matrix-vector multiplication)
revenue_per_store = units_sold @ prices
print("\n💰 Revenue per Store:")
stores = ['Store A', 'Store B', 'Store C']
for store, rev in zip(stores, revenue_per_store):
    print(f"  {store}: ${rev:,.2f}")

# 3. Best performing store
best_store = stores[np.argmax(revenue_per_store)]
print(f"\n🏆 Highest Revenue: {best_store} (${max(revenue_per_store):,.2f})")

# Output:
# Revenue per Store:
#   Store A: $3,288.35
#   Store B: $4,275.18  ← Winner!
#   Store C: $4,270.63

# Bonus: Revenue breakdown by product
print("\n📊 Revenue by Product (all stores):")
product_revenue = units_sold.sum(axis=0) * prices
products = ['Product 1', 'Product 2', 'Product 3', 'Product 4']
for prod, rev in zip(products, product_revenue):
    print(f"  {prod}: ${rev:,.2f}")

Real-World Insight: This is how Walmart, Target, and Amazon calculate daily revenue across thousands of stores and millions of products - all matrix operations!

Exercise 3: Simple Neural Network Forward Pass 🧠

Implement a tiny neural network using only matrix operations:

import numpy as np

# Input: 4 features (e.g., house: beds, baths, sqft/1000, age/10)
X = np.array([[3, 2, 1.5, 1.5]])  # 1 house

# Layer 1: 4 inputs → 3 neurons
W1 = np.array([
    [0.5, 0.3, 0.8, -0.2],
    [0.2, 0.6, 0.4, -0.1],
    [0.7, 0.1, 0.5, -0.3],
])
b1 = np.array([0.1, 0.2, 0.1])

# Layer 2: 3 neurons → 1 output (price)
W2 = np.array([[0.8, 0.5, 0.6]])
b2 = np.array([0.1])

Tasks:

Compute the output of Layer 1: h = X @ W1.T + b1
Apply ReLU activation: h = max(0, h)
Compute final output: y = h @ W2.T + b2
What’s the predicted price?

💡 Solution

import numpy as np

# Input
X = np.array([[3, 2, 1.5, 1.5]])

# Layer 1 weights and bias
W1 = np.array([
    [0.5, 0.3, 0.8, -0.2],
    [0.2, 0.6, 0.4, -0.1],
    [0.7, 0.1, 0.5, -0.3],
])
b1 = np.array([0.1, 0.2, 0.1])

# Layer 2 weights and bias
W2 = np.array([[0.8, 0.5, 0.6]])
b2 = np.array([0.1])

def relu(x):
    return np.maximum(0, x)

# Forward pass
print("🧠 Neural Network Forward Pass")
print("=" * 40)

# Layer 1
z1 = X @ W1.T + b1
print(f"Layer 1 pre-activation (z1): {z1}")

h1 = relu(z1)
print(f"Layer 1 post-ReLU (h1):      {h1}")

# Layer 2 (output)
z2 = h1 @ W2.T + b2
print(f"\nFinal output (z2): {z2}")

# Interpret as price (scale back)
predicted_price = z2[0, 0] * 100000  # Scale factor
print(f"\n🏠 Predicted House Price: ${predicted_price:,.0f}")

# Output:
# Layer 1 pre-activation: [[3.4  2.55 3.3 ]]
# Layer 1 post-ReLU:      [[3.4  2.55 3.3 ]]
# Final output: [[5.095]]
# Predicted House Price: $509,500

Real-World Insight: This is EXACTLY how PyTorch and TensorFlow work under the hood! Every deep learning model is just chains of matrix multiplications with non-linear activations.

Exercise 4: Cryptography - Hill Cipher 🔐

The Hill Cipher uses matrix multiplication to encrypt messages:

import numpy as np

# Encryption key (2x2 matrix)
key = np.array([
    [3, 3],
    [2, 5]
])

# Message: "HI" → Convert to numbers (A=0, B=1, ... Z=25)
# H=7, I=8
message = np.array([[7], [8]])

# Encrypt: ciphertext = key @ message (mod 26)

Tasks:

Encrypt the message “HI”
Find the decryption key (matrix inverse mod 26)
Decrypt back to the original message

💡 Solution

import numpy as np

def mod_inverse(a, m):
    """Find modular multiplicative inverse of a mod m"""
    for x in range(1, m):
        if (a * x) % m == 1:
            return x
    return None

# Encryption key
key = np.array([
    [3, 3],
    [2, 5]
])

# Message: "HI"
message = np.array([[7], [8]])  # H=7, I=8
print(f"Original message: HI → {message.flatten()}")

# 1. ENCRYPT
ciphertext = (key @ message) % 26
print(f"Encrypted: {ciphertext.flatten()}")

# Convert back to letters
cipher_letters = ''.join([chr(c + ord('A')) for c in ciphertext.flatten()])
print(f"Ciphertext letters: {cipher_letters}")

# 2. FIND DECRYPTION KEY
# Key inverse (mod 26) = (1/det) * adjugate (mod 26)
det = int(np.linalg.det(key)) % 26  # det = 3*5 - 3*2 = 9
det_inv = mod_inverse(det, 26)  # 9^(-1) mod 26 = 3

# Adjugate matrix
adjugate = np.array([
    [5, -3],
    [-2, 3]
])

# Decryption key
key_inv = (det_inv * adjugate) % 26
print(f"\nDecryption key:\n{key_inv}")

# 3. DECRYPT
decrypted = (key_inv @ ciphertext) % 26
print(f"\nDecrypted: {decrypted.flatten()}")

# Convert back to letters
original = ''.join([chr(int(c) + ord('A')) for c in decrypted.flatten()])
print(f"Original message: {original}")

# Output:
# Original message: HI → [7 8]
# Encrypted: [19 2]
# Ciphertext letters: TC
# Decryption key: [[15 17] [20  9]]
# Decrypted: [[7] [8]]
# Original message: HI ✓

Real-World Insight: While Hill Cipher is breakable, modern encryption (RSA, AES) uses similar matrix operations in much larger spaces. Your HTTPS connection uses these principles!

Key Takeaways

✅ Matrices are functions that transform vectors
✅ Matrix multiplication = applying transformations
✅ Neural networks = stacked matrix multiplications
✅ Batch processing = process many inputs at once
✅ Transpose = used in backpropagation
✅ Inverse = solving equations, undoing transformations

GPU Computing: Why Matrices Matter for Speed

Why GPUs are 100x faster: GPUs have thousands of cores designed for parallel matrix operations.

import numpy as np
import time

# CPU: Sequential operations
n = 5000
A = np.random.randn(n, n)
B = np.random.randn(n, n)

start = time.time()
C = A @ B  # Matrix multiplication
cpu_time = time.time() - start

print(f"CPU: {cpu_time:.2f}s for {n}x{n} matrix multiply")
# CPU: ~2-5 seconds

# On GPU (with PyTorch/CUDA):
# import torch
# A_gpu = torch.randn(n, n, device='cuda')
# B_gpu = torch.randn(n, n, device='cuda')
# C_gpu = A_gpu @ B_gpu  # ~0.02 seconds!

Why this matters:

GPT-4 training: 25,000 GPUs × months
Each forward pass: trillions of matrix operations
Without GPU optimization: would take centuries!

Interview Questions: Matrices

Explain how a neural network layer works mathematically

Answer: A neural network layer computes:

\mathbf{h} = \sigma(W\mathbf{x} + \mathbf{b})

Where:

$W$ is the weight matrix (learned parameters)
$\mathbf{x}$ is the input vector
$\mathbf{b}$ is the bias vector
$\sigma$ is the activation function (ReLU, sigmoid, etc.)

Matrix multiplication

W\mathbf{x}

computes weighted sums of inputs. The bias shifts the result. The activation adds non-linearity, enabling the network to learn complex patterns.

Why is batch processing important in deep learning?

Answer: Batch processing (processing multiple samples simultaneously) is crucial because:

GPU efficiency: GPUs parallelize matrix operations; single samples waste capacity
Gradient stability: Averaging gradients over a batch reduces noise
Memory efficiency: One matrix multiply instead of N vector operations
Modern batch sizes: 32-8192 depending on task and memory

What's the computational complexity of matrix multiplication?

Answer: For two

n \times n

matrices:

Naive algorithm: $O(n^3)$ - each of $n^2$ outputs requires $n$ multiplications
Strassen’s algorithm: $O(n^{2.807})$ - faster but less numerically stable
Best known: $O(n^{2.373})$ - theoretical, not practical

In practice, optimized libraries (BLAS, cuBLAS) use cache-aware algorithms that approach theoretical limits.

What’s Next?

You now understand how matrices transform data. But which transformations are most important? Which directions in your data carry the most information? That’s where eigenvalues and eigenvectors come in - they reveal the “natural axes” of your data!

Next: Eigenvalues & Eigenvectors

Discover which house features matter most for price prediction

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Matrices & Linear Transformations

​A Problem You Already Understand: Calculating Final Grades

​What You Just Did (Mathematically)

​Scaling Up: All Students at Once

​What Is a Matrix?

​A Matrix is a Table of Numbers

​Matrix Operations: The Complete Toolkit

​Matrix Addition

​Scalar Multiplication

​Matrix Transpose

​Matrix Multiplication

​The Identity Matrix

​Matrix Inverse

​Determinant

​A Matrix is a Transformation Machine

​Real-World Example: Photo Filters

​Brightness: Multiply Every Pixel

​Color Transformation: Matrix Multiplication

​Grayscale: Average the Colors

​Real-World Example: Predicting House Prices

​The Setup

​The Approach: Weighted Sum of Features

​Predicting Many Houses at Once

​The Connection to Machine Learning

​What a Neural Network Layer Does

​The Pattern

​Batch Prediction for Entire Class

​Example 3: Movie Rating Prediction

​The Model

​Recommend Movies to User

​Matrix Operations

​1. Matrix Addition

​2. Scalar Multiplication

​3. Matrix Multiplication

​Matrix Multiplication: Three Examples

​Example 1: Multi-Output House Prediction

​Example 2: Student Performance Across Subjects

​Example 3: Multi-User Movie Recommendations

Matrices & Linear Transformations

A Problem You Already Understand: Calculating Final Grades

What You Just Did (Mathematically)

Scaling Up: All Students at Once

What Is a Matrix?

A Matrix is a Table of Numbers

Matrix Operations: The Complete Toolkit

Matrix Addition

Scalar Multiplication

Matrix Transpose

Matrix Multiplication

The Identity Matrix

Matrix Inverse

Determinant

A Matrix is a Transformation Machine

Real-World Example: Photo Filters

Brightness: Multiply Every Pixel

Color Transformation: Matrix Multiplication

Grayscale: Average the Colors

Real-World Example: Predicting House Prices

The Setup

The Approach: Weighted Sum of Features

Predicting Many Houses at Once

The Connection to Machine Learning

What a Neural Network Layer Does

The Pattern

Batch Prediction for Entire Class

Example 3: Movie Rating Prediction

The Model

Recommend Movies to User

Matrix Operations

1. Matrix Addition

2. Scalar Multiplication

3. Matrix Multiplication

Matrix Multiplication: Three Examples

Example 1: Multi-Output House Prediction

Example 2: Student Performance Across Subjects

Example 3: Multi-User Movie Recommendations

Matrix Transpose

Identity Matrix

Matrix Inverse

🎯 Practice Exercises & Real-World Applications

Exercise 1: Instagram Color Filters 📸

Exercise 2: Multi-Store Inventory Management 🏪

Exercise 3: Simple Neural Network Forward Pass 🧠

Exercise 4: Cryptography - Hill Cipher 🔐

Key Takeaways

GPU Computing: Why Matrices Matter for Speed

Interview Questions: Matrices

What’s Next?