A Problem You Already Understand: Calculating Final Grades
You’re a teacher with a spreadsheet of student data:
Student
Homework (40%)
Midterm (25%)
Final (35%)
Alice
92
88
85
Bob
78
82
90
Carol
95
90
92
Question: What’s each student’s final grade?You already know how to do this in Excel:
Copy
=0.40*B2 + 0.25*C2 + 0.35*D2
For Alice: 0.40×92 + 0.25×88 + 0.35×85 = 36.8 + 22 + 29.75 = 88.55Congratulations — you just did matrix multiplication!
Estimated Time: 4-5 hours Difficulty: Beginner to Intermediate Prerequisites: Vectors module What You’ll Build: Grade calculator, photo filter app, and a simple prediction model
Mathematical notation: An m×n matrix has m rows and n columns:A=a11a21⋮am1a12a22⋮am2⋯⋯⋱⋯a1na2n⋮amnWhere aij is the element at row i, column j.
This is the most important operation! For C=AB:Cij=k=1∑nAikBkj=(row i of A)⋅(column j of B)The rule: (row) × (column) = one number, repeated for every position.
Copy
A = np.array([[1, 2], [3, 4]]) # 2×2B = np.array([[5, 6], [7, 8]]) # 2×2C = A @ B # or np.matmul(A, B) or np.dot(A, B)print(C)# [[19, 22], [43, 50]]# Let's verify C[0,0]:# Row 0 of A: [1, 2]# Col 0 of B: [5, 7]# Dot product: 1×5 + 2×7 = 5 + 14 = 19 ✓
Worked example — step by step:[1324]×[5768]=[(1)(5)+(2)(7)(3)(5)+(4)(7)(1)(6)+(2)(8)(3)(6)+(4)(8)]=[19432250]
Matrix multiplication is NOT commutative!AB=BA in general.
Copy
print(A @ B) # [[19, 22], [43, 50]]print(B @ A) # [[23, 34], [31, 46]] - Different!
Dimension rule: For AB to work, columns of A must equal rows of B:
(m×n)×(n×p)=(m×p)
The inner dimensions must match!
Copy
A = np.array([[1, 2, 3], [4, 5, 6]]) # 2×3B = np.array([[1], [2], [3]]) # 3×1C = A @ B # (2×3) × (3×1) = (2×1)print(C.shape) # (2, 1)
The inverse A−1 “undoes” multiplication by A:AA−1=A−1A=IFor a 2×2 matrix:A=[acbd],A−1=ad−bc1[d−c−ba]The term ad−bc is called the determinant. If it’s zero, no inverse exists!
# Grayscale transformation# Human eyes are more sensitive to green, so we weight it highergrayscale_weights = np.array([0.299, 0.587, 0.114])pixel = np.array([100, 150, 200])gray_value = np.dot(grayscale_weights, pixel)print(f"Gray value: {gray_value:.0f}") # 143
Every Instagram filter is a matrix transformation on your pixels!
Just like grades! Each feature contributes to the price:
Copy
# New house featuresnew_house = np.array([3, 1800, 15])# Weights (how much each feature matters)# These would be "learned" from data in real MLweights = np.array([ 50000, # Each bedroom adds $50k 150, # Each sqft adds $150 -3000, # Each year of age subtracts $3k])# Predictionpredicted_price = np.dot(weights, new_house)# = 50000×3 + 150×1800 + (-3000)×15# = 150000 + 270000 - 45000# = $375,000print(f"Predicted price: ${predicted_price:,}")
Real-World Insight: Instagram’s filters are exactly this - matrix multiplications applied to every pixel! The “Clarendon” filter boosts contrast, “Gingham” adds vintage fade.
A retail company has 3 stores and 4 products. Calculate total revenue using matrix multiplication:
Copy
# Inventory: rows = stores, cols = productsinventory = np.array([ [50, 30, 100, 25], # Store A [80, 45, 60, 40], # Store B [35, 60, 80, 55], # Store C])# Prices per productprices = np.array([29.99, 49.99, 9.99, 79.99])# Units sold (percentage of inventory)sales_rate = np.array([ [0.8, 0.6, 0.9, 0.5], # Store A [0.7, 0.8, 0.7, 0.6], # Store B [0.9, 0.5, 0.8, 0.7], # Store C])
Tasks:
Calculate units sold at each store (element-wise multiply inventory × sales_rate)
Calculate total revenue per store (matrix × price vector)
Which store had the highest revenue?
💡 Solution
Copy
import numpy as npinventory = np.array([ [50, 30, 100, 25], # Store A [80, 45, 60, 40], # Store B [35, 60, 80, 55], # Store C])prices = np.array([29.99, 49.99, 9.99, 79.99])sales_rate = np.array([ [0.8, 0.6, 0.9, 0.5], [0.7, 0.8, 0.7, 0.6], [0.9, 0.5, 0.8, 0.7],])# 1. Units sold at each store (element-wise)units_sold = inventory * sales_rateprint("Units Sold per Store:")print(units_sold)# 2. Revenue per store (matrix-vector multiplication)revenue_per_store = units_sold @ pricesprint("\n💰 Revenue per Store:")stores = ['Store A', 'Store B', 'Store C']for store, rev in zip(stores, revenue_per_store): print(f" {store}: ${rev:,.2f}")# 3. Best performing storebest_store = stores[np.argmax(revenue_per_store)]print(f"\n🏆 Highest Revenue: {best_store} (${max(revenue_per_store):,.2f})")# Output:# Revenue per Store:# Store A: $3,288.35# Store B: $4,275.18 ← Winner!# Store C: $4,270.63# Bonus: Revenue breakdown by productprint("\n📊 Revenue by Product (all stores):")product_revenue = units_sold.sum(axis=0) * pricesproducts = ['Product 1', 'Product 2', 'Product 3', 'Product 4']for prod, rev in zip(products, product_revenue): print(f" {prod}: ${rev:,.2f}")
Real-World Insight: This is how Walmart, Target, and Amazon calculate daily revenue across thousands of stores and millions of products - all matrix operations!
Real-World Insight: This is EXACTLY how PyTorch and TensorFlow work under the hood! Every deep learning model is just chains of matrix multiplications with non-linear activations.
import numpy as npdef mod_inverse(a, m): """Find modular multiplicative inverse of a mod m""" for x in range(1, m): if (a * x) % m == 1: return x return None# Encryption keykey = np.array([ [3, 3], [2, 5]])# Message: "HI"message = np.array([[7], [8]]) # H=7, I=8print(f"Original message: HI → {message.flatten()}")# 1. ENCRYPTciphertext = (key @ message) % 26print(f"Encrypted: {ciphertext.flatten()}")# Convert back to letterscipher_letters = ''.join([chr(c + ord('A')) for c in ciphertext.flatten()])print(f"Ciphertext letters: {cipher_letters}")# 2. FIND DECRYPTION KEY# Key inverse (mod 26) = (1/det) * adjugate (mod 26)det = int(np.linalg.det(key)) % 26 # det = 3*5 - 3*2 = 9det_inv = mod_inverse(det, 26) # 9^(-1) mod 26 = 3# Adjugate matrixadjugate = np.array([ [5, -3], [-2, 3]])# Decryption keykey_inv = (det_inv * adjugate) % 26print(f"\nDecryption key:\n{key_inv}")# 3. DECRYPTdecrypted = (key_inv @ ciphertext) % 26print(f"\nDecrypted: {decrypted.flatten()}")# Convert back to lettersoriginal = ''.join([chr(int(c) + ord('A')) for c in decrypted.flatten()])print(f"Original message: {original}")# Output:# Original message: HI → [7 8]# Encrypted: [19 2]# Ciphertext letters: TC# Decryption key: [[15 17] [20 9]]# Decrypted: [[7] [8]]# Original message: HI ✓
Real-World Insight: While Hill Cipher is breakable, modern encryption (RSA, AES) uses similar matrix operations in much larger spaces. Your HTTPS connection uses these principles!
Explain how a neural network layer works mathematically
Answer: A neural network layer computes:
h=σ(Wx+b)Where:
W is the weight matrix (learned parameters)
x is the input vector
b is the bias vector
σ is the activation function (ReLU, sigmoid, etc.)
Matrix multiplication Wx computes weighted sums of inputs. The bias shifts the result. The activation adds non-linearity, enabling the network to learn complex patterns.
Why is batch processing important in deep learning?
Answer: Batch processing (processing multiple samples simultaneously) is crucial because:
GPU efficiency: GPUs parallelize matrix operations; single samples waste capacity
Gradient stability: Averaging gradients over a batch reduces noise
Memory efficiency: One matrix multiply instead of N vector operations
Modern batch sizes: 32-8192 depending on task and memory
What's the computational complexity of matrix multiplication?
Answer: For two n×n matrices:
Naive algorithm: O(n3) - each of n2 outputs requires n multiplications
Strassen’s algorithm: O(n2.807) - faster but less numerically stable
Best known: O(n2.373) - theoretical, not practical
In practice, optimized libraries (BLAS, cuBLAS) use cache-aware algorithms that approach theoretical limits.
You now understand how matrices transform data. But which transformations are most important? Which directions in your data carry the most information?That’s where eigenvalues and eigenvectors come in - they reveal the “natural axes” of your data!