Skip to main content

The Prediction Game

ML Prediction Concept - Input to Output

Starting With Something You Already Know

Forget Python. Forget libraries. Forget math notation. Let’s play a game.

Round 1: The House Price Game

You’re a real estate agent. A client asks: “How much is this house worth?” They give you some info:
FeatureValue
Bedrooms3
Bathrooms2
Square Feet1,500
Age (years)10
Has PoolNo
What’s your guess?

Your Brain’s Algorithm

Without realizing it, you do this:
  1. Think of similar houses you’ve seen
  2. Remember what they sold for
  3. Adjust based on differences
  4. Make a guess
That’s machine learning. You learned patterns from past data and applied them to new data.
Real World ML - Email Spam Filtering

Round 2: Let’s Be More Systematic

What if I told you the average house in your area sells for:
  • Base price: $200,000
  • Each bedroom adds about $25,000
  • Each bathroom adds about $15,000
  • Each square foot adds about $150
Now you can compute:
Base:                           $200,000
+ 3 bedrooms × $25,000:         + $75,000
+ 2 bathrooms × $15,000:        + $30,000
+ 1,500 sq ft × $150:          + $225,000
                               ----------
Predicted price:                $530,000
You just built your first linear model!
The formula you just used:price = base + (bedrooms × weight1) + (bathrooms × weight2) + (sqft × weight3)Those “weights” (25k,25k, 15k, $150) are what machine learning learns automatically from data.

Let’s Code It (Still No Libraries!)

# Your first "model" - just a function!
def predict_house_price(bedrooms, bathrooms, sqft):
    base = 200000
    bedroom_value = 25000
    bathroom_value = 15000
    sqft_value = 150
    
    predicted = (
        base + 
        bedrooms * bedroom_value + 
        bathrooms * bathroom_value + 
        sqft * sqft_value
    )
    return predicted

# Test it
house1 = predict_house_price(3, 2, 1500)
print(f"House 1 predicted: ${house1:,}")  # $530,000

house2 = predict_house_price(4, 3, 2200)
print(f"House 2 predicted: ${house2:,}")  # $675,000

The Million Dollar Question

But wait… how did we know those weights?
  • Why 25,000perbedroomandnot25,000 per bedroom and not 30,000?
  • Why 150persqftandnot150 per sq ft and not 200?
We guessed. And our guesses might be wrong. Machine learning answers this: Given a bunch of houses with known prices, can we figure out the best weights automatically?

Real Data, Real Problem

Here’s actual data (simplified):
# Past house sales (our "training data")
houses = [
    # [bedrooms, bathrooms, sqft] -> actual_price
    {"features": [2, 1, 1000], "price": 250000},
    {"features": [3, 2, 1500], "price": 380000},
    {"features": [4, 2, 1800], "price": 450000},
    {"features": [3, 3, 2000], "price": 520000},
    {"features": [5, 4, 3000], "price": 750000},
]
Our goal: Find weights that make our predictions match these actual prices as closely as possible.

Step 1: How Wrong Are We?

If we use our guessed weights, let’s see how we do:
def predict_house_price(features):
    bedrooms, bathrooms, sqft = features
    base = 200000
    return base + bedrooms * 25000 + bathrooms * 15000 + sqft * 150

# Check each house
for house in houses:
    predicted = predict_house_price(house["features"])
    actual = house["price"]
    error = predicted - actual
    print(f"Predicted: ${predicted:,}, Actual: ${actual:,}, Error: ${error:,}")
Output:
Predicted: $430,000, Actual: $250,000, Error: $180,000  (too high!)
Predicted: $530,000, Actual: $380,000, Error: $150,000  (too high!)
Predicted: $595,000, Actual: $450,000, Error: $145,000  (too high!)
Predicted: $620,000, Actual: $520,000, Error: $100,000  (too high!)
Predicted: $825,000, Actual: $750,000, Error: $75,000   (too high!)
We’re consistently too high! Our weights are off.

Step 2: Measure Total “Wrongness”

We need a single number that tells us how wrong we are overall. Simple approach: Sum of all errors
total_error = 0
for house in houses:
    predicted = predict_house_price(house["features"])
    actual = house["price"]
    error = predicted - actual
    total_error += error

print(f"Total error: ${total_error:,}")  # $650,000 too high overall
Problem: What if some errors are positive and some negative? They cancel out! Better approach: Sum of squared errors
total_squared_error = 0
for house in houses:
    predicted = predict_house_price(house["features"])
    actual = house["price"]
    error = predicted - actual
    total_squared_error += error ** 2

print(f"Total squared error: {total_squared_error:,.0f}")
This is called the Loss Function or Cost Function. Lower is better!
Why squared?
  1. No negative numbers (errors can’t cancel out)
  2. Big errors get penalized more than small errors
  3. It has nice mathematical properties (smooth, differentiable)

Step 3: Try Different Weights

What if we try different values?
def calculate_total_error(base, bed_weight, bath_weight, sqft_weight):
    total_squared_error = 0
    for house in houses:
        bedrooms, bathrooms, sqft = house["features"]
        predicted = base + bedrooms * bed_weight + bathrooms * bath_weight + sqft * sqft_weight
        actual = house["price"]
        error = predicted - actual
        total_squared_error += error ** 2
    return total_squared_error

# Our original guess
error1 = calculate_total_error(200000, 25000, 15000, 150)
print(f"Original weights error: {error1:,.0f}")

# Try lowering everything
error2 = calculate_total_error(100000, 20000, 10000, 100)
print(f"Lower weights error: {error2:,.0f}")

# Try something else
error3 = calculate_total_error(50000, 15000, 25000, 175)
print(f"Alternative weights error: {error3:,.0f}")
The challenge: There are infinite combinations of weights. How do we find the best ones?
What if we:
  1. Start with random weights
  2. Check how wrong we are
  3. Slightly adjust weights
  4. If error goes down, keep the change
  5. Repeat until error stops improving
This is the core idea behind Gradient Descent - which we’ll explore in the next module!
# A simple (but slow) approach: try lots of combinations
best_error = float('inf')
best_weights = None

import random

for _ in range(10000):  # Try 10,000 random combinations
    base = random.randint(0, 200000)
    bed = random.randint(5000, 50000)
    bath = random.randint(5000, 50000)
    sqft = random.randint(50, 300)
    
    error = calculate_total_error(base, bed, bath, sqft)
    
    if error < best_error:
        best_error = error
        best_weights = (base, bed, bath, sqft)

print(f"Best weights found: {best_weights}")
print(f"Best error: {best_error:,.0f}")

What You Just Learned

Let’s recap with proper ML terminology:
What You DidML Term
Used past house salesTraining Data
Features like bedrooms, sqftInput Features (X)
The actual priceTarget/Label (y)
The weights (25k,25k, 15k, etc.)Model Parameters
The prediction formulaModel
How wrong our predictions wereLoss/Error
Sum of squared errorsLoss Function
Trying to minimize errorTraining/Optimization

The Mathematical Connection

When you calculated:
price = base + (bedrooms × weight1) + (bathrooms × weight2) + (sqft × weight3)
In math notation, this is: y^=w0+w1x1+w2x2+w3x3\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_3 Or in matrix form (from our Linear Algebra course): y^=wx\hat{y} = \mathbf{w} \cdot \mathbf{x} This is a dot product - the same operation you do when calculating weighted grades!

🚀 Mini Projects

Project 1

Build a house price estimator from scratch

Project 2

Create a used car valuation tool

Project 3

Visualize prediction errors and find patterns

Key Takeaways

ML is Pattern Matching

Find patterns in past data, apply to new data

Weights Capture Knowledge

The learned weights encode what matters

Loss Measures Wrongness

Lower loss = better predictions

Training = Minimizing Loss

Find weights that make predictions best match reality

Practice Challenge

Try this on your own:
# New dataset: Car prices
cars = [
    # [age_years, mileage_k, horsepower] -> price
    {"features": [2, 15, 200], "price": 35000},
    {"features": [5, 50, 180], "price": 22000},
    {"features": [1, 8, 250], "price": 45000},
    {"features": [8, 100, 150], "price": 12000},
    {"features": [3, 30, 220], "price": 32000},
]

# Your task:
# 1. Create a predict_car_price function with guessed weights
# 2. Calculate total squared error
# 3. Try different weights and find better ones
# 4. What patterns do you notice? (age and mileage should be negative!)
Key insight: Unlike houses where more is usually better, for cars:
  • Older cars are worth less (negative weight for age)
  • Higher mileage is worth less (negative weight for mileage)
  • More horsepower is worth more (positive weight)
Try something like:
price = 50000 - (age * 3000) - (mileage * 200) + (horsepower * 100)

Next Up

In the next module, we’ll learn:
  • How to systematically find the best weights (not just random guessing)
  • The key insight of gradient descent - following the slope downhill
  • How this connects to calculus

Continue to Module 2: Learning From Mistakes

Discover gradient descent - the algorithm that powers all modern ML

🔗 Math → ML Connection

What you learned in this module connects to formal ML:
Concept in This ModuleFormal ML TermWhere It’s Used
Guessing weightsModel parametersEvery ML model has parameters to learn
Formula: price = base + weight × featureLinear modelNeural network layers, linear regression
Measuring “wrongness”Loss functionTraining any model (MSE, cross-entropy, etc.)
Finding better weightsOptimizationGradient descent, Adam, SGD
Past data with answersTraining dataSupervised learning
Next module: We’ll replace “random guessing” with a systematic approach called gradient descent - the same algorithm that trains ChatGPT!

🚀 Going Deeper (Optional)

For learners who want the formal treatment:

Matrix Formulation

What we wrote as:
price = base + w1×bedrooms + w2×bathrooms + w3×sqft
Can be written in matrix form as: y^=Xw\hat{y} = X \mathbf{w}Where:
  • XX is the feature matrix (each row is a house, each column is a feature)
  • w\mathbf{w} is the weight vector
  • y^\hat{y} is the prediction vector

Why Squared Error?

We use squared error (not absolute error) because:
  1. It’s differentiable - we can compute gradients (needed for Module 2)
  2. It penalizes large errors more - a 100Kerrorisworsethantwo100K error is worse than two 50K errors
  3. It leads to closed-form solutions in linear regression

Closed-Form Solution

For linear regression, there’s actually a formula that gives optimal weights directly: w=(XTX)1XTy\mathbf{w}^* = (X^T X)^{-1} X^T yWe’ll derive this in Linear Regression module.