Skip to main content
Probability Foundations

Probability Foundations: How Likely Is This?

The Doctor’s Dilemma

Dr. Sarah runs a routine test on a patient. The test comes back positive for a rare disease that affects 1 in 1000 people. The test is 99% accurate:
  • If you HAVE the disease, it correctly says “positive” 99% of the time
  • If you DON’T have it, it correctly says “negative” 99% of the time
Question: What’s the probability this patient actually has the disease? Most people (and many doctors!) say “99%”. The real answer? About 9%. Surprised? This is why understanding probability properly can literally save lives - and it’s the foundation of all machine learning.
Real Talk: Probability is one of the most misunderstood topics. Our intuition is terrible at it. By the end of this module, you’ll understand why our intuition fails and how to calculate correctly.
Estimated Time: 3-4 hours
Difficulty: Beginner
Prerequisites: Basic Python, Module 1 (Describing Data)
What You’ll Build: A spam email detector using Bayes’ theorem

What Is Probability?

At its core, probability answers: “How likely is something to happen?”

The Coin Flip

import random

def flip_coin():
    return random.choice(['Heads', 'Tails'])

# Flip 1000 times
results = [flip_coin() for _ in range(1000)]
heads_count = results.count('Heads')

print(f"Heads: {heads_count}/1000 = {heads_count/1000:.2%}")
Output (varies):
Heads: 497/1000 = 49.70%
The probability of heads is 0.5 (or 50%). This means:
  • Over many flips, about half will be heads
  • Any single flip is unpredictable
  • Probability is about long-run frequency
Probability Formula Visualization
The Formula: P(A)=Number of ways A can happenTotal number of possible outcomesP(A) = \frac{\text{Number of ways A can happen}}{\text{Total number of possible outcomes}}

Basic Probability Rules

Rule 1: Probability is Between 0 and 1

0P(A)10 \leq P(A) \leq 1
  • P(A) = 0: Impossible (rolling a 7 on a standard die)
  • P(A) = 1: Certain (rolling 1-6 on a standard die)
  • P(A) = 0.5: Equally likely to happen or not
# Die roll probabilities
die_outcomes = [1, 2, 3, 4, 5, 6]

p_roll_7 = 0 / 6  # Impossible
p_roll_any = 6 / 6  # Certain
p_roll_even = 3 / 6  # 2, 4, or 6

print(f"P(roll 7): {p_roll_7}")
print(f"P(roll 1-6): {p_roll_any}")
print(f"P(roll even): {p_roll_even}")

Rule 2: Complement Rule

The probability something doesn’t happen is 1 minus the probability it does. P(not A)=1P(A)P(\text{not } A) = 1 - P(A)
Complement Rule - Rain Example
# Weather example
p_rain = 0.30  # 30% chance of rain

p_no_rain = 1 - p_rain
print(f"P(no rain): {p_no_rain:.0%}")  # 70%

# Useful for "at least one" problems
p_single_die_not_six = 5/6
p_three_dice_at_least_one_six = 1 - (5/6)**3
print(f"P(at least one 6 in 3 rolls): {p_three_dice_at_least_one_six:.2%}")

Rule 3: Addition Rule

For mutually exclusive events (can’t happen together): P(A or B)=P(A)+P(B)P(A \text{ or } B) = P(A) + P(B)
# Drawing a card
p_ace = 4/52
p_king = 4/52

# P(ace OR king) - can't be both!
p_ace_or_king = p_ace + p_king
print(f"P(ace or king): {p_ace_or_king:.2%}")  # 15.38%
For non-mutually exclusive events (can happen together): P(A or B)=P(A)+P(B)P(A and B)P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)
# P(ace OR spade)
p_ace = 4/52
p_spade = 13/52
p_ace_and_spade = 1/52  # Ace of spades!

p_ace_or_spade = p_ace + p_spade - p_ace_and_spade
print(f"P(ace or spade): {p_ace_or_spade:.2%}")  # 30.77%

Rule 4: Multiplication Rule for Independent Events

Events are independent if one doesn’t affect the other. P(A and B)=P(A)×P(B)P(A \text{ and } B) = P(A) \times P(B)
# Flipping a coin and rolling a die
p_heads = 1/2
p_six = 1/6

p_heads_and_six = p_heads * p_six
print(f"P(heads AND six): {p_heads_and_six:.2%}")  # 8.33%

# Password cracking: 4-digit PIN
digits = 10  # 0-9
combinations = digits ** 4  # 10,000
p_guess_correct = 1 / combinations
print(f"P(guess 4-digit PIN): {p_guess_correct:.4%}")  # 0.01%

Conditional Probability: The Game Changer

Here’s where it gets interesting. Conditional probability answers: “What’s the probability of A, GIVEN that B already happened?” Notation: P(AB)P(A|B) reads as “probability of A given B”

The Job Interview Example

You’re applying for jobs. Here’s some data:
# 100 applicants total
data = {
    'has_experience': {'hired': 30, 'not_hired': 20},  # 50 total
    'no_experience': {'hired': 10, 'not_hired': 40}    # 50 total
}

# Overall probability of getting hired
total_hired = 30 + 10
total_applicants = 100
p_hired = total_hired / total_applicants
print(f"P(hired): {p_hired:.0%}")  # 40%

# BUT... what if you have experience?
experienced_hired = 30
total_experienced = 50
p_hired_given_experience = experienced_hired / total_experienced
print(f"P(hired | experience): {p_hired_given_experience:.0%}")  # 60%

# What if you don't have experience?
no_exp_hired = 10
total_no_exp = 50
p_hired_given_no_experience = no_exp_hired / total_no_exp
print(f"P(hired | no experience): {p_hired_given_no_experience:.0%}")  # 20%
Key Insight: The “given” information changes everything! The Formula: P(AB)=P(A and B)P(B)P(A|B) = \frac{P(A \text{ and } B)}{P(B)}
# Verify with formula
p_hired_and_experienced = 30 / 100  # 0.30
p_experienced = 50 / 100  # 0.50

p_hired_given_exp_formula = p_hired_and_experienced / p_experienced
print(f"P(hired | exp) via formula: {p_hired_given_exp_formula:.0%}")  # 60% ✓

Bayes’ Theorem: The Most Important Formula

Now we can solve that medical test problem from the beginning. Bayes’ Theorem lets you flip conditional probabilities: P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} In words: The probability of A given B equals the probability of B given A, times the probability of A, divided by the probability of B.

Solving the Medical Test Problem

Let’s set up what we know:
  • P(Disease) = 0.001 (1 in 1000 people have it)
  • P(Positive | Disease) = 0.99 (test catches 99% of sick people)
  • P(Positive | No Disease) = 0.01 (1% false positive rate)
We want: P(Disease | Positive) - given a positive test, what’s the chance they’re actually sick?
# Known probabilities
p_disease = 0.001  # Prior: base rate of disease
p_positive_given_disease = 0.99  # Sensitivity
p_positive_given_no_disease = 0.01  # False positive rate

# Calculate P(Positive) using law of total probability
p_no_disease = 1 - p_disease
p_positive = (p_positive_given_disease * p_disease + 
              p_positive_given_no_disease * p_no_disease)

print(f"P(Positive test): {p_positive:.4f}")  # 0.011

# Apply Bayes' theorem
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive

print(f"P(Disease | Positive): {p_disease_given_positive:.2%}")
Output:
P(Positive test): 0.0109
P(Disease | Positive): 9.02%
Only 9% chance of actually having the disease, despite a “99% accurate” test!
Bayes Theorem - Medical Test Visualization

Why Is This So Low?

Let’s think about 10,000 people:
population = 10000

# How many actually have disease?
actually_sick = int(population * 0.001)  # 10 people
actually_healthy = population - actually_sick  # 9990 people

# Of the sick, how many test positive?
sick_positive = int(actually_sick * 0.99)  # ~10 true positives

# Of the healthy, how many test positive (false positives)?
healthy_positive = int(actually_healthy * 0.01)  # ~100 false positives!

# Total positives
total_positive = sick_positive + healthy_positive

print(f"Actually sick: {actually_sick}")
print(f"True positives: {sick_positive}")
print(f"False positives: {healthy_positive}")
print(f"Total positives: {total_positive}")
print(f"P(sick | positive): {sick_positive/total_positive:.1%}")
Output:
Actually sick: 10
True positives: 10
False positives: 100
Total positives: 110
P(sick | positive): 9.1%
The Key Insight: When the disease is rare, even a small false positive rate creates many false alarms that overwhelm the true cases!

Bayes’ Theorem in Machine Learning

This isn’t just medical trivia. Bayes’ theorem is the foundation of:
  • Spam filters (Naive Bayes classifier)
  • Recommendation systems
  • Medical diagnosis AI
  • Text classification
  • Bayesian neural networks
Let’s build a spam detector!

🚀 Mini-Project: Spam Email Detector

The Problem

You receive an email with the word “FREE” in it. What’s the probability it’s spam?

The Data

# Training data: 1000 emails
total_emails = 1000
spam_emails = 400
ham_emails = 600  # "ham" = not spam

# Word frequency
emails_with_free = {
    'spam': 200,  # 200 spam emails contain "FREE"
    'ham': 30     # 30 legitimate emails contain "FREE"
}

emails_with_meeting = {
    'spam': 10,
    'ham': 250
}

emails_with_urgent = {
    'spam': 180,
    'ham': 40
}

Step 1: Calculate Prior Probabilities

# P(Spam) and P(Ham)
p_spam = spam_emails / total_emails  # 0.40
p_ham = ham_emails / total_emails    # 0.60

print(f"P(Spam): {p_spam:.0%}")
print(f"P(Ham): {p_ham:.0%}")

Step 2: Calculate Likelihoods

# P(word | Spam) - probability of word appearing in spam
p_free_given_spam = emails_with_free['spam'] / spam_emails
p_free_given_ham = emails_with_free['ham'] / ham_emails

print(f"P('FREE' | Spam): {p_free_given_spam:.0%}")  # 50%
print(f"P('FREE' | Ham): {p_free_given_ham:.0%}")    # 5%

p_meeting_given_spam = emails_with_meeting['spam'] / spam_emails
p_meeting_given_ham = emails_with_meeting['ham'] / ham_emails

print(f"P('meeting' | Spam): {p_meeting_given_spam:.1%}")  # 2.5%
print(f"P('meeting' | Ham): {p_meeting_given_ham:.1%}")    # 41.7%

Step 3: Apply Bayes’ Theorem

def naive_bayes_spam(word, p_word_spam, p_word_ham, p_spam=0.4, p_ham=0.6):
    """
    Calculate P(Spam | word) using Bayes' theorem.
    """
    # P(word) = P(word|Spam)*P(Spam) + P(word|Ham)*P(Ham)
    p_word = p_word_spam * p_spam + p_word_ham * p_ham
    
    # Bayes' theorem
    p_spam_given_word = (p_word_spam * p_spam) / p_word
    
    return p_spam_given_word

# Email contains "FREE"
p_spam_free = naive_bayes_spam("FREE", 0.50, 0.05)
print(f"P(Spam | 'FREE'): {p_spam_free:.0%}")  # 87%

# Email contains "meeting"
p_spam_meeting = naive_bayes_spam("meeting", 0.025, 0.417)
print(f"P(Spam | 'meeting'): {p_spam_meeting:.0%}")  # 4%

Step 4: Multiple Words (Naive Assumption)

The “naive” in Naive Bayes assumes words are independent:
def naive_bayes_multi_word(words_data, p_spam=0.4, p_ham=0.6):
    """
    Calculate P(Spam | multiple words) assuming independence.
    
    words_data: list of (word, p_word_spam, p_word_ham)
    """
    # Start with prior
    log_spam = np.log(p_spam)
    log_ham = np.log(p_ham)
    
    # Multiply likelihoods (add in log space)
    for word, p_word_spam, p_word_ham in words_data:
        log_spam += np.log(p_word_spam + 1e-10)  # Add small value to avoid log(0)
        log_ham += np.log(p_word_ham + 1e-10)
    
    # Convert back and normalize
    spam_score = np.exp(log_spam)
    ham_score = np.exp(log_ham)
    
    p_spam_given_words = spam_score / (spam_score + ham_score)
    return p_spam_given_words

# Email: "FREE meeting URGENT"
words = [
    ("FREE", 0.50, 0.05),
    ("meeting", 0.025, 0.417),
    ("URGENT", 0.45, 0.067)
]

p_spam_multi = naive_bayes_multi_word(words)
print(f"P(Spam | 'FREE meeting URGENT'): {p_spam_multi:.0%}")
Output:
P(Spam | 'FREE meeting URGENT'): 71%
Even though “meeting” is a ham indicator, the strong spam signals from “FREE” and “URGENT” push the probability up!

Complete Spam Classifier

import numpy as np
from collections import defaultdict

class NaiveBayesSpamFilter:
    def __init__(self):
        self.word_counts = {'spam': defaultdict(int), 'ham': defaultdict(int)}
        self.class_counts = {'spam': 0, 'ham': 0}
        self.vocabulary = set()
    
    def train(self, emails, labels):
        """Train on labeled email data."""
        for email, label in zip(emails, labels):
            self.class_counts[label] += 1
            words = email.lower().split()
            for word in words:
                self.word_counts[label][word] += 1
                self.vocabulary.add(word)
    
    def predict_proba(self, email):
        """Calculate P(Spam | email)."""
        words = email.lower().split()
        
        # Prior probabilities
        total = sum(self.class_counts.values())
        p_spam = self.class_counts['spam'] / total
        p_ham = self.class_counts['ham'] / total
        
        # Calculate log likelihoods
        log_spam = np.log(p_spam)
        log_ham = np.log(p_ham)
        
        vocab_size = len(self.vocabulary)
        spam_total = sum(self.word_counts['spam'].values())
        ham_total = sum(self.word_counts['ham'].values())
        
        for word in words:
            # Laplace smoothing
            p_word_spam = (self.word_counts['spam'][word] + 1) / (spam_total + vocab_size)
            p_word_ham = (self.word_counts['ham'][word] + 1) / (ham_total + vocab_size)
            
            log_spam += np.log(p_word_spam)
            log_ham += np.log(p_word_ham)
        
        # Convert to probabilities
        spam_score = np.exp(log_spam)
        ham_score = np.exp(log_ham)
        
        return spam_score / (spam_score + ham_score)
    
    def predict(self, email, threshold=0.5):
        """Classify as spam or ham."""
        return 'spam' if self.predict_proba(email) > threshold else 'ham'

# Example usage
spam_filter = NaiveBayesSpamFilter()

# Training data
train_emails = [
    "FREE money click here now",
    "Congratulations you won FREE prize",
    "URGENT action required FREE offer",
    "Meeting tomorrow at 3pm",
    "Project deadline reminder",
    "Team lunch on Friday",
    "Your invoice attached",
    "FREE trial subscription ending",
]
train_labels = ['spam', 'spam', 'spam', 'ham', 'ham', 'ham', 'ham', 'spam']

spam_filter.train(train_emails, train_labels)

# Test
test_emails = [
    "FREE discount on all products",
    "Meeting rescheduled to Monday",
    "URGENT free money opportunity",
]

for email in test_emails:
    prob = spam_filter.predict_proba(email)
    pred = spam_filter.predict(email)
    print(f"'{email[:40]}...'")
    print(f"  P(Spam): {prob:.1%}{pred.upper()}\n")

🎯 Practice Exercises

Exercise 1: Card Probability

# Standard 52-card deck
# Calculate:
# 1. P(drawing a heart)
# 2. P(drawing a face card) - J, Q, K
# 3. P(drawing a red face card)
# 4. P(drawing a heart OR a face card)
# 1. P(heart)
p_heart = 13/52  # 25%

# 2. P(face card) - 12 face cards (3 per suit × 4 suits)
p_face = 12/52  # 23.08%

# 3. P(red face card) - 6 cards (3 per red suit × 2 red suits)
p_red_face = 6/52  # 11.54%

# 4. P(heart OR face card)
# Hearts = 13, Face cards = 12, but 3 are both (J, Q, K of hearts)
p_heart_or_face = (13 + 12 - 3) / 52  # 22/52 = 42.31%

print(f"P(heart): {p_heart:.2%}")
print(f"P(face): {p_face:.2%}")
print(f"P(red face): {p_red_face:.2%}")
print(f"P(heart OR face): {p_heart_or_face:.2%}")

Exercise 2: Weather Prediction

# Historical data:
# - 30% of days are rainy
# - On rainy days, 80% are cloudy in the morning
# - On non-rainy days, 40% are cloudy in the morning

# This morning is cloudy. What's the probability of rain?
# Use Bayes' theorem
p_rain = 0.30
p_no_rain = 0.70
p_cloudy_given_rain = 0.80
p_cloudy_given_no_rain = 0.40

# P(Cloudy)
p_cloudy = p_cloudy_given_rain * p_rain + p_cloudy_given_no_rain * p_no_rain
print(f"P(Cloudy): {p_cloudy:.0%}")  # 52%

# P(Rain | Cloudy)
p_rain_given_cloudy = (p_cloudy_given_rain * p_rain) / p_cloudy
print(f"P(Rain | Cloudy): {p_rain_given_cloudy:.1%}")  # 46.2%

# Even though it's cloudy, there's still less than 50% chance of rain
# because the base rate of rain (30%) is low

Exercise 3: Two-Test Diagnosis

# A disease affects 2% of the population
# Test A: 95% sensitivity, 90% specificity
# Test B: 90% sensitivity, 95% specificity

# A patient tests positive on BOTH tests.
# What's the probability they have the disease?

# Hint: Apply Bayes' theorem twice (sequentially)
# Initial state
p_disease = 0.02

# Test A
sensitivity_a = 0.95  # P(+|disease)
specificity_a = 0.90  # P(-|no disease)
false_pos_a = 1 - specificity_a  # 0.10

# After Test A positive
p_pos_a = sensitivity_a * p_disease + false_pos_a * (1 - p_disease)
p_disease_after_a = (sensitivity_a * p_disease) / p_pos_a
print(f"After Test A positive: P(disease) = {p_disease_after_a:.1%}")  # 16.2%

# Test B (now using updated probability as prior)
sensitivity_b = 0.90
specificity_b = 0.95
false_pos_b = 0.05

# After Test B positive (using p_disease_after_a as new prior)
p_pos_b_given_disease = sensitivity_b
p_pos_b_given_no_disease = false_pos_b

p_pos_b = (p_pos_b_given_disease * p_disease_after_a + 
           p_pos_b_given_no_disease * (1 - p_disease_after_a))

p_disease_after_both = (p_pos_b_given_disease * p_disease_after_a) / p_pos_b
print(f"After both tests positive: P(disease) = {p_disease_after_both:.1%}")  # 77.6%

# Two positive tests dramatically increases confidence!

Key Takeaways

Basic Rules

  • Probability is between 0 and 1
  • P(not A) = 1 - P(A)
  • For exclusive events: P(A or B) = P(A) + P(B)
  • For independent events: P(A and B) = P(A) × P(B)

Conditional Probability

  • P(A|B) = P(A and B) / P(B)
  • “Given B” means we’re only looking at cases where B happened
  • This changes everything!

Bayes' Theorem

  • Lets you flip P(A|B) to P(B|A)
  • Prior × Likelihood = Posterior (after normalizing)
  • Foundation of spam filters, medical diagnosis, ML

Key Insight

  • Base rates matter enormously
  • “99% accurate” doesn’t mean what you think
  • Always consider: what’s the prior probability?

Common Mistakes to Avoid

Mistake 1: Ignoring Base Rates (Base Rate Fallacy)A 99% accurate test for a rare disease (1 in 10,000) will mostly produce false positives. Most people who test positive won’t have the disease. Always consider how common the thing you’re testing for actually is.
Mistake 2: Confusing P(A|B) with P(B|A)P(sick|positive test) is NOT the same as P(positive test|sick). This confusion has led to wrongful convictions and medical misdiagnoses. Bayes’ theorem is the bridge between them.
Mistake 3: Treating Dependent Events as IndependentDrawing cards without replacement means probabilities change. P(2nd card is ace | 1st was ace) = 3/51, not 4/52.

Interview Questions

Question: A family has two children. Given that at least one is a boy, what’s the probability that both are boys?
Answer: 1/3, not 1/2!Possible outcomes for two children: BB, BG, GB, GG Given “at least one boy”: BB, BG, GB (3 options) Both boys: BB (1 option)P(both boys | at least one boy) = 1/3This is counterintuitive because we’re not told which child is the boy, so both orders (BG, GB) are possible.
Question: In a room of 23 people, what’s the probability that at least two share a birthday?
Answer: About 50%!It’s easier to calculate the complement: P(no shared birthdays) = (365/365) × (364/365) × (363/365) × … × (343/365)
import numpy as np
p_no_match = np.prod([(365-i)/365 for i in range(23)])
p_at_least_one_match = 1 - p_no_match
print(f"P(shared birthday): {p_at_least_one_match:.1%}")  # 50.7%
This is famously counterintuitive because we’re comparing ALL pairs, not just pairs involving you.
Question: You’re on a game show with 3 doors. Behind one is a car, behind the others are goats. You pick door 1. The host (who knows what’s behind each door) opens door 3, revealing a goat. Should you switch to door 2?
Answer: Yes! Switching gives you 2/3 chance of winning.Initial pick: 1/3 chance of being right Switching: 2/3 chance of winningThe key insight: The host’s action gives you information. He always reveals a goat, so when you switch, you’re essentially betting that your initial choice was wrong (which it probably was, 2/3 of the time).
Question: Your spam filter has 98% sensitivity and 95% specificity. If 5% of emails are spam, what fraction of emails flagged as spam are actually spam?
Answer: About 51%
p_spam = 0.05
sensitivity = 0.98  # P(flagged|spam)
specificity = 0.95  # P(not_flagged|not_spam)
false_positive_rate = 0.05

# P(flagged)
p_flagged = sensitivity * p_spam + false_positive_rate * (1 - p_spam)
# = 0.98 × 0.05 + 0.05 × 0.95 = 0.0965

# P(spam|flagged)
p_spam_given_flagged = (sensitivity * p_spam) / p_flagged
# = 0.049 / 0.0965 = 0.508 or about 51%

print(f"Precision: {p_spam_given_flagged:.1%}")
Half of flagged emails are false positives! This is why spam filters need extremely high specificity.

Practice Challenge

Build a simple sentiment classifier using Bayes’ theorem:
import numpy as np
from collections import defaultdict

# Training data
reviews = [
    ("great product love it", "positive"),
    ("terrible waste of money", "negative"),
    ("amazing quality highly recommend", "positive"),
    ("awful experience never again", "negative"),
    ("fantastic works perfectly", "positive"),
    ("horrible customer service", "negative"),
    ("excellent value great buy", "positive"),
    ("disappointing poor quality", "negative"),
]

# Your task: Implement a Naive Bayes classifier
# 1. Count word frequencies in positive vs negative reviews
# 2. Calculate P(word|positive) and P(word|negative) for each word
# 3. Use Bayes to classify new review: "great quality but poor service"

class NaiveBayesClassifier:
    def __init__(self):
        self.word_counts = defaultdict(lambda: defaultdict(int))
        self.class_counts = defaultdict(int)
        self.vocab = set()
    
    def train(self, reviews):
        # Your implementation here
        pass
    
    def predict(self, text):
        # Your implementation here
        pass

# Test your classifier
classifier = NaiveBayesClassifier()
classifier.train(reviews)
print(classifier.predict("great quality but poor service"))
Solution:
class NaiveBayesClassifier:
    def __init__(self):
        self.word_counts = defaultdict(lambda: defaultdict(int))
        self.class_counts = defaultdict(int)
        self.vocab = set()
    
    def train(self, reviews):
        for text, label in reviews:
            self.class_counts[label] += 1
            for word in text.lower().split():
                self.word_counts[label][word] += 1
                self.vocab.add(word)
    
    def predict(self, text):
        words = text.lower().split()
        total_reviews = sum(self.class_counts.values())
        
        scores = {}
        for label in self.class_counts:
            # Start with prior P(class)
            log_prob = np.log(self.class_counts[label] / total_reviews)
            
            # Add log P(word|class) for each word
            total_words = sum(self.word_counts[label].values())
            for word in words:
                # Laplace smoothing
                count = self.word_counts[label].get(word, 0) + 1
                prob = count / (total_words + len(self.vocab))
                log_prob += np.log(prob)
            
            scores[label] = log_prob
        
        return max(scores, key=scores.get)

# Test
classifier = NaiveBayesClassifier()
classifier.train(reviews)
result = classifier.predict("great quality but poor service")
print(f"Prediction: {result}")  # Could go either way!

📝 Practice Exercises

Exercise 1

Calculate probabilities for card games

Exercise 2

Apply Bayes’ theorem to medical diagnosis

Exercise 3

Build a spam classifier using Naive Bayes

Exercise 4

Real-world: Customer conversion funnel analysis

Connection to Machine Learning

Probability ConceptML Application
Prior probabilityInitial model beliefs
LikelihoodHow well data fits model
PosteriorUpdated beliefs after seeing data
Bayes’ theoremNaive Bayes classifier, Bayesian models
Conditional probabilityClassification, prediction

Interview Prep: Common Questions

Q: Two cards are drawn from a deck without replacement. What’s P(both are aces)?
P(1st ace) = 4/52. P(2nd ace | 1st ace) = 3/51. Total = (4/52) × (3/51) = 12/2652 ≈ 0.45%
Q: A test is 95% accurate for a disease affecting 1% of population. If positive, what’s P(have disease)?
Use Bayes: P(D|+) = P(+|D)×P(D) / [P(+|D)×P(D) + P(+|¬D)×P(¬D)] = (0.95×0.01) / (0.95×0.01 + 0.05×0.99) ≈ 16%
Q: What’s the difference between independent and mutually exclusive events?
Independent: Occurrence of one doesn’t affect the other (e.g., two coin flips). Mutually exclusive: Both cannot occur simultaneously (e.g., getting heads AND tails on one flip). Note: Mutually exclusive events are NOT independent!
Q: You flip a fair coin 10 times and get 10 heads. What’s P(heads on flip 11)?
Still 50%! This is the gambler’s fallacy. Each flip is independent; past results don’t affect future outcomes.

Common Pitfalls

Probability Mistakes to Avoid:
  1. Base Rate Neglect - Forgetting prior probabilities when interpreting test results
  2. Confusing P(A|B) with P(B|A) - P(positive|disease) ≠ P(disease|positive)
  3. Gambler’s Fallacy - Believing past independent events affect future outcomes
  4. Assuming Independence - Many real-world events are correlated; check before multiplying probabilities
  5. Ignoring Complement - Sometimes P(at least one) = 1 - P(none) is easier

Key Takeaways

What You Learned:
  • Basic Probability - P(A) = favorable outcomes / total outcomes
  • Addition Rule - P(A or B) = P(A) + P(B) - P(A and B)
  • Multiplication Rule - P(A and B) = P(A) × P(B|A)
  • Conditional Probability - P(A|B) = P(A and B) / P(B)
  • Bayes’ Theorem - Update beliefs with new evidence; essential for ML
  • Independence - Events where one doesn’t affect the other
Bayes’ Theorem Intuition: Think of it as “updating your beliefs.” You start with a prior (what you knew before), see evidence, and get a posterior (what you know after). This is exactly how many ML algorithms learn!
Coming up next: We’ll learn about probability distributions - the patterns that randomness follows. This is where we discover the famous “bell curve” and understand why it shows up everywhere!

Next: Probability Distributions

Discover the patterns hidden in randomness