> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Probability Foundations: How Likely Is This?

> Master probability through real examples - from medical tests to spam detection

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/probability-math.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=cedf1c8b14d77203baa08ac90a2ea5ff" alt="Probability Foundations" width="1080" height="1080" data-path="images/courses/statistics-for-ml/probability-math.svg" />
</Frame>

# Probability Foundations: How Likely Is This?

## The Doctor's Dilemma

Dr. Sarah runs a routine test on a patient. The test comes back **positive** for a rare disease that affects 1 in 1000 people.

The test is 99% accurate:

* If you HAVE the disease, it correctly says "positive" 99% of the time
* If you DON'T have it, it correctly says "negative" 99% of the time

**Question**: What's the probability this patient actually has the disease?

Most people (and many doctors!) say "99%".

The real answer? **About 9%.**

Surprised? This is why understanding probability properly can literally save lives - and it's the foundation of all machine learning.

<Warning>
  **Real Talk**: Probability is one of the most misunderstood topics. Our intuition is terrible at it. By the end of this module, you'll understand why our intuition fails and how to calculate correctly.
</Warning>

<Info>
  **Estimated Time**: 3-4 hours\
  **Difficulty**: Beginner\
  **Prerequisites**: Basic Python, Module 1 (Describing Data)\
  **What You'll Build**: A spam email detector using Bayes' theorem
</Info>

***

## What Is Probability?

At its core, probability answers: **"How likely is something to happen?"**

**Analogy**: Probability is the language your ML model speaks. When a spam classifier says "92% chance this is spam," it is giving you a probability. When a self-driving car decides to brake, it is reasoning about the probability of a pedestrian stepping into the road. Every prediction an ML model makes is, at its foundation, a probability statement.

### The Coin Flip

```python theme={null}
import random

def flip_coin():
    return random.choice(['Heads', 'Tails'])

# Flip 1000 times
results = [flip_coin() for _ in range(1000)]
heads_count = results.count('Heads')

print(f"Heads: {heads_count}/1000 = {heads_count/1000:.2%}")
```

**Output (varies):**

```
Heads: 497/1000 = 49.70%
```

The probability of heads is 0.5 (or 50%). This means:

* Over many flips, about half will be heads
* Any single flip is unpredictable
* Probability is about **long-run frequency**

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/probability-math.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=cedf1c8b14d77203baa08ac90a2ea5ff" alt="Probability Formula Visualization" width="1080" height="1080" data-path="images/courses/statistics-for-ml/probability-math.svg" />
</Frame>

**The Formula:**

$$
P(A) = \frac{\text{Number of ways A can happen}}{\text{Total number of possible outcomes}}
$$

***

## Basic Probability Rules

### Rule 1: Probability is Between 0 and 1

$$
0 \leq P(A) \leq 1
$$

* **P(A) = 0**: Impossible (rolling a 7 on a standard die)
* **P(A) = 1**: Certain (rolling 1-6 on a standard die)
* **P(A) = 0.5**: Equally likely to happen or not

```python theme={null}
# Die roll probabilities
die_outcomes = [1, 2, 3, 4, 5, 6]

p_roll_7 = 0 / 6  # Impossible
p_roll_any = 6 / 6  # Certain
p_roll_even = 3 / 6  # 2, 4, or 6

print(f"P(roll 7): {p_roll_7}")
print(f"P(roll 1-6): {p_roll_any}")
print(f"P(roll even): {p_roll_even}")
```

### Rule 2: Complement Rule

The probability something **doesn't** happen is 1 minus the probability it does.

$$
P(\text{not } A) = 1 - P(A)
$$

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/complement-real-world.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=7743db99eaf54c9aa41d019b9a3a6aa5" alt="Complement Rule - Rain Example" width="1080" height="1080" data-path="images/courses/statistics-for-ml/complement-real-world.svg" />
</Frame>

```python theme={null}
# Weather example
p_rain = 0.30  # 30% chance of rain

p_no_rain = 1 - p_rain
print(f"P(no rain): {p_no_rain:.0%}")  # 70%

# Useful for "at least one" problems
p_single_die_not_six = 5/6
p_three_dice_at_least_one_six = 1 - (5/6)**3
print(f"P(at least one 6 in 3 rolls): {p_three_dice_at_least_one_six:.2%}")
```

### Rule 3: Addition Rule

For **mutually exclusive** events (can't happen together):

$$
P(A \text{ or } B) = P(A) + P(B)
$$

```python theme={null}
# Drawing a card
p_ace = 4/52
p_king = 4/52

# P(ace OR king) - can't be both!
p_ace_or_king = p_ace + p_king
print(f"P(ace or king): {p_ace_or_king:.2%}")  # 15.38%
```

For **non-mutually exclusive** events (can happen together):

$$
P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)
$$

```python theme={null}
# P(ace OR spade)
p_ace = 4/52
p_spade = 13/52
p_ace_and_spade = 1/52  # Ace of spades!

p_ace_or_spade = p_ace + p_spade - p_ace_and_spade
print(f"P(ace or spade): {p_ace_or_spade:.2%}")  # 30.77%
```

### Rule 4: Multiplication Rule for Independent Events

Events are **independent** if one doesn't affect the other.

$$
P(A \text{ and } B) = P(A) \times P(B)
$$

```python theme={null}
# Flipping a coin and rolling a die
p_heads = 1/2
p_six = 1/6

p_heads_and_six = p_heads * p_six
print(f"P(heads AND six): {p_heads_and_six:.2%}")  # 8.33%

# Password cracking: 4-digit PIN
digits = 10  # 0-9
combinations = digits ** 4  # 10,000
p_guess_correct = 1 / combinations
print(f"P(guess 4-digit PIN): {p_guess_correct:.4%}")  # 0.01%
```

***

## Conditional Probability: The Game Changer

Here's where it gets interesting. **Conditional probability** answers:

**"What's the probability of A, GIVEN that B already happened?"**

Notation: $P(A|B)$ reads as "probability of A given B"

### The Job Interview Example

You're applying for jobs. Here's some data:

```python theme={null}
# 100 applicants total
data = {
    'has_experience': {'hired': 30, 'not_hired': 20},  # 50 total
    'no_experience': {'hired': 10, 'not_hired': 40}    # 50 total
}

# Overall probability of getting hired
total_hired = 30 + 10
total_applicants = 100
p_hired = total_hired / total_applicants
print(f"P(hired): {p_hired:.0%}")  # 40%

# BUT... what if you have experience?
experienced_hired = 30
total_experienced = 50
p_hired_given_experience = experienced_hired / total_experienced
print(f"P(hired | experience): {p_hired_given_experience:.0%}")  # 60%

# What if you don't have experience?
no_exp_hired = 10
total_no_exp = 50
p_hired_given_no_experience = no_exp_hired / total_no_exp
print(f"P(hired | no experience): {p_hired_given_no_experience:.0%}")  # 20%
```

**Key Insight**: The "given" information changes everything!

**Analogy**: Conditional probability is like updating your GPS after a wrong turn. The probability of arriving on time given you are on the highway is very different from the probability given you are stuck in a detour. The "given" is the context that reshapes all your estimates.

<Tip>
  **ML Application -- Feature Importance**: Every feature in a classification model is essentially providing conditional information. When your model uses "number of support tickets" to predict churn, it is computing something like P(churn | high support tickets). Understanding conditional probability helps you reason about why certain features are predictive and others are not -- and helps you spot cases where your model is learning spurious correlations.
</Tip>

**The Formula:**

$$
P(A|B) = \frac{P(A \text{ and } B)}{P(B)}
$$

```python theme={null}
# Verify with formula
p_hired_and_experienced = 30 / 100  # 0.30
p_experienced = 50 / 100  # 0.50

p_hired_given_exp_formula = p_hired_and_experienced / p_experienced
print(f"P(hired | exp) via formula: {p_hired_given_exp_formula:.0%}")  # 60% ✓
```

***

## Bayes' Theorem: The Most Important Formula

Now we can solve that medical test problem from the beginning.

**Bayes' Theorem** lets you flip conditional probabilities:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

In words: *The probability of A given B equals the probability of B given A, times the probability of A, divided by the probability of B.*

### Solving the Medical Test Problem

Let's set up what we know:

* **P(Disease)** = 0.001 (1 in 1000 people have it)
* **P(Positive | Disease)** = 0.99 (test catches 99% of sick people)
* **P(Positive | No Disease)** = 0.01 (1% false positive rate)

We want: **P(Disease | Positive)** - given a positive test, what's the chance they're actually sick?

```python theme={null}
# Known probabilities
p_disease = 0.001  # Prior: base rate of disease
p_positive_given_disease = 0.99  # Sensitivity
p_positive_given_no_disease = 0.01  # False positive rate

# Calculate P(Positive) using law of total probability
p_no_disease = 1 - p_disease
p_positive = (p_positive_given_disease * p_disease + 
              p_positive_given_no_disease * p_no_disease)

print(f"P(Positive test): {p_positive:.4f}")  # 0.011

# Apply Bayes' theorem
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive

print(f"P(Disease | Positive): {p_disease_given_positive:.2%}")
```

**Output:**

```
P(Positive test): 0.0109
P(Disease | Positive): 9.02%
```

Only **9% chance** of actually having the disease, despite a "99% accurate" test!

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/bayes-real-world.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=74c343d0ad6ea71f968f5bf38a3f9eda" alt="Bayes Theorem - Medical Test Visualization" width="1080" height="1080" data-path="images/courses/statistics-for-ml/bayes-real-world.svg" />
</Frame>

### Why Is This So Low?

Let's think about 10,000 people:

```python theme={null}
population = 10000

# How many actually have disease?
actually_sick = int(population * 0.001)  # 10 people
actually_healthy = population - actually_sick  # 9990 people

# Of the sick, how many test positive?
sick_positive = int(actually_sick * 0.99)  # ~10 true positives

# Of the healthy, how many test positive (false positives)?
healthy_positive = int(actually_healthy * 0.01)  # ~100 false positives!

# Total positives
total_positive = sick_positive + healthy_positive

print(f"Actually sick: {actually_sick}")
print(f"True positives: {sick_positive}")
print(f"False positives: {healthy_positive}")
print(f"Total positives: {total_positive}")
print(f"P(sick | positive): {sick_positive/total_positive:.1%}")
```

**Output:**

```
Actually sick: 10
True positives: 10
False positives: 100
Total positives: 110
P(sick | positive): 9.1%
```

**The Key Insight**: When the disease is rare, even a small false positive rate creates many false alarms that overwhelm the true cases!

**Step-by-step reasoning**: Here is how to think through Bayes problems without getting lost in formulas:

1. **Start with a concrete population** -- pick 10,000 people (or 100,000, whatever makes the math clean).
2. **Split by the base rate** -- how many actually have the condition? At 0.1%, that is 10 out of 10,000.
3. **Apply the test to both groups** -- of the 10 sick people, 99% test positive = \~10 true positives. Of the 9,990 healthy people, 1% test positive = \~100 false positives.
4. **Look at the positive pile** -- 10 true positives out of 110 total positives = \~9%.

This "frequency table" approach makes Bayes problems intuitive. No formula needed.

<Warning>
  **Statistical Mistake in ML -- Ignoring Class Imbalance**: This exact same base rate problem happens in ML classification. If only 1% of transactions are fraudulent, a model that predicts "not fraud" for everything gets 99% accuracy. The Bayesian insight applies: you must look at precision (of all things you flagged as fraud, how many actually were?) and recall (of all actual frauds, how many did you catch?). Accuracy alone is meaningless with imbalanced classes.
</Warning>

***

## Bayes' Theorem in Machine Learning

This isn't just medical trivia. Bayes' theorem is the foundation of:

* **Spam filters** (Naive Bayes classifier)
* **Recommendation systems**
* **Medical diagnosis AI**
* **Text classification**
* **Bayesian neural networks**

Let's build a spam detector!

***

## 🚀 Mini-Project: Spam Email Detector

### The Problem

You receive an email with the word "FREE" in it. What's the probability it's spam?

### The Data

```python theme={null}
# Training data: 1000 emails
total_emails = 1000
spam_emails = 400
ham_emails = 600  # "ham" = not spam

# Word frequency
emails_with_free = {
    'spam': 200,  # 200 spam emails contain "FREE"
    'ham': 30     # 30 legitimate emails contain "FREE"
}

emails_with_meeting = {
    'spam': 10,
    'ham': 250
}

emails_with_urgent = {
    'spam': 180,
    'ham': 40
}
```

### Step 1: Calculate Prior Probabilities

```python theme={null}
# P(Spam) and P(Ham)
p_spam = spam_emails / total_emails  # 0.40
p_ham = ham_emails / total_emails    # 0.60

print(f"P(Spam): {p_spam:.0%}")
print(f"P(Ham): {p_ham:.0%}")
```

### Step 2: Calculate Likelihoods

```python theme={null}
# P(word | Spam) - probability of word appearing in spam
p_free_given_spam = emails_with_free['spam'] / spam_emails
p_free_given_ham = emails_with_free['ham'] / ham_emails

print(f"P('FREE' | Spam): {p_free_given_spam:.0%}")  # 50%
print(f"P('FREE' | Ham): {p_free_given_ham:.0%}")    # 5%

p_meeting_given_spam = emails_with_meeting['spam'] / spam_emails
p_meeting_given_ham = emails_with_meeting['ham'] / ham_emails

print(f"P('meeting' | Spam): {p_meeting_given_spam:.1%}")  # 2.5%
print(f"P('meeting' | Ham): {p_meeting_given_ham:.1%}")    # 41.7%
```

### Step 3: Apply Bayes' Theorem

```python theme={null}
def naive_bayes_spam(word, p_word_spam, p_word_ham, p_spam=0.4, p_ham=0.6):
    """
    Calculate P(Spam | word) using Bayes' theorem.
    """
    # P(word) = P(word|Spam)*P(Spam) + P(word|Ham)*P(Ham)
    p_word = p_word_spam * p_spam + p_word_ham * p_ham
    
    # Bayes' theorem
    p_spam_given_word = (p_word_spam * p_spam) / p_word
    
    return p_spam_given_word

# Email contains "FREE"
p_spam_free = naive_bayes_spam("FREE", 0.50, 0.05)
print(f"P(Spam | 'FREE'): {p_spam_free:.0%}")  # 87%

# Email contains "meeting"
p_spam_meeting = naive_bayes_spam("meeting", 0.025, 0.417)
print(f"P(Spam | 'meeting'): {p_spam_meeting:.0%}")  # 4%
```

### Step 4: Multiple Words (Naive Assumption)

The "naive" in Naive Bayes assumes words are independent:

```python theme={null}
def naive_bayes_multi_word(words_data, p_spam=0.4, p_ham=0.6):
    """
    Calculate P(Spam | multiple words) assuming independence.
    
    words_data: list of (word, p_word_spam, p_word_ham)
    """
    # Start with prior
    log_spam = np.log(p_spam)
    log_ham = np.log(p_ham)
    
    # Multiply likelihoods (add in log space)
    for word, p_word_spam, p_word_ham in words_data:
        log_spam += np.log(p_word_spam + 1e-10)  # Add small value to avoid log(0)
        log_ham += np.log(p_word_ham + 1e-10)
    
    # Convert back and normalize
    spam_score = np.exp(log_spam)
    ham_score = np.exp(log_ham)
    
    p_spam_given_words = spam_score / (spam_score + ham_score)
    return p_spam_given_words

# Email: "FREE meeting URGENT"
words = [
    ("FREE", 0.50, 0.05),
    ("meeting", 0.025, 0.417),
    ("URGENT", 0.45, 0.067)
]

p_spam_multi = naive_bayes_multi_word(words)
print(f"P(Spam | 'FREE meeting URGENT'): {p_spam_multi:.0%}")
```

**Output:**

```
P(Spam | 'FREE meeting URGENT'): 71%
```

Even though "meeting" is a ham indicator, the strong spam signals from "FREE" and "URGENT" push the probability up!

***

## Complete Spam Classifier

```python theme={null}
import numpy as np
from collections import defaultdict

class NaiveBayesSpamFilter:
    def __init__(self):
        self.word_counts = {'spam': defaultdict(int), 'ham': defaultdict(int)}
        self.class_counts = {'spam': 0, 'ham': 0}
        self.vocabulary = set()
    
    def train(self, emails, labels):
        """Train on labeled email data."""
        for email, label in zip(emails, labels):
            self.class_counts[label] += 1
            words = email.lower().split()
            for word in words:
                self.word_counts[label][word] += 1
                self.vocabulary.add(word)
    
    def predict_proba(self, email):
        """Calculate P(Spam | email)."""
        words = email.lower().split()
        
        # Prior probabilities
        total = sum(self.class_counts.values())
        p_spam = self.class_counts['spam'] / total
        p_ham = self.class_counts['ham'] / total
        
        # Calculate log likelihoods
        log_spam = np.log(p_spam)
        log_ham = np.log(p_ham)
        
        vocab_size = len(self.vocabulary)
        spam_total = sum(self.word_counts['spam'].values())
        ham_total = sum(self.word_counts['ham'].values())
        
        for word in words:
            # Laplace smoothing
            p_word_spam = (self.word_counts['spam'][word] + 1) / (spam_total + vocab_size)
            p_word_ham = (self.word_counts['ham'][word] + 1) / (ham_total + vocab_size)
            
            log_spam += np.log(p_word_spam)
            log_ham += np.log(p_word_ham)
        
        # Convert to probabilities
        spam_score = np.exp(log_spam)
        ham_score = np.exp(log_ham)
        
        return spam_score / (spam_score + ham_score)
    
    def predict(self, email, threshold=0.5):
        """Classify as spam or ham."""
        return 'spam' if self.predict_proba(email) > threshold else 'ham'

# Example usage
spam_filter = NaiveBayesSpamFilter()

# Training data
train_emails = [
    "FREE money click here now",
    "Congratulations you won FREE prize",
    "URGENT action required FREE offer",
    "Meeting tomorrow at 3pm",
    "Project deadline reminder",
    "Team lunch on Friday",
    "Your invoice attached",
    "FREE trial subscription ending",
]
train_labels = ['spam', 'spam', 'spam', 'ham', 'ham', 'ham', 'ham', 'spam']

spam_filter.train(train_emails, train_labels)

# Test
test_emails = [
    "FREE discount on all products",
    "Meeting rescheduled to Monday",
    "URGENT free money opportunity",
]

for email in test_emails:
    prob = spam_filter.predict_proba(email)
    pred = spam_filter.predict(email)
    print(f"'{email[:40]}...'")
    print(f"  P(Spam): {prob:.1%} → {pred.upper()}\n")
```

***

## 🎯 Practice Exercises

### Exercise 1: Card Probability

```python theme={null}
# Standard 52-card deck
# Calculate:
# 1. P(drawing a heart)
# 2. P(drawing a face card) - J, Q, K
# 3. P(drawing a red face card)
# 4. P(drawing a heart OR a face card)
```

<Accordion title="Solution">
  ```python theme={null}
  # 1. P(heart)
  p_heart = 13/52  # 25%

  # 2. P(face card) - 12 face cards (3 per suit × 4 suits)
  p_face = 12/52  # 23.08%

  # 3. P(red face card) - 6 cards (3 per red suit × 2 red suits)
  p_red_face = 6/52  # 11.54%

  # 4. P(heart OR face card)
  # Hearts = 13, Face cards = 12, but 3 are both (J, Q, K of hearts)
  p_heart_or_face = (13 + 12 - 3) / 52  # 22/52 = 42.31%

  print(f"P(heart): {p_heart:.2%}")
  print(f"P(face): {p_face:.2%}")
  print(f"P(red face): {p_red_face:.2%}")
  print(f"P(heart OR face): {p_heart_or_face:.2%}")
  ```
</Accordion>

### Exercise 2: Weather Prediction

```python theme={null}
# Historical data:
# - 30% of days are rainy
# - On rainy days, 80% are cloudy in the morning
# - On non-rainy days, 40% are cloudy in the morning

# This morning is cloudy. What's the probability of rain?
```

<Accordion title="Solution">
  ```python theme={null}
  # Use Bayes' theorem
  p_rain = 0.30
  p_no_rain = 0.70
  p_cloudy_given_rain = 0.80
  p_cloudy_given_no_rain = 0.40

  # P(Cloudy)
  p_cloudy = p_cloudy_given_rain * p_rain + p_cloudy_given_no_rain * p_no_rain
  print(f"P(Cloudy): {p_cloudy:.0%}")  # 52%

  # P(Rain | Cloudy)
  p_rain_given_cloudy = (p_cloudy_given_rain * p_rain) / p_cloudy
  print(f"P(Rain | Cloudy): {p_rain_given_cloudy:.1%}")  # 46.2%

  # Even though it's cloudy, there's still less than 50% chance of rain
  # because the base rate of rain (30%) is low
  ```
</Accordion>

### Exercise 3: Two-Test Diagnosis

```python theme={null}
# A disease affects 2% of the population
# Test A: 95% sensitivity, 90% specificity
# Test B: 90% sensitivity, 95% specificity

# A patient tests positive on BOTH tests.
# What's the probability they have the disease?

# Hint: Apply Bayes' theorem twice (sequentially)
```

<Accordion title="Solution">
  ```python theme={null}
  # Initial state
  p_disease = 0.02

  # Test A
  sensitivity_a = 0.95  # P(+|disease)
  specificity_a = 0.90  # P(-|no disease)
  false_pos_a = 1 - specificity_a  # 0.10

  # After Test A positive
  p_pos_a = sensitivity_a * p_disease + false_pos_a * (1 - p_disease)
  p_disease_after_a = (sensitivity_a * p_disease) / p_pos_a
  print(f"After Test A positive: P(disease) = {p_disease_after_a:.1%}")  # 16.2%

  # Test B (now using updated probability as prior)
  sensitivity_b = 0.90
  specificity_b = 0.95
  false_pos_b = 0.05

  # After Test B positive (using p_disease_after_a as new prior)
  p_pos_b_given_disease = sensitivity_b
  p_pos_b_given_no_disease = false_pos_b

  p_pos_b = (p_pos_b_given_disease * p_disease_after_a + 
             p_pos_b_given_no_disease * (1 - p_disease_after_a))

  p_disease_after_both = (p_pos_b_given_disease * p_disease_after_a) / p_pos_b
  print(f"After both tests positive: P(disease) = {p_disease_after_both:.1%}")  # 77.6%

  # Two positive tests dramatically increases confidence!
  ```
</Accordion>

***

## Key Takeaways

<CardGroup cols={2}>
  <Card title="Basic Rules" icon="list-check">
    * Probability is between 0 and 1
    * P(not A) = 1 - P(A)
    * For exclusive events: P(A or B) = P(A) + P(B)
    * For independent events: P(A and B) = P(A) × P(B)
  </Card>

  <Card title="Conditional Probability" icon="code-branch">
    * P(A|B) = P(A and B) / P(B)
    * "Given B" means we're only looking at cases where B happened
    * This changes everything!
  </Card>

  <Card title="Bayes' Theorem" icon="star">
    * Lets you flip P(A|B) to P(B|A)
    * Prior × Likelihood = Posterior (after normalizing)
    * Foundation of spam filters, medical diagnosis, ML
  </Card>

  <Card title="Key Insight" icon="lightbulb">
    * Base rates matter enormously
    * "99% accurate" doesn't mean what you think
    * Always consider: what's the prior probability?
  </Card>
</CardGroup>

***

## Common Mistakes to Avoid

<Warning>
  **Mistake 1: Ignoring Base Rates (Base Rate Fallacy)**

  A 99% accurate test for a rare disease (1 in 10,000) will mostly produce false positives. Most people who test positive won't have the disease. Always consider how common the thing you're testing for actually is.
</Warning>

<Warning>
  **Mistake 2: Confusing P(A|B) with P(B|A)**

  P(sick|positive test) is NOT the same as P(positive test|sick). This confusion has led to wrongful convictions and medical misdiagnoses. Bayes' theorem is the bridge between them.
</Warning>

<Warning>
  **Mistake 3: Treating Dependent Events as Independent**

  Drawing cards without replacement means probabilities change. P(2nd card is ace | 1st was ace) = 3/51, not 4/52.
</Warning>

***

## Interview Questions

<Accordion title="Question 1: The Two-Child Problem (Google)">
  **Question**: A family has two children. Given that at least one is a boy, what's the probability that both are boys?

  <Tip>
    **Answer**: 1/3, not 1/2!

    Possible outcomes for two children: BB, BG, GB, GG
    Given "at least one boy": BB, BG, GB (3 options)
    Both boys: BB (1 option)

    P(both boys | at least one boy) = 1/3

    This is counterintuitive because we're not told which child is the boy, so both orders (BG, GB) are possible.
  </Tip>
</Accordion>

<Accordion title="Question 2: Birthday Problem (Amazon)">
  **Question**: In a room of 23 people, what's the probability that at least two share a birthday?

  <Tip>
    **Answer**: About 50%!

    It's easier to calculate the complement:
    P(no shared birthdays) = (365/365) × (364/365) × (363/365) × ... × (343/365)

    ```python theme={null}
    import numpy as np
    p_no_match = np.prod([(365-i)/365 for i in range(23)])
    p_at_least_one_match = 1 - p_no_match
    print(f"P(shared birthday): {p_at_least_one_match:.1%}")  # 50.7%
    ```

    This is famously counterintuitive because we're comparing ALL pairs, not just pairs involving you.
  </Tip>
</Accordion>

<Accordion title="Question 3: Monty Hall Problem (Classic)">
  **Question**: You're on a game show with 3 doors. Behind one is a car, behind the others are goats. You pick door 1. The host (who knows what's behind each door) opens door 3, revealing a goat. Should you switch to door 2?

  <Tip>
    **Answer**: Yes! Switching gives you 2/3 chance of winning.

    Initial pick: 1/3 chance of being right
    Switching: 2/3 chance of winning

    The key insight: The host's action gives you information. He always reveals a goat, so when you switch, you're essentially betting that your initial choice was wrong (which it probably was, 2/3 of the time).
  </Tip>
</Accordion>

<Accordion title="Question 4: Spam Classifier (Tech Companies)">
  **Question**: Your spam filter has 98% sensitivity and 95% specificity. If 5% of emails are spam, what fraction of emails flagged as spam are actually spam?

  <Tip>
    **Answer**: About 51%

    ```python theme={null}
    p_spam = 0.05
    sensitivity = 0.98  # P(flagged|spam)
    specificity = 0.95  # P(not_flagged|not_spam)
    false_positive_rate = 0.05

    # P(flagged)
    p_flagged = sensitivity * p_spam + false_positive_rate * (1 - p_spam)
    # = 0.98 × 0.05 + 0.05 × 0.95 = 0.0965

    # P(spam|flagged)
    p_spam_given_flagged = (sensitivity * p_spam) / p_flagged
    # = 0.049 / 0.0965 = 0.508 or about 51%

    print(f"Precision: {p_spam_given_flagged:.1%}")
    ```

    Half of flagged emails are false positives! This is why spam filters need extremely high specificity.
  </Tip>
</Accordion>

***

## Practice Challenge

<Accordion title="Challenge: Build a Complete Bayesian Classifier">
  Build a simple sentiment classifier using Bayes' theorem:

  ```python theme={null}
  import numpy as np
  from collections import defaultdict

  # Training data
  reviews = [
      ("great product love it", "positive"),
      ("terrible waste of money", "negative"),
      ("amazing quality highly recommend", "positive"),
      ("awful experience never again", "negative"),
      ("fantastic works perfectly", "positive"),
      ("horrible customer service", "negative"),
      ("excellent value great buy", "positive"),
      ("disappointing poor quality", "negative"),
  ]

  # Your task: Implement a Naive Bayes classifier
  # 1. Count word frequencies in positive vs negative reviews
  # 2. Calculate P(word|positive) and P(word|negative) for each word
  # 3. Use Bayes to classify new review: "great quality but poor service"

  class NaiveBayesClassifier:
      def __init__(self):
          self.word_counts = defaultdict(lambda: defaultdict(int))
          self.class_counts = defaultdict(int)
          self.vocab = set()
      
      def train(self, reviews):
          # Your implementation here
          pass
      
      def predict(self, text):
          # Your implementation here
          pass

  # Test your classifier
  classifier = NaiveBayesClassifier()
  classifier.train(reviews)
  print(classifier.predict("great quality but poor service"))
  ```

  **Solution**:

  ```python theme={null}
  class NaiveBayesClassifier:
      def __init__(self):
          self.word_counts = defaultdict(lambda: defaultdict(int))
          self.class_counts = defaultdict(int)
          self.vocab = set()
      
      def train(self, reviews):
          for text, label in reviews:
              self.class_counts[label] += 1
              for word in text.lower().split():
                  self.word_counts[label][word] += 1
                  self.vocab.add(word)
      
      def predict(self, text):
          words = text.lower().split()
          total_reviews = sum(self.class_counts.values())
          
          scores = {}
          for label in self.class_counts:
              # Start with prior P(class)
              log_prob = np.log(self.class_counts[label] / total_reviews)
              
              # Add log P(word|class) for each word
              total_words = sum(self.word_counts[label].values())
              for word in words:
                  # Laplace smoothing
                  count = self.word_counts[label].get(word, 0) + 1
                  prob = count / (total_words + len(self.vocab))
                  log_prob += np.log(prob)
              
              scores[label] = log_prob
          
          return max(scores, key=scores.get)

  # Test
  classifier = NaiveBayesClassifier()
  classifier.train(reviews)
  result = classifier.predict("great quality but poor service")
  print(f"Prediction: {result}")  # Could go either way!
  ```
</Accordion>

***

## 📝 Practice Exercises

<CardGroup cols={2}>
  <Card title="Exercise 1" icon="dice" color="#3B82F6">
    Calculate probabilities for card games
  </Card>

  <Card title="Exercise 2" icon="virus" color="#10B981">
    Apply Bayes' theorem to medical diagnosis
  </Card>

  <Card title="Exercise 3" icon="envelope" color="#8B5CF6">
    Build a spam classifier using Naive Bayes
  </Card>

  <Card title="Exercise 4" icon="cart-shopping" color="#F59E0B">
    Real-world: Customer conversion funnel analysis
  </Card>
</CardGroup>

<details>
  <summary>**Exercise 1: Card Game Probabilities** - Apply probability rules</summary>

  **Problem**: From a standard 52-card deck:

  1. What's the probability of drawing a King OR a Heart?
  2. What's the probability of drawing 2 Aces in a row (without replacement)?
  3. If you know the card is red, what's P(it's a Diamond)?

  **Solution**:

  ```python theme={null}
  # 1. P(King OR Heart) - Use addition rule for non-mutually exclusive events
  p_king = 4/52
  p_heart = 13/52
  p_king_and_heart = 1/52  # King of Hearts

  p_king_or_heart = p_king + p_heart - p_king_and_heart
  print(f"P(King or Heart): {p_king_or_heart:.4f} = {16}/{52} = {4}/{13}")
  # Answer: 16/52 = 4/13 ≈ 0.3077

  # 2. P(2 Aces in a row without replacement)
  p_first_ace = 4/52
  p_second_ace_given_first = 3/51  # One less ace, one less card

  p_two_aces = p_first_ace * p_second_ace_given_first
  print(f"P(2 Aces in a row): {p_two_aces:.4f} = {4*3}/{52*51}")
  # Answer: 12/2652 = 1/221 ≈ 0.0045

  # 3. P(Diamond | Red) - Conditional probability
  # Red cards: 26 (13 Hearts + 13 Diamonds)
  # Diamonds: 13
  p_diamond_given_red = 13/26
  print(f"P(Diamond | Red): {p_diamond_given_red:.4f} = 1/2")
  # Answer: 0.5 - Half of red cards are diamonds
  ```
</details>

<details>
  <summary>**Exercise 2: Medical Diagnosis with Bayes** - Apply Bayes' theorem</summary>

  **Problem**: A disease affects 0.1% of the population. A test for the disease has:

  * Sensitivity (true positive rate): 99%
  * Specificity (true negative rate): 95%

  If someone tests positive, what's the probability they actually have the disease?

  **Solution**:

  ```python theme={null}
  # Given probabilities
  p_disease = 0.001          # Prior: 0.1% have disease
  p_no_disease = 0.999       # 99.9% don't have disease

  p_positive_given_disease = 0.99    # Sensitivity
  p_positive_given_no_disease = 0.05  # False positive rate (1 - specificity)

  # Apply Bayes' theorem: P(Disease|Positive) = P(Positive|Disease) * P(Disease) / P(Positive)

  # First calculate P(Positive) using law of total probability
  p_positive = (p_positive_given_disease * p_disease) + \
               (p_positive_given_no_disease * p_no_disease)

  print(f"P(Positive): {p_positive:.4f}")  # ~5.09%

  # Now apply Bayes
  p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive

  print(f"P(Disease | Positive): {p_disease_given_positive:.4f}")  # ~1.94%
  print(f"As percentage: {p_disease_given_positive*100:.2f}%")

  # Surprising result! Even with a positive test, only ~2% chance of disease!
  # This is because the disease is rare (low prior), and the false positives
  # from the 99.9% healthy population overwhelm the true positives.

  # Verification table for 100,000 people:
  print("\n--- Verification for 100,000 people ---")
  population = 100000
  have_disease = int(population * p_disease)  # 100
  no_disease = population - have_disease       # 99,900

  true_positives = int(have_disease * p_positive_given_disease)  # 99
  false_positives = int(no_disease * p_positive_given_no_disease) # 4,995
  total_positives = true_positives + false_positives              # 5,094

  print(f"True positives: {true_positives}")
  print(f"False positives: {false_positives}")
  print(f"P(Disease|Positive) = {true_positives}/{total_positives} = {true_positives/total_positives:.4f}")
  ```
</details>

<details>
  <summary>**Exercise 3: Email Spam Classification** - Naive Bayes in action</summary>

  **Problem**: You're building a spam filter. Based on training data:

  * P(Spam) = 0.3, P(Not Spam) = 0.7
  * Word frequencies:
    * P("free" | Spam) = 0.8, P("free" | Not Spam) = 0.1
    * P("meeting" | Spam) = 0.1, P("meeting" | Not Spam) = 0.6

  Classify the email: "free meeting invitation"
  (Assume independence and only consider "free" and "meeting")

  **Solution**:

  ```python theme={null}
  import numpy as np

  # Prior probabilities
  p_spam = 0.3
  p_not_spam = 0.7

  # Likelihoods
  p_free_given_spam = 0.8
  p_free_given_not_spam = 0.1
  p_meeting_given_spam = 0.1
  p_meeting_given_not_spam = 0.6

  # Naive Bayes: Calculate P(class) * P(word1|class) * P(word2|class) for each class

  # For Spam:
  score_spam = p_spam * p_free_given_spam * p_meeting_given_spam
  print(f"Score(Spam): {p_spam} × {p_free_given_spam} × {p_meeting_given_spam} = {score_spam:.4f}")

  # For Not Spam:
  score_not_spam = p_not_spam * p_free_given_not_spam * p_meeting_given_not_spam
  print(f"Score(Not Spam): {p_not_spam} × {p_free_given_not_spam} × {p_meeting_given_not_spam} = {score_not_spam:.4f}")

  # Normalize to get probabilities
  total = score_spam + score_not_spam
  p_spam_given_email = score_spam / total
  p_not_spam_given_email = score_not_spam / total

  print(f"\nP(Spam | email): {p_spam_given_email:.2%}")      # ~36%
  print(f"P(Not Spam | email): {p_not_spam_given_email:.2%}") # ~64%

  # Classification
  prediction = "Spam" if p_spam_given_email > p_not_spam_given_email else "Not Spam"
  print(f"\nClassification: {prediction}")  # Not Spam

  # The word "meeting" strongly indicates legitimate email,
  # outweighing the word "free" which could go either way.
  ```
</details>

<details>
  <summary>**Exercise 4: E-commerce Conversion Funnel** - Real-world conditional probability</summary>

  **Problem**: An e-commerce site tracks user behavior:

  * 10,000 visitors land on the homepage
  * 40% view a product page
  * Of those who view products, 25% add to cart
  * Of those who add to cart, 60% complete purchase

  Calculate:

  1. P(Purchase | Visited site)
  2. If you see someone at checkout, what's P(they complete purchase)?
  3. How many purchases do you expect from 10,000 visitors?

  **Solution**:

  ```python theme={null}
  # Define funnel stages
  visitors = 10000

  # Stage probabilities
  p_view_product = 0.40
  p_add_cart_given_view = 0.25
  p_purchase_given_cart = 0.60

  # 1. P(Purchase | Visited) = P(View) × P(Cart|View) × P(Purchase|Cart)
  p_purchase = p_view_product * p_add_cart_given_view * p_purchase_given_cart
  print(f"P(Purchase | Visited): {p_purchase:.2%}")  # 6%

  # 2. P(Complete | At Checkout)
  # "At checkout" means they've added to cart
  p_complete_given_checkout = p_purchase_given_cart
  print(f"P(Complete | At Checkout): {p_complete_given_checkout:.2%}")  # 60%

  # 3. Expected purchases
  expected_purchases = visitors * p_purchase
  print(f"Expected purchases from {visitors:,} visitors: {expected_purchases:.0f}")  # 600

  # Bonus: Funnel breakdown
  viewed_products = visitors * p_view_product  # 4,000
  added_to_cart = viewed_products * p_add_cart_given_view  # 1,000
  purchased = added_to_cart * p_purchase_given_cart  # 600

  print("\n--- Funnel Breakdown ---")
  print(f"Visitors:        {visitors:,}")
  print(f"Viewed Products: {viewed_products:,.0f} ({viewed_products/visitors:.0%})")
  print(f"Added to Cart:   {added_to_cart:,.0f} ({added_to_cart/visitors:.0%})")
  print(f"Purchased:       {purchased:,.0f} ({purchased/visitors:.0%})")

  # Business insight: Biggest drop-off is View → Cart (75% abandon)
  # Focus optimization efforts there!
  ```
</details>

***

## Connection to Machine Learning

| Probability Concept     | ML Application                          |
| ----------------------- | --------------------------------------- |
| Prior probability       | Initial model beliefs                   |
| Likelihood              | How well data fits model                |
| Posterior               | Updated beliefs after seeing data       |
| Bayes' theorem          | Naive Bayes classifier, Bayesian models |
| Conditional probability | Classification, prediction              |

***

## Interview Prep: Common Questions

<Accordion title="Probability Interview Questions">
  **Q: Two cards are drawn from a deck without replacement. What's P(both are aces)?**

  > P(1st ace) = 4/52. P(2nd ace | 1st ace) = 3/51. Total = (4/52) × (3/51) = 12/2652 ≈ 0.45%

  **Q: A test is 95% accurate for a disease affecting 1% of population. If positive, what's P(have disease)?**

  > Use Bayes: P(D|+) = P(+|D)×P(D) / \[P(+|D)×P(D) + P(+|¬D)×P(¬D)] = (0.95×0.01) / (0.95×0.01 + 0.05×0.99) ≈ 16%

  **Q: What's the difference between independent and mutually exclusive events?**

  > Independent: Occurrence of one doesn't affect the other (e.g., two coin flips). Mutually exclusive: Both cannot occur simultaneously (e.g., getting heads AND tails on one flip). Note: Mutually exclusive events are NOT independent!

  **Q: You flip a fair coin 10 times and get 10 heads. What's P(heads on flip 11)?**

  > Still 50%! This is the gambler's fallacy. Each flip is independent; past results don't affect future outcomes.
</Accordion>

***

## Common Pitfalls

<Warning>
  **Probability Mistakes to Avoid**:

  1. **Base Rate Neglect** - Forgetting prior probabilities when interpreting test results
  2. **Confusing P(A|B) with P(B|A)** - P(positive|disease) ≠ P(disease|positive)
  3. **Gambler's Fallacy** - Believing past independent events affect future outcomes
  4. **Assuming Independence** - Many real-world events are correlated; check before multiplying probabilities
  5. **Ignoring Complement** - Sometimes P(at least one) = 1 - P(none) is easier
</Warning>

***

## Key Takeaways

<Note>
  **What You Learned**:

  * ✅ **Basic Probability** - P(A) = favorable outcomes / total outcomes
  * ✅ **Addition Rule** - P(A or B) = P(A) + P(B) - P(A and B)
  * ✅ **Multiplication Rule** - P(A and B) = P(A) × P(B|A)
  * ✅ **Conditional Probability** - P(A|B) = P(A and B) / P(B)
  * ✅ **Bayes' Theorem** - Update beliefs with new evidence; essential for ML
  * ✅ **Independence** - Events where one doesn't affect the other
</Note>

<Tip>
  **Bayes' Theorem Intuition**: Think of it as "updating your beliefs." You start with a prior (what you knew before), see evidence, and get a posterior (what you know after). This is exactly how many ML algorithms learn!
</Tip>

**Coming up next**: We'll learn about **probability distributions** - the patterns that randomness follows. This is where we discover the famous "bell curve" and understand why it shows up everywhere!

<Card title="Next: Probability Distributions" icon="arrow-right" href="/courses/statistics-for-ml/04-distributions">
  Discover the patterns hidden in randomness
</Card>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="A disease affects 1 in 10,000 people. A test is 99% sensitive and 99% specific. A patient tests positive. Walk me through the actual probability they have the disease.">
    **Strong Answer:**

    * This is a Bayes' theorem problem, and the answer shocks most people. Let me work through it with a natural frequency approach. Imagine 1,000,000 people. Of those, 100 actually have the disease (1 in 10,000). Of those 100, the test correctly identifies 99 (99% sensitivity). Of the 999,900 healthy people, the test incorrectly flags 1% as positive, giving 9,999 false positives.
    * Total positive tests: 99 true positives plus 9,999 false positives equals 10,098. So the probability of actually having the disease given a positive test is 99 / 10,098, which is about 0.98% -- less than 1%.
    * The intuition: when the disease is very rare, even a highly accurate test produces far more false positives than true positives because the healthy population is so much larger. This is the base rate fallacy in action.
    * This has direct real-world consequences. During COVID, rapid antigen tests had different positive predictive values depending on community prevalence. In a low-prevalence area, a positive test was much less reliable than in a high-prevalence area -- same test, same accuracy, but radically different interpretation because of the base rate.

    **Follow-up: How does this base rate problem apply to fraud detection in machine learning?**

    It is the exact same mathematics. If fraud occurs in 0.1% of transactions, even a model with 99% precision and 99% recall will generate far more false alarms than true catches in absolute terms. For every 1 million transactions, you would have roughly 990 true fraud catches but also about 9,990 false alarms. Your fraud investigators get swamped with false positives. The practical solution is to either increase specificity dramatically (accepting some missed fraud), use a two-stage system where a high-recall model generates candidates and a high-precision model filters them, or rank alerts by probability score and only investigate the top-N. The base rate is the single most important number in any classification system -- it determines whether your precision is useful or not.
  </Accordion>

  <Accordion title="What is the difference between independent events and mutually exclusive events? Most candidates confuse these.">
    **Strong Answer:**

    * Independent events are those where the occurrence of one does not change the probability of the other. Formally, P(A and B) = P(A) x P(B). Example: flipping a coin and rolling a die. The coin result has no effect on the die.
    * Mutually exclusive events are those that cannot occur simultaneously. P(A and B) = 0. Example: rolling a 3 and rolling a 5 on the same die throw.
    * Here is the critical point that trips people up: mutually exclusive events are actually dependent, not independent. If I tell you that event A happened (you rolled a 3), then you know with certainty that event B did not happen (you did not roll a 5). That means P(B given A) = 0, which is different from P(B) = 1/6. Since P(B given A) does not equal P(B), the events are dependent.
    * The only exception is when one of the events has probability zero. In every other case, mutually exclusive implies dependent. This is a common interview gotcha question precisely because the two concepts sound like they should be related but they are actually in tension.

    **Follow-up: Give me a real-world example where confusing these two concepts would lead to a wrong answer in a data science context.**

    Consider customer segmentation. Suppose "clicked ad" and "made purchase" are not mutually exclusive (a customer can do both). If you incorrectly treat them as mutually exclusive and compute P(click or purchase) = P(click) + P(purchase), you will overcount customers who did both and overestimate your reach. On the other hand, if you incorrectly assume click and purchase are independent and compute P(click and purchase) = P(click) x P(purchase), you might underestimate the joint probability because people who click are much more likely to purchase (they are positively dependent). The correct approach uses P(A or B) = P(A) + P(B) - P(A and B), and the correct joint probability requires knowing P(purchase given click), not just multiplying the marginals.
  </Accordion>

  <Accordion title="You flip a fair coin 20 times and get 15 heads. A colleague says the coin must be biased. How do you evaluate this claim statistically?">
    **Strong Answer:**

    * The way to evaluate this is to ask: "If the coin truly were fair, how likely is it to see a result this extreme or more extreme?" That is the definition of a p-value.
    * For a fair coin, the number of heads in 20 flips follows a Binomial(n=20, p=0.5) distribution. P(X >= 15) is the sum of P(X=15) + P(X=16) + ... + P(X=20). Computing this gives approximately 2.1%. Since we should also consider the other tail (5 or fewer heads would be equally surprising), the two-tailed p-value is about 4.1%.
    * At a standard alpha of 0.05, this is borderline significant. We would reject the null hypothesis that the coin is fair -- but barely. At alpha = 0.01, we would not reject. This tells us that 15 out of 20 is unusual but not overwhelmingly so for a fair coin.
    * The key insight: a single run of 20 flips is not very powerful for detecting bias. If the true probability of heads were 0.6, we would need roughly 200 flips to detect that with 80% power. Twenty flips can only reliably detect large biases (like p = 0.8). This is why sample size planning matters before making claims.

    **Follow-up: If you wanted to be 95% confident in detecting a coin that is 60% biased toward heads, how many flips would you need?**

    This is a power analysis for a one-proportion z-test. The null is p = 0.5, the alternative is p = 0.6, and we want 80% power at alpha = 0.05. Using the formula n = ((z\_alpha + z\_beta)^2 x p\_alt x (1 - p\_alt)) / (p\_alt - p\_null)^2, we get approximately 194 flips. The intuition is that a 10 percentage point shift on a binary outcome requires roughly 200 observations to detect reliably. This scales quadratically: detecting a 5% shift (p=0.55) would require roughly 780 flips. Small effect sizes demand large samples -- one of the most important lessons in all of statistics.
  </Accordion>

  <Accordion title="Explain Naive Bayes classification. Why is it called 'naive,' and when does it work well despite that assumption?">
    **Strong Answer:**

    * Naive Bayes uses Bayes' theorem to compute P(class given features) by flipping it to P(features given class) x P(class). The "naive" part is the assumption that all features are conditionally independent given the class. That means P(word1 and word2 given spam) = P(word1 given spam) x P(word2 given spam). In reality, words are correlated -- "Nigerian" and "prince" tend to appear together in spam -- so the assumption is violated.
    * Despite this, Naive Bayes works surprisingly well in practice for several reasons. First, classification only requires getting the rank ordering of class probabilities correct, not the actual probability values. Even with wrong probability estimates, the argmax (most likely class) is often correct. Second, the independence assumption causes errors that tend to cancel out across many features. Third, with limited training data, Naive Bayes has far fewer parameters to estimate than a model that captures all pairwise dependencies, so it suffers less from overfitting.
    * It excels in text classification (spam filtering, sentiment analysis) where you have high-dimensional sparse features and moderate amounts of labeled data. It is fast to train (single pass through data), fast to predict, and handles missing features naturally.
    * Where it fails: when feature dependencies actually matter for classification. For example, XOR-type problems where the class depends on the interaction of two features (neither feature alone is predictive). In those cases, you need a model that captures interactions, like logistic regression with interaction terms, or a tree-based model.

    **Follow-up: In a production spam filter, would you use Naive Bayes or a deep learning model, and what factors drive that decision?**

    It depends on the constraints. For an email provider processing billions of messages, Naive Bayes is still the first-stage filter because it is incredibly fast (microseconds per email), uses minimal memory, and can be updated incrementally as new spam patterns emerge. Its simplicity means the system is interpretable and debuggable -- you can see exactly which words drove the classification. A deep learning model gives better accuracy (maybe 99.5% versus 98.5%) but requires GPU inference, has higher latency, and is harder to debug when it misclassifies. In practice, most production systems use a cascaded approach: a fast Naive Bayes or logistic regression model as the first filter, then a more expensive neural model for borderline cases. You optimize the expensive model for precision in the ambiguous middle zone where the simple model is uncertain.
  </Accordion>
</AccordionGroup>