> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Probability Distributions: Patterns in Randomness

> Discover the bell curve and other patterns that govern everything from test scores to stock prices

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/distribution-real-world.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=60c13da05e01417c3dab7054c09184fa" alt="Probability Distributions" width="1080" height="1080" data-path="images/courses/statistics-for-ml/distribution-real-world.svg" />
</Frame>

# Probability Distributions: Patterns in Randomness

## The Factory Quality Problem

You run a factory that produces ball bearings. Each bearing should be exactly 10mm in diameter. But manufacturing isn't perfect - there's always some variation.

You measure 1000 bearings and get:

```python theme={null}
import numpy as np
import matplotlib.pyplot as plt

# Simulated bearing diameters (mm)
np.random.seed(42)
bearings = np.random.normal(loc=10.0, scale=0.05, size=1000)

print(f"Mean diameter: {np.mean(bearings):.4f} mm")
print(f"Std deviation: {np.std(bearings):.4f} mm")
print(f"Min: {np.min(bearings):.4f} mm")
print(f"Max: {np.max(bearings):.4f} mm")
```

**Output:**

```
Mean diameter: 10.0024 mm
Std deviation: 0.0498 mm
Min: 9.8521 mm
Max: 10.1534 mm
```

If you plot these measurements, something magical appears:

```python theme={null}
plt.figure(figsize=(10, 5))
plt.hist(bearings, bins=50, density=True, alpha=0.7, edgecolor='black')
plt.xlabel('Diameter (mm)')
plt.ylabel('Frequency')
plt.title('Distribution of Ball Bearing Diameters')
plt.axvline(10.0, color='red', linestyle='--', label='Target: 10mm')
plt.legend()
plt.show()
```

A **bell curve** emerges. This isn't coincidence - it's one of the most profound patterns in nature.

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/key-distributions-ml.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=69120d57b39391354dbf2b5cc1a0e5b1" alt="Key Probability Distributions for ML" width="900" height="500" data-path="images/courses/statistics-for-ml/key-distributions-ml.svg" />
</Frame>

<Info>
  **Estimated Time**: 3-4 hours\
  **Difficulty**: Beginner\
  **Prerequisites**: Modules 1-2 (Describing Data, Probability)\
  **What You'll Build**: Quality control system, prediction intervals
</Info>

***

## What Is a Probability Distribution?

A **probability distribution** describes all possible values a random variable can take and how likely each value is.

Think of it as a complete map of possibilities.

**Analogy**: A probability distribution is like a city's terrain map. The peaks show where values cluster (common outcomes), and the valleys show where values are rare. Just as different cities have different landscapes -- some flat (uniform), some with a single mountain (normal), some with a long tail running to the east (exponential) -- different types of data have different distributional shapes. Learning to "read the terrain" of your data is one of the most valuable skills in ML.

### Discrete vs Continuous

| Type           | Description              | Examples                                    |
| -------------- | ------------------------ | ------------------------------------------- |
| **Discrete**   | Countable outcomes       | Coin flips, dice rolls, number of customers |
| **Continuous** | Infinite possible values | Height, weight, temperature, time           |

```python theme={null}
# Discrete: Number of heads in 10 coin flips
# Can only be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10

# Continuous: Height of a person
# Can be 170.0 cm, 170.1 cm, 170.01 cm, 170.001 cm...
```

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/distribution-types-math.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=b3716c42f62c2f3cd773033096c94b1e" alt="Discrete vs Continuous Distributions" width="1080" height="1080" data-path="images/courses/statistics-for-ml/distribution-types-math.svg" />
</Frame>

***

## The Uniform Distribution: Equal Chances

The simplest distribution - every outcome is equally likely.

### Discrete Uniform: The Fair Die

```python theme={null}
import numpy as np
from collections import Counter

# Roll a fair die 10000 times
rolls = np.random.randint(1, 7, size=10000)

counts = Counter(rolls)
for face in sorted(counts.keys()):
    pct = counts[face] / 10000 * 100
    print(f"Face {face}: {counts[face]:4d} ({pct:.1f}%)")
```

**Output:**

```
Face 1: 1652 (16.5%)
Face 2: 1689 (16.9%)
Face 3: 1634 (16.3%)
Face 4: 1701 (17.0%)
Face 5: 1658 (16.6%)
Face 6: 1666 (16.7%)
```

Each face appears roughly 16.67% (1/6) of the time.

<Frame>
  <img src="https://mintcdn.com/devweeekends/CHfRzoAmD5TGW2ch/images/courses/statistics-for-ml/uniform-real-world.svg?fit=max&auto=format&n=CHfRzoAmD5TGW2ch&q=85&s=377d756b2482887dcc30c6aad29aa5b3" alt="Uniform Distribution - Dice and Lottery" width="1080" height="1080" data-path="images/courses/statistics-for-ml/uniform-real-world.svg" />
</Frame>

### Continuous Uniform: Random Numbers

```python theme={null}
# Random time a customer arrives between 9:00 and 10:00 AM
arrival_minutes = np.random.uniform(0, 60, size=1000)

print(f"Mean arrival: {np.mean(arrival_minutes):.1f} minutes after 9:00")
print(f"P(arrive in first 15 min): {np.mean(arrival_minutes < 15):.1%}")
```

**ML Applications:**

* Random weight initialization
* Data augmentation (random crops, rotations)
* Monte Carlo simulations

***

## The Binomial Distribution: Success/Failure Experiments

When you repeat an experiment with two outcomes (success/failure) multiple times.

**Parameters:**

* n = number of trials
* p = probability of success on each trial

**Question:** If you flip a coin 10 times, what's the probability of getting exactly 7 heads?

### Mathematical Formula

$$
P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
$$

Where $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ is "n choose k"

```python theme={null}
from scipy import stats
import math

def binomial_probability(n, k, p):
    """Calculate P(X = k) for binomial distribution."""
    # n choose k
    combinations = math.factorial(n) / (math.factorial(k) * math.factorial(n - k))
    # Probability
    return combinations * (p ** k) * ((1 - p) ** (n - k))

# P(exactly 7 heads in 10 flips)
p_7_heads = binomial_probability(n=10, k=7, p=0.5)
print(f"P(7 heads in 10 flips): {p_7_heads:.4f}")  # 0.1172

# Using scipy
p_7_scipy = stats.binom.pmf(k=7, n=10, p=0.5)
print(f"P(7 heads) via scipy: {p_7_scipy:.4f}")  # 0.1172
```

### Visualizing the Binomial Distribution

```python theme={null}
n = 10
p = 0.5

k_values = range(0, n + 1)
probabilities = [stats.binom.pmf(k, n, p) for k in k_values]

plt.figure(figsize=(10, 5))
plt.bar(k_values, probabilities, edgecolor='black', alpha=0.7)
plt.xlabel('Number of Heads')
plt.ylabel('Probability')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xticks(k_values)
plt.show()
```

### Real-World Example: Website Conversion

```python theme={null}
# Your website has a 3% conversion rate
# 100 people visit today
# What's the probability of 5 or more conversions?

n = 100
p = 0.03

# P(X >= 5) = 1 - P(X <= 4)
p_at_least_5 = 1 - stats.binom.cdf(4, n, p)
print(f"P(5+ conversions): {p_at_least_5:.1%}")  # 18.2%

# Expected conversions
expected = n * p
print(f"Expected conversions: {expected}")  # 3.0
```

<Accordion title="Practice: Quality Control">
  A factory produces items with a 2% defect rate. In a batch of 50 items:

  1. What's the probability of exactly 0 defects?
  2. What's the probability of 3 or more defects?
  3. How many defects do you expect?

  ```python theme={null}
  n, p = 50, 0.02

  # 1. P(X = 0)
  p_zero = stats.binom.pmf(0, n, p)
  print(f"P(0 defects): {p_zero:.1%}")  # 36.4%

  # 2. P(X >= 3)
  p_three_plus = 1 - stats.binom.cdf(2, n, p)
  print(f"P(3+ defects): {p_three_plus:.1%}")  # 7.8%

  # 3. Expected defects
  expected = n * p
  print(f"Expected defects: {expected}")  # 1.0
  ```
</Accordion>

***

## The Normal Distribution: The Bell Curve

This is the most important distribution in statistics. It appears everywhere:

* Human heights and weights
* Test scores
* Measurement errors
* Stock price changes
* IQ scores

**Parameters:**

* $\mu$ (mu) = mean (center of the bell)
* $\sigma$ (sigma) = standard deviation (width of the bell)

### Mathematical Formula

$$
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
$$

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/normal-math.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=4fe9e10b2bf9ea68bc697fc16280bb40" alt="Normal Distribution Formula and Shape" width="1080" height="1080" data-path="images/courses/statistics-for-ml/normal-math.svg" />
</Frame>

```python theme={null}
# Generate normally distributed data
mu = 100  # mean
sigma = 15  # standard deviation

# IQ scores follow this distribution
iq_scores = np.random.normal(mu, sigma, 10000)

plt.figure(figsize=(10, 5))
plt.hist(iq_scores, bins=50, density=True, alpha=0.7, edgecolor='black')

# Overlay theoretical curve
x = np.linspace(50, 150, 1000)
y = stats.norm.pdf(x, mu, sigma)
plt.plot(x, y, 'r-', linewidth=2, label='Theoretical')

plt.xlabel('IQ Score')
plt.ylabel('Probability Density')
plt.title('Normal Distribution of IQ Scores (μ=100, σ=15)')
plt.legend()
plt.show()
```

### The 68-95-99.7 Rule (Empirical Rule)

One of the most useful facts in statistics:

| Range  | Percentage of Data |
| ------ | ------------------ |
| μ ± 1σ | 68%                |
| μ ± 2σ | 95%                |
| μ ± 3σ | 99.7%              |

```python theme={null}
# Verify with IQ scores
within_1_std = np.mean(np.abs(iq_scores - mu) <= sigma)
within_2_std = np.mean(np.abs(iq_scores - mu) <= 2 * sigma)
within_3_std = np.mean(np.abs(iq_scores - mu) <= 3 * sigma)

print(f"Within 1 std (85-115): {within_1_std:.1%}")   # ~68%
print(f"Within 2 std (70-130): {within_2_std:.1%}")   # ~95%
print(f"Within 3 std (55-145): {within_3_std:.1%}")   # ~99.7%
```

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/statistics-for-ml/normal-real-world.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=a26f31eddfbcdcf27dcfde70f0892d0b" alt="68-95-99.7 Rule Applied to Heights" width="1080" height="1080" data-path="images/courses/statistics-for-ml/normal-real-world.svg" />
</Frame>

### Z-Scores: Standardizing Any Normal Distribution

A **z-score** tells you how many standard deviations a value is from the mean.

$$
z = \frac{x - \mu}{\sigma}
$$

```python theme={null}
def z_score(x, mu, sigma):
    """Convert value to z-score."""
    return (x - mu) / sigma

# How exceptional is an IQ of 130?
iq = 130
z = z_score(iq, mu=100, sigma=15)
print(f"IQ of 130 has z-score: {z:.2f}")  # 2.0

# This means 130 is 2 standard deviations above average
# Only about 2.3% of people score higher
percentile = stats.norm.cdf(z) * 100
print(f"Percentile: {percentile:.1f}%")  # 97.7%
```

### Calculating Probabilities

```python theme={null}
# Normal distribution with μ=100, σ=15

# P(IQ > 130)
p_above_130 = 1 - stats.norm.cdf(130, loc=100, scale=15)
print(f"P(IQ > 130): {p_above_130:.2%}")  # 2.28%

# P(IQ between 85 and 115)
p_middle = stats.norm.cdf(115, 100, 15) - stats.norm.cdf(85, 100, 15)
print(f"P(85 < IQ < 115): {p_middle:.2%}")  # 68.27%

# What IQ score is at the 99th percentile?
iq_99 = stats.norm.ppf(0.99, loc=100, scale=15)
print(f"99th percentile IQ: {iq_99:.1f}")  # 134.9
```

***

## Why Is the Normal Distribution Everywhere?

The **Central Limit Theorem** (CLT) explains this magic:

<Note>
  **Central Limit Theorem**: When you add up many independent random variables, their sum tends toward a normal distribution - regardless of the original distributions.
</Note>

### Demonstration

```python theme={null}
# Roll a single die - definitely NOT normal
single_die = np.random.randint(1, 7, 10000)

# Sum of 2 dice - starting to look different
sum_2_dice = np.array([np.random.randint(1, 7, 2).sum() for _ in range(10000)])

# Sum of 10 dice - getting bell-shaped
sum_10_dice = np.array([np.random.randint(1, 7, 10).sum() for _ in range(10000)])

# Sum of 30 dice - nearly perfect normal!
sum_30_dice = np.array([np.random.randint(1, 7, 30).sum() for _ in range(10000)])

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].hist(single_die, bins=6, edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Single Die (Uniform)')

axes[0, 1].hist(sum_2_dice, bins=11, edgecolor='black', alpha=0.7)
axes[0, 1].set_title('Sum of 2 Dice')

axes[1, 0].hist(sum_10_dice, bins=30, edgecolor='black', alpha=0.7)
axes[1, 0].set_title('Sum of 10 Dice')

axes[1, 1].hist(sum_30_dice, bins=40, edgecolor='black', alpha=0.7)
axes[1, 1].set_title('Sum of 30 Dice (Nearly Normal!)')

plt.tight_layout()
plt.show()
```

**This is why heights are normally distributed**: Height is determined by thousands of genes, each adding a small random effect. Sum of many small random things = normal distribution.

<Tip>
  **ML Application -- Why Batch Normalization Works**: The Central Limit Theorem is the hidden reason batch normalization is so effective in deep learning. Each layer in a neural network sums many weighted inputs -- and by CLT, those sums tend toward normality. Batch normalization exploits this by re-centering and re-scaling activations to a standard normal at each layer, stabilizing training and allowing higher learning rates. When someone asks "why does batch norm help?", the CLT is the statistical foundation of the answer.
</Tip>

<Warning>
  **Statistical Mistake in ML -- Assuming Normality of Features**: Many ML practitioners apply z-score standardization and assume their features are normally distributed. But real-world features like income, click counts, and session durations are often heavily skewed. Before standardizing, plot your distributions. For right-skewed data, a log transform before standardization often dramatically improves model performance -- especially for linear models and neural networks that implicitly assume symmetric inputs.
</Warning>

***

## Other Important Distributions

### Poisson Distribution: Rare Events Over Time

How many customers arrive per hour? How many defects per batch? How many emails per day?

**Parameter:** λ (lambda) = average rate of events

$$
P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
$$

```python theme={null}
# Average 5 customers per hour
lambda_rate = 5

# Probability of exactly 3 customers in an hour
p_3 = stats.poisson.pmf(3, lambda_rate)
print(f"P(3 customers): {p_3:.2%}")  # 14.04%

# Probability of 10 or more
p_10_plus = 1 - stats.poisson.cdf(9, lambda_rate)
print(f"P(10+ customers): {p_10_plus:.2%}")  # 3.18%

# Visualize
k_values = range(0, 15)
probs = [stats.poisson.pmf(k, lambda_rate) for k in k_values]

plt.figure(figsize=(10, 5))
plt.bar(k_values, probs, edgecolor='black', alpha=0.7)
plt.xlabel('Number of Customers')
plt.ylabel('Probability')
plt.title(f'Poisson Distribution (λ={lambda_rate})')
plt.show()
```

### Exponential Distribution: Time Between Events

If events occur at rate λ, how long until the next one?

```python theme={null}
# Average 5 customers per hour = 1 customer per 12 minutes average
lambda_rate = 5  # per hour
avg_wait = 60 / lambda_rate  # 12 minutes

# Probability of waiting more than 20 minutes
p_wait_20 = 1 - stats.expon.cdf(20, scale=avg_wait)
print(f"P(wait > 20 min): {p_wait_20:.2%}")  # 18.9%

# Time by which 90% of customers will have arrived
time_90 = stats.expon.ppf(0.90, scale=avg_wait)
print(f"90% arrive within: {time_90:.1f} minutes")  # 27.6 min
```

***

## Mini-Project: Quality Control System

Build a complete quality control system for the ball bearing factory.

```python theme={null}
import numpy as np
from scipy import stats

class QualityControlSystem:
    """
    Quality control system using normal distribution.
    """
    
    def __init__(self, target, std_dev, tolerance):
        """
        Initialize QC system.
        
        target: desired measurement (e.g., 10mm)
        std_dev: expected standard deviation in production
        tolerance: acceptable deviation from target (e.g., ±0.1mm)
        """
        self.target = target
        self.std_dev = std_dev
        self.tolerance = tolerance
        self.lower_limit = target - tolerance
        self.upper_limit = target + tolerance
        
    def expected_defect_rate(self):
        """Calculate expected percentage of out-of-spec products."""
        # P(X < lower) + P(X > upper)
        p_below = stats.norm.cdf(self.lower_limit, self.target, self.std_dev)
        p_above = 1 - stats.norm.cdf(self.upper_limit, self.target, self.std_dev)
        return p_below + p_above
    
    def analyze_batch(self, measurements):
        """Analyze a batch of measurements."""
        n = len(measurements)
        mean = np.mean(measurements)
        std = np.std(measurements)
        
        # Count out-of-spec
        defects = np.sum((measurements < self.lower_limit) | 
                         (measurements > self.upper_limit))
        defect_rate = defects / n
        
        # Check if process is in control
        # Mean should be within 2 standard errors of target
        std_error = std / np.sqrt(n)
        z_score = (mean - self.target) / std_error
        
        results = {
            'batch_size': n,
            'mean': mean,
            'std_dev': std,
            'defects': defects,
            'defect_rate': defect_rate,
            'z_score': z_score,
            'process_in_control': abs(z_score) < 2
        }
        
        return results
    
    def print_report(self, results):
        """Print a formatted QC report."""
        print("\n" + "=" * 50)
        print("QUALITY CONTROL REPORT")
        print("=" * 50)
        print(f"Batch Size: {results['batch_size']}")
        print(f"Target: {self.target:.4f} ± {self.tolerance:.4f}")
        print(f"Specification Limits: [{self.lower_limit:.4f}, {self.upper_limit:.4f}]")
        print("-" * 50)
        print(f"Batch Mean: {results['mean']:.4f}")
        print(f"Batch Std Dev: {results['std_dev']:.4f}")
        print(f"Defects: {results['defects']} ({results['defect_rate']:.2%})")
        print(f"Expected Defect Rate: {self.expected_defect_rate():.2%}")
        print("-" * 50)
        print(f"Z-Score: {results['z_score']:.2f}")
        status = "IN CONTROL" if results['process_in_control'] else "OUT OF CONTROL"
        print(f"Process Status: {status}")
        print("=" * 50)


# Create QC system
qc = QualityControlSystem(
    target=10.0,      # 10mm target diameter
    std_dev=0.05,     # 0.05mm expected variation
    tolerance=0.1     # ±0.1mm acceptable
)

# Expected defect rate
print(f"Expected defect rate: {qc.expected_defect_rate():.2%}")

# Simulate a good batch
np.random.seed(42)
good_batch = np.random.normal(10.0, 0.05, 100)
results = qc.analyze_batch(good_batch)
qc.print_report(results)

# Simulate a problematic batch (shifted mean)
bad_batch = np.random.normal(10.08, 0.05, 100)  # Mean shifted by 0.08mm
results_bad = qc.analyze_batch(bad_batch)
qc.print_report(results_bad)
```

**Output:**

```
Expected defect rate: 4.55%

==================================================
QUALITY CONTROL REPORT
==================================================
Batch Size: 100
Target: 10.0000 ± 0.1000
Specification Limits: [9.9000, 10.1000]
--------------------------------------------------
Batch Mean: 10.0024
Batch Std Dev: 0.0496
Defects: 4 (4.00%)
Expected Defect Rate: 4.55%
--------------------------------------------------
Z-Score: 0.49
Process Status: IN CONTROL
==================================================

==================================================
QUALITY CONTROL REPORT
==================================================
Batch Size: 100
Target: 10.0000 ± 0.1000
Specification Limits: [9.9000, 10.1000]
--------------------------------------------------
Batch Mean: 10.0822
Batch Std Dev: 0.0518
Defects: 33 (33.00%)
Expected Defect Rate: 4.55%
--------------------------------------------------
Z-Score: 15.88
Process Status: OUT OF CONTROL
==================================================
```

***

## Practice Exercises

### Exercise 1: Height Analysis

```python theme={null}
# Adult male heights in the US follow N(69.1, 2.9) inches
# (mean 69.1 inches, std dev 2.9 inches)

# Calculate:
# 1. What percentage of men are over 6 feet (72 inches)?
# 2. What percentage are between 5'6" (66 in) and 6'0" (72 in)?
# 3. How tall do you need to be to be in the top 5%?
# 4. What is the z-score for someone 6'4" (76 inches)?
```

<Accordion title="Solution">
  ```python theme={null}
  mu = 69.1
  sigma = 2.9

  # 1. P(height > 72)
  p_over_6ft = 1 - stats.norm.cdf(72, mu, sigma)
  print(f"Over 6 feet: {p_over_6ft:.1%}")  # 15.9%

  # 2. P(66 < height < 72)
  p_between = stats.norm.cdf(72, mu, sigma) - stats.norm.cdf(66, mu, sigma)
  print(f"Between 5'6\" and 6'0\": {p_between:.1%}")  # 71.0%

  # 3. Top 5% height
  top_5_height = stats.norm.ppf(0.95, mu, sigma)
  print(f"Top 5% starts at: {top_5_height:.1f} inches")  # 73.9 inches (6'2")

  # 4. Z-score for 6'4"
  z_76 = (76 - mu) / sigma
  print(f"Z-score for 6'4\": {z_76:.2f}")  # 2.38
  print(f"Percentile: {stats.norm.cdf(z_76) * 100:.1f}%")  # 99.1%
  ```
</Accordion>

### Exercise 2: Server Requests

```python theme={null}
# A web server receives an average of 100 requests per minute.
# Requests follow a Poisson distribution.

# Calculate:
# 1. P(exactly 100 requests in a minute)
# 2. P(more than 120 requests in a minute)
# 3. For capacity planning, what number of requests per minute
#    will only be exceeded 1% of the time?
```

<Accordion title="Solution">
  ```python theme={null}
  lambda_rate = 100

  # 1. P(X = 100)
  p_exactly_100 = stats.poisson.pmf(100, lambda_rate)
  print(f"P(exactly 100): {p_exactly_100:.2%}")  # 3.99%

  # 2. P(X > 120)
  p_over_120 = 1 - stats.poisson.cdf(120, lambda_rate)
  print(f"P(over 120): {p_over_120:.2%}")  # 1.79%

  # 3. 99th percentile
  capacity_99 = stats.poisson.ppf(0.99, lambda_rate)
  print(f"99% of minutes have fewer than {capacity_99:.0f} requests")  # 124
  ```
</Accordion>

***

## Common Mistakes to Avoid

<Warning>
  **Mistake 1: Assuming Everything is Normal**

  Not all data follows a normal distribution. Income data is heavily right-skewed. Time-to-event data often follows exponential distributions. Always visualize your data before assuming normality.
</Warning>

<Warning>
  **Mistake 2: Misusing the 68-95-99.7 Rule**

  This rule ONLY applies to normal distributions. Applying it to skewed data will give wrong answers. For non-normal data, use Chebyshev's inequality: at least 75% of data is within 2 std devs, regardless of distribution shape.
</Warning>

<Warning>
  **Mistake 3: Confusing PDF and CDF**

  The PDF gives the relative likelihood at a point (technically, density). The CDF gives the probability of being less than or equal to a value. P(X = exact value) is always 0 for continuous distributions.
</Warning>

***

## Interview Questions

<Accordion title="Question 1: Normal Distribution Application (Google)">
  **Question**: Website response times follow a normal distribution with mean 200ms and std dev 50ms. What percentage of requests take more than 300ms?

  <Tip>
    **Answer**: About 2.3%

    ```python theme={null}
    from scipy import stats
    p_slow = 1 - stats.norm.cdf(300, loc=200, scale=50)
    # Or using z-score: z = (300-200)/50 = 2
    # P(Z > 2) ≈ 0.0228
    print(f"{p_slow:.2%}")  # 2.28%
    ```

    The 68-95-99.7 rule gives us a quick check: 300ms is 2 standard deviations above mean, so roughly 2.5% should be above that.
  </Tip>
</Accordion>

<Accordion title="Question 2: Choosing the Right Distribution (Amazon)">
  **Question**: You're modeling these scenarios. Which distribution would you use for each?

  1. Number of customers arriving per hour
  2. Whether a user clicks an ad (yes/no)
  3. Time until a server fails
  4. Heights of basketball players

  <Tip>
    **Answer**:

    1. **Poisson** - Counts of events in fixed intervals
    2. **Bernoulli** (single trial) or **Binomial** (many users) - Binary outcomes
    3. **Exponential** - Time until an event (memoryless process)
    4. **Normal** - Continuous measurements of natural phenomena

    For height, you might also consider that basketball players are selected to be tall, so it could be a truncated normal!
  </Tip>
</Accordion>

<Accordion title="Question 3: Central Limit Theorem (Facebook/Meta)">
  **Question**: User session times are heavily right-skewed (not normal). You calculate the average session time each day for 30 days. What distribution does the sample mean follow?

  <Tip>
    **Answer**: Approximately normal!

    Thanks to the Central Limit Theorem, the sampling distribution of the mean will be approximately normal regardless of the underlying distribution shape, as long as:

    * Sample size is sufficiently large (n ≥ 30 is a common rule of thumb)
    * The original distribution has finite variance

    This is why we can use confidence intervals and hypothesis tests based on the normal distribution even when the underlying data isn't normal.
  </Tip>
</Accordion>

<Accordion title="Question 4: Percentiles in Practice (Netflix)">
  **Question**: Video start times follow a log-normal distribution (right-skewed). The P50 is 1.2 seconds and P95 is 4.8 seconds. What does this tell you about user experience?

  <Tip>
    **Answer**:

    * Half of users experience start times of 1.2s or less (good!)
    * 5% of users wait more than 4.8 seconds (potentially frustrating)
    * The ratio P95/P50 = 4 indicates significant variability

    For right-skewed metrics like latency, the P95 or P99 is often more important than the mean because it captures the experience of the "unlucky" users. A 4x difference between median and P95 suggests there are edge cases worth investigating (slow CDNs, distant users, etc.).
  </Tip>
</Accordion>

***

## Practice Challenge

<Accordion title="Challenge: Distribution Fitting">
  You have real website session data. Determine which distribution best fits it:

  ```python theme={null}
  import numpy as np
  from scipy import stats
  import matplotlib.pyplot as plt

  # Simulated session durations (in seconds)
  np.random.seed(42)
  sessions = np.random.exponential(scale=120, size=1000)  # Unknown to you!

  # Your task:
  # 1. Visualize the data with a histogram
  # 2. Calculate summary statistics
  # 3. Fit different distributions and compare
  # 4. Determine which distribution fits best

  # Hint: Try normal, exponential, and log-normal

  # Starter code:
  plt.figure(figsize=(12, 4))

  # Histogram
  plt.subplot(1, 3, 1)
  plt.hist(sessions, bins=50, density=True, alpha=0.7)
  plt.title('Data Distribution')
  plt.xlabel('Session Duration (s)')

  # Q-Q plot for normal
  plt.subplot(1, 3, 2)
  stats.probplot(sessions, dist="norm", plot=plt)
  plt.title('Normal Q-Q Plot')

  # Q-Q plot for exponential
  plt.subplot(1, 3, 3)
  stats.probplot(sessions, dist="expon", plot=plt)
  plt.title('Exponential Q-Q Plot')

  plt.tight_layout()
  plt.show()

  # Fit distributions and compare
  ```

  **Solution**:

  ```python theme={null}
  # 1. Visual inspection shows right-skewed data
  # 2. Summary stats
  print(f"Mean: {np.mean(sessions):.1f}s")
  print(f"Median: {np.median(sessions):.1f}s")
  print(f"Std: {np.std(sessions):.1f}s")
  print(f"Skewness: {stats.skew(sessions):.2f}")  # Positive = right-skewed

  # 3. Fit distributions
  # Normal
  norm_params = stats.norm.fit(sessions)
  # Exponential
  exp_params = stats.expon.fit(sessions)
  # Log-normal
  lognorm_params = stats.lognorm.fit(sessions)

  # 4. Compare using Kolmogorov-Smirnov test
  # Null hypothesis: data follows the distribution
  # Lower p-value = worse fit

  ks_norm = stats.kstest(sessions, 'norm', args=norm_params)
  ks_exp = stats.kstest(sessions, 'expon', args=exp_params)
  ks_lognorm = stats.kstest(sessions, 'lognorm', args=lognorm_params)

  print(f"\nKS Test p-values:")
  print(f"Normal: {ks_norm.pvalue:.4f}")      # Low - bad fit
  print(f"Exponential: {ks_exp.pvalue:.4f}")  # High - good fit!
  print(f"Log-normal: {ks_lognorm.pvalue:.4f}")

  # Exponential wins because mean ≈ std dev (property of exponential)
  ```
</Accordion>

***

## 📝 Practice Exercises

<CardGroup cols={2}>
  <Card title="Exercise 1" icon="bell" color="#3B82F6">
    Work with normal distribution and z-scores
  </Card>

  <Card title="Exercise 2" icon="chart-simple" color="#10B981">
    Apply binomial distribution to A/B testing
  </Card>

  <Card title="Exercise 3" icon="clock" color="#8B5CF6">
    Model customer arrivals with Poisson distribution
  </Card>

  <Card title="Exercise 4" icon="industry" color="#F59E0B">
    Real-world: Quality control with distributions
  </Card>
</CardGroup>

<details>
  <summary>**Exercise 1: SAT Scores Analysis** - Normal distribution and z-scores</summary>

  **Problem**: SAT scores are normally distributed with mean μ = 1050 and standard deviation σ = 200.

  1. What percentage of students score above 1250?
  2. What score puts a student in the top 10%?
  3. Between what scores do the middle 68% of students fall?
  4. If a student scores 1400, how many standard deviations above average are they?

  **Solution**:

  ```python theme={null}
  from scipy import stats
  import numpy as np

  mu = 1050  # mean
  sigma = 200  # standard deviation

  # 1. P(X > 1250)
  z = (1250 - mu) / sigma
  p_above_1250 = 1 - stats.norm.cdf(z)
  print(f"Z-score for 1250: {z}")  # 1.0
  print(f"P(Score > 1250): {p_above_1250:.2%}")  # 15.87%

  # Using scipy directly
  p_above_1250_direct = 1 - stats.norm.cdf(1250, loc=mu, scale=sigma)
  print(f"Direct calculation: {p_above_1250_direct:.2%}")

  # 2. Score for top 10% (90th percentile)
  top_10_score = stats.norm.ppf(0.90, loc=mu, scale=sigma)
  print(f"\nTop 10% threshold: {top_10_score:.0f}")  # 1306

  # 3. Middle 68% (empirical rule: mean ± 1 std dev)
  lower_68 = mu - sigma  # 850
  upper_68 = mu + sigma  # 1250
  print(f"\nMiddle 68%: {lower_68} to {upper_68}")

  # Verify with scipy
  lower_16 = stats.norm.ppf(0.16, loc=mu, scale=sigma)
  upper_84 = stats.norm.ppf(0.84, loc=mu, scale=sigma)
  print(f"Verified: {lower_16:.0f} to {upper_84:.0f}")

  # 4. Z-score for 1400
  z_1400 = (1400 - mu) / sigma
  print(f"\nZ-score for 1400: {z_1400:.2f}")  # 1.75
  print(f"This is {z_1400:.2f} standard deviations above the mean")
  print(f"Percentile: {stats.norm.cdf(z_1400)*100:.1f}%")  # 96th percentile
  ```
</details>

<details>
  <summary>**Exercise 2: A/B Test Analysis** - Binomial distribution</summary>

  **Problem**: You're running an A/B test for a website button color:

  * Control (Blue): 500 visitors, 45 conversions
  * Treatment (Green): 500 visitors, 58 conversions

  1. What's the conversion rate for each group?
  2. Using binomial distribution, what's P(≥58 conversions) if true rate is 9%?
  3. Is the difference likely due to chance?

  **Solution**:

  ```python theme={null}
  from scipy import stats
  import numpy as np

  # Data
  control_visitors, control_conversions = 500, 45
  treatment_visitors, treatment_conversions = 500, 58

  # 1. Conversion rates
  control_rate = control_conversions / control_visitors
  treatment_rate = treatment_conversions / treatment_visitors

  print(f"Control rate: {control_rate:.1%}")    # 9.0%
  print(f"Treatment rate: {treatment_rate:.1%}") # 11.6%
  print(f"Lift: {(treatment_rate/control_rate - 1)*100:.1f}%")  # 28.9% lift

  # 2. P(X >= 58) if true rate is 9% (null hypothesis)
  p_null = 0.09  # Assume true rate is same as control
  n = 500

  # P(X >= 58) = 1 - P(X <= 57)
  p_at_least_58 = 1 - stats.binom.cdf(57, n, p_null)
  print(f"\nP(≥58 conversions | rate=9%): {p_at_least_58:.4f}")  # ~4.6%

  # 3. Statistical significance (simplified)
  # If p < 0.05, the result is statistically significant
  if p_at_least_58 < 0.05:
      print("\nResult: Statistically significant! Green button likely better.")
  else:
      print("\nResult: Not significant. Could be chance variation.")

  # More rigorous: Two-proportion z-test
  # Pooled proportion
  pooled_p = (control_conversions + treatment_conversions) / (control_visitors + treatment_visitors)
  se = np.sqrt(pooled_p * (1-pooled_p) * (1/control_visitors + 1/treatment_visitors))
  z_stat = (treatment_rate - control_rate) / se
  p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

  print(f"\nZ-statistic: {z_stat:.3f}")
  print(f"P-value (two-tailed): {p_value:.4f}")
  ```
</details>

<details>
  <summary>**Exercise 3: Customer Arrivals** - Poisson distribution</summary>

  **Problem**: A coffee shop sees an average of 4 customers per minute during rush hour.

  1. What's P(exactly 6 customers) in a given minute?
  2. What's P(0 or 1 customers) in a given minute?
  3. What's P(more than 10 customers) in a 2-minute window?
  4. How many baristas needed if each can serve 5 customers/minute?

  **Solution**:

  ```python theme={null}
  from scipy import stats
  import numpy as np

  lambda_per_min = 4  # Average customers per minute

  # 1. P(X = 6)
  p_exactly_6 = stats.poisson.pmf(6, lambda_per_min)
  print(f"P(exactly 6 customers): {p_exactly_6:.4f}")  # ~10.4%

  # 2. P(X = 0 or X = 1) = P(X = 0) + P(X = 1)
  p_0_or_1 = stats.poisson.cdf(1, lambda_per_min)  # CDF gives P(X <= 1)
  print(f"P(0 or 1 customers): {p_0_or_1:.4f}")  # ~9.2%

  # 3. In 2-minute window, lambda = 4 * 2 = 8
  lambda_2min = 8
  p_more_than_10 = 1 - stats.poisson.cdf(10, lambda_2min)
  print(f"P(more than 10 in 2 min): {p_more_than_10:.4f}")  # ~18.4%

  # 4. Staffing analysis
  # Each barista serves 5 customers/minute
  # If we want to handle 95th percentile of demand...
  customers_95th = stats.poisson.ppf(0.95, lambda_per_min)
  print(f"\n95th percentile demand: {customers_95th:.0f} customers/min")

  baristas_needed = np.ceil(customers_95th / 5)
  print(f"Baristas needed (95% coverage): {baristas_needed:.0f}")

  # Probability of being overwhelmed with 1 vs 2 baristas
  p_overwhelm_1 = 1 - stats.poisson.cdf(5, lambda_per_min)  # >5 customers
  p_overwhelm_2 = 1 - stats.poisson.cdf(10, lambda_per_min)  # >10 customers

  print(f"\nP(overwhelmed with 1 barista): {p_overwhelm_1:.1%}")
  print(f"P(overwhelmed with 2 baristas): {p_overwhelm_2:.1%}")
  ```
</details>

<details>
  <summary>**Exercise 4: Manufacturing Quality Control** - Real-world application</summary>

  **Problem**: A factory produces bolts with target diameter 10mm and acceptable tolerance ±0.2mm. The machine produces bolts with μ = 10.02mm and σ = 0.08mm.

  1. What percentage of bolts are within specification (9.8mm to 10.2mm)?
  2. If you produce 10,000 bolts, how many are rejected?
  3. To reduce rejects to under 1%, what standard deviation is needed?
  4. Should you adjust the mean or reduce variability?

  **Solution**:

  ```python theme={null}
  from scipy import stats
  import numpy as np

  # Current machine parameters
  mu = 10.02  # Slightly off-center
  sigma = 0.08

  # Specifications
  lower_spec = 9.8
  upper_spec = 10.2

  # 1. Percentage within spec
  p_below_upper = stats.norm.cdf(upper_spec, loc=mu, scale=sigma)
  p_below_lower = stats.norm.cdf(lower_spec, loc=mu, scale=sigma)
  p_in_spec = p_below_upper - p_below_lower

  print(f"Current performance:")
  print(f"  Mean: {mu}mm (target: 10mm, off by {mu-10:.2f}mm)")
  print(f"  Std Dev: {sigma}mm")
  print(f"  % in spec: {p_in_spec:.2%}")  # ~97.4%

  # 2. Rejects from 10,000 bolts
  n_bolts = 10000
  n_rejects = n_bolts * (1 - p_in_spec)
  print(f"\nFrom {n_bolts:,} bolts: {n_rejects:.0f} rejected")

  # 3. What sigma needed for <1% rejects?
  target_reject_rate = 0.01
  target_in_spec = 1 - target_reject_rate

  # For centered distribution, need P(|X - μ| < 0.2) > 0.99
  # This means 0.5% in each tail
  # Z for 99.5th percentile ≈ 2.576
  z_required = stats.norm.ppf(1 - target_reject_rate/2)
  sigma_required = 0.2 / z_required  # If centered

  print(f"\nTo achieve <1% rejects (if centered):")
  print(f"  Z required: {z_required:.3f}")
  print(f"  Sigma required: {sigma_required:.4f}mm")

  # 4. Compare: Fix mean vs reduce variability
  # Option A: Center the mean (mu = 10.00)
  mu_centered = 10.0
  p_in_spec_centered = (stats.norm.cdf(upper_spec, loc=mu_centered, scale=sigma) -
                         stats.norm.cdf(lower_spec, loc=mu_centered, scale=sigma))

  # Option B: Keep mean, reduce sigma to 0.06
  sigma_reduced = 0.06
  p_in_spec_reduced_sigma = (stats.norm.cdf(upper_spec, loc=mu, scale=sigma_reduced) -
                              stats.norm.cdf(lower_spec, loc=mu, scale=sigma_reduced))

  print(f"\nImprovement options:")
  print(f"  Current: {p_in_spec:.2%} in spec")
  print(f"  Center mean (μ=10.00): {p_in_spec_centered:.2%} in spec")
  print(f"  Reduce σ to 0.06: {p_in_spec_reduced_sigma:.2%} in spec")
  print(f"\nRecommendation: Centering the mean is cheaper and very effective!")
  ```
</details>

***

## Key Takeaways

<CardGroup cols={2}>
  <Card title="Distribution Types" icon="chart-simple">
    * **Discrete**: Countable outcomes (die rolls, counts)
    * **Continuous**: Any value in a range (measurements)
    * Each distribution has parameters that define its shape
  </Card>

  <Card title="The Normal Distribution" icon="bell">
    * Defined by mean (μ) and standard deviation (σ)
    * 68-95-99.7 rule for quick calculations
    * Appears everywhere due to Central Limit Theorem
  </Card>

  <Card title="Key Distributions" icon="list">
    * **Uniform**: Equal probability (dice, random selection)
    * **Binomial**: Success/failure experiments (conversions, defects)
    * **Normal**: Continuous measurements (heights, errors)
    * **Poisson**: Count of rare events (arrivals, defects)
  </Card>

  <Card title="Z-Scores" icon="ruler">
    * Standardize any normal distribution
    * z = (x - μ) / σ
    * Allows comparison across different scales
    * Standard normal has μ=0, σ=1
  </Card>
</CardGroup>

***

## Interview Prep: Common Questions

<Accordion title="Distribution Interview Questions">
  **Q: When would you use Poisson vs Binomial distribution?**

  > Poisson: Counting events in continuous time/space where events are rare (website visits, defects). Binomial: Fixed number of trials with binary outcomes (10 coin flips, 100 users converting).

  **Q: How do you check if data is normally distributed?**

  > Visual: histogram, Q-Q plot. Statistical: Shapiro-Wilk test, Anderson-Darling test. Rule of thumb: Check skewness (\< 2) and kurtosis (\< 7).

  **Q: What is the Central Limit Theorem and why does it matter?**

  > CLT states that sample means approach a normal distribution regardless of population distribution, given large enough samples (n ≥ 30). It's why we can use normal-based methods even when data isn't normally distributed.

  **Q: A process has 2% defect rate. What distribution models the number of defects in a batch of 50?**

  > Binomial with n=50, p=0.02. Expected defects = np = 1. Could approximate with Poisson(λ=1) since n is large and p is small.
</Accordion>

***

## Common Pitfalls

<Warning>
  **Distribution Mistakes to Avoid**:

  1. **Assuming Normality** - Always check; many real-world distributions are skewed or heavy-tailed
  2. **Confusing Parameters** - Variance (σ²) vs Standard Deviation (σ); Population vs Sample
  3. **Ignoring Distribution Shape** - Mean/std alone don't fully describe a distribution; visualize first
  4. **Wrong Distribution Choice** - Using normal for bounded data, using binomial for continuous outcomes
  5. **CLT Misapplication** - CLT applies to sample means, not individual observations
</Warning>

***

## Connection to Machine Learning

| Distribution Concept  | ML Application                                       |
| --------------------- | ---------------------------------------------------- |
| Normal distribution   | Gaussian noise, regularization, Gaussian Naive Bayes |
| Central Limit Theorem | Why batch statistics work, confidence in predictions |
| Z-scores              | Feature standardization, batch normalization         |
| Binomial              | Classification evaluation, confidence intervals      |
| Poisson               | Count prediction, event modeling                     |

<Tip>
  **ML Connection**: When you see "Gaussian" in ML papers, it means "normal distribution." Gaussian processes, Gaussian mixture models, and Gaussian noise all rely on properties of the normal distribution you just learned!
</Tip>

**Coming up next**: We'll learn about **Statistical Inference** - how to draw conclusions about entire populations from just samples. This is how polls predict elections and A/B tests drive decisions.

<Card title="Next: Statistical Inference" icon="arrow-right" href="/courses/statistics-for-ml/05-inference">
  Learn to draw conclusions from limited data
</Card>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="You are modeling customer support ticket arrivals. How do you decide between Poisson, Binomial, and Normal distributions?">
    **Strong Answer:**

    * The choice depends on the nature of the data-generating process, not on what the histogram looks like. Poisson is the right choice when you are counting events in a continuous interval (tickets per hour) where events are independent and occur at a roughly constant rate. It has one parameter (lambda, the average rate) and its variance equals its mean.
    * Binomial is correct when you have a fixed number of discrete trials each with a binary outcome -- for example, "out of 500 customers who contacted us, how many submitted a ticket?" It requires knowing the number of trials and the success probability.
    * Normal might be appropriate if you are looking at the average number of tickets per day over many days. By the Central Limit Theorem, the daily averages will be approximately normal even if individual arrivals follow a Poisson process. But you would not use normal for the raw counts because counts cannot be negative, and the normal distribution extends to negative infinity.
    * In practice, I would start by checking whether the mean and variance of the ticket counts are roughly equal. If they are, Poisson is a good fit. If the variance is much larger than the mean (overdispersion), I would consider a Negative Binomial distribution instead, which adds a dispersion parameter. Overdispersion is extremely common in real ticket data because arrival rates are not actually constant -- they vary by time of day, day of week, and whether there was a product incident.

    **Follow-up: Your ticket data shows variance that is 4x the mean. What does this tell you and how do you handle it?**

    Variance much larger than the mean is overdispersion, and it means the Poisson assumption is violated. This typically happens because the arrival rate itself is not constant -- it varies over time or across customer segments. Using Poisson in this situation would underestimate the probability of extreme counts (many tickets or zero tickets) and give overly narrow prediction intervals. The fix is to use a Negative Binomial distribution, which explicitly models this extra variation. Alternatively, you can build a hierarchical model: the arrival rate lambda follows a Gamma distribution across time periods, and conditional on lambda, counts follow a Poisson. This is actually mathematically equivalent to the Negative Binomial and gives you a richer understanding of what is driving the overdispersion.
  </Accordion>

  <Accordion title="Explain the Central Limit Theorem to a non-technical product manager, and then explain why it matters for A/B testing.">
    **Strong Answer:**

    * For the product manager: "Imagine you survey 100 random customers and compute the average satisfaction score. If you repeated that survey many times, each time with a different random 100 customers, those averages would form a bell curve -- even if individual satisfaction scores are not bell-shaped at all. The Central Limit Theorem says that averages of random samples become predictable and bell-shaped as long as your sample is large enough. That is why we can compute a margin of error on any survey or test result."
    * For the technical layer: the CLT states that the sampling distribution of the sample mean converges to a normal distribution as sample size increases, regardless of the population distribution, provided the population has finite variance. The rate of convergence depends on how "non-normal" the underlying distribution is -- highly skewed distributions need larger n.
    * For A/B testing specifically, the CLT is the entire foundation. When you compare conversion rates between two groups, each conversion rate is a sample mean (of a Bernoulli variable). The CLT guarantees that the difference between these means is approximately normally distributed, which is why you can use a z-test to compute a p-value. Without the CLT, you would need to know the exact distribution of your metric to do any hypothesis testing.
    * The practical caveat: the CLT needs "large enough" samples, and "large enough" depends on the distribution. For proportions near 0.5, n=30 is usually fine. For proportions near 0.01 (like conversion rates), you might need n=500 or more before the normal approximation is accurate. This is why very low conversion rate tests need more traffic.

    **Follow-up: When does the CLT fail or give misleading results, even with a large sample?**

    The CLT fails when the underlying distribution does not have a finite variance. The canonical example is a Cauchy distribution (heavy-tailed), where the sample mean does not converge to anything normal no matter how many samples you take. In practice, this matters for financial data -- stock returns have heavier tails than normal, and models that assume normal distributions (like VaR) systematically underestimate tail risk. The 2008 financial crisis was partly caused by this exact mistake. Another practical failure mode is when your data has structural dependencies that violate the "independent and identically distributed" assumption -- like time-series data with autocorrelation or clustered data where observations within a cluster are correlated. In those cases, the effective sample size is much smaller than the nominal sample size, and the CLT-based confidence intervals are too narrow.
  </Accordion>

  <Accordion title="A manufacturing line has a 2% defect rate. You test 500 items and find 18 defects (3.6%). Should you stop the line?">
    **Strong Answer:**

    * Before stopping the line (which is expensive), I need to determine if 18 defects in 500 is statistically inconsistent with the expected 2% rate. Under the null hypothesis of 2%, the expected number of defects is 10, and the standard deviation is sqrt(500 x 0.02 x 0.98) = approximately 3.13.
    * The z-score for 18 defects is (18 - 10) / 3.13 = 2.56, giving a one-tailed p-value of about 0.005. This is well below the typical 0.05 threshold. So statistically, yes, 18 defects is very unlikely if the true rate is still 2%.
    * However, the statistical answer is only half the decision. I would also consider: Is this a sudden spike or a gradual trend? (Check a control chart for the last several batches.) What is the cost of stopping the line versus the cost of shipping defective products? Is there a known assignable cause (like a new material batch or a maintenance event)?
    * In a Six Sigma framework, this would trigger an investigation but not necessarily an immediate line stop. I would pull the last 5 batches of data and look at a Shewhart control chart. If the process mean has shifted (as opposed to one unlucky batch), that warrants corrective action. If this is a single batch anomaly, the response might be different -- inspect remaining inventory from this batch rather than shutting everything down.

    **Follow-up: What is the difference between using a binomial exact test versus a normal approximation here, and when does it matter?**

    For n=500 and p=0.02, the normal approximation is adequate because np=10 and n(1-p)=490 are both greater than 5. The binomial exact test would give P(X >= 18 given n=500, p=0.02) directly without the normal approximation. The two answers will be close -- typically within 0.1% of each other at this sample size. The exact test matters when either np or n(1-p) is small, which happens with very rare events (like a 0.01% defect rate tested on 100 items). In those cases, the normal approximation can be meaningfully wrong, and you should use the exact binomial or a Poisson approximation instead. In modern practice, there is little reason not to use the exact test since computational cost is negligible, but understanding when the approximation breaks helps you catch errors in older tools that default to normal.
  </Accordion>
</AccordionGroup>
