Skip to main content
Probability Distributions

Probability Distributions: Patterns in Randomness

The Factory Quality Problem

You run a factory that produces ball bearings. Each bearing should be exactly 10mm in diameter. But manufacturing isn’t perfect - there’s always some variation. You measure 1000 bearings and get:
import numpy as np
import matplotlib.pyplot as plt

# Simulated bearing diameters (mm)
np.random.seed(42)
bearings = np.random.normal(loc=10.0, scale=0.05, size=1000)

print(f"Mean diameter: {np.mean(bearings):.4f} mm")
print(f"Std deviation: {np.std(bearings):.4f} mm")
print(f"Min: {np.min(bearings):.4f} mm")
print(f"Max: {np.max(bearings):.4f} mm")
Output:
Mean diameter: 10.0024 mm
Std deviation: 0.0498 mm
Min: 9.8521 mm
Max: 10.1534 mm
If you plot these measurements, something magical appears:
plt.figure(figsize=(10, 5))
plt.hist(bearings, bins=50, density=True, alpha=0.7, edgecolor='black')
plt.xlabel('Diameter (mm)')
plt.ylabel('Frequency')
plt.title('Distribution of Ball Bearing Diameters')
plt.axvline(10.0, color='red', linestyle='--', label='Target: 10mm')
plt.legend()
plt.show()
A bell curve emerges. This isn’t coincidence - it’s one of the most profound patterns in nature.
Key Probability Distributions for ML
Estimated Time: 3-4 hours
Difficulty: Beginner
Prerequisites: Modules 1-2 (Describing Data, Probability)
What You’ll Build: Quality control system, prediction intervals

What Is a Probability Distribution?

A probability distribution describes all possible values a random variable can take and how likely each value is. Think of it as a complete map of possibilities.

Discrete vs Continuous

TypeDescriptionExamples
DiscreteCountable outcomesCoin flips, dice rolls, number of customers
ContinuousInfinite possible valuesHeight, weight, temperature, time
# Discrete: Number of heads in 10 coin flips
# Can only be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10

# Continuous: Height of a person
# Can be 170.0 cm, 170.1 cm, 170.01 cm, 170.001 cm...
Discrete vs Continuous Distributions

The Uniform Distribution: Equal Chances

The simplest distribution - every outcome is equally likely.

Discrete Uniform: The Fair Die

import numpy as np
from collections import Counter

# Roll a fair die 10000 times
rolls = np.random.randint(1, 7, size=10000)

counts = Counter(rolls)
for face in sorted(counts.keys()):
    pct = counts[face] / 10000 * 100
    print(f"Face {face}: {counts[face]:4d} ({pct:.1f}%)")
Output:
Face 1: 1652 (16.5%)
Face 2: 1689 (16.9%)
Face 3: 1634 (16.3%)
Face 4: 1701 (17.0%)
Face 5: 1658 (16.6%)
Face 6: 1666 (16.7%)
Each face appears roughly 16.67% (1/6) of the time.
Uniform Distribution - Dice and Lottery

Continuous Uniform: Random Numbers

# Random time a customer arrives between 9:00 and 10:00 AM
arrival_minutes = np.random.uniform(0, 60, size=1000)

print(f"Mean arrival: {np.mean(arrival_minutes):.1f} minutes after 9:00")
print(f"P(arrive in first 15 min): {np.mean(arrival_minutes < 15):.1%}")
ML Applications:
  • Random weight initialization
  • Data augmentation (random crops, rotations)
  • Monte Carlo simulations

The Binomial Distribution: Success/Failure Experiments

When you repeat an experiment with two outcomes (success/failure) multiple times. Parameters:
  • n = number of trials
  • p = probability of success on each trial
Question: If you flip a coin 10 times, what’s the probability of getting exactly 7 heads?

Mathematical Formula

P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1-p)^{n-k} Where (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!} is “n choose k”
from scipy import stats
import math

def binomial_probability(n, k, p):
    """Calculate P(X = k) for binomial distribution."""
    # n choose k
    combinations = math.factorial(n) / (math.factorial(k) * math.factorial(n - k))
    # Probability
    return combinations * (p ** k) * ((1 - p) ** (n - k))

# P(exactly 7 heads in 10 flips)
p_7_heads = binomial_probability(n=10, k=7, p=0.5)
print(f"P(7 heads in 10 flips): {p_7_heads:.4f}")  # 0.1172

# Using scipy
p_7_scipy = stats.binom.pmf(k=7, n=10, p=0.5)
print(f"P(7 heads) via scipy: {p_7_scipy:.4f}")  # 0.1172

Visualizing the Binomial Distribution

n = 10
p = 0.5

k_values = range(0, n + 1)
probabilities = [stats.binom.pmf(k, n, p) for k in k_values]

plt.figure(figsize=(10, 5))
plt.bar(k_values, probabilities, edgecolor='black', alpha=0.7)
plt.xlabel('Number of Heads')
plt.ylabel('Probability')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xticks(k_values)
plt.show()

Real-World Example: Website Conversion

# Your website has a 3% conversion rate
# 100 people visit today
# What's the probability of 5 or more conversions?

n = 100
p = 0.03

# P(X >= 5) = 1 - P(X <= 4)
p_at_least_5 = 1 - stats.binom.cdf(4, n, p)
print(f"P(5+ conversions): {p_at_least_5:.1%}")  # 18.2%

# Expected conversions
expected = n * p
print(f"Expected conversions: {expected}")  # 3.0
A factory produces items with a 2% defect rate. In a batch of 50 items:
  1. What’s the probability of exactly 0 defects?
  2. What’s the probability of 3 or more defects?
  3. How many defects do you expect?
n, p = 50, 0.02

# 1. P(X = 0)
p_zero = stats.binom.pmf(0, n, p)
print(f"P(0 defects): {p_zero:.1%}")  # 36.4%

# 2. P(X >= 3)
p_three_plus = 1 - stats.binom.cdf(2, n, p)
print(f"P(3+ defects): {p_three_plus:.1%}")  # 7.8%

# 3. Expected defects
expected = n * p
print(f"Expected defects: {expected}")  # 1.0

The Normal Distribution: The Bell Curve

This is the most important distribution in statistics. It appears everywhere:
  • Human heights and weights
  • Test scores
  • Measurement errors
  • Stock price changes
  • IQ scores
Parameters:
  • μ\mu (mu) = mean (center of the bell)
  • σ\sigma (sigma) = standard deviation (width of the bell)

Mathematical Formula

f(x)=1σ2πe12(xμσ)2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
Normal Distribution Formula and Shape
# Generate normally distributed data
mu = 100  # mean
sigma = 15  # standard deviation

# IQ scores follow this distribution
iq_scores = np.random.normal(mu, sigma, 10000)

plt.figure(figsize=(10, 5))
plt.hist(iq_scores, bins=50, density=True, alpha=0.7, edgecolor='black')

# Overlay theoretical curve
x = np.linspace(50, 150, 1000)
y = stats.norm.pdf(x, mu, sigma)
plt.plot(x, y, 'r-', linewidth=2, label='Theoretical')

plt.xlabel('IQ Score')
plt.ylabel('Probability Density')
plt.title('Normal Distribution of IQ Scores (μ=100, σ=15)')
plt.legend()
plt.show()

The 68-95-99.7 Rule (Empirical Rule)

One of the most useful facts in statistics:
RangePercentage of Data
μ ± 1σ68%
μ ± 2σ95%
μ ± 3σ99.7%
# Verify with IQ scores
within_1_std = np.mean(np.abs(iq_scores - mu) <= sigma)
within_2_std = np.mean(np.abs(iq_scores - mu) <= 2 * sigma)
within_3_std = np.mean(np.abs(iq_scores - mu) <= 3 * sigma)

print(f"Within 1 std (85-115): {within_1_std:.1%}")   # ~68%
print(f"Within 2 std (70-130): {within_2_std:.1%}")   # ~95%
print(f"Within 3 std (55-145): {within_3_std:.1%}")   # ~99.7%
68-95-99.7 Rule Applied to Heights

Z-Scores: Standardizing Any Normal Distribution

A z-score tells you how many standard deviations a value is from the mean. z=xμσz = \frac{x - \mu}{\sigma}
def z_score(x, mu, sigma):
    """Convert value to z-score."""
    return (x - mu) / sigma

# How exceptional is an IQ of 130?
iq = 130
z = z_score(iq, mu=100, sigma=15)
print(f"IQ of 130 has z-score: {z:.2f}")  # 2.0

# This means 130 is 2 standard deviations above average
# Only about 2.3% of people score higher
percentile = stats.norm.cdf(z) * 100
print(f"Percentile: {percentile:.1f}%")  # 97.7%

Calculating Probabilities

# Normal distribution with μ=100, σ=15

# P(IQ > 130)
p_above_130 = 1 - stats.norm.cdf(130, loc=100, scale=15)
print(f"P(IQ > 130): {p_above_130:.2%}")  # 2.28%

# P(IQ between 85 and 115)
p_middle = stats.norm.cdf(115, 100, 15) - stats.norm.cdf(85, 100, 15)
print(f"P(85 < IQ < 115): {p_middle:.2%}")  # 68.27%

# What IQ score is at the 99th percentile?
iq_99 = stats.norm.ppf(0.99, loc=100, scale=15)
print(f"99th percentile IQ: {iq_99:.1f}")  # 134.9

Why Is the Normal Distribution Everywhere?

The Central Limit Theorem (CLT) explains this magic:
Central Limit Theorem: When you add up many independent random variables, their sum tends toward a normal distribution - regardless of the original distributions.

Demonstration

# Roll a single die - definitely NOT normal
single_die = np.random.randint(1, 7, 10000)

# Sum of 2 dice - starting to look different
sum_2_dice = np.array([np.random.randint(1, 7, 2).sum() for _ in range(10000)])

# Sum of 10 dice - getting bell-shaped
sum_10_dice = np.array([np.random.randint(1, 7, 10).sum() for _ in range(10000)])

# Sum of 30 dice - nearly perfect normal!
sum_30_dice = np.array([np.random.randint(1, 7, 30).sum() for _ in range(10000)])

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].hist(single_die, bins=6, edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Single Die (Uniform)')

axes[0, 1].hist(sum_2_dice, bins=11, edgecolor='black', alpha=0.7)
axes[0, 1].set_title('Sum of 2 Dice')

axes[1, 0].hist(sum_10_dice, bins=30, edgecolor='black', alpha=0.7)
axes[1, 0].set_title('Sum of 10 Dice')

axes[1, 1].hist(sum_30_dice, bins=40, edgecolor='black', alpha=0.7)
axes[1, 1].set_title('Sum of 30 Dice (Nearly Normal!)')

plt.tight_layout()
plt.show()
This is why heights are normally distributed: Height is determined by thousands of genes, each adding a small random effect. Sum of many small random things = normal distribution.

Other Important Distributions

Poisson Distribution: Rare Events Over Time

How many customers arrive per hour? How many defects per batch? How many emails per day? Parameter: λ (lambda) = average rate of events P(X=k)=λkeλk!P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
# Average 5 customers per hour
lambda_rate = 5

# Probability of exactly 3 customers in an hour
p_3 = stats.poisson.pmf(3, lambda_rate)
print(f"P(3 customers): {p_3:.2%}")  # 14.04%

# Probability of 10 or more
p_10_plus = 1 - stats.poisson.cdf(9, lambda_rate)
print(f"P(10+ customers): {p_10_plus:.2%}")  # 3.18%

# Visualize
k_values = range(0, 15)
probs = [stats.poisson.pmf(k, lambda_rate) for k in k_values]

plt.figure(figsize=(10, 5))
plt.bar(k_values, probs, edgecolor='black', alpha=0.7)
plt.xlabel('Number of Customers')
plt.ylabel('Probability')
plt.title(f'Poisson Distribution (λ={lambda_rate})')
plt.show()

Exponential Distribution: Time Between Events

If events occur at rate λ, how long until the next one?
# Average 5 customers per hour = 1 customer per 12 minutes average
lambda_rate = 5  # per hour
avg_wait = 60 / lambda_rate  # 12 minutes

# Probability of waiting more than 20 minutes
p_wait_20 = 1 - stats.expon.cdf(20, scale=avg_wait)
print(f"P(wait > 20 min): {p_wait_20:.2%}")  # 18.9%

# Time by which 90% of customers will have arrived
time_90 = stats.expon.ppf(0.90, scale=avg_wait)
print(f"90% arrive within: {time_90:.1f} minutes")  # 27.6 min

Mini-Project: Quality Control System

Build a complete quality control system for the ball bearing factory.
import numpy as np
from scipy import stats

class QualityControlSystem:
    """
    Quality control system using normal distribution.
    """
    
    def __init__(self, target, std_dev, tolerance):
        """
        Initialize QC system.
        
        target: desired measurement (e.g., 10mm)
        std_dev: expected standard deviation in production
        tolerance: acceptable deviation from target (e.g., ±0.1mm)
        """
        self.target = target
        self.std_dev = std_dev
        self.tolerance = tolerance
        self.lower_limit = target - tolerance
        self.upper_limit = target + tolerance
        
    def expected_defect_rate(self):
        """Calculate expected percentage of out-of-spec products."""
        # P(X < lower) + P(X > upper)
        p_below = stats.norm.cdf(self.lower_limit, self.target, self.std_dev)
        p_above = 1 - stats.norm.cdf(self.upper_limit, self.target, self.std_dev)
        return p_below + p_above
    
    def analyze_batch(self, measurements):
        """Analyze a batch of measurements."""
        n = len(measurements)
        mean = np.mean(measurements)
        std = np.std(measurements)
        
        # Count out-of-spec
        defects = np.sum((measurements < self.lower_limit) | 
                         (measurements > self.upper_limit))
        defect_rate = defects / n
        
        # Check if process is in control
        # Mean should be within 2 standard errors of target
        std_error = std / np.sqrt(n)
        z_score = (mean - self.target) / std_error
        
        results = {
            'batch_size': n,
            'mean': mean,
            'std_dev': std,
            'defects': defects,
            'defect_rate': defect_rate,
            'z_score': z_score,
            'process_in_control': abs(z_score) < 2
        }
        
        return results
    
    def print_report(self, results):
        """Print a formatted QC report."""
        print("\n" + "=" * 50)
        print("QUALITY CONTROL REPORT")
        print("=" * 50)
        print(f"Batch Size: {results['batch_size']}")
        print(f"Target: {self.target:.4f} ± {self.tolerance:.4f}")
        print(f"Specification Limits: [{self.lower_limit:.4f}, {self.upper_limit:.4f}]")
        print("-" * 50)
        print(f"Batch Mean: {results['mean']:.4f}")
        print(f"Batch Std Dev: {results['std_dev']:.4f}")
        print(f"Defects: {results['defects']} ({results['defect_rate']:.2%})")
        print(f"Expected Defect Rate: {self.expected_defect_rate():.2%}")
        print("-" * 50)
        print(f"Z-Score: {results['z_score']:.2f}")
        status = "IN CONTROL" if results['process_in_control'] else "OUT OF CONTROL"
        print(f"Process Status: {status}")
        print("=" * 50)


# Create QC system
qc = QualityControlSystem(
    target=10.0,      # 10mm target diameter
    std_dev=0.05,     # 0.05mm expected variation
    tolerance=0.1     # ±0.1mm acceptable
)

# Expected defect rate
print(f"Expected defect rate: {qc.expected_defect_rate():.2%}")

# Simulate a good batch
np.random.seed(42)
good_batch = np.random.normal(10.0, 0.05, 100)
results = qc.analyze_batch(good_batch)
qc.print_report(results)

# Simulate a problematic batch (shifted mean)
bad_batch = np.random.normal(10.08, 0.05, 100)  # Mean shifted by 0.08mm
results_bad = qc.analyze_batch(bad_batch)
qc.print_report(results_bad)
Output:
Expected defect rate: 4.55%

==================================================
QUALITY CONTROL REPORT
==================================================
Batch Size: 100
Target: 10.0000 ± 0.1000
Specification Limits: [9.9000, 10.1000]
--------------------------------------------------
Batch Mean: 10.0024
Batch Std Dev: 0.0496
Defects: 4 (4.00%)
Expected Defect Rate: 4.55%
--------------------------------------------------
Z-Score: 0.49
Process Status: IN CONTROL
==================================================

==================================================
QUALITY CONTROL REPORT
==================================================
Batch Size: 100
Target: 10.0000 ± 0.1000
Specification Limits: [9.9000, 10.1000]
--------------------------------------------------
Batch Mean: 10.0822
Batch Std Dev: 0.0518
Defects: 33 (33.00%)
Expected Defect Rate: 4.55%
--------------------------------------------------
Z-Score: 15.88
Process Status: OUT OF CONTROL
==================================================

Practice Exercises

Exercise 1: Height Analysis

# Adult male heights in the US follow N(69.1, 2.9) inches
# (mean 69.1 inches, std dev 2.9 inches)

# Calculate:
# 1. What percentage of men are over 6 feet (72 inches)?
# 2. What percentage are between 5'6" (66 in) and 6'0" (72 in)?
# 3. How tall do you need to be to be in the top 5%?
# 4. What is the z-score for someone 6'4" (76 inches)?
mu = 69.1
sigma = 2.9

# 1. P(height > 72)
p_over_6ft = 1 - stats.norm.cdf(72, mu, sigma)
print(f"Over 6 feet: {p_over_6ft:.1%}")  # 15.9%

# 2. P(66 < height < 72)
p_between = stats.norm.cdf(72, mu, sigma) - stats.norm.cdf(66, mu, sigma)
print(f"Between 5'6\" and 6'0\": {p_between:.1%}")  # 71.0%

# 3. Top 5% height
top_5_height = stats.norm.ppf(0.95, mu, sigma)
print(f"Top 5% starts at: {top_5_height:.1f} inches")  # 73.9 inches (6'2")

# 4. Z-score for 6'4"
z_76 = (76 - mu) / sigma
print(f"Z-score for 6'4\": {z_76:.2f}")  # 2.38
print(f"Percentile: {stats.norm.cdf(z_76) * 100:.1f}%")  # 99.1%

Exercise 2: Server Requests

# A web server receives an average of 100 requests per minute.
# Requests follow a Poisson distribution.

# Calculate:
# 1. P(exactly 100 requests in a minute)
# 2. P(more than 120 requests in a minute)
# 3. For capacity planning, what number of requests per minute
#    will only be exceeded 1% of the time?
lambda_rate = 100

# 1. P(X = 100)
p_exactly_100 = stats.poisson.pmf(100, lambda_rate)
print(f"P(exactly 100): {p_exactly_100:.2%}")  # 3.99%

# 2. P(X > 120)
p_over_120 = 1 - stats.poisson.cdf(120, lambda_rate)
print(f"P(over 120): {p_over_120:.2%}")  # 1.79%

# 3. 99th percentile
capacity_99 = stats.poisson.ppf(0.99, lambda_rate)
print(f"99% of minutes have fewer than {capacity_99:.0f} requests")  # 124

Common Mistakes to Avoid

Mistake 1: Assuming Everything is NormalNot all data follows a normal distribution. Income data is heavily right-skewed. Time-to-event data often follows exponential distributions. Always visualize your data before assuming normality.
Mistake 2: Misusing the 68-95-99.7 RuleThis rule ONLY applies to normal distributions. Applying it to skewed data will give wrong answers. For non-normal data, use Chebyshev’s inequality: at least 75% of data is within 2 std devs, regardless of distribution shape.
Mistake 3: Confusing PDF and CDFThe PDF gives the relative likelihood at a point (technically, density). The CDF gives the probability of being less than or equal to a value. P(X = exact value) is always 0 for continuous distributions.

Interview Questions

Question: Website response times follow a normal distribution with mean 200ms and std dev 50ms. What percentage of requests take more than 300ms?
Answer: About 2.3%
from scipy import stats
p_slow = 1 - stats.norm.cdf(300, loc=200, scale=50)
# Or using z-score: z = (300-200)/50 = 2
# P(Z > 2) ≈ 0.0228
print(f"{p_slow:.2%}")  # 2.28%
The 68-95-99.7 rule gives us a quick check: 300ms is 2 standard deviations above mean, so roughly 2.5% should be above that.
Question: You’re modeling these scenarios. Which distribution would you use for each?
  1. Number of customers arriving per hour
  2. Whether a user clicks an ad (yes/no)
  3. Time until a server fails
  4. Heights of basketball players
Answer:
  1. Poisson - Counts of events in fixed intervals
  2. Bernoulli (single trial) or Binomial (many users) - Binary outcomes
  3. Exponential - Time until an event (memoryless process)
  4. Normal - Continuous measurements of natural phenomena
For height, you might also consider that basketball players are selected to be tall, so it could be a truncated normal!
Question: User session times are heavily right-skewed (not normal). You calculate the average session time each day for 30 days. What distribution does the sample mean follow?
Answer: Approximately normal!Thanks to the Central Limit Theorem, the sampling distribution of the mean will be approximately normal regardless of the underlying distribution shape, as long as:
  • Sample size is sufficiently large (n ≥ 30 is a common rule of thumb)
  • The original distribution has finite variance
This is why we can use confidence intervals and hypothesis tests based on the normal distribution even when the underlying data isn’t normal.
Question: Video start times follow a log-normal distribution (right-skewed). The P50 is 1.2 seconds and P95 is 4.8 seconds. What does this tell you about user experience?
Answer:
  • Half of users experience start times of 1.2s or less (good!)
  • 5% of users wait more than 4.8 seconds (potentially frustrating)
  • The ratio P95/P50 = 4 indicates significant variability
For right-skewed metrics like latency, the P95 or P99 is often more important than the mean because it captures the experience of the “unlucky” users. A 4x difference between median and P95 suggests there are edge cases worth investigating (slow CDNs, distant users, etc.).

Practice Challenge

You have real website session data. Determine which distribution best fits it:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Simulated session durations (in seconds)
np.random.seed(42)
sessions = np.random.exponential(scale=120, size=1000)  # Unknown to you!

# Your task:
# 1. Visualize the data with a histogram
# 2. Calculate summary statistics
# 3. Fit different distributions and compare
# 4. Determine which distribution fits best

# Hint: Try normal, exponential, and log-normal

# Starter code:
plt.figure(figsize=(12, 4))

# Histogram
plt.subplot(1, 3, 1)
plt.hist(sessions, bins=50, density=True, alpha=0.7)
plt.title('Data Distribution')
plt.xlabel('Session Duration (s)')

# Q-Q plot for normal
plt.subplot(1, 3, 2)
stats.probplot(sessions, dist="norm", plot=plt)
plt.title('Normal Q-Q Plot')

# Q-Q plot for exponential
plt.subplot(1, 3, 3)
stats.probplot(sessions, dist="expon", plot=plt)
plt.title('Exponential Q-Q Plot')

plt.tight_layout()
plt.show()

# Fit distributions and compare
Solution:
# 1. Visual inspection shows right-skewed data
# 2. Summary stats
print(f"Mean: {np.mean(sessions):.1f}s")
print(f"Median: {np.median(sessions):.1f}s")
print(f"Std: {np.std(sessions):.1f}s")
print(f"Skewness: {stats.skew(sessions):.2f}")  # Positive = right-skewed

# 3. Fit distributions
# Normal
norm_params = stats.norm.fit(sessions)
# Exponential
exp_params = stats.expon.fit(sessions)
# Log-normal
lognorm_params = stats.lognorm.fit(sessions)

# 4. Compare using Kolmogorov-Smirnov test
# Null hypothesis: data follows the distribution
# Lower p-value = worse fit

ks_norm = stats.kstest(sessions, 'norm', args=norm_params)
ks_exp = stats.kstest(sessions, 'expon', args=exp_params)
ks_lognorm = stats.kstest(sessions, 'lognorm', args=lognorm_params)

print(f"\nKS Test p-values:")
print(f"Normal: {ks_norm.pvalue:.4f}")      # Low - bad fit
print(f"Exponential: {ks_exp.pvalue:.4f}")  # High - good fit!
print(f"Log-normal: {ks_lognorm.pvalue:.4f}")

# Exponential wins because mean ≈ std dev (property of exponential)

📝 Practice Exercises

Exercise 1

Work with normal distribution and z-scores

Exercise 2

Apply binomial distribution to A/B testing

Exercise 3

Model customer arrivals with Poisson distribution

Exercise 4

Real-world: Quality control with distributions

Key Takeaways

Distribution Types

  • Discrete: Countable outcomes (die rolls, counts)
  • Continuous: Any value in a range (measurements)
  • Each distribution has parameters that define its shape

The Normal Distribution

  • Defined by mean (μ) and standard deviation (σ)
  • 68-95-99.7 rule for quick calculations
  • Appears everywhere due to Central Limit Theorem

Key Distributions

  • Uniform: Equal probability (dice, random selection)
  • Binomial: Success/failure experiments (conversions, defects)
  • Normal: Continuous measurements (heights, errors)
  • Poisson: Count of rare events (arrivals, defects)

Z-Scores

  • Standardize any normal distribution
  • z = (x - μ) / σ
  • Allows comparison across different scales
  • Standard normal has μ=0, σ=1

Interview Prep: Common Questions

Q: When would you use Poisson vs Binomial distribution?
Poisson: Counting events in continuous time/space where events are rare (website visits, defects). Binomial: Fixed number of trials with binary outcomes (10 coin flips, 100 users converting).
Q: How do you check if data is normally distributed?
Visual: histogram, Q-Q plot. Statistical: Shapiro-Wilk test, Anderson-Darling test. Rule of thumb: Check skewness (< 2) and kurtosis (< 7).
Q: What is the Central Limit Theorem and why does it matter?
CLT states that sample means approach a normal distribution regardless of population distribution, given large enough samples (n ≥ 30). It’s why we can use normal-based methods even when data isn’t normally distributed.
Q: A process has 2% defect rate. What distribution models the number of defects in a batch of 50?
Binomial with n=50, p=0.02. Expected defects = np = 1. Could approximate with Poisson(λ=1) since n is large and p is small.

Common Pitfalls

Distribution Mistakes to Avoid:
  1. Assuming Normality - Always check; many real-world distributions are skewed or heavy-tailed
  2. Confusing Parameters - Variance (σ²) vs Standard Deviation (σ); Population vs Sample
  3. Ignoring Distribution Shape - Mean/std alone don’t fully describe a distribution; visualize first
  4. Wrong Distribution Choice - Using normal for bounded data, using binomial for continuous outcomes
  5. CLT Misapplication - CLT applies to sample means, not individual observations

Connection to Machine Learning

Distribution ConceptML Application
Normal distributionGaussian noise, regularization, Gaussian Naive Bayes
Central Limit TheoremWhy batch statistics work, confidence in predictions
Z-scoresFeature standardization, batch normalization
BinomialClassification evaluation, confidence intervals
PoissonCount prediction, event modeling
ML Connection: When you see “Gaussian” in ML papers, it means “normal distribution.” Gaussian processes, Gaussian mixture models, and Gaussian noise all rely on properties of the normal distribution you just learned!
Coming up next: We’ll learn about Statistical Inference - how to draw conclusions about entire populations from just samples. This is how polls predict elections and A/B tests drive decisions.

Next: Statistical Inference

Learn to draw conclusions from limited data