Probability Distributions: Patterns in Randomness

The Factory Quality Problem

You run a factory that produces ball bearings. Each bearing should be exactly 10mm in diameter. But manufacturing isn’t perfect - there’s always some variation. You measure 1000 bearings and get:

import numpy as np
import matplotlib.pyplot as plt

# Simulated bearing diameters (mm)
np.random.seed(42)
bearings = np.random.normal(loc=10.0, scale=0.05, size=1000)

print(f"Mean diameter: {np.mean(bearings):.4f} mm")
print(f"Std deviation: {np.std(bearings):.4f} mm")
print(f"Min: {np.min(bearings):.4f} mm")
print(f"Max: {np.max(bearings):.4f} mm")

Output:

Mean diameter: 10.0024 mm
Std deviation: 0.0498 mm
Min: 9.8521 mm
Max: 10.1534 mm

If you plot these measurements, something magical appears:

plt.figure(figsize=(10, 5))
plt.hist(bearings, bins=50, density=True, alpha=0.7, edgecolor='black')
plt.xlabel('Diameter (mm)')
plt.ylabel('Frequency')
plt.title('Distribution of Ball Bearing Diameters')
plt.axvline(10.0, color='red', linestyle='--', label='Target: 10mm')
plt.legend()
plt.show()

A bell curve emerges. This isn’t coincidence - it’s one of the most profound patterns in nature.

Estimated Time: 3-4 hours
Difficulty: Beginner
Prerequisites: Modules 1-2 (Describing Data, Probability)
What You’ll Build: Quality control system, prediction intervals

What Is a Probability Distribution?

A probability distribution describes all possible values a random variable can take and how likely each value is. Think of it as a complete map of possibilities.

Discrete vs Continuous

Type	Description	Examples
Discrete	Countable outcomes	Coin flips, dice rolls, number of customers
Continuous	Infinite possible values	Height, weight, temperature, time

# Discrete: Number of heads in 10 coin flips
# Can only be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10

# Continuous: Height of a person
# Can be 170.0 cm, 170.1 cm, 170.01 cm, 170.001 cm...

The Uniform Distribution: Equal Chances

The simplest distribution - every outcome is equally likely.

Discrete Uniform: The Fair Die

import numpy as np
from collections import Counter

# Roll a fair die 10000 times
rolls = np.random.randint(1, 7, size=10000)

counts = Counter(rolls)
for face in sorted(counts.keys()):
    pct = counts[face] / 10000 * 100
    print(f"Face {face}: {counts[face]:4d} ({pct:.1f}%)")

Output:

Face 1: 1652 (16.5%)
Face 2: 1689 (16.9%)
Face 3: 1634 (16.3%)
Face 4: 1701 (17.0%)
Face 5: 1658 (16.6%)
Face 6: 1666 (16.7%)

Each face appears roughly 16.67% (1/6) of the time.

Continuous Uniform: Random Numbers

# Random time a customer arrives between 9:00 and 10:00 AM
arrival_minutes = np.random.uniform(0, 60, size=1000)

print(f"Mean arrival: {np.mean(arrival_minutes):.1f} minutes after 9:00")
print(f"P(arrive in first 15 min): {np.mean(arrival_minutes < 15):.1%}")

ML Applications:

Random weight initialization
Data augmentation (random crops, rotations)
Monte Carlo simulations

The Binomial Distribution: Success/Failure Experiments

When you repeat an experiment with two outcomes (success/failure) multiple times. Parameters:

n = number of trials
p = probability of success on each trial

Question: If you flip a coin 10 times, what’s the probability of getting exactly 7 heads?

Mathematical Formula

P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}

Where

\binom{n}{k} = \frac{n!}{k!(n-k)!}

is “n choose k”

from scipy import stats
import math

def binomial_probability(n, k, p):
    """Calculate P(X = k) for binomial distribution."""
    # n choose k
    combinations = math.factorial(n) / (math.factorial(k) * math.factorial(n - k))
    # Probability
    return combinations * (p ** k) * ((1 - p) ** (n - k))

# P(exactly 7 heads in 10 flips)
p_7_heads = binomial_probability(n=10, k=7, p=0.5)
print(f"P(7 heads in 10 flips): {p_7_heads:.4f}")  # 0.1172

# Using scipy
p_7_scipy = stats.binom.pmf(k=7, n=10, p=0.5)
print(f"P(7 heads) via scipy: {p_7_scipy:.4f}")  # 0.1172

Visualizing the Binomial Distribution

n = 10
p = 0.5

k_values = range(0, n + 1)
probabilities = [stats.binom.pmf(k, n, p) for k in k_values]

plt.figure(figsize=(10, 5))
plt.bar(k_values, probabilities, edgecolor='black', alpha=0.7)
plt.xlabel('Number of Heads')
plt.ylabel('Probability')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xticks(k_values)
plt.show()

Real-World Example: Website Conversion

# Your website has a 3% conversion rate
# 100 people visit today
# What's the probability of 5 or more conversions?

n = 100
p = 0.03

# P(X >= 5) = 1 - P(X <= 4)
p_at_least_5 = 1 - stats.binom.cdf(4, n, p)
print(f"P(5+ conversions): {p_at_least_5:.1%}")  # 18.2%

# Expected conversions
expected = n * p
print(f"Expected conversions: {expected}")  # 3.0

Practice: Quality Control

A factory produces items with a 2% defect rate. In a batch of 50 items:

What’s the probability of exactly 0 defects?
What’s the probability of 3 or more defects?
How many defects do you expect?

n, p = 50, 0.02

# 1. P(X = 0)
p_zero = stats.binom.pmf(0, n, p)
print(f"P(0 defects): {p_zero:.1%}")  # 36.4%

# 2. P(X >= 3)
p_three_plus = 1 - stats.binom.cdf(2, n, p)
print(f"P(3+ defects): {p_three_plus:.1%}")  # 7.8%

# 3. Expected defects
expected = n * p
print(f"Expected defects: {expected}")  # 1.0

The Normal Distribution: The Bell Curve

This is the most important distribution in statistics. It appears everywhere:

Human heights and weights
Test scores
Measurement errors
Stock price changes
IQ scores

Parameters:

$\mu$ (mu) = mean (center of the bell)
$\sigma$ (sigma) = standard deviation (width of the bell)

Mathematical Formula

f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}

# Generate normally distributed data
mu = 100  # mean
sigma = 15  # standard deviation

# IQ scores follow this distribution
iq_scores = np.random.normal(mu, sigma, 10000)

plt.figure(figsize=(10, 5))
plt.hist(iq_scores, bins=50, density=True, alpha=0.7, edgecolor='black')

# Overlay theoretical curve
x = np.linspace(50, 150, 1000)
y = stats.norm.pdf(x, mu, sigma)
plt.plot(x, y, 'r-', linewidth=2, label='Theoretical')

plt.xlabel('IQ Score')
plt.ylabel('Probability Density')
plt.title('Normal Distribution of IQ Scores (μ=100, σ=15)')
plt.legend()
plt.show()

The 68-95-99.7 Rule (Empirical Rule)

One of the most useful facts in statistics:

Range	Percentage of Data
μ ± 1σ	68%
μ ± 2σ	95%
μ ± 3σ	99.7%

# Verify with IQ scores
within_1_std = np.mean(np.abs(iq_scores - mu) <= sigma)
within_2_std = np.mean(np.abs(iq_scores - mu) <= 2 * sigma)
within_3_std = np.mean(np.abs(iq_scores - mu) <= 3 * sigma)

print(f"Within 1 std (85-115): {within_1_std:.1%}")   # ~68%
print(f"Within 2 std (70-130): {within_2_std:.1%}")   # ~95%
print(f"Within 3 std (55-145): {within_3_std:.1%}")   # ~99.7%

Z-Scores: Standardizing Any Normal Distribution

A z-score tells you how many standard deviations a value is from the mean.

z = \frac{x - \mu}{\sigma}

def z_score(x, mu, sigma):
    """Convert value to z-score."""
    return (x - mu) / sigma

# How exceptional is an IQ of 130?
iq = 130
z = z_score(iq, mu=100, sigma=15)
print(f"IQ of 130 has z-score: {z:.2f}")  # 2.0

# This means 130 is 2 standard deviations above average
# Only about 2.3% of people score higher
percentile = stats.norm.cdf(z) * 100
print(f"Percentile: {percentile:.1f}%")  # 97.7%

Calculating Probabilities

# Normal distribution with μ=100, σ=15

# P(IQ > 130)
p_above_130 = 1 - stats.norm.cdf(130, loc=100, scale=15)
print(f"P(IQ > 130): {p_above_130:.2%}")  # 2.28%

# P(IQ between 85 and 115)
p_middle = stats.norm.cdf(115, 100, 15) - stats.norm.cdf(85, 100, 15)
print(f"P(85 < IQ < 115): {p_middle:.2%}")  # 68.27%

# What IQ score is at the 99th percentile?
iq_99 = stats.norm.ppf(0.99, loc=100, scale=15)
print(f"99th percentile IQ: {iq_99:.1f}")  # 134.9

Why Is the Normal Distribution Everywhere?

The Central Limit Theorem (CLT) explains this magic:

Central Limit Theorem: When you add up many independent random variables, their sum tends toward a normal distribution - regardless of the original distributions.

Demonstration

# Roll a single die - definitely NOT normal
single_die = np.random.randint(1, 7, 10000)

# Sum of 2 dice - starting to look different
sum_2_dice = np.array([np.random.randint(1, 7, 2).sum() for _ in range(10000)])

# Sum of 10 dice - getting bell-shaped
sum_10_dice = np.array([np.random.randint(1, 7, 10).sum() for _ in range(10000)])

# Sum of 30 dice - nearly perfect normal!
sum_30_dice = np.array([np.random.randint(1, 7, 30).sum() for _ in range(10000)])

fig, axes = plt.subplots(2, 2, figsize=(12, 8))

axes[0, 0].hist(single_die, bins=6, edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Single Die (Uniform)')

axes[0, 1].hist(sum_2_dice, bins=11, edgecolor='black', alpha=0.7)
axes[0, 1].set_title('Sum of 2 Dice')

axes[1, 0].hist(sum_10_dice, bins=30, edgecolor='black', alpha=0.7)
axes[1, 0].set_title('Sum of 10 Dice')

axes[1, 1].hist(sum_30_dice, bins=40, edgecolor='black', alpha=0.7)
axes[1, 1].set_title('Sum of 30 Dice (Nearly Normal!)')

plt.tight_layout()
plt.show()

This is why heights are normally distributed: Height is determined by thousands of genes, each adding a small random effect. Sum of many small random things = normal distribution.

Other Important Distributions

Poisson Distribution: Rare Events Over Time

How many customers arrive per hour? How many defects per batch? How many emails per day? Parameter: λ (lambda) = average rate of events

P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}

# Average 5 customers per hour
lambda_rate = 5

# Probability of exactly 3 customers in an hour
p_3 = stats.poisson.pmf(3, lambda_rate)
print(f"P(3 customers): {p_3:.2%}")  # 14.04%

# Probability of 10 or more
p_10_plus = 1 - stats.poisson.cdf(9, lambda_rate)
print(f"P(10+ customers): {p_10_plus:.2%}")  # 3.18%

# Visualize
k_values = range(0, 15)
probs = [stats.poisson.pmf(k, lambda_rate) for k in k_values]

plt.figure(figsize=(10, 5))
plt.bar(k_values, probs, edgecolor='black', alpha=0.7)
plt.xlabel('Number of Customers')
plt.ylabel('Probability')
plt.title(f'Poisson Distribution (λ={lambda_rate})')
plt.show()

Exponential Distribution: Time Between Events

If events occur at rate λ, how long until the next one?

# Average 5 customers per hour = 1 customer per 12 minutes average
lambda_rate = 5  # per hour
avg_wait = 60 / lambda_rate  # 12 minutes

# Probability of waiting more than 20 minutes
p_wait_20 = 1 - stats.expon.cdf(20, scale=avg_wait)
print(f"P(wait > 20 min): {p_wait_20:.2%}")  # 18.9%

# Time by which 90% of customers will have arrived
time_90 = stats.expon.ppf(0.90, scale=avg_wait)
print(f"90% arrive within: {time_90:.1f} minutes")  # 27.6 min

Mini-Project: Quality Control System

Build a complete quality control system for the ball bearing factory.

import numpy as np
from scipy import stats

class QualityControlSystem:
    """
    Quality control system using normal distribution.
    """
    
    def __init__(self, target, std_dev, tolerance):
        """
        Initialize QC system.
        
        target: desired measurement (e.g., 10mm)
        std_dev: expected standard deviation in production
        tolerance: acceptable deviation from target (e.g., ±0.1mm)
        """
        self.target = target
        self.std_dev = std_dev
        self.tolerance = tolerance
        self.lower_limit = target - tolerance
        self.upper_limit = target + tolerance
        
    def expected_defect_rate(self):
        """Calculate expected percentage of out-of-spec products."""
        # P(X < lower) + P(X > upper)
        p_below = stats.norm.cdf(self.lower_limit, self.target, self.std_dev)
        p_above = 1 - stats.norm.cdf(self.upper_limit, self.target, self.std_dev)
        return p_below + p_above
    
    def analyze_batch(self, measurements):
        """Analyze a batch of measurements."""
        n = len(measurements)
        mean = np.mean(measurements)
        std = np.std(measurements)
        
        # Count out-of-spec
        defects = np.sum((measurements < self.lower_limit) | 
                         (measurements > self.upper_limit))
        defect_rate = defects / n
        
        # Check if process is in control
        # Mean should be within 2 standard errors of target
        std_error = std / np.sqrt(n)
        z_score = (mean - self.target) / std_error
        
        results = {
            'batch_size': n,
            'mean': mean,
            'std_dev': std,
            'defects': defects,
            'defect_rate': defect_rate,
            'z_score': z_score,
            'process_in_control': abs(z_score) < 2
        }
        
        return results
    
    def print_report(self, results):
        """Print a formatted QC report."""
        print("\n" + "=" * 50)
        print("QUALITY CONTROL REPORT")
        print("=" * 50)
        print(f"Batch Size: {results['batch_size']}")
        print(f"Target: {self.target:.4f} ± {self.tolerance:.4f}")
        print(f"Specification Limits: [{self.lower_limit:.4f}, {self.upper_limit:.4f}]")
        print("-" * 50)
        print(f"Batch Mean: {results['mean']:.4f}")
        print(f"Batch Std Dev: {results['std_dev']:.4f}")
        print(f"Defects: {results['defects']} ({results['defect_rate']:.2%})")
        print(f"Expected Defect Rate: {self.expected_defect_rate():.2%}")
        print("-" * 50)
        print(f"Z-Score: {results['z_score']:.2f}")
        status = "IN CONTROL" if results['process_in_control'] else "OUT OF CONTROL"
        print(f"Process Status: {status}")
        print("=" * 50)


# Create QC system
qc = QualityControlSystem(
    target=10.0,      # 10mm target diameter
    std_dev=0.05,     # 0.05mm expected variation
    tolerance=0.1     # ±0.1mm acceptable
)

# Expected defect rate
print(f"Expected defect rate: {qc.expected_defect_rate():.2%}")

# Simulate a good batch
np.random.seed(42)
good_batch = np.random.normal(10.0, 0.05, 100)
results = qc.analyze_batch(good_batch)
qc.print_report(results)

# Simulate a problematic batch (shifted mean)
bad_batch = np.random.normal(10.08, 0.05, 100)  # Mean shifted by 0.08mm
results_bad = qc.analyze_batch(bad_batch)
qc.print_report(results_bad)

Output:

Expected defect rate: 4.55%

==================================================
QUALITY CONTROL REPORT
==================================================
Batch Size: 100
Target: 10.0000 ± 0.1000
Specification Limits: [9.9000, 10.1000]
--------------------------------------------------
Batch Mean: 10.0024
Batch Std Dev: 0.0496
Defects: 4 (4.00%)
Expected Defect Rate: 4.55%
--------------------------------------------------
Z-Score: 0.49
Process Status: IN CONTROL
==================================================

==================================================
QUALITY CONTROL REPORT
==================================================
Batch Size: 100
Target: 10.0000 ± 0.1000
Specification Limits: [9.9000, 10.1000]
--------------------------------------------------
Batch Mean: 10.0822
Batch Std Dev: 0.0518
Defects: 33 (33.00%)
Expected Defect Rate: 4.55%
--------------------------------------------------
Z-Score: 15.88
Process Status: OUT OF CONTROL
==================================================

Practice Exercises

Exercise 1: Height Analysis

# Adult male heights in the US follow N(69.1, 2.9) inches
# (mean 69.1 inches, std dev 2.9 inches)

# Calculate:
# 1. What percentage of men are over 6 feet (72 inches)?
# 2. What percentage are between 5'6" (66 in) and 6'0" (72 in)?
# 3. How tall do you need to be to be in the top 5%?
# 4. What is the z-score for someone 6'4" (76 inches)?

Solution

mu = 69.1
sigma = 2.9

# 1. P(height > 72)
p_over_6ft = 1 - stats.norm.cdf(72, mu, sigma)
print(f"Over 6 feet: {p_over_6ft:.1%}")  # 15.9%

# 2. P(66 < height < 72)
p_between = stats.norm.cdf(72, mu, sigma) - stats.norm.cdf(66, mu, sigma)
print(f"Between 5'6\" and 6'0\": {p_between:.1%}")  # 71.0%

# 3. Top 5% height
top_5_height = stats.norm.ppf(0.95, mu, sigma)
print(f"Top 5% starts at: {top_5_height:.1f} inches")  # 73.9 inches (6'2")

# 4. Z-score for 6'4"
z_76 = (76 - mu) / sigma
print(f"Z-score for 6'4\": {z_76:.2f}")  # 2.38
print(f"Percentile: {stats.norm.cdf(z_76) * 100:.1f}%")  # 99.1%

Exercise 2: Server Requests

# A web server receives an average of 100 requests per minute.
# Requests follow a Poisson distribution.

# Calculate:
# 1. P(exactly 100 requests in a minute)
# 2. P(more than 120 requests in a minute)
# 3. For capacity planning, what number of requests per minute
#    will only be exceeded 1% of the time?

Solution

lambda_rate = 100

# 1. P(X = 100)
p_exactly_100 = stats.poisson.pmf(100, lambda_rate)
print(f"P(exactly 100): {p_exactly_100:.2%}")  # 3.99%

# 2. P(X > 120)
p_over_120 = 1 - stats.poisson.cdf(120, lambda_rate)
print(f"P(over 120): {p_over_120:.2%}")  # 1.79%

# 3. 99th percentile
capacity_99 = stats.poisson.ppf(0.99, lambda_rate)
print(f"99% of minutes have fewer than {capacity_99:.0f} requests")  # 124

Common Mistakes to Avoid

Mistake 1: Assuming Everything is NormalNot all data follows a normal distribution. Income data is heavily right-skewed. Time-to-event data often follows exponential distributions. Always visualize your data before assuming normality.

Mistake 2: Misusing the 68-95-99.7 RuleThis rule ONLY applies to normal distributions. Applying it to skewed data will give wrong answers. For non-normal data, use Chebyshev’s inequality: at least 75% of data is within 2 std devs, regardless of distribution shape.

Mistake 3: Confusing PDF and CDFThe PDF gives the relative likelihood at a point (technically, density). The CDF gives the probability of being less than or equal to a value. P(X = exact value) is always 0 for continuous distributions.

Interview Questions

Question 1: Normal Distribution Application (Google)

Question: Website response times follow a normal distribution with mean 200ms and std dev 50ms. What percentage of requests take more than 300ms?

Answer: About 2.3%

from scipy import stats
p_slow = 1 - stats.norm.cdf(300, loc=200, scale=50)
# Or using z-score: z = (300-200)/50 = 2
# P(Z > 2) ≈ 0.0228
print(f"{p_slow:.2%}")  # 2.28%

The 68-95-99.7 rule gives us a quick check: 300ms is 2 standard deviations above mean, so roughly 2.5% should be above that.

Question 2: Choosing the Right Distribution (Amazon)

Question: You’re modeling these scenarios. Which distribution would you use for each?

Number of customers arriving per hour
Whether a user clicks an ad (yes/no)
Time until a server fails
Heights of basketball players

Answer:

Poisson - Counts of events in fixed intervals
Bernoulli (single trial) or Binomial (many users) - Binary outcomes
Exponential - Time until an event (memoryless process)
Normal - Continuous measurements of natural phenomena

For height, you might also consider that basketball players are selected to be tall, so it could be a truncated normal!

Question 3: Central Limit Theorem (Facebook/Meta)

Question: User session times are heavily right-skewed (not normal). You calculate the average session time each day for 30 days. What distribution does the sample mean follow?

Answer: Approximately normal!Thanks to the Central Limit Theorem, the sampling distribution of the mean will be approximately normal regardless of the underlying distribution shape, as long as:

Sample size is sufficiently large (n ≥ 30 is a common rule of thumb)
The original distribution has finite variance

This is why we can use confidence intervals and hypothesis tests based on the normal distribution even when the underlying data isn’t normal.

Question 4: Percentiles in Practice (Netflix)

Question: Video start times follow a log-normal distribution (right-skewed). The P50 is 1.2 seconds and P95 is 4.8 seconds. What does this tell you about user experience?

Answer:

Half of users experience start times of 1.2s or less (good!)
5% of users wait more than 4.8 seconds (potentially frustrating)
The ratio P95/P50 = 4 indicates significant variability

For right-skewed metrics like latency, the P95 or P99 is often more important than the mean because it captures the experience of the “unlucky” users. A 4x difference between median and P95 suggests there are edge cases worth investigating (slow CDNs, distant users, etc.).

Practice Challenge

Challenge: Distribution Fitting

You have real website session data. Determine which distribution best fits it:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Simulated session durations (in seconds)
np.random.seed(42)
sessions = np.random.exponential(scale=120, size=1000)  # Unknown to you!

# Your task:
# 1. Visualize the data with a histogram
# 2. Calculate summary statistics
# 3. Fit different distributions and compare
# 4. Determine which distribution fits best

# Hint: Try normal, exponential, and log-normal

# Starter code:
plt.figure(figsize=(12, 4))

# Histogram
plt.subplot(1, 3, 1)
plt.hist(sessions, bins=50, density=True, alpha=0.7)
plt.title('Data Distribution')
plt.xlabel('Session Duration (s)')

# Q-Q plot for normal
plt.subplot(1, 3, 2)
stats.probplot(sessions, dist="norm", plot=plt)
plt.title('Normal Q-Q Plot')

# Q-Q plot for exponential
plt.subplot(1, 3, 3)
stats.probplot(sessions, dist="expon", plot=plt)
plt.title('Exponential Q-Q Plot')

plt.tight_layout()
plt.show()

# Fit distributions and compare

Solution:

# 1. Visual inspection shows right-skewed data
# 2. Summary stats
print(f"Mean: {np.mean(sessions):.1f}s")
print(f"Median: {np.median(sessions):.1f}s")
print(f"Std: {np.std(sessions):.1f}s")
print(f"Skewness: {stats.skew(sessions):.2f}")  # Positive = right-skewed

# 3. Fit distributions
# Normal
norm_params = stats.norm.fit(sessions)
# Exponential
exp_params = stats.expon.fit(sessions)
# Log-normal
lognorm_params = stats.lognorm.fit(sessions)

# 4. Compare using Kolmogorov-Smirnov test
# Null hypothesis: data follows the distribution
# Lower p-value = worse fit

ks_norm = stats.kstest(sessions, 'norm', args=norm_params)
ks_exp = stats.kstest(sessions, 'expon', args=exp_params)
ks_lognorm = stats.kstest(sessions, 'lognorm', args=lognorm_params)

print(f"\nKS Test p-values:")
print(f"Normal: {ks_norm.pvalue:.4f}")      # Low - bad fit
print(f"Exponential: {ks_exp.pvalue:.4f}")  # High - good fit!
print(f"Log-normal: {ks_lognorm.pvalue:.4f}")

# Exponential wins because mean ≈ std dev (property of exponential)

📝 Practice Exercises

Exercise 1

Work with normal distribution and z-scores

Exercise 2

Apply binomial distribution to A/B testing

Exercise 3

Model customer arrivals with Poisson distribution

Exercise 4

Real-world: Quality control with distributions

Key Takeaways

Distribution Types

Discrete: Countable outcomes (die rolls, counts)
Continuous: Any value in a range (measurements)
Each distribution has parameters that define its shape

The Normal Distribution

Defined by mean (μ) and standard deviation (σ)
68-95-99.7 rule for quick calculations
Appears everywhere due to Central Limit Theorem

Key Distributions

Uniform: Equal probability (dice, random selection)
Binomial: Success/failure experiments (conversions, defects)
Normal: Continuous measurements (heights, errors)
Poisson: Count of rare events (arrivals, defects)

Z-Scores

Standardize any normal distribution
z = (x - μ) / σ
Allows comparison across different scales
Standard normal has μ=0, σ=1

Interview Prep: Common Questions

Distribution Interview Questions

Q: When would you use Poisson vs Binomial distribution?

Poisson: Counting events in continuous time/space where events are rare (website visits, defects). Binomial: Fixed number of trials with binary outcomes (10 coin flips, 100 users converting).

Q: How do you check if data is normally distributed?

Visual: histogram, Q-Q plot. Statistical: Shapiro-Wilk test, Anderson-Darling test. Rule of thumb: Check skewness (< 2) and kurtosis (< 7).

Q: What is the Central Limit Theorem and why does it matter?

CLT states that sample means approach a normal distribution regardless of population distribution, given large enough samples (n ≥ 30). It’s why we can use normal-based methods even when data isn’t normally distributed.

Q: A process has 2% defect rate. What distribution models the number of defects in a batch of 50?

Binomial with n=50, p=0.02. Expected defects = np = 1. Could approximate with Poisson(λ=1) since n is large and p is small.

Common Pitfalls

Distribution Mistakes to Avoid:

Assuming Normality - Always check; many real-world distributions are skewed or heavy-tailed
Confusing Parameters - Variance (σ²) vs Standard Deviation (σ); Population vs Sample
Ignoring Distribution Shape - Mean/std alone don’t fully describe a distribution; visualize first
Wrong Distribution Choice - Using normal for bounded data, using binomial for continuous outcomes
CLT Misapplication - CLT applies to sample means, not individual observations

Connection to Machine Learning

Distribution Concept	ML Application
Normal distribution	Gaussian noise, regularization, Gaussian Naive Bayes
Central Limit Theorem	Why batch statistics work, confidence in predictions
Z-scores	Feature standardization, batch normalization
Binomial	Classification evaluation, confidence intervals
Poisson	Count prediction, event modeling

ML Connection: When you see “Gaussian” in ML papers, it means “normal distribution.” Gaussian processes, Gaussian mixture models, and Gaussian noise all rely on properties of the normal distribution you just learned!

Coming up next: We’ll learn about Statistical Inference - how to draw conclusions about entire populations from just samples. This is how polls predict elections and A/B tests drive decisions.

Next: Statistical Inference

Learn to draw conclusions from limited data

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Probability Distributions: Patterns in Randomness

​The Factory Quality Problem

​What Is a Probability Distribution?

​Discrete vs Continuous

​The Uniform Distribution: Equal Chances

​Discrete Uniform: The Fair Die

​Continuous Uniform: Random Numbers

​The Binomial Distribution: Success/Failure Experiments

​Mathematical Formula

​Visualizing the Binomial Distribution

​Real-World Example: Website Conversion

​The Normal Distribution: The Bell Curve

​Mathematical Formula

​The 68-95-99.7 Rule (Empirical Rule)

​Z-Scores: Standardizing Any Normal Distribution

​Calculating Probabilities

​Why Is the Normal Distribution Everywhere?

​Demonstration

​Other Important Distributions

​Poisson Distribution: Rare Events Over Time

​Exponential Distribution: Time Between Events

​Mini-Project: Quality Control System

​Practice Exercises

​Exercise 1: Height Analysis

​Exercise 2: Server Requests

​Common Mistakes to Avoid

​Interview Questions

​Practice Challenge

​📝 Practice Exercises

Exercise 1

Exercise 2

Exercise 3

Exercise 4

​Key Takeaways

Distribution Types

The Normal Distribution

Key Distributions

Z-Scores

Probability Distributions: Patterns in Randomness

The Factory Quality Problem

What Is a Probability Distribution?

Discrete vs Continuous

The Uniform Distribution: Equal Chances

Discrete Uniform: The Fair Die

Continuous Uniform: Random Numbers

The Binomial Distribution: Success/Failure Experiments

Mathematical Formula

Visualizing the Binomial Distribution

Real-World Example: Website Conversion

The Normal Distribution: The Bell Curve

Mathematical Formula

The 68-95-99.7 Rule (Empirical Rule)

Z-Scores: Standardizing Any Normal Distribution

Calculating Probabilities

Why Is the Normal Distribution Everywhere?

Demonstration

Other Important Distributions

Poisson Distribution: Rare Events Over Time

Exponential Distribution: Time Between Events

Mini-Project: Quality Control System

Practice Exercises

Exercise 1: Height Analysis

Exercise 2: Server Requests

Common Mistakes to Avoid

Interview Questions

Practice Challenge

📝 Practice Exercises

Key Takeaways

Interview Prep: Common Questions

Common Pitfalls

Connection to Machine Learning