Diffusion Models
The Core Idea
Diffusion models work by:- Forward process: Gradually add noise to data until it becomes pure noise
- Reverse process: Learn to denoise step by step, recovering the original data
- Forward: Dropping ink into water (ink diffuses until water is uniformly colored)
- Reverse: Learning to “un-diffuse” the ink back to its original drop
Mathematical Foundation
Forward Diffusion (Adding Noise)
At each step , we add Gaussian noise: Where is the noise schedule. We can jump directly to any step: Where and .Reverse Process (Learning to Denoise)
We train a neural network to predict the noise added at step :Training Loop
Sampling (Generation)
Classifier-Free Guidance
Enables controlling generation with text or class labels: Where is the guidance scale (typically 7.5 for Stable Diffusion).Connection to Stable Diffusion
Stable Diffusion operates in latent space for efficiency:- VAE Encoder: Compress 512×512 image to 64×64 latent
- U-Net: Denoise in latent space (much cheaper)
- VAE Decoder: Expand latent back to image
- CLIP Text Encoder: Condition on text prompts
Exercises
Exercise 1: MNIST Diffusion
Exercise 1: MNIST Diffusion
Train a diffusion model on MNIST. Generate digit samples and visualize the denoising process.
Exercise 2: Noise Schedules
Exercise 2: Noise Schedules
Implement and compare linear, cosine, and quadratic noise schedules.
Exercise 3: Conditional Diffusion
Exercise 3: Conditional Diffusion
Add class conditioning to generate specific digits.