Residual & Skip Connections
The Depth Problem
In 2015, researchers tried training networks with 100+ layers. They expected deeper = better. What actually happened: Deeper networks performed WORSE than shallower ones! This wasn’t overfitting — even training error was higher. Deep networks simply couldn’t learn.The Residual Insight
The solution is beautifully simple: skip connections. Instead of learning , learn the residual :Why This Works
- Identity is easy: If optimal is identity, weights just go to zero
- Gradient highway: Gradients flow directly through skip connections
- Ensemble effect: Each block can refine or skip
ResNet Architecture
DenseNet: Dense Connections
Instead of adding, DenseNet concatenates all previous features:U-Net: Skip Connections for Segmentation
U-Net combines encoder-decoder with skip connections for pixel-level predictions:Comparison
| Architecture | Connection Type | Best For |
|---|---|---|
| ResNet | Add | Image classification |
| DenseNet | Concatenate | Feature reuse, fewer params |
| U-Net | Skip + Concat | Segmentation |
| Highway | Gated add | Sequence modeling |
Exercises
Exercise 1: Gradient Analysis
Exercise 1: Gradient Analysis
Compare gradient magnitudes at early layers for a 50-layer network with and without skip connections.
Exercise 2: ResNet Variants
Exercise 2: ResNet Variants
Implement ResNet-50 and ResNet-101 using bottleneck blocks.
Exercise 3: Segmentation
Exercise 3: Segmentation
Train U-Net on a simple segmentation task (e.g., cell segmentation).