Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Adversarial Robustness

Adversarial Machine Learning

The Vulnerability of Neural Networks

Neural networks are surprisingly vulnerable to adversarial examples — inputs crafted to cause misclassification while appearing normal to humans. Consider this: a model that classifies a panda image with 99.9% confidence can be made to classify it as a gibbon with even higher confidence, by adding a perturbation so small that a human cannot see the difference. We’re talking about changes of less than 1/255 per pixel — invisible to the naked eye, devastating to the model. Why does this matter in production? Adversarial attacks aren’t just an academic curiosity. Self-driving cars need to correctly classify stop signs even when someone sticks a small adversarial patch on them. Medical imaging systems must resist subtle pixel perturbations that could change a “benign” diagnosis to “malignant.” Content moderation systems must resist adversarial bypasses. Any model deployed in a safety-critical or adversarial environment needs to be tested against these attacks.
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
from typing import Tuple, Optional, Callable

torch.manual_seed(42)
The adversary’s fundamental goal is to find the smallest perturbation that causes a misclassification: Adversary’s goal: minδδ s.t. f(x+δ)f(x)\text{Adversary's goal: } \min_{\delta} \|\delta\| \text{ s.t. } f(x + \delta) \neq f(x) Intuition behind the math: Neural network decision boundaries are high-dimensional surfaces. In high dimensions, most data points are close to the boundary (this is counterintuitive but follows from the geometry of high-dimensional spaces). Even a tiny step in the right direction can cross the boundary. The “right direction” is the gradient of the loss with respect to the input — the same gradient we use for training, but applied to the input image instead of the weights.

Adversarial Attacks

Fast Gradient Sign Method (FGSM)

The foundational one-step attack by Goodfellow et al. (2014). FGSM is to adversarial ML what “Hello World” is to programming — the simplest possible attack, yet surprisingly effective: xadv=x+ϵsign(xL(f(x),y))x_{adv} = x + \epsilon \cdot \text{sign}(\nabla_x L(f(x), y)) Intuition: Compute the gradient of the loss with respect to each input pixel — this tells you which direction to nudge each pixel to maximally increase the classification error. Then take just the sign of each gradient (+1 or -1) and scale by ϵ\epsilon. The sign operation means every pixel changes by exactly ±ϵ\pm\epsilon, which maximizes the L-infinity perturbation within the budget. It’s a single backward pass — as cheap as one training step.
class FGSM:
    """
    Fast Gradient Sign Method attack.
    
    Simple and fast (one backward pass), but not the strongest attack.
    Think of FGSM as taking one big step in the adversarial direction.
    PGD (below) takes many small steps and is strictly stronger.
    """
    
    def __init__(self, model: nn.Module, epsilon: float = 0.03):
        """
        Args:
            model: Target model to attack
            epsilon: Perturbation budget (L-infinity)
        """
        self.model = model
        self.epsilon = epsilon
    
    def attack(
        self,
        images: torch.Tensor,
        labels: torch.Tensor
    ) -> torch.Tensor:
        """
        Generate adversarial examples.
        
        Args:
            images: [N, C, H, W] clean images
            labels: [N] true labels
        
        Returns:
            adversarial: [N, C, H, W] adversarial images
        """
        images = images.clone().detach().requires_grad_(True)
        
        # Forward pass -- compute the loss we want to MAXIMIZE
        outputs = self.model(images)
        loss = F.cross_entropy(outputs, labels)
        
        # Backward pass -- compute gradient of loss w.r.t. input pixels, not weights
        self.model.zero_grad()
        loss.backward()
        
        # Create perturbation: sign() gives +/-1 per pixel, scaled by epsilon.
        # This is the L-infinity optimal perturbation -- every pixel moves by
        # exactly epsilon in the direction that increases the loss the most.
        grad_sign = images.grad.sign()
        perturbation = self.epsilon * grad_sign
        
        # Apply perturbation and clamp to valid pixel range [0, 1]
        adversarial = images + perturbation
        adversarial = torch.clamp(adversarial, 0, 1)
        
        return adversarial.detach()
    
    def targeted_attack(
        self,
        images: torch.Tensor,
        target_labels: torch.Tensor
    ) -> torch.Tensor:
        """Generate targeted adversarial examples."""
        
        images = images.clone().detach().requires_grad_(True)
        
        # Forward pass
        outputs = self.model(images)
        
        # Minimize loss for target class (gradient descent)
        loss = F.cross_entropy(outputs, target_labels)
        
        self.model.zero_grad()
        loss.backward()
        
        # Subtract gradient (move toward target)
        grad_sign = images.grad.sign()
        perturbation = -self.epsilon * grad_sign  # Negative!
        
        adversarial = images + perturbation
        adversarial = torch.clamp(adversarial, 0, 1)
        
        return adversarial.detach()


# Example usage
def fgsm_example():
    model = nn.Sequential(
        nn.Flatten(),
        nn.Linear(784, 256),
        nn.ReLU(),
        nn.Linear(256, 10)
    )
    
    attack = FGSM(model, epsilon=0.3)
    
    # Generate adversarial examples
    images = torch.rand(10, 1, 28, 28)
    labels = torch.randint(0, 10, (10,))
    
    adv_images = attack.attack(images, labels)
    
    # Measure perturbation
    perturbation = (adv_images - images).abs().max()
    print(f"Max perturbation: {perturbation:.4f}")

Projected Gradient Descent (PGD)

The strongest first-order attack — iterative FGSM with projection. If FGSM takes one big step, PGD takes many small steps, recalculating the gradient at each point. This is much more effective because the loss landscape around a data point is highly non-linear — a single gradient step often overshoots the optimal perturbation direction.
Why PGD is the gold standard for robustness evaluation: Madry et al. (2018) proved that PGD finds approximately worst-case perturbations within the L-infinity ball. If your model is robust to PGD with enough iterations and random restarts, it’s robust to any first-order attack. This is why “PGD-robust accuracy” has become the standard benchmark metric.
class PGD:
    """
    Projected Gradient Descent attack.
    
    Strong iterative attack -- the standard for evaluating robustness.
    Think of it as "gradient ascent on the loss, projected back into
    the allowed perturbation ball at each step."
    """
    
    def __init__(
        self,
        model: nn.Module,
        epsilon: float = 0.03,
        alpha: float = 0.01,
        num_iter: int = 40,
        random_start: bool = True
    ):
        """
        Args:
            epsilon: Total perturbation budget (L-infinity). Standard values:
                     MNIST: 0.3, CIFAR-10: 8/255=0.031, ImageNet: 4/255=0.016
            alpha: Step size per iteration. Rule of thumb: alpha = 2.5*epsilon/num_iter
            num_iter: Number of attack iterations. 20-40 is typical; more is stronger
            random_start: Start from random point in epsilon ball (avoids local optima)
        """
        self.model = model
        self.epsilon = epsilon
        self.alpha = alpha
        self.num_iter = num_iter
        self.random_start = random_start
    
    def attack(
        self,
        images: torch.Tensor,
        labels: torch.Tensor
    ) -> torch.Tensor:
        """Generate PGD adversarial examples."""
        
        original = images.clone()
        
        if self.random_start:
            # Random start within the epsilon ball helps escape local optima.
            # Multiple random restarts further increases attack strength.
            images = images + torch.empty_like(images).uniform_(
                -self.epsilon, self.epsilon
            )
            images = torch.clamp(images, 0, 1)
        
        for _ in range(self.num_iter):
            images = images.clone().detach().requires_grad_(True)
            
            # Forward pass -- compute loss to maximize
            outputs = self.model(images)
            loss = F.cross_entropy(outputs, labels)
            
            # Backward pass -- gradient w.r.t. input pixels
            self.model.zero_grad()
            loss.backward()
            
            # Take a small step in the gradient sign direction
            grad_sign = images.grad.sign()
            images = images + self.alpha * grad_sign
            
            # PROJECT: clamp perturbation back into the L-inf epsilon ball.
            # This is the "projection" in Projected Gradient Descent -- without it,
            # the adversarial image would drift arbitrarily far from the original.
            perturbation = images - original
            perturbation = torch.clamp(perturbation, -self.epsilon, self.epsilon)
            images = original + perturbation
            
            # Also clamp to valid pixel range [0, 1]
            images = torch.clamp(images, 0, 1)
        
        return images.detach()
    
    def attack_with_restarts(
        self,
        images: torch.Tensor,
        labels: torch.Tensor,
        num_restarts: int = 10
    ) -> torch.Tensor:
        """PGD with multiple random restarts."""
        
        best_adv = None
        best_loss = float('-inf')
        
        for _ in range(num_restarts):
            adv = self.attack(images, labels)
            
            with torch.no_grad():
                outputs = self.model(adv)
                loss = F.cross_entropy(outputs, labels)
            
            if loss > best_loss:
                best_loss = loss
                best_adv = adv
        
        return best_adv


class AutoPGD:
    """
    Auto-PGD: Automatically tuned PGD attack.
    Part of AutoAttack - a reliable attack for robustness evaluation.
    """
    
    def __init__(
        self,
        model: nn.Module,
        epsilon: float = 0.03,
        num_iter: int = 100,
        loss_type: str = 'ce'  # 'ce' or 'dlr'
    ):
        self.model = model
        self.epsilon = epsilon
        self.num_iter = num_iter
        self.loss_type = loss_type
    
    def _dlr_loss(self, outputs: torch.Tensor, labels: torch.Tensor) -> torch.Tensor:
        """Difference of Logits Ratio loss."""
        # Sort outputs
        sorted_outputs, _ = outputs.sort(dim=1, descending=True)
        
        # y = correct class logit
        # y' = highest incorrect logit
        # y'' = second highest overall
        y = outputs.gather(1, labels.unsqueeze(1)).squeeze()
        
        # Mask correct class
        mask = torch.ones_like(outputs, dtype=torch.bool)
        mask.scatter_(1, labels.unsqueeze(1), False)
        y_prime = outputs[mask].view(outputs.shape[0], -1).max(dim=1)[0]
        
        # DLR loss
        loss = -(y - y_prime) / (sorted_outputs[:, 0] - sorted_outputs[:, 2] + 1e-8)
        
        return loss.mean()
    
    def attack(
        self,
        images: torch.Tensor,
        labels: torch.Tensor
    ) -> torch.Tensor:
        """Auto-PGD attack with step size adaptation."""
        
        original = images.clone()
        
        # Initialize with random start
        images = images + torch.empty_like(images).uniform_(
            -self.epsilon, self.epsilon
        )
        images = torch.clamp(images, 0, 1)
        
        # Adaptive step size
        step_size = 2 * self.epsilon
        
        best_adv = images.clone()
        best_loss = float('-inf')
        
        for i in range(self.num_iter):
            images = images.clone().detach().requires_grad_(True)
            
            outputs = self.model(images)
            
            if self.loss_type == 'dlr':
                loss = self._dlr_loss(outputs, labels)
            else:
                loss = F.cross_entropy(outputs, labels)
            
            self.model.zero_grad()
            loss.backward()
            
            # Gradient step
            grad = images.grad
            images = images + step_size * grad.sign()
            
            # Project
            perturbation = images - original
            perturbation = torch.clamp(perturbation, -self.epsilon, self.epsilon)
            images = original + perturbation
            images = torch.clamp(images, 0, 1)
            
            # Update best
            with torch.no_grad():
                current_loss = loss.item()
                if current_loss > best_loss:
                    best_loss = current_loss
                    best_adv = images.clone()
            
            # Adapt step size
            if i % 10 == 0 and i > 0:
                step_size *= 0.75
        
        return best_adv

C&W Attack

Carlini and Wagner (2017) — a fundamentally different approach. Instead of constraining the perturbation and maximizing the loss (like PGD), C&W jointly optimizes for minimal perturbation AND misclassification using a Lagrangian formulation. This finds smaller perturbations than PGD, though at much higher computational cost (1000+ optimization steps vs 20-40 for PGD). C&W also uses the tanh-space reparameterization to handle the [0,1] box constraint elegantly:
class CWAttack:
    """
    Carlini & Wagner L2 attack.
    
    Powerful optimization-based attack that finds minimal perturbations.
    Unlike FGSM/PGD which maximize loss within a fixed budget,
    C&W minimizes the perturbation needed to achieve misclassification.
    This makes it the gold standard for measuring true model vulnerability.
    """
    
    def __init__(
        self,
        model: nn.Module,
        c: float = 1.0,
        kappa: float = 0,
        num_iter: int = 1000,
        lr: float = 0.01
    ):
        """
        Args:
            c: Weight for classification loss
            kappa: Confidence margin
            num_iter: Optimization steps
            lr: Learning rate
        """
        self.model = model
        self.c = c
        self.kappa = kappa
        self.num_iter = num_iter
        self.lr = lr
    
    def attack(
        self,
        images: torch.Tensor,
        labels: torch.Tensor,
        targeted: bool = False,
        target_labels: Optional[torch.Tensor] = None
    ) -> torch.Tensor:
        """Generate C&W adversarial examples."""
        
        batch_size = images.shape[0]
        
        # Use tanh space for box constraints
        # x = 0.5 * (tanh(w) + 1)
        w = torch.arctanh(2 * images - 1).clone().detach().requires_grad_(True)
        
        optimizer = torch.optim.Adam([w], lr=self.lr)
        
        for _ in range(self.num_iter):
            optimizer.zero_grad()
            
            # Convert back to image space
            adv_images = 0.5 * (torch.tanh(w) + 1)
            
            # Forward pass
            outputs = self.model(adv_images)
            
            # L2 distance loss
            l2_loss = ((adv_images - images) ** 2).sum(dim=(1, 2, 3)).mean()
            
            # Classification loss
            if targeted:
                # Minimize f(x_adv) for target class
                target_logits = outputs.gather(1, target_labels.unsqueeze(1)).squeeze()
                other_logits = outputs.clone()
                other_logits.scatter_(1, target_labels.unsqueeze(1), float('-inf'))
                max_other = other_logits.max(dim=1)[0]
                
                f_loss = F.relu(max_other - target_logits + self.kappa).mean()
            else:
                # Maximize loss for true class
                true_logits = outputs.gather(1, labels.unsqueeze(1)).squeeze()
                other_logits = outputs.clone()
                other_logits.scatter_(1, labels.unsqueeze(1), float('-inf'))
                max_other = other_logits.max(dim=1)[0]
                
                f_loss = F.relu(true_logits - max_other + self.kappa).mean()
            
            # Total loss
            loss = l2_loss + self.c * f_loss
            
            loss.backward()
            optimizer.step()
        
        # Final adversarial images
        adv_images = 0.5 * (torch.tanh(w) + 1)
        
        return adv_images.detach()

Adversarial Defenses

Adversarial Training

The most effective empirical defense — and conceptually the simplest: train on adversarial examples. At each training step, generate adversarial perturbations of the current batch using PGD, then update the model weights to correctly classify those adversarial examples. The model learns to be robust by constantly facing worst-case inputs. The cost: adversarial training is 5-10x slower than standard training because each training step requires running PGD (multiple forward+backward passes for attack generation) before the actual weight update. For CIFAR-10, this means training takes 1-2 days on a single GPU instead of a few hours.
The accuracy-robustness trade-off is real and unavoidable. Adversarially trained models consistently achieve 5-15% lower clean accuracy than their standard counterparts. On CIFAR-10, standard training achieves approximately 95% clean accuracy; PGD adversarial training achieves approximately 85% clean accuracy and approximately 50% robust accuracy (against PGD-20 with epsilon=8/255). This is not a bug — it appears to be a fundamental property of the problem, supported by theoretical lower bounds.
class AdversarialTrainer:
    """
    Adversarial training framework.
    
    Key insight: Train on worst-case perturbations.
    This implements the min-max formulation:
        min_theta max_delta L(f_theta(x + delta), y)
    The inner maximization (PGD) finds the worst-case input;
    the outer minimization (SGD/Adam) updates weights to handle it.
    """
    
    def __init__(
        self,
        model: nn.Module,
        optimizer: torch.optim.Optimizer,
        epsilon: float = 0.03,
        attack_steps: int = 10,
        attack_lr: float = 0.01
    ):
        self.model = model
        self.optimizer = optimizer
        self.epsilon = epsilon
        self.attack_steps = attack_steps
        self.attack_lr = attack_lr
        
        self.pgd = PGD(
            model,
            epsilon=epsilon,
            alpha=attack_lr,
            num_iter=attack_steps
        )
    
    def train_step(
        self,
        images: torch.Tensor,
        labels: torch.Tensor
    ) -> Tuple[float, float]:
        """
        Single adversarial training step.
        
        Returns:
            clean_loss: Loss on clean examples
            adv_loss: Loss on adversarial examples
        """
        self.model.train()
        
        # Generate adversarial examples
        self.model.eval()
        adv_images = self.pgd.attack(images, labels)
        self.model.train()
        
        # Train on adversarial examples
        self.optimizer.zero_grad()
        
        adv_outputs = self.model(adv_images)
        adv_loss = F.cross_entropy(adv_outputs, labels)
        
        adv_loss.backward()
        self.optimizer.step()
        
        # Compute clean loss for monitoring
        with torch.no_grad():
            clean_outputs = self.model(images)
            clean_loss = F.cross_entropy(clean_outputs, labels)
        
        return clean_loss.item(), adv_loss.item()
    
    def train_epoch(self, dataloader):
        """Train for one epoch."""
        
        total_clean_loss = 0
        total_adv_loss = 0
        n_batches = 0
        
        for images, labels in dataloader:
            clean_loss, adv_loss = self.train_step(images, labels)
            total_clean_loss += clean_loss
            total_adv_loss += adv_loss
            n_batches += 1
        
        return total_clean_loss / n_batches, total_adv_loss / n_batches


class TRADESTrainer:
    """
    TRADES: TRadeoff-inspired Adversarial Defense via Surrogate-loss minimization.
    
    Key insight: standard adversarial training lumps clean accuracy and robustness
    into one loss. TRADES separates them, giving you a knob (beta) to control
    the trade-off: loss = CE(f(x), y) + beta * KL(f(x_adv) || f(x))
    The first term pushes for clean accuracy; the second pushes for
    consistent predictions between clean and adversarial inputs.
    """
    
    def __init__(
        self,
        model: nn.Module,
        optimizer: torch.optim.Optimizer,
        epsilon: float = 0.03,
        beta: float = 6.0,  # Robustness weight
        attack_steps: int = 10
    ):
        self.model = model
        self.optimizer = optimizer
        self.epsilon = epsilon
        self.beta = beta
        self.attack_steps = attack_steps
    
    def train_step(
        self,
        images: torch.Tensor,
        labels: torch.Tensor
    ) -> float:
        """TRADES training step."""
        
        self.model.eval()
        
        # Generate adversarial examples (maximize KL divergence)
        adv_images = images.clone().detach()
        adv_images += torch.empty_like(adv_images).uniform_(-self.epsilon, self.epsilon)
        adv_images = torch.clamp(adv_images, 0, 1)
        
        with torch.no_grad():
            natural_outputs = self.model(images)
        
        for _ in range(self.attack_steps):
            adv_images = adv_images.clone().detach().requires_grad_(True)
            
            adv_outputs = self.model(adv_images)
            
            # KL divergence from natural outputs
            loss = F.kl_div(
                F.log_softmax(adv_outputs, dim=1),
                F.softmax(natural_outputs, dim=1),
                reduction='batchmean'
            )
            
            self.model.zero_grad()
            loss.backward()
            
            adv_images = adv_images + (self.epsilon / self.attack_steps) * adv_images.grad.sign()
            adv_images = torch.clamp(
                adv_images,
                images - self.epsilon,
                images + self.epsilon
            )
            adv_images = torch.clamp(adv_images, 0, 1)
        
        # Training step
        self.model.train()
        self.optimizer.zero_grad()
        
        # Natural loss
        natural_outputs = self.model(images)
        natural_loss = F.cross_entropy(natural_outputs, labels)
        
        # Robustness loss (KL divergence)
        adv_outputs = self.model(adv_images)
        robust_loss = F.kl_div(
            F.log_softmax(adv_outputs, dim=1),
            F.softmax(natural_outputs.detach(), dim=1),
            reduction='batchmean'
        )
        
        # Combined loss
        loss = natural_loss + self.beta * robust_loss
        
        loss.backward()
        self.optimizer.step()
        
        return loss.item()

Input Preprocessing Defenses

A hard-earned lesson: Most preprocessing defenses (JPEG compression, bit-depth reduction, spatial smoothing) appear effective when tested against standard PGD, but are completely broken by adaptive attacks that incorporate the preprocessing into the attack loop. If the attacker knows about your defense (which you must assume — Kerckhoffs’ principle), they can differentiate through it or use expectation-over-transformation (EoT) to bypass it. Always evaluate defenses against adaptive attacks, not just off-the-shelf ones.
class InputPreprocessing:
    """Preprocessing-based defenses (generally broken by adaptive attacks)."""
    
    @staticmethod
    def jpeg_compression(images: torch.Tensor, quality: int = 50) -> torch.Tensor:
        """Apply JPEG compression as defense."""
        # Note: This is easily broken by adaptive attacks
        import io
        from PIL import Image
        import torchvision.transforms as T
        
        compressed = []
        for img in images:
            # Convert to PIL
            pil_img = T.ToPILImage()(img)
            
            # Compress
            buffer = io.BytesIO()
            pil_img.save(buffer, format='JPEG', quality=quality)
            buffer.seek(0)
            
            # Reload
            compressed_img = Image.open(buffer)
            compressed.append(T.ToTensor()(compressed_img))
        
        return torch.stack(compressed)
    
    @staticmethod
    def spatial_smoothing(images: torch.Tensor, kernel_size: int = 3) -> torch.Tensor:
        """Apply spatial smoothing."""
        kernel = torch.ones(1, 1, kernel_size, kernel_size) / (kernel_size ** 2)
        
        smoothed = []
        for c in range(images.shape[1]):
            channel = images[:, c:c+1]
            smoothed_channel = F.conv2d(channel, kernel, padding=kernel_size//2)
            smoothed.append(smoothed_channel)
        
        return torch.cat(smoothed, dim=1)
    
    @staticmethod
    def bit_depth_reduction(images: torch.Tensor, bits: int = 4) -> torch.Tensor:
        """Reduce bit depth of images."""
        factor = 2 ** (8 - bits)
        return torch.round(images * 255 / factor) * factor / 255


class RandomizedDefense:
    """
    Randomized defenses add stochasticity to break gradient-based attacks.
    """
    
    @staticmethod
    def random_resize_padding(
        images: torch.Tensor,
        min_size: int = 200,
        max_size: int = 224
    ) -> torch.Tensor:
        """Random resizing and padding."""
        
        batch_size = images.shape[0]
        
        # Random new size
        new_size = torch.randint(min_size, max_size + 1, (1,)).item()
        
        # Resize
        resized = F.interpolate(images, size=new_size, mode='bilinear')
        
        # Random padding to max_size
        pad_total = max_size - new_size
        pad_left = torch.randint(0, pad_total + 1, (1,)).item()
        pad_top = torch.randint(0, pad_total + 1, (1,)).item()
        
        padded = F.pad(
            resized,
            (pad_left, pad_total - pad_left, pad_top, pad_total - pad_top)
        )
        
        return padded

Certified Defenses

Randomized Smoothing

The only scalable approach to certified (provable) robustness on ImageNet-scale models. The core idea: if a classifier gives the same prediction under many random Gaussian perturbations of the input, then the prediction must also be correct for any adversarial perturbation within a certifiable L2 radius. Unlike adversarial training (which is empirical — no guarantees), randomized smoothing gives a mathematical certificate: “no adversarial perturbation within radius rr can change this prediction.”
class RandomizedSmoothing:
    """
    Randomized Smoothing: Certifiably robust classifier.
    
    Key idea: Average predictions over many Gaussian-perturbed copies of the input.
    If class A wins the majority vote by a large margin, we can certify that no
    L2 perturbation within a computed radius can change the prediction.
    The certified radius is: r = sigma * Phi^{-1}(p_A), where p_A is the
    probability that the base classifier returns class A under Gaussian noise.
    """
    
    def __init__(
        self,
        base_classifier: nn.Module,
        sigma: float = 0.25,
        n_samples: int = 100
    ):
        self.base_classifier = base_classifier
        self.sigma = sigma
        self.n_samples = n_samples
    
    def predict(self, x: torch.Tensor) -> torch.Tensor:
        """Smoothed prediction (majority vote)."""
        
        counts = torch.zeros(x.shape[0], 10)  # Assuming 10 classes
        
        with torch.no_grad():
            for _ in range(self.n_samples):
                # Add Gaussian noise
                noise = torch.randn_like(x) * self.sigma
                noisy_x = x + noise
                
                # Get prediction
                outputs = self.base_classifier(noisy_x)
                preds = outputs.argmax(dim=1)
                
                # Count
                for i, pred in enumerate(preds):
                    counts[i, pred] += 1
        
        return counts.argmax(dim=1)
    
    def certify(
        self,
        x: torch.Tensor,
        n_samples: int = 10000,
        alpha: float = 0.001
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Certify robustness radius.
        
        Returns:
            predictions: Certified predictions
            radii: Certified L2 radii
        """
        from scipy.stats import norm, binom_test
        
        # Count predictions
        counts = torch.zeros(x.shape[0], 10)
        
        with torch.no_grad():
            for _ in range(n_samples):
                noise = torch.randn_like(x) * self.sigma
                outputs = self.base_classifier(x + noise)
                preds = outputs.argmax(dim=1)
                
                for i, pred in enumerate(preds):
                    counts[i, pred] += 1
        
        predictions = []
        radii = []
        
        for i in range(x.shape[0]):
            # Top class and count
            top_class = counts[i].argmax().item()
            top_count = counts[i, top_class].item()
            
            # Statistical test for majority
            p_value = binom_test(top_count, n_samples, 0.5)
            
            if p_value < alpha:
                # Compute certified radius
                p_lower = self._lower_confidence_bound(top_count, n_samples, alpha)
                radius = self.sigma * norm.ppf(p_lower)
                
                predictions.append(top_class)
                radii.append(max(0, radius))
            else:
                predictions.append(-1)  # Abstain
                radii.append(0)
        
        return torch.tensor(predictions), torch.tensor(radii)
    
    def _lower_confidence_bound(
        self,
        successes: int,
        trials: int,
        alpha: float
    ) -> float:
        """Compute lower confidence bound using Clopper-Pearson."""
        from scipy.stats import beta
        return beta.ppf(alpha, successes, trials - successes + 1)


class IBPCertifiedDefense:
    """
    Interval Bound Propagation for certified defense.
    
    Propagates bounds through the network to certify robustness.
    """
    
    def __init__(self, model: nn.Module, epsilon: float = 0.03):
        self.model = model
        self.epsilon = epsilon
    
    def compute_bounds(
        self,
        x: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Compute output bounds for epsilon-ball around x.
        
        Returns:
            lower_bounds: Lower bound on each output
            upper_bounds: Upper bound on each output
        """
        # Initial bounds
        lower = x - self.epsilon
        upper = x + self.epsilon
        
        for layer in self.model:
            if isinstance(layer, nn.Linear):
                lower, upper = self._linear_bounds(layer, lower, upper)
            elif isinstance(layer, nn.ReLU):
                lower, upper = self._relu_bounds(lower, upper)
            elif isinstance(layer, nn.Flatten):
                lower = lower.flatten(start_dim=1)
                upper = upper.flatten(start_dim=1)
        
        return lower, upper
    
    def _linear_bounds(
        self,
        layer: nn.Linear,
        lower: torch.Tensor,
        upper: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Propagate bounds through linear layer."""
        
        weight = layer.weight
        bias = layer.bias if layer.bias is not None else 0
        
        # Positive and negative weights
        pos_weight = F.relu(weight)
        neg_weight = -F.relu(-weight)
        
        # New bounds
        new_lower = (
            F.linear(lower, pos_weight) +
            F.linear(upper, neg_weight) +
            bias
        )
        new_upper = (
            F.linear(upper, pos_weight) +
            F.linear(lower, neg_weight) +
            bias
        )
        
        return new_lower, new_upper
    
    def _relu_bounds(
        self,
        lower: torch.Tensor,
        upper: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Propagate bounds through ReLU."""
        return F.relu(lower), F.relu(upper)
    
    def certified_accuracy(
        self,
        x: torch.Tensor,
        labels: torch.Tensor
    ) -> float:
        """Compute certified accuracy."""
        
        lower, upper = self.compute_bounds(x)
        
        # Check if true class lower bound > all other upper bounds
        certified = 0
        
        for i in range(x.shape[0]):
            true_class = labels[i].item()
            true_lower = lower[i, true_class]
            
            # Mask true class
            other_upper = upper[i].clone()
            other_upper[true_class] = float('-inf')
            max_other = other_upper.max()
            
            if true_lower > max_other:
                certified += 1
        
        return certified / x.shape[0]

Robust Architecture Design

class RobustArchitectureDesign:
    """
    Architectural choices that improve robustness.
    """
    
    @staticmethod
    def create_robust_cnn():
        """CNN with robustness-enhancing features."""
        
        return nn.Sequential(
            # Larger kernel sizes (more robust to small perturbations)
            nn.Conv2d(3, 64, 5, padding=2),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2),
            
            # Smooth activation functions
            nn.Conv2d(64, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.SiLU(),  # Smoother than ReLU
            nn.MaxPool2d(2),
            
            nn.Conv2d(128, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.SiLU(),
            nn.AdaptiveAvgPool2d(1),
            
            nn.Flatten(),
            nn.Linear(256, 10)
        )
    
    @staticmethod
    def lipschitz_constrained_layer(
        in_features: int,
        out_features: int,
        lipschitz_bound: float = 1.0
    ) -> nn.Module:
        """Linear layer with Lipschitz constraint."""
        
        class LipschitzLinear(nn.Module):
            def __init__(self):
                super().__init__()
                self.weight = nn.Parameter(
                    torch.randn(out_features, in_features) * 0.01
                )
                self.bias = nn.Parameter(torch.zeros(out_features))
                self.bound = lipschitz_bound
            
            def forward(self, x):
                # Spectral normalization
                u = torch.randn(self.weight.shape[1], 1, device=x.device)
                
                for _ in range(3):  # Power iteration
                    v = self.weight @ u
                    v = v / v.norm()
                    u = self.weight.T @ v
                    u = u / u.norm()
                
                sigma = (v.T @ self.weight @ u).item()
                
                # Scale weight if needed
                weight = self.weight
                if sigma > self.bound:
                    weight = weight * self.bound / sigma
                
                return F.linear(x, weight, self.bias)
        
        return LipschitzLinear()


class WideResNetRobust(nn.Module):
    """
    Wide ResNet architecture commonly used for adversarial training.
    Wider networks tend to be more robust.
    """
    
    def __init__(self, depth: int = 28, widen_factor: int = 10, num_classes: int = 10):
        super().__init__()
        
        nChannels = [16, 16 * widen_factor, 32 * widen_factor, 64 * widen_factor]
        
        self.conv1 = nn.Conv2d(3, nChannels[0], 3, padding=1)
        
        self.block1 = self._make_block(nChannels[0], nChannels[1], depth // 6)
        self.block2 = self._make_block(nChannels[1], nChannels[2], depth // 6, stride=2)
        self.block3 = self._make_block(nChannels[2], nChannels[3], depth // 6, stride=2)
        
        self.bn = nn.BatchNorm2d(nChannels[3])
        self.relu = nn.ReLU()
        self.fc = nn.Linear(nChannels[3], num_classes)
    
    def _make_block(self, in_c, out_c, num_blocks, stride=1):
        layers = [self._residual_block(in_c, out_c, stride)]
        for _ in range(1, num_blocks):
            layers.append(self._residual_block(out_c, out_c))
        return nn.Sequential(*layers)
    
    def _residual_block(self, in_c, out_c, stride=1):
        return nn.Sequential(
            nn.BatchNorm2d(in_c),
            nn.ReLU(),
            nn.Conv2d(in_c, out_c, 3, stride=stride, padding=1),
            nn.BatchNorm2d(out_c),
            nn.ReLU(),
            nn.Conv2d(out_c, out_c, 3, padding=1)
        )
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.relu(self.bn(x))
        x = F.adaptive_avg_pool2d(x, 1)
        x = x.view(x.size(0), -1)
        return self.fc(x)

Robustness Evaluation

class RobustnessEvaluator:
    """Comprehensive robustness evaluation."""
    
    def __init__(self, model: nn.Module):
        self.model = model
        self.model.eval()
    
    def evaluate(
        self,
        test_loader,
        epsilon: float = 0.03
    ) -> Dict[str, float]:
        """Full robustness evaluation."""
        
        results = {
            'clean_accuracy': 0,
            'fgsm_accuracy': 0,
            'pgd_accuracy': 0,
            'pgd_20_accuracy': 0,
            'autopgd_accuracy': 0
        }
        
        fgsm = FGSM(self.model, epsilon)
        pgd_10 = PGD(self.model, epsilon, num_iter=10)
        pgd_20 = PGD(self.model, epsilon, num_iter=20)
        autopgd = AutoPGD(self.model, epsilon)
        
        n_correct = {k: 0 for k in results}
        n_total = 0
        
        for images, labels in test_loader:
            n_total += len(labels)
            
            with torch.no_grad():
                # Clean accuracy
                clean_pred = self.model(images).argmax(dim=1)
                n_correct['clean_accuracy'] += (clean_pred == labels).sum().item()
            
            # FGSM
            fgsm_images = fgsm.attack(images, labels)
            with torch.no_grad():
                fgsm_pred = self.model(fgsm_images).argmax(dim=1)
                n_correct['fgsm_accuracy'] += (fgsm_pred == labels).sum().item()
            
            # PGD-10
            pgd_images = pgd_10.attack(images, labels)
            with torch.no_grad():
                pgd_pred = self.model(pgd_images).argmax(dim=1)
                n_correct['pgd_accuracy'] += (pgd_pred == labels).sum().item()
            
            # PGD-20
            pgd20_images = pgd_20.attack(images, labels)
            with torch.no_grad():
                pgd20_pred = self.model(pgd20_images).argmax(dim=1)
                n_correct['pgd_20_accuracy'] += (pgd20_pred == labels).sum().item()
            
            # AutoPGD
            autopgd_images = autopgd.attack(images, labels)
            with torch.no_grad():
                auto_pred = self.model(autopgd_images).argmax(dim=1)
                n_correct['autopgd_accuracy'] += (auto_pred == labels).sum().item()
        
        for k in results:
            results[k] = n_correct[k] / n_total
        
        return results
    
    def robustness_curve(
        self,
        images: torch.Tensor,
        labels: torch.Tensor,
        epsilons: list = [0.01, 0.02, 0.03, 0.05, 0.1, 0.2]
    ) -> Dict[float, float]:
        """Accuracy vs epsilon curve."""
        
        results = {}
        
        for eps in epsilons:
            pgd = PGD(self.model, eps, num_iter=20)
            adv_images = pgd.attack(images, labels)
            
            with torch.no_grad():
                preds = self.model(adv_images).argmax(dim=1)
                accuracy = (preds == labels).float().mean().item()
            
            results[eps] = accuracy
        
        return results


def evaluate_robustness():
    """Evaluation guidelines."""
    
    guidelines = """
    ╔════════════════════════════════════════════════════════════════╗
    ║               ROBUSTNESS EVALUATION GUIDELINES                 ║
    ╠════════════════════════════════════════════════════════════════╣
    ║                                                                ║
    ║  1. STANDARD EVALUATIONS                                       ║
    ║     • Clean accuracy (baseline)                                ║
    ║     • FGSM accuracy (weak attack)                              ║
    ║     • PGD-20 with restarts (strong attack)                     ║
    ║     • AutoAttack (state-of-the-art)                            ║
    ║                                                                ║
    ║  2. EPSILON RANGES (L∞ normalized to [0,1])                    ║
    ║     • MNIST: ε = 0.3                                           ║
    ║     • CIFAR-10: ε = 8/255 ≈ 0.031                              ║
    ║     • ImageNet: ε = 4/255 ≈ 0.016                              ║
    ║                                                                ║
    ║  3. AVOID COMMON PITFALLS                                      ║
    ║     • Don't rely on weak attacks                               ║
    ║     • Use adaptive attacks for defense evaluation              ║
    ║     • Report worst-case across multiple attacks                ║
    ║     • Include certified accuracy if applicable                 ║
    ║                                                                ║
    ║  4. BENCHMARKS                                                 ║
    ║     • RobustBench: robustbench.github.io                       ║
    ║     • AutoAttack: standardized evaluation                      ║
    ║                                                                ║
    ╚════════════════════════════════════════════════════════════════╝
    """
    print(guidelines)

evaluate_robustness()

Exercises

Implement the query-efficient Square Attack:
class SquareAttack:
    # Black-box attack using only model outputs
    # No gradients needed!
    pass
Compare TRADES and standard PGD adversarial training:
  • Train models with both methods
  • Compare clean vs robust accuracy tradeoff
  • Evaluate with AutoAttack
Implement and evaluate randomized smoothing:
  • Train a smoothed classifier
  • Compute certified radii
  • Plot certified accuracy vs radius

Training Tips

Practical adversarial robustness checklist a senior ML engineer would follow:
  • Never evaluate robustness with only FGSM. FGSM is a weak attack. Models that appear robust against FGSM often crumble against PGD-20 with random restarts. Use AutoAttack as the minimum standard for robustness claims.
  • Always use the correct epsilon for your domain. Standard benchmarks: MNIST epsilon=0.3 (L-inf), CIFAR-10 epsilon=8/255, ImageNet epsilon=4/255. Using the wrong epsilon makes results incomparable to the literature.
  • Budget for the accuracy-robustness trade-off. Adversarial training will cost you 5-15% clean accuracy. Communicate this trade-off to stakeholders before starting.
  • Use a wider model. Robust models need more capacity than standard models. WideResNet-28-10 is the standard backbone for adversarial training on CIFAR-10; don’t expect a ResNet-18 to achieve competitive robust accuracy.
  • Early stopping on robust accuracy, not clean accuracy. Overfitting to adversarial training data manifests as decreasing robust test accuracy while clean test accuracy stays flat or increases. Monitor both throughout training.
  • For certified robustness, train with Gaussian noise augmentation. The base classifier in randomized smoothing performs best when trained on Gaussian-corrupted inputs matching the smoothing sigma.

Interview Deep-Dive

Strong Answer:
  • FGSM (Fast Gradient Sign Method): a single-step attack. Compute the gradient of the loss with respect to the input, take its sign, and scale by epsilon. The adversarial image is xadv=x+ϵsign(xL)x_{adv} = x + \epsilon \cdot \text{sign}(\nabla_x L). It’s fast (one forward + one backward pass) but weak because the loss landscape is highly non-linear and a single gradient step often doesn’t find the worst-case perturbation.
  • PGD (Projected Gradient Descent): an iterative version of FGSM. Take many small FGSM steps (step size alpha, typically alpha = 2.5 * epsilon / num_iter), and after each step project the perturbation back into the L-infinity epsilon ball. Start from a random point within the ball (random restart) to avoid local optima. With enough iterations (20-40) and random restarts (10+), PGD is provably near-optimal among first-order attacks.
  • Why PGD is the gold standard: Madry et al. (2018) showed that PGD’s inner maximization problem is approximately solved by iterative projected gradient ascent, and that adversarial training against PGD produces models robust to all first-order attacks. The key theoretical insight is that the adversarial loss landscape, while non-convex, has no problematic local maxima in practice — PGD reliably finds near-optimal adversarial examples.
  • Limitations of PGD: (1) it only finds first-order adversarial examples — second-order attacks or optimization-based attacks (C&W) can sometimes find smaller perturbations; (2) PGD is slow for large epsilon budgets or high-resolution images; (3) PGD evaluates L-infinity robustness by default, but real-world attacks may use other threat models (L2, spatial transformations, color shifts). For this reason, AutoAttack (a standardized ensemble of four diverse attacks) has become the recommended evaluation protocol.
  • A senior engineer would note: the number of PGD restarts matters enormously. Evaluating with PGD-20 (20 steps, no restarts) can overestimate robustness by 5-10% compared to PGD-20 with 10 random restarts. Always report the attack configuration precisely.
Follow-up: When would you use C&W attack over PGD?C&W is an optimization-based attack that minimizes perturbation size rather than maximizing loss within a fixed budget. Use C&W when you need to demonstrate that very small perturbations suffice for misclassification (e.g., to argue that a model is fundamentally vulnerable). C&W is 50-100x slower than PGD but finds perturbations that are often 2-3x smaller in L2 norm. In practice, PGD is used for training and routine evaluation; C&W is used for thorough vulnerability assessment and when writing papers.
Strong Answer:
  • The empirical observation: adversarially trained models consistently achieve lower clean accuracy than standard models. On CIFAR-10: standard training gets approximately 95% clean accuracy, while PGD adversarial training (epsilon=8/255) gets approximately 85% clean accuracy and approximately 50-55% PGD-robust accuracy. On ImageNet: standard models hit approximately 80% top-1; adversarially robust models hit approximately 65% clean and approximately 35% robust.
  • Is it fundamental? There is growing theoretical and empirical evidence that the trade-off is inherent to the problem, not just a limitation of current algorithms. Tsipras et al. (2019) showed that in certain data distributions, robust classifiers must use fundamentally different features than accurate classifiers — robust features are more semantically meaningful but less predictive. Zhang et al. (TRADES, 2019) proved a decomposition: robustness error is bounded by the sum of clean error and a boundary complexity term, suggesting you can’t minimize both simultaneously.
  • The concrete reason: standard models exploit “non-robust features” — statistical patterns in the data that are predictive of the class label but are fragile under small perturbations. These features actually contain real signal (not just noise), which is why standard models that use them achieve higher accuracy. Adversarial training forces the model to ignore these features and rely only on “robust features” (patterns that survive perturbation), which are fewer and less discriminative.
  • TRADES addresses the trade-off explicitly: its loss function L=CE(f(x),y)+βKL(f(xadv)f(x))L = CE(f(x), y) + \beta \cdot KL(f(x_{adv}) \| f(x)) lets you tune beta to control the clean-robust balance. Higher beta means more robustness at the cost of clean accuracy. The optimal beta depends on the deployment context — a self-driving car system should favor robustness; a photo tagging system might favor clean accuracy.
  • A senior engineer would add: in production, the trade-off means you need separate models for adversarial and non-adversarial settings, or an ensemble that routes inputs based on threat detection. Don’t deploy a single adversarially trained model for all use cases — you’re paying the clean accuracy cost even when there’s no adversary.
Strong Answer:
  • Adversarial training (empirical): train on PGD-generated adversarial examples. Achieves the best empirical robustness — approximately 60% robust accuracy on CIFAR-10 at epsilon=8/255 for state-of-the-art models (WideResNet-70-16 with extra data). No formal guarantees: a sufficiently clever attacker might find perturbations that break the model. Training cost is 5-10x standard training.
  • Randomized smoothing (certified): wrap any base classifier with Gaussian noise averaging. Provides a formal certificate: “for this specific input, no L2 perturbation within radius rr can change the prediction.” Certified accuracy on CIFAR-10 at L2 epsilon=0.5: approximately 60%. The downside: inference requires 100-10,000 forward passes per input (one per noise sample), making it 100-10,000x slower. And the certificates are per-input — some inputs get large radii, others get small radii or abstain.
  • When to use adversarial training: when you need low-latency inference and are defending against known threat models (e.g., L-infinity perturbations). The lack of formal guarantees is acceptable when the attack surface is well-characterized and you evaluate with AutoAttack. Best for: image classification, content moderation, any system where you can tolerate empirical robustness.
  • When to use randomized smoothing: when you need a formal guarantee that a specific prediction is correct, regardless of what the attacker does. The guarantee is legally and contractually meaningful — you can certify that “this medical image classification is provably correct within this perturbation radius.” Best for: safety-critical systems (medical imaging, autonomous vehicles), regulatory compliance, or as a certification layer on top of adversarial training.
  • Hybrid approach: adversarially train the base classifier, then wrap it with randomized smoothing. This gives you the best of both worlds: strong empirical robustness from adversarial training, plus formal certificates from smoothing. The certified radii are larger than smoothing alone because the base classifier is already robust to moderate perturbations.
Follow-up: What about the computational cost of randomized smoothing in production?The 100-10,000 forward passes per input can be batched, so on a GPU the actual latency is 10-100x (not 10,000x). For batch prediction (not real-time), this is often acceptable. For real-time inference, you can reduce the number of samples (at the cost of smaller certified radii or higher abstention rate). In practice, many production systems use a two-stage approach: fast standard prediction for most inputs, with randomized smoothing triggered only for high-stakes decisions or when the model’s confidence is low.
Strong Answer:
  • The appeal: JPEG compression removes high-frequency components from images, and adversarial perturbations often contain high-frequency patterns. In initial testing against standard FGSM or PGD, JPEG preprocessing may appear to reduce attack success rate by 20-40%. It’s also trivial to implement — literally one line of code.
  • Why it fails: this defense has been thoroughly broken by adaptive attacks. The key insight: if the attacker knows JPEG compression is being applied (which we must assume per Kerckhoffs’ principle), they can incorporate the JPEG operation into their attack loop. JPEG is differentiable (or can be approximated with a differentiable proxy), so PGD through the JPEG layer finds adversarial examples that survive compression. In the original paper by Dziugaite et al. and later analysis by Athalye et al. (2018, “Obfuscated Gradients”), JPEG defense was reduced to near-zero effectiveness against adaptive attacks.
  • The general principle — “obfuscated gradients”: JPEG compression is one example of a broader failure mode: defenses that appear to work because they break the gradient computation that attacks rely on, without actually making the model robust. Three types: (1) shattered gradients (non-differentiable operations like JPEG), (2) stochastic gradients (random transformations), (3) vanishing/exploding gradients (very deep preprocessing). All three have been systematically broken using techniques like Expectation over Transformations (EoT), backward pass differentiable approximations (BPDA), or C&W-style optimization that bypasses gradients entirely.
  • What to recommend instead: adversarial training is the only preprocessing-free defense with sustained empirical success. If the colleague wants a lightweight defense, suggest certified defenses (randomized smoothing) or at minimum, ensemble adversarial training. But always evaluate against adaptive attacks before claiming robustness.
  • A senior engineer would add: the history of adversarial defenses is littered with papers that claimed robustness, got accepted to top venues, and were broken within months by adaptive attacks. The lesson: never evaluate a defense only against standard (non-adaptive) attacks. Always assume the attacker knows your defense and has white-box access to the model. If your defense only works against oblivious attackers, it’s not a defense — it’s security through obscurity.
Strong Answer:
  • Step 1: Establish clean accuracy baseline. Evaluate on the standard test set without any attack. This is your upper bound. Record top-1 and top-5 accuracy plus per-class accuracy (robustness often varies significantly across classes).
  • Step 2: FGSM evaluation (weak attack, fast sanity check). Run FGSM at the standard epsilon for your domain (8/255 for CIFAR-10-like, 4/255 for ImageNet-like). If the model is not robust to FGSM, there’s no point running stronger attacks — go directly to adversarial training. FGSM takes seconds to run on the full test set.
  • Step 3: PGD-20 with 5 random restarts (strong first-order attack). This is the standard benchmark attack. Report accuracy at the standard epsilon. Expected robust accuracy for a well-trained adversarially robust model: 50-60% on CIFAR-10, 30-40% on ImageNet. This step takes 10-30 minutes on a GPU.
  • Step 4: AutoAttack (standardized evaluation). Run the full AutoAttack suite: APGD-CE, APGD-DLR, FAB attack, and Square Attack (black-box). AutoAttack is the community standard for robustness claims — results are comparable across papers. This takes 1-4 hours on a GPU.
  • Step 5: Robustness curve. Plot accuracy vs epsilon for epsilon in [0, 0.01, 0.02, …, 0.1]. This shows the full picture: at what perturbation level does the model break? The curve should degrade gracefully, not cliff-dive at a specific epsilon.
  • Thresholds (CIFAR-10, epsilon=8/255): AutoAttack robust accuracy above 50% is competitive, above 55% is strong, above 60% is state-of-the-art (as of 2025, per RobustBench leaderboard). Clean accuracy should be above 80% (below 80% suggests the robustness came at too high a clean accuracy cost, or training went wrong).
  • Step 6: Per-class robustness analysis. Some classes are inherently harder to defend (e.g., “cat” vs “dog” are more confusable than “airplane” vs “frog”). Report per-class robust accuracy and flag classes where robustness drops below a minimum threshold.
  • Production integration: run this pipeline as a CI/CD step on every model checkpoint. Track robustness metrics over time on a dashboard alongside clean accuracy. Set alerts for robustness regression (e.g., AutoAttack accuracy drops more than 2% between model versions).

What’s Next?

Efficient Architectures

MobileNet, ShuffleNet, efficiency techniques

Knowledge Distillation

Transfer knowledge between models