Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Kubernetes Workloads

Learn how to manage applications using Kubernetes controllers. Think of controllers as autopilots for your applications: you declare the desired state (three replicas, this container image, this much memory), and the controller continuously works to make reality match your declaration. If a pod crashes, the controller replaces it. If a node goes down, the controller reschedules pods elsewhere. You stop managing individual processes and start managing intent.

1. Deployments

The standard way to run stateless applications (web servers, APIs, microservices). A Deployment manages ReplicaSets, which in turn manage Pods. You almost never create ReplicaSets or Pods directly — Deployments handle the orchestration for you.
  • ReplicaSet: Ensures the specified number of identical pod copies are always running.
  • Rolling Updates: Replaces pods incrementally, so your application stays available during deploys.
  • Rollbacks: Kubernetes keeps previous ReplicaSets around (scaled to zero), so you can revert instantly if a deploy goes wrong.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx                # Labels are how Kubernetes identifies and groups resources
spec:
  replicas: 3                 # Run 3 identical pods -- provides HA and load distribution
  selector:
    matchLabels:
      app: nginx              # Must match template labels -- this is how the Deployment finds its pods
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1       # At most 1 pod can be down during an update (keeps 2/3 serving traffic)
      maxSurge: 1             # Allow 1 extra pod during update (temporarily 4 pods, then back to 3)
  template:                   # Pod template -- every pod created by this Deployment uses this spec
    metadata:
      labels:
        app: nginx            # Must match selector above
    spec:
      containers:
      - name: nginx
        image: nginx:1.21     # Always pin to a specific version, never use :latest in production
        ports:
        - containerPort: 80
        livenessProbe:        # Kubernetes kills and restarts the pod if this check fails
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 15  # Give the container time to start before checking

Managing Deployments

# Apply the deployment manifest (creates or updates the resource)
kubectl apply -f deployment.yaml

# Scale manually -- useful for quick capacity changes, but prefer HPA for automation
kubectl scale deployment nginx-deployment --replicas=5

# Update the container image -- this triggers a rolling update automatically
# Kubernetes creates a new ReplicaSet and gradually shifts pods from old to new
kubectl set image deployment/nginx-deployment nginx=nginx:1.22

# Watch the rollout progress in real time (blocks until complete or failed)
kubectl rollout status deployment/nginx-deployment

# View rollout history -- each entry is a ReplicaSet revision you can roll back to
# Tip: use --record when applying changes to capture the command that caused each revision
kubectl rollout history deployment/nginx-deployment

# Roll back to the previous version (Kubernetes scales up the old ReplicaSet)
# This is near-instant because the old pods are not rebuilt, just the ReplicaSet is scaled
kubectl rollout undo deployment/nginx-deployment

# Roll back to a specific revision number (from the history output)
kubectl rollout undo deployment/nginx-deployment --to-revision=2

2. StatefulSets

Used for stateful applications (databases, Kafka, Zookeeper, Redis clusters) where each instance has its own identity and data. The analogy: a Deployment is like a row of identical vending machines (any one can serve any request), while a StatefulSet is like numbered lockers (each one stores specific content, and the number matters).
  • Stable Network ID: Pods get predictable names like mysql-0, mysql-1 (not random hashes). Each pod gets a DNS entry like mysql-0.mysql.default.svc.cluster.local.
  • Stable Storage: Each pod gets its own PersistentVolumeClaim, and that claim is preserved even if the pod is deleted and recreated. The data follows the identity.
  • Ordered Deployment: Pods start in order (0, then 1, then 2). Pod 1 does not start until pod 0 is running and ready. This is critical for databases with primary/replica topology.
  • Ordered Termination: Pods are deleted in reverse order (2, then 1, then 0). Replicas shut down before the primary.
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"        # Required: a Headless Service that provides DNS entries for each pod
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "password"   # In production, use a Secret instead of a plain-text password
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql  # MySQL data directory -- each pod gets its own volume here
  volumeClaimTemplates:       # Each pod gets a unique PVC created from this template
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]  # Each volume is bound to one pod (standard for databases)
      resources:
        requests:
          storage: 10Gi       # Size per pod -- mysql-0 gets 10Gi, mysql-1 gets 10Gi, etc.
When to use StatefulSet vs Deployment: If your application stores data locally and each instance needs its own persistent volume, use a StatefulSet. If your application is stateless (stores data in an external database, object storage, or cache), use a Deployment. When in doubt, ask: “Can I destroy any instance and have a new one take its place without data loss?” If yes, Deployment. If no, StatefulSet.

3. DaemonSets

Ensures that all (or some) nodes run a copy of a Pod. Think of a DaemonSet as placing an agent on every machine in your fleet — when a new node joins the cluster, the DaemonSet automatically schedules a pod on it. When a node is removed, the pod is garbage collected.
  • Use cases: Log collectors (Fluentd, Filebeat), monitoring agents (Prometheus Node Exporter, Datadog agent), network plugins (CNI), storage drivers.
  • Key difference from Deployments: A Deployment runs N replicas spread across nodes. A DaemonSet runs exactly one pod per node (or per qualifying node).
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      name: fluentd
  template:
    metadata:
      labels:
        name: fluentd
    spec:
      tolerations:              # Tolerations let this DaemonSet run on ALL nodes, even master nodes
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule      # Without this, DaemonSet pods skip the control plane nodes
      containers:
      - name: fluentd
        image: fluentd:v1
        resources:
          limits:
            memory: 200Mi       # Always set resource limits on DaemonSet pods --
          requests:             # they run on EVERY node, so overuse multiplies across the cluster
            cpu: 100m
            memory: 100Mi
        volumeMounts:           # Log collectors typically need access to host log directories
        - name: varlog
          mountPath: /var/log
          readOnly: true
      volumes:
      - name: varlog
        hostPath:               # Mount the node's /var/log into the pod
          path: /var/log

4. Jobs & CronJobs

Job

Runs a pod until it completes successfully (exit code 0), then stops. Unlike Deployments which run forever, Jobs are for finite work. The analogy: a Deployment is a permanent employee, while a Job is a contractor hired for a specific task.
  • Use cases: Database migrations, batch data processing, report generation, one-off admin tasks.
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: myapp:v2
        command: ["python", "manage.py", "migrate"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:           # Pull credentials from a Kubernetes Secret, not hardcoded
              name: db-credentials
              key: url
      restartPolicy: Never          # Never = do not restart the container on failure (create new pod instead)
                                    # OnFailure = restart the container in the same pod
  backoffLimit: 4                   # Retry up to 4 times on failure before marking the Job as failed
  activeDeadlineSeconds: 300        # Kill the Job if it runs longer than 5 minutes (prevents hung migrations)

CronJob

Creates Jobs on a schedule, like Linux cron but managed by Kubernetes. Useful for periodic tasks that should run in your cluster environment.
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
spec:
  schedule: "0 2 * * *"            # Run at 2:00 AM every day (standard cron syntax)
  concurrencyPolicy: Forbid         # Do not start a new Job if the previous one is still running
                                    # Options: Allow (default), Forbid, Replace
  successfulJobsHistoryLimit: 3     # Keep the last 3 successful Job records (for debugging)
  failedJobsHistoryLimit: 5         # Keep the last 5 failed Job records
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: backup-tool:latest
            command: ["/bin/sh", "-c", "pg_dump $DATABASE_URL | gzip > /backups/$(date +%Y%m%d).sql.gz"]
          restartPolicy: OnFailure  # Retry the container within the same pod on failure
CronJob gotcha: By default, concurrencyPolicy is Allow, meaning a new Job starts even if the previous one is still running. For backups or migrations, always set Forbid to prevent overlapping runs that could corrupt data or double-process records.

Key Takeaways

WorkloadUse CaseExample
DeploymentStateless appsWeb servers, APIs
StatefulSetStateful appsDatabases, Kafka
DaemonSetNode agentsLogging, Monitoring
JobOne-off tasksDB Migration
CronJobScheduled tasksBackups

5. Horizontal Pod Autoscaler (HPA)

Automatically scales the number of pods based on observed CPU/memory utilization or custom metrics. Instead of guessing how many replicas you need, HPA watches actual usage and adjusts. Think of it like a restaurant that opens more registers when the line gets long, and closes them when traffic dies down.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-deployment        # The Deployment this HPA controls
  minReplicas: 2                # Never go below 2 pods (maintains availability even at idle)
  maxReplicas: 10               # Cap at 10 pods (cost protection -- prevents runaway scaling)
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale up when average CPU across pods exceeds 70%
  - type: Resource              # You can scale on multiple metrics -- HPA uses whichever triggers first
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale up when average memory exceeds 80%
  behavior:                     # Fine-tune scaling speed (available in autoscaling/v2)
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 minutes before scaling down (prevents flapping)

HPA vs VPA

FeatureHPAVPA
What it scalesNumber of podsResources per pod
Best forStateless appsStateful apps, right-sizing
Can run together?Yes, but don’t use same metricUse different metrics
Interview Tip: HPA requires resource requests to be set. Without requests, HPA cannot calculate utilization percentage.

6. Pod Disruption Budgets (PDB)

Ensures a minimum number of pods remain available during voluntary disruptions (node drains, cluster upgrades, spot instance reclamation). Without a PDB, a node drain could evict all your pods simultaneously, causing downtime. A PDB is your safety net — it tells Kubernetes “you can evict pods from this node, but you must keep at least N running at all times.”
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2        # At least 2 pods must remain running during disruptions
  # OR use the alternative approach:
  # maxUnavailable: 1    # At most 1 pod can be down at any time
  # Choose one style: minAvailable is easier to reason about for small replica counts,
  # maxUnavailable scales better (e.g., "1 pod down" works whether you have 3 or 30 replicas)
  selector:
    matchLabels:
      app: api
How it works in practice: When you run kubectl drain node-3 to take a node offline for maintenance, Kubernetes checks PDBs before evicting each pod. If evicting the next pod would violate a PDB (taking the running count below minAvailable), the drain command pauses and waits. This is also respected during cluster autoscaler scale-down events and spot instance preemptions.

7. Deployment Strategies Deep Dive

Rolling Update (Default)

Updates pods incrementally, ensuring availability.
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 25%   # Max pods that can be unavailable
    maxSurge: 25%         # Max extra pods during update

Recreate

Kills all existing pods before creating new ones. This causes downtime, but guarantees that only one version is running at a time.
  • Use Case: When you cannot run two versions simultaneously — for example, a database migration that changes the schema in a way incompatible with the old code, or an application that holds an exclusive lock on a resource.
strategy:
  type: Recreate
  # No additional configuration needed -- Kubernetes simply scales old pods to 0, then creates new ones

Blue-Green Deployment (Manual with Services)

Run two identical environments. Switch traffic instantly by updating the Service selector. The advantage over rolling updates: if the new version has problems, you switch back in seconds because the old pods are still running.
# Blue deployment (current version, currently receiving traffic)
metadata:
  name: app-blue
  labels:
    version: blue

# Green deployment (new version, deployed but NOT receiving traffic yet)
metadata:
  name: app-green
  labels:
    version: green

# Service selector determines which deployment gets traffic
# To cut over: change 'blue' to 'green' and apply
# To rollback: change 'green' back to 'blue'
selector:
  version: blue  # Switch to 'green' when you have validated the new version

Canary Deployment

Route a small percentage of traffic to the new version to validate it with real users before a full rollout. If metrics look good, increase the percentage. If something breaks, only a small fraction of users are affected.
  • Requires an Ingress controller (like Nginx Ingress with canary annotations) or a Service Mesh (Istio, Linkerd) for fine-grained traffic splitting.
  • Without these tools, you can approximate canary deploys by running a small Deployment alongside the main one, both behind the same Service.

8. Pod Affinity & Anti-Affinity

Control where pods are scheduled based on node labels and the presence of other pods. Affinity is about attraction (“schedule me near X”), anti-affinity is about repulsion (“schedule me away from X”). Together, they give you fine-grained control over pod placement for performance, availability, and compliance requirements.

Node Affinity

Schedule pods on nodes with specific labels. Use this when your workload has hardware requirements (SSD storage, GPU, specific CPU architecture) or geographic constraints (must run in a specific availability zone).
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:  # Hard rule: MUST match, or pod stays pending
      nodeSelectorTerms:
      - matchExpressions:
        - key: disktype
          operator: In
          values:
          - ssd                    # Only schedule on nodes labeled disktype=ssd
    # preferredDuringSchedulingIgnoredDuringExecution is the soft version:
    # "Try to place me on SSD nodes, but if none are available, regular nodes are fine"

Pod Anti-Affinity

Spread pods across nodes or zones to avoid single points of failure. This is one of the most important patterns for production availability: without it, the scheduler might place all your replicas on the same node.
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:  # Hard rule: never co-locate with matching pods
    - labelSelector:
        matchLabels:
          app: redis                  # Do not place this pod on a node that already runs a redis pod
      topologyKey: "kubernetes.io/hostname"  # "hostname" means each node is a separate domain
      # Use "topology.kubernetes.io/zone" to spread across availability zones instead
Production best practice: For services with 3+ replicas, prefer preferredDuringSchedulingIgnoredDuringExecution (soft constraint) over required (hard constraint). A hard constraint with topologyKey: zone in a 2-zone cluster means the third replica cannot be scheduled at all — it stays in Pending forever. A soft constraint lets the scheduler place it on a less-ideal node rather than leaving it unscheduled. Use topologyKey: "topology.kubernetes.io/zone" to spread pods across availability zones for high availability.

9. Taints & Tolerations

Taints are applied to nodes to repel pods — like a “Do Not Enter” sign. Tolerations are applied to pods to let them ignore specific taints — like having a badge that lets you through a restricted door. This pair gives you control over which workloads run on which nodes. Real-world examples: Taint GPU nodes so only ML workloads get scheduled there. Taint high-memory nodes for databases only. Taint spot/preemptible nodes so only fault-tolerant batch jobs land on them.
# Taint a node -- no pods will be scheduled here unless they tolerate this taint
kubectl taint nodes node1 gpu=true:NoSchedule

# Remove a taint (note the trailing minus sign)
kubectl taint nodes node1 gpu=true:NoSchedule-
# Pod with toleration -- this pod CAN be scheduled on the tainted node
spec:
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
EffectBehavior
NoScheduleNew pods won’t be scheduled
PreferNoScheduleSoft version - avoid if possible
NoExecuteEvict existing pods + no new scheduling

Interview Questions & Answers

AspectDeploymentStatefulSet
Pod NamesRandom (app-abc123)Ordered (app-0, app-1)
StorageShared or ephemeralDedicated PVC per pod
Scaling OrderParallelSequential (0→1→2)
Use CaseStateless appsDatabases, Kafka
  • ReplicaSet ensures a specified number of pod replicas are running
  • Deployment manages ReplicaSets and provides:
    • Rolling updates
    • Rollback capability
    • Update history
  • You almost never create ReplicaSets directly; use Deployments instead.
  1. A new ReplicaSet is created with the updated pod template
  2. New pods are created in the new ReplicaSet
  3. Old pods are terminated from the old ReplicaSet (respecting maxUnavailable)
  4. Old ReplicaSet is kept for rollback (scaled to 0)
Use Pod Anti-Affinity with topologyKey:
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchLabels:
            app: my-app
        topologyKey: "topology.kubernetes.io/zone"
Or use Pod Topology Spread Constraints for more control.
Init containers run before app containers and are used for:
  • Wait for dependencies (database, service)
  • Clone git repos
  • Run database migrations
  • Generate config files
They run sequentially and must complete successfully before app containers start.
  1. Check logs: kubectl logs <pod> --previous
  2. Describe pod: kubectl describe pod <pod> (check Events)
  3. Check resources: Is the container OOMKilled?
  4. Exec into container: kubectl exec -it <pod> -- sh (if it starts briefly)
  5. Override command: Create a debug pod with command: ["/bin/sleep", "infinity"]

Common Pitfalls

1. Missing Resource Requests and Limits: Without requests, the scheduler does not know how much capacity a pod needs, leading to overcommitted nodes. Without limits, a single runaway pod can consume all node memory and get OOM-killed, or starve other pods of CPU. Always set both.2. No Readiness Probes: Without a readiness probe, Kubernetes routes traffic to a pod the moment the container starts, even if the application has not finished initializing. This causes 502/503 errors during deploys and restarts. The readiness probe tells Kubernetes “this pod is ready to receive traffic.”3. Single Replica Deployments in Production: One replica means any pod restart (deploy, node failure, OOM kill) causes downtime. Always run at least 2-3 replicas for any user-facing service. The marginal cost of a second pod is tiny compared to the cost of an outage.4. Not Using PDBs: During cluster upgrades or node maintenance, Kubernetes drains nodes by evicting pods. Without a PDB, it can evict all your replicas at once. A PDB with maxUnavailable: 1 ensures rolling eviction instead of simultaneous termination.5. Forgetting Pod Anti-Affinity: Without anti-affinity rules, the scheduler may place all 3 replicas of your Deployment on the same node. If that node fails, all replicas go down together, defeating the purpose of running multiple replicas. Always spread production workloads across nodes (or better, across availability zones).

Interview Deep-Dive

Strong Answer:
  • When redis-1 restarts with an empty PVC, it starts as a fresh instance. If the Redis cluster uses replication, the empty replica triggers a full resync from the primary (BGSAVE, transfer RDB file, then stream ongoing writes).
  • If redis-1 was the primary before failure, the cluster should have promoted a replica already (via Sentinel or Redis Cluster mode). The empty redis-1 rejoins as a replica.
  • The PVC is still bound to a PV whose backing volume is gone. I need to delete the PVC, let the StatefulSet’s volumeClaimTemplate create a new one, and delete the pod to force it to pick up the new PVC.
  • Preventive measure: schedule VolumeSnapshots so recovery means restoring from a recent snapshot rather than a full resync from the primary.
Follow-up: Why does a StatefulSet enforce ordered startup, and when would you disable it?Ordered startup matters for primary/replica databases. Pod 0 (typically the primary) must be ready before replicas start, because replicas connect to the primary for replication. Starting simultaneously could cause split-brain. For peer-to-peer workloads (Cassandra, Elasticsearch), ordered startup wastes time. Set podManagementPolicy: Parallel to start all pods simultaneously.
Strong Answer:
  • Rolling Update: Replaces pods incrementally. Old and new versions coexist. Risk: unpredictable fraction of traffic hits the new version before you can detect problems.
  • Blue-Green: Two full deployments. Traffic switches all at once via Service selector change. Instant rollback, but all-or-nothing exposure.
  • Canary: Route a small percentage (say 5%) to the new version. Monitor error rates, latency, business metrics. Gradually increase if healthy.
  • For $10M/day payment processing, I would choose canary. A rolling update exposes an uncontrolled fraction of traffic. Blue-green gives instant switch but no gradual validation. Canary with Istio lets me send 1% of traffic to the new version, watch payment success rates for 15 minutes, then ramp up. The blast radius is 1% instead of 100%.
Follow-up: How do you implement canary deployments without a service mesh?Run two Deployments behind the same Service. If the main has 9 replicas and canary has 1, roughly 10% of traffic goes to the canary. For finer control, nginx-ingress supports canary annotations that split traffic at the Ingress level regardless of pod count.
Strong Answer:
  • Check recent job runs: kubectl get jobs --sort-by=.status.startTime. For failed jobs, check pod logs and kubectl describe job for events.
  • If the cleanup query runs against a growing dataset, it might exceed activeDeadlineSeconds on data-heavy nights.
  • Check concurrencyPolicy. If set to Allow (default), a new job starts even if the previous one is running. Two cleanup jobs could deadlock on database locks. Set concurrencyPolicy: Forbid.
  • Check resource limits. If the job loads data into memory, it might get OOMKilled on heavy nights.
  • Add logging with timestamps and row counts, and increase activeDeadlineSeconds with margin.
Follow-up: What is the difference between backoffLimit and activeDeadlineSeconds in a Job?backoffLimit controls retry count after pod failure (exit code non-zero, OOMKilled). Each retry uses exponential backoff. activeDeadlineSeconds is a wall-clock timeout for the entire job — Kubernetes kills all pods if exceeded. backoffLimit handles crashes (retry the work), activeDeadlineSeconds handles hangs (kill and give up). You typically want both set.

Next: Kubernetes Services →