Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Kubernetes Workloads
Learn how to manage applications using Kubernetes controllers. Think of controllers as autopilots for your applications: you declare the desired state (three replicas, this container image, this much memory), and the controller continuously works to make reality match your declaration. If a pod crashes, the controller replaces it. If a node goes down, the controller reschedules pods elsewhere. You stop managing individual processes and start managing intent.1. Deployments
The standard way to run stateless applications (web servers, APIs, microservices). A Deployment manages ReplicaSets, which in turn manage Pods. You almost never create ReplicaSets or Pods directly — Deployments handle the orchestration for you.- ReplicaSet: Ensures the specified number of identical pod copies are always running.
- Rolling Updates: Replaces pods incrementally, so your application stays available during deploys.
- Rollbacks: Kubernetes keeps previous ReplicaSets around (scaled to zero), so you can revert instantly if a deploy goes wrong.
Managing Deployments
2. StatefulSets
Used for stateful applications (databases, Kafka, Zookeeper, Redis clusters) where each instance has its own identity and data. The analogy: a Deployment is like a row of identical vending machines (any one can serve any request), while a StatefulSet is like numbered lockers (each one stores specific content, and the number matters).- Stable Network ID: Pods get predictable names like
mysql-0,mysql-1(not random hashes). Each pod gets a DNS entry likemysql-0.mysql.default.svc.cluster.local. - Stable Storage: Each pod gets its own PersistentVolumeClaim, and that claim is preserved even if the pod is deleted and recreated. The data follows the identity.
- Ordered Deployment: Pods start in order (0, then 1, then 2). Pod 1 does not start until pod 0 is running and ready. This is critical for databases with primary/replica topology.
- Ordered Termination: Pods are deleted in reverse order (2, then 1, then 0). Replicas shut down before the primary.
3. DaemonSets
Ensures that all (or some) nodes run a copy of a Pod. Think of a DaemonSet as placing an agent on every machine in your fleet — when a new node joins the cluster, the DaemonSet automatically schedules a pod on it. When a node is removed, the pod is garbage collected.- Use cases: Log collectors (Fluentd, Filebeat), monitoring agents (Prometheus Node Exporter, Datadog agent), network plugins (CNI), storage drivers.
- Key difference from Deployments: A Deployment runs N replicas spread across nodes. A DaemonSet runs exactly one pod per node (or per qualifying node).
4. Jobs & CronJobs
Job
Runs a pod until it completes successfully (exit code 0), then stops. Unlike Deployments which run forever, Jobs are for finite work. The analogy: a Deployment is a permanent employee, while a Job is a contractor hired for a specific task.- Use cases: Database migrations, batch data processing, report generation, one-off admin tasks.
CronJob
Creates Jobs on a schedule, like Linux cron but managed by Kubernetes. Useful for periodic tasks that should run in your cluster environment.Key Takeaways
| Workload | Use Case | Example |
|---|---|---|
| Deployment | Stateless apps | Web servers, APIs |
| StatefulSet | Stateful apps | Databases, Kafka |
| DaemonSet | Node agents | Logging, Monitoring |
| Job | One-off tasks | DB Migration |
| CronJob | Scheduled tasks | Backups |
5. Horizontal Pod Autoscaler (HPA)
Automatically scales the number of pods based on observed CPU/memory utilization or custom metrics. Instead of guessing how many replicas you need, HPA watches actual usage and adjusts. Think of it like a restaurant that opens more registers when the line gets long, and closes them when traffic dies down.HPA vs VPA
| Feature | HPA | VPA |
|---|---|---|
| What it scales | Number of pods | Resources per pod |
| Best for | Stateless apps | Stateful apps, right-sizing |
| Can run together? | Yes, but don’t use same metric | Use different metrics |
6. Pod Disruption Budgets (PDB)
Ensures a minimum number of pods remain available during voluntary disruptions (node drains, cluster upgrades, spot instance reclamation). Without a PDB, a node drain could evict all your pods simultaneously, causing downtime. A PDB is your safety net — it tells Kubernetes “you can evict pods from this node, but you must keep at least N running at all times.”7. Deployment Strategies Deep Dive
Rolling Update (Default)
Updates pods incrementally, ensuring availability.Recreate
Kills all existing pods before creating new ones. This causes downtime, but guarantees that only one version is running at a time.- Use Case: When you cannot run two versions simultaneously — for example, a database migration that changes the schema in a way incompatible with the old code, or an application that holds an exclusive lock on a resource.
Blue-Green Deployment (Manual with Services)
Run two identical environments. Switch traffic instantly by updating the Service selector. The advantage over rolling updates: if the new version has problems, you switch back in seconds because the old pods are still running.Canary Deployment
Route a small percentage of traffic to the new version to validate it with real users before a full rollout. If metrics look good, increase the percentage. If something breaks, only a small fraction of users are affected.- Requires an Ingress controller (like Nginx Ingress with canary annotations) or a Service Mesh (Istio, Linkerd) for fine-grained traffic splitting.
- Without these tools, you can approximate canary deploys by running a small Deployment alongside the main one, both behind the same Service.
8. Pod Affinity & Anti-Affinity
Control where pods are scheduled based on node labels and the presence of other pods. Affinity is about attraction (“schedule me near X”), anti-affinity is about repulsion (“schedule me away from X”). Together, they give you fine-grained control over pod placement for performance, availability, and compliance requirements.Node Affinity
Schedule pods on nodes with specific labels. Use this when your workload has hardware requirements (SSD storage, GPU, specific CPU architecture) or geographic constraints (must run in a specific availability zone).Pod Anti-Affinity
Spread pods across nodes or zones to avoid single points of failure. This is one of the most important patterns for production availability: without it, the scheduler might place all your replicas on the same node.9. Taints & Tolerations
Taints are applied to nodes to repel pods — like a “Do Not Enter” sign. Tolerations are applied to pods to let them ignore specific taints — like having a badge that lets you through a restricted door. This pair gives you control over which workloads run on which nodes. Real-world examples: Taint GPU nodes so only ML workloads get scheduled there. Taint high-memory nodes for databases only. Taint spot/preemptible nodes so only fault-tolerant batch jobs land on them.| Effect | Behavior |
|---|---|
| NoSchedule | New pods won’t be scheduled |
| PreferNoSchedule | Soft version - avoid if possible |
| NoExecute | Evict existing pods + no new scheduling |
Interview Questions & Answers
What is the difference between a Deployment and a StatefulSet?
What is the difference between a Deployment and a StatefulSet?
| Aspect | Deployment | StatefulSet |
|---|---|---|
| Pod Names | Random (app-abc123) | Ordered (app-0, app-1) |
| Storage | Shared or ephemeral | Dedicated PVC per pod |
| Scaling Order | Parallel | Sequential (0→1→2) |
| Use Case | Stateless apps | Databases, Kafka |
How does a ReplicaSet differ from a Deployment?
How does a ReplicaSet differ from a Deployment?
- ReplicaSet ensures a specified number of pod replicas are running
- Deployment manages ReplicaSets and provides:
- Rolling updates
- Rollback capability
- Update history
- You almost never create ReplicaSets directly; use Deployments instead.
What happens when you update a Deployment?
What happens when you update a Deployment?
- A new ReplicaSet is created with the updated pod template
- New pods are created in the new ReplicaSet
- Old pods are terminated from the old ReplicaSet (respecting
maxUnavailable) - Old ReplicaSet is kept for rollback (scaled to 0)
How would you ensure pods are spread across availability zones?
How would you ensure pods are spread across availability zones?
topologyKey:What is the purpose of init containers?
What is the purpose of init containers?
- Wait for dependencies (database, service)
- Clone git repos
- Run database migrations
- Generate config files
How do you debug a CrashLoopBackOff?
How do you debug a CrashLoopBackOff?
- Check logs:
kubectl logs <pod> --previous - Describe pod:
kubectl describe pod <pod>(check Events) - Check resources: Is the container OOMKilled?
- Exec into container:
kubectl exec -it <pod> -- sh(if it starts briefly) - Override command: Create a debug pod with
command: ["/bin/sleep", "infinity"]
Common Pitfalls
Interview Deep-Dive
You are running a StatefulSet for a 3-node Redis cluster. Node 2 loses its disk. The pod restarts but redis-1 comes up with an empty data directory. What happens and how do you recover?
You are running a StatefulSet for a 3-node Redis cluster. Node 2 loses its disk. The pod restarts but redis-1 comes up with an empty data directory. What happens and how do you recover?
- When
redis-1restarts with an empty PVC, it starts as a fresh instance. If the Redis cluster uses replication, the empty replica triggers a full resync from the primary (BGSAVE, transfer RDB file, then stream ongoing writes). - If
redis-1was the primary before failure, the cluster should have promoted a replica already (via Sentinel or Redis Cluster mode). The emptyredis-1rejoins as a replica. - The PVC is still bound to a PV whose backing volume is gone. I need to delete the PVC, let the StatefulSet’s volumeClaimTemplate create a new one, and delete the pod to force it to pick up the new PVC.
- Preventive measure: schedule VolumeSnapshots so recovery means restoring from a recent snapshot rather than a full resync from the primary.
podManagementPolicy: Parallel to start all pods simultaneously.Compare Rolling Update, Blue-Green, and Canary deployments. For a payment processing service handling $10M/day, which would you choose and why?
Compare Rolling Update, Blue-Green, and Canary deployments. For a payment processing service handling $10M/day, which would you choose and why?
- Rolling Update: Replaces pods incrementally. Old and new versions coexist. Risk: unpredictable fraction of traffic hits the new version before you can detect problems.
- Blue-Green: Two full deployments. Traffic switches all at once via Service selector change. Instant rollback, but all-or-nothing exposure.
- Canary: Route a small percentage (say 5%) to the new version. Monitor error rates, latency, business metrics. Gradually increase if healthy.
- For $10M/day payment processing, I would choose canary. A rolling update exposes an uncontrolled fraction of traffic. Blue-green gives instant switch but no gradual validation. Canary with Istio lets me send 1% of traffic to the new version, watch payment success rates for 15 minutes, then ramp up. The blast radius is 1% instead of 100%.
A CronJob that runs a nightly database cleanup started failing intermittently. Some nights it completes, others it times out. How do you troubleshoot this?
A CronJob that runs a nightly database cleanup started failing intermittently. Some nights it completes, others it times out. How do you troubleshoot this?
- Check recent job runs:
kubectl get jobs --sort-by=.status.startTime. For failed jobs, check pod logs andkubectl describe jobfor events. - If the cleanup query runs against a growing dataset, it might exceed
activeDeadlineSecondson data-heavy nights. - Check
concurrencyPolicy. If set toAllow(default), a new job starts even if the previous one is running. Two cleanup jobs could deadlock on database locks. SetconcurrencyPolicy: Forbid. - Check resource limits. If the job loads data into memory, it might get OOMKilled on heavy nights.
- Add logging with timestamps and row counts, and increase
activeDeadlineSecondswith margin.
backoffLimit and activeDeadlineSeconds in a Job?backoffLimit controls retry count after pod failure (exit code non-zero, OOMKilled). Each retry uses exponential backoff. activeDeadlineSeconds is a wall-clock timeout for the entire job — Kubernetes kills all pods if exceeded. backoffLimit handles crashes (retry the work), activeDeadlineSeconds handles hangs (kill and give up). You typically want both set.Next: Kubernetes Services →