> ## Documentation Index > Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt > Use this file to discover all available pages before exploring further. # Kubernetes Fundamentals > Core Kubernetes concepts and architecture # Kubernetes Fundamentals Master the core concepts of Kubernetes (K8s) container orchestration and understand its architecture. *** ## What is Kubernetes? Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Manages container lifecycle, scheduling, and health Automatically scales apps up or down based on demand Restarts failed containers, replaces and kills containers that don't respond Distributes network traffic to maintain stability *** ## Kubernetes Architecture A Kubernetes cluster consists of a **Control Plane** and a set of **Worker Nodes**. ```mermaid theme={null} graph TD subgraph Control Plane API[API Server] Etcd[(etcd)] Sched[Scheduler] CM[Controller Manager] end subgraph Worker Node 1 Kubelet1[Kubelet] Proxy1[Kube-Proxy] Pod1[Pod] Pod2[Pod] end subgraph Worker Node 2 Kubelet2[Kubelet] Proxy2[Kube-Proxy] Pod3[Pod] end API --- Etcd API --- Sched API --- CM API --- Kubelet1 API --- Kubelet2 ``` ### Control Plane Components The "brain" of the cluster. Think of it like the management team of a large warehouse: the API Server is the front desk that takes all orders, etcd is the filing cabinet that stores the master records, the Scheduler is the floor manager who assigns work to available workers, and the Controller Manager is the quality inspector who constantly checks that reality matches the plan. * **API Server**: The frontend for the K8s control plane. Exposes the Kubernetes API. Every `kubectl` command, every controller, every other component talks through the API Server -- it is the single point of entry. * **etcd**: Consistent and highly-available key-value store for all cluster data. If etcd is lost and unrecoverable, your entire cluster state is gone. Back it up. * **Scheduler**: Watches for newly created Pods with no assigned node, and selects a node for them to run on based on resource availability, constraints, and affinity rules. * **Controller Manager**: Runs controller processes (e.g., Node Controller, Job Controller). Each controller is an infinite reconciliation loop that watches for drift between desired state and actual state and corrects it. ### Node Components Run on every node, maintaining running pods and providing the Kubernetes runtime environment. * **Kubelet**: An agent that runs on each node. It ensures that containers are running in a Pod. * **Kube-Proxy**: Maintains network rules on nodes. Allows network communication to your Pods. * **Container Runtime**: The software that is responsible for running containers (e.g., Docker, containerd). *** ## Core Objects ### 1. Pods The smallest deployable unit in Kubernetes. A Pod is not a container -- it is a wrapper around one or more containers that share a network and storage context. Think of a Pod as a shared apartment: each container (roommate) has its own room (filesystem), but they share the kitchen (network namespace) and living room (volumes). * Represents a single instance of a running process. * Can contain one or more containers (usually one, but sidecar patterns are common). * Containers in a Pod share: * **Network**: Same IP address and port space (can talk via `localhost`). * **Storage**: Shared volumes. ### 2. Namespaces Virtual clusters backed by the same physical cluster. * Used to divide cluster resources between multiple users/teams. * Examples: `default`, `kube-system`, `dev`, `prod`. *** ## kubectl Basics `kubectl` is the command-line tool for communicating with the Kubernetes API server. ### Cluster Info & Navigation ```bash theme={null} # Check cluster status kubectl cluster-info # List all nodes kubectl get nodes # List all namespaces kubectl get namespaces # Set default namespace context kubectl config set-context --current --namespace=dev ``` ### Viewing Resources ```bash theme={null} # List pods in current namespace kubectl get pods # List pods with more details (IP, Node) kubectl get pods -o wide # List pods in all namespaces kubectl get pods -A # Describe a specific pod (Crucial for debugging!) kubectl describe pod my-pod # View pod logs kubectl logs my-pod kubectl logs my-pod -c my-container # If multi-container kubectl logs -f my-pod # Follow logs ``` ### Interacting with Pods ```bash theme={null} # Execute command inside a container kubectl exec -it my-pod -- /bin/bash kubectl exec -it my-pod -- /bin/sh # If bash isn't available # Port forward (Access pod from localhost) kubectl port-forward my-pod 8080:80 ``` *** ## Creating Your First Pod Kubernetes objects are typically defined in YAML files. ### Imperative (CLI) Quick for testing, but not recommended for production. ```bash theme={null} kubectl run nginx --image=nginx:latest --restart=Never ``` ### Declarative (YAML) The "Infrastructure as Code" way. ```yaml theme={null} # nginx-pod.yaml apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: web env: dev spec: containers: - name: nginx image: nginx:1.21 ports: - containerPort: 80 resources: limits: memory: "128Mi" cpu: "500m" ``` **Apply the configuration:** ```bash theme={null} # Create/Update resource kubectl apply -f nginx-pod.yaml # Verify kubectl get pods -l app=web # Delete kubectl delete -f nginx-pod.yaml ``` *** ## Pod Lifecycle 1. **Pending**: Pod accepted by system, but container image not yet created. 2. **Running**: Pod bound to a node, all containers created, at least one running. 3. **Succeeded**: All containers terminated successfully (exit code 0). 4. **Failed**: All containers terminated, at least one with failure. 5. **Unknown**: State cannot be obtained. *** ## Resource Management Every container should have **resource requests and limits** defined. ### Requests vs Limits This is one of the most important concepts in Kubernetes resource management. Requests are your reservation -- "I need at least this much." Limits are your ceiling -- "I can never use more than this." The analogy: a request is like reserving a table at a restaurant (guaranteed capacity), and a limit is the maximum tab you can run up. | Concept | Description | Interview Insight | | ----------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------- | | **Request** | Minimum resources guaranteed | Used by Scheduler to place pods. If a node does not have enough unrequested capacity, the pod will not be scheduled there. | | **Limit** | Maximum resources allowed | Enforced at runtime. Memory limit exceeded = OOMKilled. CPU limit exceeded = throttled (not killed). | ```yaml theme={null} resources: requests: memory: "64Mi" # Scheduler reserves 64Mi on the node for this container cpu: "250m" # 0.25 CPU cores (250 millicores). 1000m = 1 full core. limits: memory: "128Mi" # Container is OOMKilled if it exceeds 128Mi cpu: "500m" # Container is throttled (not killed) above 0.5 cores ``` **Common Interview Question**: What happens when a container exceeds its memory limit? * The container is **OOMKilled** (Out of Memory Killed) by the kernel. * If CPU limit is exceeded, the container is **throttled**, not killed. ### Quality of Service (QoS) Classes Kubernetes assigns QoS classes based on resource settings: | QoS Class | Condition | Eviction Priority | | -------------- | --------------------------------------- | ------------------- | | **Guaranteed** | requests = limits (both CPU and memory) | Last to be evicted | | **Burstable** | requests \< limits, or only one is set | Middle priority | | **BestEffort** | No requests or limits | First to be evicted | *** ## Health Probes (Critical for Interviews!) Probes allow Kubernetes to know when to restart or route traffic to a container. Without probes, Kubernetes only knows if the main process exited -- it has no way to detect a deadlocked application or a service that is running but unable to handle requests. ### Liveness Probe "Is the container alive?" - If it fails, the container is **restarted**. Use this to detect deadlocks, infinite loops, or corrupted state where the process is running but not functional. ```yaml theme={null} livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 15 periodSeconds: 10 failureThreshold: 3 ``` ### Readiness Probe "Is the container ready to receive traffic?" - If it fails, the Pod is **removed from Service endpoints** (no traffic routed to it), but the container is NOT restarted. Use this for temporary conditions like warming caches or waiting for a downstream dependency. ```yaml theme={null} readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 ``` ### Startup Probe "Has the application started?" - For slow-starting apps (JVM warmup, large ML model loading, database migrations). Disables liveness/readiness probes until it succeeds, preventing premature restarts during boot. ```yaml theme={null} startupProbe: httpGet: path: /healthz port: 8080 failureThreshold: 30 periodSeconds: 10 ``` **Interview Tip**: Always explain the difference between liveness and readiness probes. Liveness restarts containers; Readiness controls traffic routing. *** ## etcd Deep Dive **etcd** is the "source of truth" for Kubernetes. Understanding it is crucial for interviews. ### Key Facts * **Distributed key-value store** using Raft consensus * Stores all cluster state: Pods, Services, Secrets, ConfigMaps * **Strongly consistent** - reads return the most recent write * Typically runs as a **3 or 5 node cluster** (odd numbers for quorum) ### Common Interview Questions * Existing workloads **continue running** (kubelet manages local pods) * **No new operations** possible (no scheduling, no API calls) * Cluster is in **read-only mode** until etcd recovers ```bash theme={null} ETCDCTL_API=3 etcdctl snapshot save snapshot.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key ``` Quorum = (n/2) + 1. For 3 nodes, quorum is 2. If you lose quorum, etcd becomes read-only. *** ## RBAC Basics Role-Based Access Control (RBAC) regulates access to Kubernetes resources. ### Key Components | Resource | Scope | Description | | ---------------------- | ------------ | ------------------------------------------- | | **Role** | Namespace | Defines permissions within a namespace | | **ClusterRole** | Cluster-wide | Defines permissions across all namespaces | | **RoleBinding** | Namespace | Binds Role to users/groups/service accounts | | **ClusterRoleBinding** | Cluster-wide | Binds ClusterRole cluster-wide | ```yaml theme={null} # Role: Can read pods in "dev" namespace apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: dev name: pod-reader rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"] --- # RoleBinding: Bind to user "jane" apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods namespace: dev subjects: - kind: User name: jane apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io ``` *** ## Pod Lifecycle 1. **Pending**: Pod accepted by system, but container image not yet created. 2. **Running**: Pod bound to a node, all containers created, at least one running. 3. **Succeeded**: All containers terminated successfully (exit code 0). 4. **Failed**: All containers terminated, at least one with failure. 5. **Unknown**: State cannot be obtained. *** ## Interview Questions & Answers A **Container** is a single running process with its own filesystem and network namespace. A **Pod** is a Kubernetes abstraction that can contain one or more containers that share: * Network namespace (same IP, communicate via localhost) * Storage volumes * Lifecycle (created and destroyed together) 1. Watches for unscheduled Pods (via API Server) 2. **Filtering**: Eliminates nodes that don't meet requirements (resources, taints, nodeSelector) 3. **Scoring**: Ranks remaining nodes based on priorities (least utilized, affinity rules) 4. **Binding**: Assigns Pod to the highest-scoring node * **create**: Creates a resource. Fails if it already exists. * **apply**: Creates or updates a resource. Idempotent. Recommended for GitOps workflows. 1. **Node Controller** marks node as `NotReady` after 40s of no heartbeat 2. After `pod-eviction-timeout` (default 5min), pods are evicted 3. **Deployment/ReplicaSet** controllers create replacement pods on healthy nodes A container that runs alongside the main application container in the same Pod to provide supporting functionality: * **Logging**: Collects and ships logs (e.g., Fluentd sidecar) * **Service Mesh**: Handles networking (e.g., Envoy proxy in Istio) * **Security**: Handles TLS termination or secrets injection *** ## Key Takeaways * **Control Plane** manages the cluster; **Nodes** run the applications. * **Pods** are the atomic unit of scheduling. * Use **Declarative (YAML)** configuration for reproducibility. * Always define **resource requests and limits**. * Implement **liveness and readiness probes** for production workloads. * `kubectl describe` and `kubectl logs` are your best friends for debugging. *** ## Interview Deep-Dive **Strong Answer:** * kubectl serializes the command into an API request (a Pod object in JSON) and sends it to the API server over HTTPS. * The API server authenticates the request (client certificate or bearer token), authorizes it (RBAC check -- does this user have permission to create pods in this namespace?), and runs it through admission controllers (mutating webhooks might inject a sidecar, validating webhooks might enforce a label requirement). * The validated Pod object is persisted to etcd. At this point, the Pod exists in the cluster state but has no node assigned -- its `nodeName` field is empty. * The scheduler is watching etcd (via the API server's watch mechanism) for pods with no node assignment. It picks up this pod, runs its filtering phase (which nodes have enough CPU and memory, match nodeSelector, tolerate taints), then its scoring phase (which of the remaining nodes is the best fit), and writes the selected node name back to the pod's spec in etcd. * The kubelet on the selected node is also watching for pods assigned to its node. It sees the new pod, pulls the nginx image through the container runtime (containerd via CRI), creates the pause container to hold network namespaces, then starts the nginx container inside that pod sandbox. * The kubelet reports the pod's status back to the API server, which updates etcd. Now `kubectl get pods` shows the pod as Running. **Follow-up: If the scheduler cannot find any node for the pod, what happens? How would you debug it?** The pod stays in Pending state indefinitely. Running `kubectl describe pod ` shows FailedScheduling events with a message like "0/3 nodes are available: 1 Insufficient cpu, 2 node(s) had taint that the pod didn't tolerate." The fix depends on the root cause: if it is resources, you either scale down other workloads, add nodes, or reduce the pod's resource requests. If it is taints, you either add a toleration to the pod or remove the taint from the node. I have seen teams burn hours on Pending pods when the real issue was a ResourceQuota rejecting pods without explicit resource requests. **Strong Answer:** * First, I would check `kubectl describe pod ` to confirm the exit code is 137 (SIGKILL from OOM) and see which container is being killed. The Last State section shows the exact reason. * Next, I would look at actual memory usage over time, not just the limit. If the cluster runs Prometheus, I would query `container_memory_working_set_bytes` for the container over the past hour to see whether usage is steadily climbing (memory leak) or spiking under load (insufficient allocation). * If it is a gradual climb, the application has a memory leak. Increasing the limit only delays the inevitable. The fix is in the application code -- profiling with language-specific tools (Java heap dumps, Go pprof, Python tracemalloc). * If usage is spiking under load, the limit is genuinely too low. I would check whether the requests and limits are set correctly. A common mistake is setting requests equal to limits (Guaranteed QoS) at a value that is too low. The pod gets exactly what it asks for and no more, so any burst kills it. * I would also check if the pod is running a JVM-based application, because the JVM has its own memory management. If `MaxRAMPercentage` is not set correctly, the JVM might try to use more heap than the container limit allows, and the kernel kills the container before the JVM even knows it is out of memory. **Follow-up: What is the difference between the memory metric in `kubectl top pod` and what the OOM killer actually uses?** `kubectl top pod` reports the working set size, which is the memory the kernel considers "in use" and non-reclaimable. The OOM killer looks at the cgroup memory usage, which includes the page cache. If your application reads large files, the page cache might push the cgroup usage above the limit even though the application itself is not using that much heap. This is why `container_memory_working_set_bytes` (from cAdvisor/Prometheus) is a better metric than raw RSS for understanding OOM risk. **Strong Answer:** * Liveness probe answers "is this container alive?" If it fails, kubelet kills and restarts the container. Use it when your application can get into a deadlocked or hung state that only a restart can fix. * Readiness probe answers "is this container ready to receive traffic?" If it fails, the pod is removed from Service endpoints but NOT restarted. Use it when the application needs time to warm up (load a cache, connect to a database) or when it is temporarily overloaded. * Startup probe answers "has this application finished starting?" It runs before liveness and readiness probes. While it is running, those other probes are disabled. Use it for slow-starting applications (large Java apps, ML model loading) to avoid liveness probes killing the container before it finishes booting. * A real scenario: a team configured a liveness probe with `initialDelaySeconds: 5` on a Java application that took 90 seconds to start. During every deployment, the liveness probe failed because the app was not ready yet, so kubelet kept killing and restarting the container, which restarted the 90-second boot, which failed the liveness check again. The pod went into CrashLoopBackOff and never became healthy. The fix was adding a startup probe with a high `failureThreshold` (30 retries at 10-second intervals = 5-minute window to start), which disabled the liveness probe during boot. **Follow-up: Should a readiness probe check downstream dependencies like the database, or only the application itself?** This is a trade-off with strong opinions on both sides. If the readiness probe checks the database and the database goes down, ALL pods become unready simultaneously, which means the Service has zero endpoints and returns 503 to every request. That is often worse than serving degraded responses. My preference is to have the readiness probe check only the application's own health, and handle downstream failures with circuit breakers and graceful degradation in the application code. The exception is during startup -- checking that the database connection is established before accepting traffic is reasonable. *** Next: [Kubernetes Workloads →](/courses/devops-tools/kubernetes-workloads)