Real Interview Questions

This module contains actual interview questions from infrastructure, observability, and platform engineering roles at top companies. Each question includes detailed solutions and the key insights interviewers are looking for.

What to Expect: These questions require deep understanding, not memorization
Interview Format: Usually 45-60 minute deep dives into 2-3 topics
Time to Prepare: 10-12 hours to work through all scenarios

Company-Specific Patterns

Different companies emphasize different areas:

Company	Focus Areas	Style
Datadog	eBPF, tracing, metrics collection	Hands-on implementation
Grafana Labs	Observability stack, performance	System design + coding
Cloudflare	Network stack, performance	Deep Linux networking
Chronosphere	Time series, observability	Architecture + internals
Meta Infra	Large scale systems	Design + debugging
Netflix	Performance, containers	Deep dives + scenarios

Observability Company Questions

Question 1: Implement a Syscall Counter (Datadog-style)

Context: You’re asked to implement a tool that counts syscalls by process in production without significant overhead.

The Question: “Design and implement a production-safe syscall counter. It should show the top processes by syscall count in real-time. Discuss the trade-offs of different approaches.”

Discussion Points

Interviewer is looking for:

Knowledge of different approaches (strace, perf, eBPF)
Understanding of overhead implications
Production safety considerations
Sampling vs complete counting trade-offs

Key trade-offs to discuss:

strace: Per-syscall ptrace, very high overhead (~100x slowdown)
perf: Sampling-based, lower overhead, may miss syscalls
eBPF tracepoints: Low overhead (~1-5%), production-safe
eBPF kprobes: Slightly higher overhead, more flexible

Solution Approach

Best approach: eBPF tracepoint

// syscall_counter.bpf.c
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);  // PID
    __type(value, u64);  // count
} syscall_count SEC(".maps");

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);  // PID
    __type(value, char[16]);  // comm
} pid_comm SEC(".maps");

SEC("tracepoint/raw_syscalls/sys_enter")
int tracepoint__raw_syscalls__sys_enter(struct trace_event_raw_sys_enter *ctx)
{
    u32 pid = bpf_get_current_pid_tgid() >> 32;
    u64 *count = bpf_map_lookup_elem(&syscall_count, &pid);
    
    if (count) {
        (*count)++;
    } else {
        u64 initial = 1;
        bpf_map_update_elem(&syscall_count, &pid, &initial, BPF_ANY);
        
        // Store comm for this PID
        char comm[16];
        bpf_get_current_comm(&comm, sizeof(comm));
        bpf_map_update_elem(&pid_comm, &pid, &comm, BPF_ANY);
    }
    
    return 0;
}

char LICENSE[] SEC("license") = "GPL";

User-space component:

Periodically read map, sort by count
Display top N processes
Handle PID reuse (track start time)

Production Considerations

What makes this production-safe:

Bounded map size: Won’t consume unlimited memory
No locks in hot path: Per-CPU increments would be ideal
Graceful degradation: If map full, just skip new PIDs
Low overhead: Tracepoint, not ptrace

Improvements for production:

// Use per-CPU array for counting (no locks)
struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_HASH);
    __uint(max_entries, 10240);
    __type(key, u32);
    __type(value, u64);
} syscall_count SEC(".maps");

Overhead estimation:

~50-100ns per syscall
On busy system (100K syscalls/sec): ~1% CPU
Acceptable for production monitoring

Question 2: Debug High Latency in Production

Context: A service is experiencing intermittent latency spikes. You need to identify the cause without restarting the service.

The Question: “A production service has p99 latency spikes from 10ms to 500ms every few minutes. How would you debug this? Walk me through your approach.”

Investigation Framework

Systematic approach:

Characterize the problem:
- When do spikes occur? (Time correlation)
- Which requests are affected? (Endpoint, payload)
- Duration of spikes? (Seconds, minutes)
Gather baseline metrics:
- CPU utilization (is there contention?)
- Memory usage (swapping? GC?)
- Disk I/O (latency, throughput)
- Network (retransmits, latency)
Narrow down the layer:
- Application code?
- Runtime (GC pauses)?
- Kernel (scheduling, I/O)?
- Hardware (disk, network)?

Tools and Commands

Quick triage:

# Check for scheduling issues
perf sched latency -p <pid>

# Check for off-CPU time
sudo offcputime-bpfcc -p <pid> 5

# Check for I/O latency
sudo biolatency-bpfcc 5

# Check for memory pressure
cat /proc/<pid>/status | grep -E "VmRSS|VmSwap"
sar -B 1 5  # Page faults

# Check for lock contention
sudo perf lock record -p <pid> -- sleep 5
sudo perf lock report

Deep analysis with bpftrace:

# Trace slow syscalls
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter /pid == $TARGET/ {
    @start[tid] = nsecs;
}
tracepoint:raw_syscalls:sys_exit /@start[tid]/ {
    $lat = (nsecs - @start[tid]) / 1000000;
    if ($lat > 10) {
        printf("%s syscall %d took %d ms\n", comm, args->id, $lat);
        @slow[args->id] = count();
    }
    delete(@start[tid]);
}'

Check for GC pauses (if JVM/Go):

# Java
jstat -gc <pid> 1000

# Go - trace runtime events
GODEBUG=gctrace=1 ./myapp

Common Causes and Solutions

Likely causes of intermittent spikes:

Garbage Collection:
- Symptom: Regular, predictable spikes
- Detection: GC logs show long pauses
- Fix: Tune GC, reduce allocation rate
Disk I/O:
- Symptom: Correlates with writes/fsync
- Detection: biolatency shows spikes
- Fix: Async I/O, better storage
Memory Pressure:
- Symptom: During memory spikes
- Detection: sar -B shows page faults
- Fix: Increase memory, reduce footprint
CPU Throttling (containers):
- Symptom: Regular, consistent spikes
- Detection: cat /sys/fs/cgroup/cpu.stat
- Fix: Increase CPU limits
Network Issues:
- Symptom: Affects network calls
- Detection: tcpretrans, ss -ti
- Fix: Check network path, timeouts

Question 3: Container Memory Behavior

The Question: “Explain what happens when a container hits its memory limit. What are the different behaviors, and how would you debug an OOM-killed container?”

Memory Limit Behavior

When container approaches memory limit:

Usage < memory.high
└── Normal operation

Usage > memory.high
└── Throttling begins
└── Reclaim pressure increases
└── Application may slow down

Usage > memory.max
└── Allocation fails
└── OOM killer invoked
└── Process in cgroup killed

Cgroup v2 memory controls:

memory.current: Current usage
memory.max: Hard limit (OOM if exceeded)
memory.high: Soft limit (throttling)
memory.low: Best-effort protection
memory.min: Hard protection

Debugging OOM Kills

Immediate diagnostics:

# Check kernel logs
dmesg | grep -i "oom\|killed"

# Check container events
docker events --filter 'event=oom'

# Get OOM details from journald
journalctl -k | grep -A 10 "invoked oom-killer"

Understanding OOM output:

Memory cgroup out of memory: Killed process 12345 (myapp)
total-pgfault:15000 total-pgmajfault:50
anon-rss:1048576kB file-rss:0kB shmem-rss:0kB

Key indicators:
- anon-rss: Heap, stack, anonymous mmap
- file-rss: Mapped files (can be evicted)
- shmem-rss: Shared memory

Memory profiling:

# Track allocations over time
docker stats <container>

# Get detailed memory breakdown
cat /sys/fs/cgroup/<path>/memory.stat

# Profile with BPF
sudo memleak-bpfcc -p <pid>

Prevention Strategies

Proper memory sizing:

Profile application under realistic load
Account for peak usage, not average
Include headroom for GC, file cache

Kubernetes recommendations:

resources:
  requests:
    memory: "256Mi"  # For scheduling
  limits:
    memory: "512Mi"  # Hard limit (2x requests typical)

Application-level protections:

Set JVM heap < container limit (-Xmx)
Use memory-aware allocators
Implement backpressure mechanisms

Infrastructure Company Questions

Question 4: Network Stack Performance (Cloudflare-style)

The Question: “Explain the journey of a packet from the NIC to the application. Where are the performance bottlenecks, and how would you optimize for high packet rates?”

Packet Journey

┌─────────────────────────────────────────────────────────────────────┐
│                     PACKET RECEIVE PATH                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  1. NIC receives packet                                              │
│     └─→ DMA to ring buffer in memory                                │
│     └─→ Raise interrupt                                             │
│                                                                      │
│  2. Interrupt handler (hardirq)                                      │
│     └─→ Acknowledge interrupt                                       │
│     └─→ Schedule NAPI poll (softirq)                                │
│     └─→ Disable further interrupts for this queue                  │
│                                                                      │
│  3. NAPI poll (softirq context)                                     │
│     └─→ Poll ring buffer for packets                                │
│     └─→ Allocate sk_buff structures                                 │
│     └─→ Process up to budget packets                                │
│     └─→ Re-enable interrupts if done                                │
│                                                                      │
│  4. Network stack processing                                         │
│     └─→ XDP (if attached) - earliest hook                          │
│     └─→ tc ingress                                                   │
│     └─→ netfilter/iptables                                          │
│     └─→ IP routing                                                   │
│     └─→ TCP/UDP processing                                          │
│                                                                      │
│  5. Socket layer                                                     │
│     └─→ Socket buffer (sk_buff queue)                               │
│     └─→ Wake up waiting application                                 │
│                                                                      │
│  6. Application                                                      │
│     └─→ read()/recv() copies to user space                         │
│     └─→ Process data                                                │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Performance Bottlenecks

Common bottlenecks:

Interrupt overhead:
- Each interrupt: ~1-2μs
- At 1M pps: 100% CPU just handling interrupts
- Solution: NAPI, interrupt coalescing
Memory allocation:
- sk_buff allocation per packet
- Solution: Page pools, recycling
Lock contention:
- Socket lock for each packet
- Solution: SO_REUSEPORT, RSS
Cache misses:
- Packet data not in cache
- Solution: Busy polling, NUMA awareness
Context switches:
- Waking application per packet
- Solution: Batching, busy polling

Optimization Techniques

Hardware level:

# Enable RSS (Receive Side Scaling)
ethtool -L eth0 combined 8

# Configure interrupt coalescing
ethtool -C eth0 rx-usecs 50 rx-frames 64

# Pin interrupts to CPUs
echo 1 > /proc/irq/XX/smp_affinity

Kernel level:

# Increase socket buffer sizes
sysctl -w net.core.rmem_max=26214400

# Enable busy polling
sysctl -w net.core.busy_poll=50
sysctl -w net.core.busy_read=50

# Tune NAPI
echo 64 > /sys/class/net/eth0/gro_flush_timeout

Application level:

// Use SO_REUSEPORT for multi-thread scaling
int opt = 1;
setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt));

// Use recvmmsg for batch receiving
struct mmsghdr msgs[BATCH_SIZE];
int ret = recvmmsg(fd, msgs, BATCH_SIZE, 0, NULL);

Ultimate performance: XDP:

Process packets before sk_buff allocation
10M+ pps on single core
Used by Cloudflare, Facebook

Question 5: CPU Isolation for Low Latency

The Question: “We need sub-millisecond latency for a trading system. How would you configure Linux to minimize jitter?”

Sources of Jitter

Kernel sources:

Timer interrupts (every 1-4ms)
RCU callbacks
Kernel threads (kworker, ksoftirqd)
System call overhead

Hardware sources:

SMI (System Management Interrupt)
Cache pollution
NUMA remote access
Power management (C-states)

Isolation Configuration

Boot parameters:

# /etc/default/grub
GRUB_CMDLINE_LINUX="
    isolcpus=2,3,4,5              # Remove from scheduler
    nohz_full=2,3,4,5             # No timer ticks
    rcu_nocbs=2,3,4,5             # Offload RCU callbacks
    irqaffinity=0,1               # Keep IRQs off isolated CPUs
    intel_pstate=disable          # Disable dynamic frequency
    processor.max_cstate=0        # Disable C-states
    idle=poll                     # Poll instead of sleep
    nosoftlockup                  # Don't check for lockups
    nmi_watchdog=0                # Disable NMI watchdog
    audit=0                       # Disable audit
    skew_tick=1                   # Randomize timer ticks
"

CPU affinity:

# Pin critical threads to isolated CPUs
taskset -c 2,3,4,5 ./trading_app

# Move all other processes away
for pid in $(ps -eo pid --no-headers); do
    taskset -p 0x3 $pid 2>/dev/null  # CPUs 0,1
done

IRQ affinity:

# Move all IRQs to housekeeping CPUs
for irq in /proc/irq/*/smp_affinity; do
    echo 3 > $irq 2>/dev/null  # CPUs 0,1
done

Verification and Testing

Verify isolation:

# Check no IRQs on isolated CPUs
cat /proc/interrupts | awk '{print $1, $3, $4}'

# Check no kernel threads on isolated CPUs
ps -eo pid,psr,comm | grep -E "^\s*[0-9]+\s+[2-5]"

# Check timer behavior
perf stat -C 2,3,4,5 -e irq_vectors:local_timer_entry sleep 10

Measure latency:

# Use cyclictest
cyclictest -m -p 99 -i 100 -h 1000 -D 1m -a 2 -t 1

# Interpret results:
# Min: 1 μs    (good)
# Avg: 2 μs    (good)
# Max: 50 μs   (acceptable for many use cases)
# Max: 500 μs  (investigate!)

System Design with Kernel Awareness

Question 6: Design a Container Metrics Collector

The Question: “Design a system to collect CPU, memory, and I/O metrics from 10,000 containers on each host with minimal overhead.”

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                    METRICS COLLECTION ARCHITECTURE                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   Method 1: Poll cgroup files                                        │
│   ─────────────────────────                                         │
│   For each container:                                                │
│     - Read /sys/fs/cgroup/<id>/cpu.stat                            │
│     - Read /sys/fs/cgroup/<id>/memory.current                      │
│     - Read /sys/fs/cgroup/<id>/io.stat                             │
│                                                                      │
│   Pros: Simple, no kernel changes                                   │
│   Cons: File system overhead, 10K reads/second                      │
│                                                                      │
│   ─────────────────────────────────────────────────────────────────  │
│                                                                      │
│   Method 2: eBPF-based collection                                    │
│   ─────────────────────────────                                     │
│   - Hook scheduler for CPU accounting                               │
│   - Hook memory allocator for memory tracking                       │
│   - Hook block layer for I/O                                        │
│   - Aggregate per-cgroup in BPF maps                                │
│                                                                      │
│   Pros: Lower overhead, real-time data                              │
│   Cons: Complex, needs kernel support                               │
│                                                                      │
│   ─────────────────────────────────────────────────────────────────  │
│                                                                      │
│   Recommended: Hybrid approach                                       │
│   - Use cgroup files for infrequent metrics (memory limits)         │
│   - Use eBPF for high-frequency metrics (CPU, I/O)                  │
│   - Batch reads, use inotify for changes                            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Optimized Implementation

Batch cgroup file reading:

// Open file descriptors once, reuse
struct container_fds {
    int cpu_stat_fd;
    int memory_current_fd;
    int io_stat_fd;
};

void collect_metrics(struct container_fds *fds, int count) {
    for (int i = 0; i < count; i++) {
        // Use pread to avoid seeking
        char buf[4096];
        pread(fds[i].cpu_stat_fd, buf, sizeof(buf), 0);
        parse_cpu_stat(buf, &metrics[i].cpu);
        
        pread(fds[i].memory_current_fd, buf, sizeof(buf), 0);
        metrics[i].memory = atol(buf);
    }
}

eBPF for CPU tracking:

// Track CPU time per cgroup
struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10240);
    __type(key, u64);  // cgroup id
    __type(value, u64);  // total CPU time ns
} cgroup_cpu SEC(".maps");

SEC("tracepoint/sched/sched_switch")
int trace_switch(struct trace_event_raw_sched_switch *ctx) {
    u64 cgroup_id = bpf_get_current_cgroup_id();
    u64 now = bpf_ktime_get_ns();
    
    // Record time for previous task's cgroup
    u64 *last_time = bpf_map_lookup_elem(&last_switch, &ctx->prev_pid);
    if (last_time) {
        u64 delta = now - *last_time;
        u64 *total = bpf_map_lookup_elem(&cgroup_cpu, &cgroup_id);
        if (total) {
            *total += delta;
        }
    }
    
    // Record switch time for next task
    bpf_map_update_elem(&last_switch, &ctx->next_pid, &now, BPF_ANY);
    return 0;
}

Overhead Analysis

Polling approach (10K containers, 1-second interval):

30K file reads per second
~100μs per read = 3 seconds of CPU
Too much overhead!

Optimized polling:

Keep FDs open: eliminate open/close
Batch reads with io_uring
Stagger collection across time
Result: ~100ms of CPU per second

eBPF approach:

Constant overhead regardless of container count
~1-2% CPU for tracing hooks
Scales to any number of containers

Hybrid approach:

eBPF for high-frequency (CPU, I/O): ~1% overhead
Polling for low-frequency (limits, configs): ~0.1% overhead
Total: ~1.1% CPU overhead for 10K containers

Debugging Scenarios

Scenario 1: Container Not Starting

Situation: A container fails to start with “permission denied” but works as root.

Debugging Steps

# Check seccomp profile
docker inspect <container> | jq '.[0].HostConfig.SecurityOpt'

# Check AppArmor/SELinux
docker inspect <container> | jq '.[0].AppArmorProfile'
getenforce  # SELinux status

# Check capabilities
docker inspect <container> | jq '.[0].HostConfig.CapAdd'
docker inspect <container> | jq '.[0].HostConfig.CapDrop'

# Trace syscall failures
strace -f docker run --rm myimage 2>&1 | grep -i denied

# Check audit logs
ausearch -m avc -ts recent

Common Causes

Seccomp blocking syscall:
- Solution: Add syscall to profile or use --security-opt seccomp=unconfined
Missing capability:
- Solution: --cap-add=SYS_ADMIN (or specific capability)
SELinux/AppArmor denial:
- Solution: Check audit logs, update policy
User namespace UID mapping:
- Solution: Check /etc/subuid, /etc/subgid
Read-only filesystem:
- Solution: --read-only with appropriate tmpfs mounts

Scenario 2: High Memory Usage Mystery

Situation: Container shows 2GB used, but application reports only 500MB heap.

Memory Accounting Deep Dive

# Get detailed memory breakdown
cat /sys/fs/cgroup/<path>/memory.stat

# Key fields:
# anon - Anonymous memory (heap, stack)
# file - Page cache (file-backed)
# kernel - Kernel memory charged to cgroup
# slab - Slab allocations

# Check for file cache
# This is often the "missing" memory!
grep -E "^(anon|file|kernel)" /sys/fs/cgroup/<path>/memory.stat

# Inside container:
cat /proc/meminfo | grep -E "Cached|Buffers|Slab"

# Check for memory mapped files
cat /proc/<pid>/maps | grep -v "00000000 00:00" | wc -l

Common causes of discrepancy:

Page cache: Files read by application cached in memory
- Shows in cgroup, not in application heap
- Will be reclaimed under pressure
Memory-mapped files: Libraries, data files
- mmap’d but not all pages resident
Slab memory: Kernel allocations for this cgroup
- Network buffers, file system metadata
Shared memory: Multiple processes sharing
- Charged once but used by many

Quick Reference: Commands for Interviews

# Process investigation
ps aux --forest                    # Process tree
cat /proc/<pid>/status             # Detailed status
cat /proc/<pid>/maps               # Memory mappings
ls -la /proc/<pid>/fd/             # Open files
cat /proc/<pid>/stack              # Kernel stack

# Memory investigation  
free -h                            # Memory overview
cat /proc/meminfo                  # Detailed memory stats
vmstat 1 5                         # Virtual memory stats
slabtop                            # Slab allocations
cat /proc/buddyinfo                # Fragmentation

# CPU investigation
top -H                             # Per-thread CPU
mpstat -P ALL 1                    # Per-CPU stats
perf top                           # Live profiling
pidstat 1                          # Per-process stats

# Disk I/O
iostat -xz 1                       # Disk stats
iotop                              # Per-process I/O
cat /proc/<pid>/io                 # Process I/O

# Network
ss -tlnp                           # Listening sockets
ss -anp                            # All connections
cat /proc/net/dev                  # Interface stats
nstat                              # Network statistics

# Container/cgroup
cat /sys/fs/cgroup/<path>/memory.current
cat /sys/fs/cgroup/<path>/cpu.stat
cat /sys/fs/cgroup/<path>/io.stat

# Tracing
strace -p <pid>                    # Syscall tracing
ltrace -p <pid>                    # Library tracing
perf trace -p <pid>                # Fast syscall tracing
bpftrace -e '...'                  # eBPF tracing

Key Interview Tips

Think Out Loud

Explain your reasoning as you work through problems. Interviewers want to see your thought process.

Start Simple

Begin with the simplest approach, then discuss trade-offs and optimizations.

Know the Stack

Be ready to go from application to syscall to kernel to hardware.

Practice Debugging

Work through real debugging scenarios. This experience shows in interviews.

Next: Hands-on Projects →

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Real Interview Questions

​Company-Specific Patterns

​Observability Company Questions

​Question 1: Implement a Syscall Counter (Datadog-style)

​Question 2: Debug High Latency in Production

​Question 3: Container Memory Behavior

​Infrastructure Company Questions

​Question 4: Network Stack Performance (Cloudflare-style)

​Question 5: CPU Isolation for Low Latency

​System Design with Kernel Awareness

​Question 6: Design a Container Metrics Collector

​Debugging Scenarios

​Scenario 1: Container Not Starting

​Scenario 2: High Memory Usage Mystery

​Quick Reference: Commands for Interviews

​Key Interview Tips

Think Out Loud

Start Simple

Know the Stack

Practice Debugging

Real Interview Questions

Company-Specific Patterns

Observability Company Questions

Question 1: Implement a Syscall Counter (Datadog-style)

Question 2: Debug High Latency in Production

Question 3: Container Memory Behavior

Infrastructure Company Questions

Question 4: Network Stack Performance (Cloudflare-style)

Question 5: CPU Isolation for Low Latency

System Design with Kernel Awareness

Question 6: Design a Container Metrics Collector

Debugging Scenarios

Scenario 1: Container Not Starting

Scenario 2: High Memory Usage Mystery

Quick Reference: Commands for Interviews

Key Interview Tips