CPU Scheduling
The CPU scheduler decides which process/thread runs when, for how long, and on which CPU. Understanding scheduling is essential for performance optimization and is a frequent interview topic for systems roles.Interview Frequency: High
Key Topics: Scheduling algorithms, CFS, real-time, multi-core
Time to Master: 8-10 hours
Key Topics: Scheduling algorithms, CFS, real-time, multi-core
Time to Master: 8-10 hours
Scheduling Goals
Different workloads have different priorities:| Metric | Definition | Important For |
|---|---|---|
| CPU Utilization | % time CPU is busy | Maximizing throughput |
| Throughput | Jobs completed per time | Batch systems |
| Turnaround Time | Submit → Complete | Batch jobs |
| Response Time | Submit → First response | Interactive systems |
| Waiting Time | Time spent in ready queue | Fairness |
| Fairness | Equal CPU distribution | Multi-tenant |
When Does Scheduling Happen?
Scheduling decisions occur at these points:Non-Preemptive vs Preemptive
| Type | When Context Switch Occurs | Examples |
|---|---|---|
| Non-preemptive | Only on block/terminate | Old batch systems, Windows 3.1 |
| Preemptive | Also on timer, higher priority arrival | All modern OS |
Classic Scheduling Algorithms
First-Come, First-Served (FCFS)
The simplest algorithm: run processes in arrival order.Shortest Job First (SJF)
Run the process with shortest burst time:Priority Scheduling
Each process has a priority; highest priority runs first:Round Robin (RR)
Each process gets a fixed time quantum, then goes to back of queue:| Quantum Size | Effect |
|---|---|
| Too small | Too much context switch overhead |
| Too large | Approaches FCFS behavior |
| Rule of thumb | 80% of bursts should complete in one quantum |
Multi-Level Feedback Queue (MLFQ)
The algorithm that combines multiple approaches:- Interactive processes stay responsive
- CPU-bound processes get bulk CPU time
- Adapts to process behavior automatically
- No need to know burst time in advance
Linux Completely Fair Scheduler (CFS)
The default Linux scheduler since 2.6.23:Core Concept: Virtual Runtime
Red-Black Tree for Task Selection
CFS Parameters
| Parameter | Default | Effect |
|---|---|---|
sched_latency_ns | 6ms | Target scheduling period |
sched_min_granularity_ns | 0.75ms | Minimum time slice |
sched_wakeup_granularity_ns | 1ms | Minimum vruntime advantage to preempt |
Nice Values and Weights
Real-Time Scheduling
For tasks with hard deadlines:SCHED_FIFO and SCHED_RR
Real-Time Scheduling Algorithms
- Rate Monotonic
- Earliest Deadline First (EDF)
Static priority based on period: shorter period = higher prioritySchedulability test:
For 3 tasks: U ≤ 0.78 (more tasks → approaches ln(2) ≈ 0.69)
Multi-Core Scheduling
Load Balancing
CPU Affinity
NUMA Considerations
Scheduling Classes in Linux
Interview Deep Dive Questions
Q1: Why doesn't Linux use simple priority scheduling?
Q1: Why doesn't Linux use simple priority scheduling?
Answer:Problems with static priority:
- Starvation: Low-priority processes may never run
- Priority Inversion: Low-priority process holds lock needed by high-priority
- No adaptation: Doesn’t respond to changing process behavior
- Virtual runtime ensures all processes eventually run
- Proportional sharing based on nice values, not absolute priority
- Self-adjusting: Interactive processes naturally maintain low vruntime
- Real-time scheduling (SCHED_FIFO, SCHED_RR)
- Nice values influence CFS weight, not absolute priority
Q2: Explain the difference between turnaround time and response time
Q2: Explain the difference between turnaround time and response time
Answer:
- Response Time: Time from submission to first output/response
- Turnaround Time: Time from submission to completion
- Interactive: Minimize response time (run briefly, often)
- Batch: Minimize turnaround (run to completion, minimize overhead)
Q3: How does CFS handle a process that sleeps frequently?
Q3: How does CFS handle a process that sleeps frequently?
Answer:Scenario: Interactive process (e.g., text editor) sleeps waiting for inputCFS behavior:This prevents:
- While sleeping, vruntime doesn’t increase
- Other processes’ vruntimes increase
- When process wakes, it has relatively low vruntime
- Low vruntime means it’s near left of RB-tree
- Gets scheduled quickly → good response time
- Process sleeping for 1 hour, waking with vruntime = 0
- Monopolizing CPU to “catch up”
Q4: Design a scheduler for a database server
Q4: Design a scheduler for a database server
Answer:Requirements analysis:
- Mix of short queries and long batch jobs
- Predictable latency for user-facing queries
- Efficient use of CPU for analytics
-
Multi-queue with priorities:
- Queue 1: Interactive queries (< 10ms expected)
- Queue 2: Short transactions (< 100ms)
- Queue 3: Long-running analytics
-
Scheduling policy:
- Preemptive within queues
- Higher queues preempt lower
- Time-based demotion (like MLFQ)
-
Adaptive classification:
-
Resource limits:
- Queue 3 gets max 40% CPU (prevents starvation)
- Aging promotes long-waiting queries
Q5: What is priority inversion and how is it solved?
Q5: What is priority inversion and how is it solved?
Answer:Scenario:Solutions:
- Priority Inheritance:
- L inherits H’s priority while holding X
- L can preempt M, finish quickly, release X
- H then runs
- Priority Ceiling:
- Each lock has a ceiling (highest priority that may acquire it)
- Process acquiring lock gets ceiling priority
- Prevents blocking entirely
- Random boosting (Windows):
- Randomly boost low-priority threads
- Probabilistically solves inversion
Practice Exercises
1
Simulate MLFQ
Implement a multi-level feedback queue scheduler simulation. Track queue demotions and response times.
2
Measure Scheduling Latency
Write a program that measures scheduling latency by yielding and timing wake-up.
3
CPU Pinning Experiment
Compare performance of cache-sensitive workload with and without CPU affinity.
4
Priority Inversion Demo
Create a program that demonstrates priority inversion with pthreads and measure the effect.
Key Takeaways
CFS is Default
Linux uses virtual runtime for fair CPU sharing. No explicit priority queue.
MLFQ for Balance
Combines priority, round-robin, and adaptive demotion. Used by many OS.
Real-Time is Different
FIFO/RR with fixed priorities for hard deadlines. Rate Monotonic or EDF.
Multi-Core Adds Complexity
Load balancing, cache affinity, NUMA awareness all matter for performance.
Next: Virtual Memory →