YARN Resource Management
Module Duration : 3-4 hours
Focus : Architecture, Schedulers, Configuration
Prerequisites : MapReduce understanding from Module 3
Introduction to YARN
YARN (Yet Another Resource Negotiator) was introduced in Hadoop 2.0 to solve fundamental limitations of Hadoop 1.x:
Hadoop 1.x Problems :
Fixed map/reduce slots → Resource underutilization
JobTracker overload → Scalability limit at ~4000 nodes
Only MapReduce → Can’t run other frameworks
YARN Solution : Separate resource management from application logic
YARN Architecture
Component Overview
┌──────────────────────────────────────────────────────────┐
│ YARN CLUSTER │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ ResourceManager (RM) │ │
│ │ • Scheduler (allocates resources) │ │
│ │ • ApplicationsManager (accepts jobs) │ │
│ │ • Resource tracker (monitors NMs) │ │
│ └───────────────┬────────────────────────────────┘ │
│ │ │
│ ┌────────┼─────────┬─────────────┐ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────┐┌──────────┐┌──────────┐┌──────────┐ │
│ │NodeManager││NodeManager││NodeManager││NodeManager│ │
│ │ ││ ││ ││ │ │
│ │┌────────┐││┌────────┐││┌────────┐││┌────────┐│ │
│ ││Container│││Container│││Container│││Container││ │
│ ││ (App │││ (App │││ (AM for │││ (Task) ││ │
│ ││ Master) │││ Master) │││ App C) │││ ││ │
│ │└────────┘││└────────┘││└────────┘││└────────┘│ │
│ │┌────────┐││┌────────┐││┌────────┐││┌────────┐│ │
│ ││Container│││Container│││Container│││Container││ │
│ ││ (Task) │││ (Task) │││ (Task) │││ (Task) ││ │
│ │└────────┘││└────────┘││└────────┘││└────────┘│ │
│ └──────────┘└──────────┘└──────────┘└──────────┘ │
└──────────────────────────────────────────────────────────┘
ResourceManager (RM)
The master daemon that manages:
Scheduler : Allocates resources to applications
Purely scheduling (no monitoring/fault tolerance)
Pluggable: FIFO, Capacity, Fair schedulers
ApplicationsManager :
Accepts job submissions
Negotiates first container for ApplicationMaster
Restarts ApplicationMaster on failure
NodeManager (NM)
Per-machine agent that:
Monitors resource usage (CPU, memory, disk, network)
Reports to ResourceManager via heartbeats
Launches and monitors containers
Cleans up container processes
ApplicationMaster (AM)
Per-application process that:
Negotiates resources from ResourceManager
Works with NodeManager(s) to execute tasks
Monitors task progress
Handles task failures
Key Insight : Each application has its own AM → Framework-specific logic isolated
Container
The unit of resource allocation:
Container = CPU cores + Memory + (optionally) GPU/disk
Example: <2 cores, 4GB RAM>
YARN Application Lifecycle
Submitting a MapReduce Job
Step 1: Client → ResourceManager
"Submit job: wordcount.jar"
Step 2: RM → Client
"Application ID: app_001"
Step 3: Client uploads:
• Job JAR to HDFS
• Job configuration
• Input splits
Step 4: Client → RM
"Start application app_001"
Step 5: RM Scheduler
Chooses NodeManager for ApplicationMaster
Allocates container for AM
Step 6: RM → NodeManager_1
"Launch container for AM of app_001"
Step 7: NodeManager_1
Launches ApplicationMaster (MRAppMaster for MapReduce)
Step 8: ApplicationMaster → RM
"Register application app_001"
"Request containers for map/reduce tasks"
Step 9: RM Scheduler
Allocates containers based on resource availability
Step 10: RM → ApplicationMaster
"Allocated containers on: [NM_2, NM_3, NM_4...]"
Step 11: ApplicationMaster → NodeManagers
"Launch map/reduce tasks in containers"
Step 12: NodeManagers
Launch tasks, report progress to ApplicationMaster
Step 13: ApplicationMaster
Monitors tasks, handles failures
Requests more containers if needed
Step 14: ApplicationMaster → RM
"Application complete, unregister"
Step 15: RM
Frees all allocated containers
Fault Tolerance
ApplicationMaster Failure :
1. ResourceManager detects AM heartbeat timeout
2. RM kills all containers for that application
3. RM restarts AM on different NodeManager
4. New AM recovers job state from HDFS (if available)
5. Re-runs failed/incomplete tasks
Task Failure :
1. ApplicationMaster detects task failure
2. AM requests new container from RM
3. Retries task (up to configured limit)
4. If task fails 4 times → Job fails
NodeManager Failure :
1. ResourceManager detects NM heartbeat timeout
2. RM marks all containers on NM as failed
3. ApplicationMasters re-request containers
4. Tasks rescheduled on healthy nodes
YARN Schedulers
1. FIFO Scheduler
Simplest scheduler: First-In-First-Out queue.
Queue: [Job A] [Job B] [Job C]
Resources:
████████████████████ (Job A gets all resources)
When Job A completes:
████████████████████ (Job B gets all resources)
Use Case : Single-user clusters, simple testing
Limitations : No resource sharing, large jobs block small ones
2. Capacity Scheduler
Multiple queues with guaranteed capacity.
Configuration (capacity-scheduler.xml):
< configuration >
<!-- Define queues -->
< property >
< name > yarn.scheduler.capacity.root.queues </ name >
< value > production,development,research </ value >
</ property >
<!-- Production queue: 60% capacity -->
< property >
< name > yarn.scheduler.capacity.root.production.capacity </ name >
< value > 60 </ value >
</ property >
<!-- Development queue: 25% capacity -->
< property >
< name > yarn.scheduler.capacity.root.development.capacity </ name >
< value > 25 </ value >
</ property >
<!-- Research queue: 15% capacity -->
< property >
< name > yarn.scheduler.capacity.root.research.capacity </ name >
< value > 15 </ value >
</ property >
<!-- Maximum capacity (can use idle resources) -->
< property >
< name > yarn.scheduler.capacity.root.production.maximum-capacity </ name >
< value > 80 </ value >
</ property >
<!-- User limits -->
< property >
< name > yarn.scheduler.capacity.root.production.user-limit-factor </ name >
< value > 2 </ value >
</ property >
</ configuration >
Behavior :
Total Cluster: 100GB RAM, 50 cores
Guaranteed:
production: 60GB, 30 cores (guaranteed)
development: 25GB, 12 cores (guaranteed)
research: 15GB, 8 cores (guaranteed)
If development is idle:
production can use up to 80% (elastic capacity)
Use Case : Multi-tenant clusters with SLAs
3. Fair Scheduler
Dynamically balances resources across applications.
Configuration (fair-scheduler.xml):
< allocations >
<!-- Pool for production jobs -->
< pool name = "production" >
< minResources > 10000 mb, 10 vcores </ minResources >
< maxResources > 90000 mb, 100 vcores </ maxResources >
< maxRunningApps > 50 </ maxRunningApps >
< weight > 3.0 </ weight >
< schedulingPolicy > fair </ schedulingPolicy >
</ pool >
<!-- Pool for ad-hoc queries -->
< pool name = "adhoc" >
< minResources > 5000 mb, 5 vcores </ minResources >
< maxResources > 30000 mb, 30 vcores </ maxResources >
< maxRunningApps > 10 </ maxRunningApps >
< weight > 1.0 </ weight >
< schedulingPolicy > fifo </ schedulingPolicy >
</ pool >
<!-- User limits -->
< user name = "analyst1" >
< maxRunningApps > 5 </ maxRunningApps >
</ user >
<!-- Default pool settings -->
< poolAllocationPolicy >
< defaultMinSharePreemptionTimeout > 300 </ defaultMinSharePreemptionTimeout >
< defaultFairSharePreemptionTimeout > 600 </ defaultFairSharePreemptionTimeout >
</ poolAllocationPolicy >
</ allocations >
Fair Sharing Example :
Scenario: 2 users, 100GB total cluster memory
User A submits 1 job:
Job A gets: 100GB (all available)
User B submits 1 job:
Fair scheduler rebalances:
Job A: 50GB
Job B: 50GB
User A submits 2nd job:
Fair scheduler rebalances:
Job A1: 33GB
Job A2: 33GB
Job B: 34GB
Use Case : Shared clusters for interactive analytics (Hive, Spark)
Configuring YARN
yarn-site.xml
< configuration >
<!-- ResourceManager address -->
< property >
< name > yarn.resourcemanager.hostname </ name >
< value > rm.example.com </ value >
</ property >
<!-- Scheduler class -->
< property >
< name > yarn.resourcemanager.scheduler.class </ name >
< value > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler </ value >
</ property >
<!-- NodeManager resources -->
< property >
< name > yarn.nodemanager.resource.memory-mb </ name >
< value > 8192 </ value >
< description > Physical RAM available to YARN (8GB) </ description >
</ property >
< property >
< name > yarn.nodemanager.resource.cpu-vcores </ name >
< value > 8 </ value >
< description > CPU cores available to YARN </ description >
</ property >
<!-- Container limits -->
< property >
< name > yarn.scheduler.minimum-allocation-mb </ name >
< value > 1024 </ value >
< description > Minimum container memory (1GB) </ description >
</ property >
< property >
< name > yarn.scheduler.maximum-allocation-mb </ name >
< value > 8192 </ value >
< description > Maximum container memory (8GB) </ description >
</ property >
<!-- ApplicationMaster resources -->
< property >
< name > yarn.app.mapreduce.am.resource.mb </ name >
< value > 1536 </ value >
< description > Memory for MR ApplicationMaster </ description >
</ property >
<!-- Enable log aggregation -->
< property >
< name > yarn.log-aggregation-enable </ name >
< value > true </ value >
</ property >
< property >
< name > yarn.log-aggregation.retain-seconds </ name >
< value > 604800 </ value >
< description > Keep logs for 7 days </ description >
</ property >
</ configuration >
MapReduce Resource Configuration
mapred-site.xml :
< configuration >
<!-- Map task memory -->
< property >
< name > mapreduce.map.memory.mb </ name >
< value > 2048 </ value >
</ property >
<!-- Reduce task memory -->
< property >
< name > mapreduce.reduce.memory.mb </ name >
< value > 4096 </ value >
</ property >
<!-- JVM heap for map tasks -->
< property >
< name > mapreduce.map.java.opts </ name >
< value > -Xmx1638m </ value >
< description > 80% of map container memory </ description >
</ property >
<!-- JVM heap for reduce tasks -->
< property >
< name > mapreduce.reduce.java.opts </ name >
< value > -Xmx3276m </ value >
< description > 80% of reduce container memory </ description >
</ property >
</ configuration >
Monitoring YARN
ResourceManager Web UI
Access: http://resourcemanager:8088
Key Metrics :
Cluster metrics: Total memory, cores, available resources
Running applications
Queue utilization
NodeManager status
# List running applications
yarn application -list
# Get application details
yarn application -status application_1234567890_0001
# Kill application
yarn application -kill application_1234567890_0001
# View logs
yarn logs -applicationId application_1234567890_0001
# NodeManager status
yarn node -list -all
# Queue information
yarn queue -status production
Programmatic Monitoring
import org.apache.hadoop.yarn.api.records. * ;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.conf.YarnConfiguration;
public class YARNMonitor {
public static void main ( String [] args ) throws Exception {
YarnConfiguration conf = new YarnConfiguration ();
YarnClient yarnClient = YarnClient . createYarnClient ();
yarnClient . init (conf);
yarnClient . start ();
// List all applications
List < ApplicationReport > apps =
yarnClient . getApplications ();
for ( ApplicationReport app : apps) {
System . out . println ( "Application ID: " + app . getApplicationId ());
System . out . println ( "Name: " + app . getName ());
System . out . println ( "User: " + app . getUser ());
System . out . println ( "Queue: " + app . getQueue ());
System . out . println ( "State: " + app . getYarnApplicationState ());
System . out . println ( "Progress: " +
( app . getProgress () * 100 ) + "%" );
System . out . println ( "Tracking URL: " + app . getTrackingUrl ());
System . out . println ( "---" );
}
// Get cluster metrics
YarnClusterMetrics metrics = yarnClient . getYarnClusterMetrics ();
System . out . println ( " \n Cluster Metrics:" );
System . out . println ( "NodeManagers: " +
metrics . getNumNodeManagers ());
yarnClient . stop ();
}
}
Advanced: Writing a Custom YARN Application
Basic skeleton for a custom YARN application:
import org.apache.hadoop.yarn.api.records. * ;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.client.api.YarnClientApplication;
public class CustomYARNApp {
public static void main ( String [] args ) throws Exception {
YarnConfiguration conf = new YarnConfiguration ();
YarnClient yarnClient = YarnClient . createYarnClient ();
yarnClient . init (conf);
yarnClient . start ();
// Create application
YarnClientApplication app = yarnClient . createApplication ();
ApplicationSubmissionContext appContext =
app . getApplicationSubmissionContext ();
ApplicationId appId = appContext . getApplicationId ();
// Set application name and queue
appContext . setApplicationName ( "CustomApp" );
appContext . setQueue ( "default" );
// Configure ApplicationMaster container
ContainerLaunchContext amContainer =
Records . newRecord ( ContainerLaunchContext . class );
// Set AM command
List < String > commands = new ArrayList <>();
commands . add ( "$JAVA_HOME/bin/java" +
" -Xmx256M" +
" com.example.ApplicationMaster" +
" 1>" + ApplicationConstants . LOG_DIR_EXPANSION_VAR +
"/stdout" +
" 2>" + ApplicationConstants . LOG_DIR_EXPANSION_VAR +
"/stderr" );
amContainer . setCommands (commands);
// Set AM resources
Resource capability = Records . newRecord ( Resource . class );
capability . setMemory ( 512 );
capability . setVirtualCores ( 1 );
appContext . setResource (capability);
// Set AM container spec
appContext . setAMContainerSpec (amContainer);
// Submit application
yarnClient . submitApplication (appContext);
System . out . println ( "Submitted application " + appId);
yarnClient . stop ();
}
}
Best Practices
Resource Sizing
Calculate container size :
Container Memory = Task Memory + Overhead
Example:
Task needs 2GB heap
Container size = 2GB + 384MB (overhead) = 2.4GB
Round up to: 2.5GB (2560MB)
Queue Configuration
Single-tenant cluster:
→ FIFO scheduler (simplest)
Multi-tenant with SLAs:
→ Capacity scheduler
→ Guarantee minimum resources per team
Research/development cluster:
→ Fair scheduler
→ Dynamic resource sharing
Monitoring and Alerts
Set up alerts for:
High queue utilization (>80%)
Long-running jobs (potential issues)
Failed applications
NodeManager failures
Interview Focus
Key Questions :
“How does YARN improve upon Hadoop 1.x?”
Separates resource management from processing
Supports non-MapReduce applications
Better scalability (10,000+ nodes)
“Explain the role of ApplicationMaster”
Per-application coordinator
Negotiates resources from RM
Monitors tasks, handles failures
“Fair vs Capacity scheduler?”
Fair: Equal share, preemption, better for interactive
Capacity: Guaranteed capacity, hierarchical queues, better for SLAs
What’s Next?
Module 5: Hadoop Ecosystem & Integration Explore Hive, Pig, HBase, and other ecosystem tools