Skip to main content

YARN Resource Management

Module Duration: 3-4 hours Focus: Architecture, Schedulers, Configuration Prerequisites: MapReduce understanding from Module 3

Introduction to YARN

YARN (Yet Another Resource Negotiator) was introduced in Hadoop 2.0 to solve fundamental limitations of Hadoop 1.x: Hadoop 1.x Problems:
  • Fixed map/reduce slots → Resource underutilization
  • JobTracker overload → Scalability limit at ~4000 nodes
  • Only MapReduce → Can’t run other frameworks
YARN Solution: Separate resource management from application logic

YARN Architecture

Component Overview

┌──────────────────────────────────────────────────────────┐
│                  YARN CLUSTER                            │
│                                                          │
│  ┌────────────────────────────────────────────────┐     │
│  │         ResourceManager (RM)                   │     │
│  │  • Scheduler (allocates resources)             │     │
│  │  • ApplicationsManager (accepts jobs)          │     │
│  │  • Resource tracker (monitors NMs)             │     │
│  └───────────────┬────────────────────────────────┘     │
│                  │                                       │
│         ┌────────┼─────────┬─────────────┐              │
│         │        │         │             │              │
│         ▼        ▼         ▼             ▼              │
│  ┌──────────┐┌──────────┐┌──────────┐┌──────────┐      │
│  │NodeManager││NodeManager││NodeManager││NodeManager│   │
│  │          ││          ││          ││          │      │
│  │┌────────┐││┌────────┐││┌────────┐││┌────────┐│     │
│  ││Container│││Container│││Container│││Container││      │
│  ││ (App    │││ (App    │││ (AM for │││ (Task)  ││     │
│  ││ Master) │││ Master) │││ App C)  │││         ││     │
│  │└────────┘││└────────┘││└────────┘││└────────┘│     │
│  │┌────────┐││┌────────┐││┌────────┐││┌────────┐│     │
│  ││Container│││Container│││Container│││Container││      │
│  ││ (Task)  │││ (Task)  │││ (Task)  │││ (Task)  ││     │
│  │└────────┘││└────────┘││└────────┘││└────────┘│     │
│  └──────────┘└──────────┘└──────────┘└──────────┘      │
└──────────────────────────────────────────────────────────┘

ResourceManager (RM)

The master daemon that manages:
  1. Scheduler: Allocates resources to applications
    • Purely scheduling (no monitoring/fault tolerance)
    • Pluggable: FIFO, Capacity, Fair schedulers
  2. ApplicationsManager:
    • Accepts job submissions
    • Negotiates first container for ApplicationMaster
    • Restarts ApplicationMaster on failure

NodeManager (NM)

Per-machine agent that:
  • Monitors resource usage (CPU, memory, disk, network)
  • Reports to ResourceManager via heartbeats
  • Launches and monitors containers
  • Cleans up container processes

ApplicationMaster (AM)

Per-application process that:
  • Negotiates resources from ResourceManager
  • Works with NodeManager(s) to execute tasks
  • Monitors task progress
  • Handles task failures
Key Insight: Each application has its own AM → Framework-specific logic isolated

Container

The unit of resource allocation:
Container = CPU cores + Memory + (optionally) GPU/disk
Example: <2 cores, 4GB RAM>

YARN Application Lifecycle

Submitting a MapReduce Job

Step 1: Client → ResourceManager
   "Submit job: wordcount.jar"

Step 2: RM → Client
   "Application ID: app_001"

Step 3: Client uploads:
   • Job JAR to HDFS
   • Job configuration
   • Input splits

Step 4: Client → RM
   "Start application app_001"

Step 5: RM Scheduler
   Chooses NodeManager for ApplicationMaster
   Allocates container for AM

Step 6: RM → NodeManager_1
   "Launch container for AM of app_001"

Step 7: NodeManager_1
   Launches ApplicationMaster (MRAppMaster for MapReduce)

Step 8: ApplicationMaster → RM
   "Register application app_001"
   "Request containers for map/reduce tasks"

Step 9: RM Scheduler
   Allocates containers based on resource availability

Step 10: RM → ApplicationMaster
   "Allocated containers on: [NM_2, NM_3, NM_4...]"

Step 11: ApplicationMaster → NodeManagers
   "Launch map/reduce tasks in containers"

Step 12: NodeManagers
   Launch tasks, report progress to ApplicationMaster

Step 13: ApplicationMaster
   Monitors tasks, handles failures
   Requests more containers if needed

Step 14: ApplicationMaster → RM
   "Application complete, unregister"

Step 15: RM
   Frees all allocated containers

Fault Tolerance

ApplicationMaster Failure:
1. ResourceManager detects AM heartbeat timeout
2. RM kills all containers for that application
3. RM restarts AM on different NodeManager
4. New AM recovers job state from HDFS (if available)
5. Re-runs failed/incomplete tasks
Task Failure:
1. ApplicationMaster detects task failure
2. AM requests new container from RM
3. Retries task (up to configured limit)
4. If task fails 4 times → Job fails
NodeManager Failure:
1. ResourceManager detects NM heartbeat timeout
2. RM marks all containers on NM as failed
3. ApplicationMasters re-request containers
4. Tasks rescheduled on healthy nodes

YARN Schedulers

1. FIFO Scheduler

Simplest scheduler: First-In-First-Out queue.
Queue: [Job A] [Job B] [Job C]

Resources:
████████████████████ (Job A gets all resources)

When Job A completes:
████████████████████ (Job B gets all resources)
Use Case: Single-user clusters, simple testing Limitations: No resource sharing, large jobs block small ones

2. Capacity Scheduler

Multiple queues with guaranteed capacity. Configuration (capacity-scheduler.xml):
<configuration>
  <!-- Define queues -->
  <property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>production,development,research</value>
  </property>

  <!-- Production queue: 60% capacity -->
  <property>
    <name>yarn.scheduler.capacity.root.production.capacity</name>
    <value>60</value>
  </property>

  <!-- Development queue: 25% capacity -->
  <property>
    <name>yarn.scheduler.capacity.root.development.capacity</name>
    <value>25</value>
  </property>

  <!-- Research queue: 15% capacity -->
  <property>
    <name>yarn.scheduler.capacity.root.research.capacity</name>
    <value>15</value>
  </property>

  <!-- Maximum capacity (can use idle resources) -->
  <property>
    <name>yarn.scheduler.capacity.root.production.maximum-capacity</name>
    <value>80</value>
  </property>

  <!-- User limits -->
  <property>
    <name>yarn.scheduler.capacity.root.production.user-limit-factor</name>
    <value>2</value>
  </property>
</configuration>
Behavior:
Total Cluster: 100GB RAM, 50 cores

Guaranteed:
  production:  60GB, 30 cores (guaranteed)
  development: 25GB, 12 cores (guaranteed)
  research:    15GB,  8 cores (guaranteed)

If development is idle:
  production can use up to 80% (elastic capacity)
Use Case: Multi-tenant clusters with SLAs

3. Fair Scheduler

Dynamically balances resources across applications. Configuration (fair-scheduler.xml):
<allocations>
  <!-- Pool for production jobs -->
  <pool name="production">
    <minResources>10000 mb, 10 vcores</minResources>
    <maxResources>90000 mb, 100 vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <weight>3.0</weight>
    <schedulingPolicy>fair</schedulingPolicy>
  </pool>

  <!-- Pool for ad-hoc queries -->
  <pool name="adhoc">
    <minResources>5000 mb, 5 vcores</minResources>
    <maxResources>30000 mb, 30 vcores</maxResources>
    <maxRunningApps>10</maxRunningApps>
    <weight>1.0</weight>
    <schedulingPolicy>fifo</schedulingPolicy>
  </pool>

  <!-- User limits -->
  <user name="analyst1">
    <maxRunningApps>5</maxRunningApps>
  </user>

  <!-- Default pool settings -->
  <poolAllocationPolicy>
    <defaultMinSharePreemptionTimeout>300</defaultMinSharePreemptionTimeout>
    <defaultFairSharePreemptionTimeout>600</defaultFairSharePreemptionTimeout>
  </poolAllocationPolicy>
</allocations>
Fair Sharing Example:
Scenario: 2 users, 100GB total cluster memory

User A submits 1 job:
  Job A gets: 100GB (all available)

User B submits 1 job:
  Fair scheduler rebalances:
    Job A: 50GB
    Job B: 50GB

User A submits 2nd job:
  Fair scheduler rebalances:
    Job A1: 33GB
    Job A2: 33GB
    Job B:  34GB
Use Case: Shared clusters for interactive analytics (Hive, Spark)

Configuring YARN

yarn-site.xml

<configuration>
  <!-- ResourceManager address -->
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>rm.example.com</value>
  </property>

  <!-- Scheduler class -->
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
  </property>

  <!-- NodeManager resources -->
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>8192</value>
    <description>Physical RAM available to YARN (8GB)</description>
  </property>

  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>8</value>
    <description>CPU cores available to YARN</description>
  </property>

  <!-- Container limits -->
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
    <description>Minimum container memory (1GB)</description>
  </property>

  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>8192</value>
    <description>Maximum container memory (8GB)</description>
  </property>

  <!-- ApplicationMaster resources -->
  <property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>1536</value>
    <description>Memory for MR ApplicationMaster</description>
  </property>

  <!-- Enable log aggregation -->
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
    <description>Keep logs for 7 days</description>
  </property>
</configuration>

MapReduce Resource Configuration

mapred-site.xml:
<configuration>
  <!-- Map task memory -->
  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>2048</value>
  </property>

  <!-- Reduce task memory -->
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>4096</value>
  </property>

  <!-- JVM heap for map tasks -->
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx1638m</value>
    <description>80% of map container memory</description>
  </property>

  <!-- JVM heap for reduce tasks -->
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx3276m</value>
    <description>80% of reduce container memory</description>
  </property>
</configuration>

Monitoring YARN

ResourceManager Web UI

Access: http://resourcemanager:8088 Key Metrics:
  • Cluster metrics: Total memory, cores, available resources
  • Running applications
  • Queue utilization
  • NodeManager status

Command-Line Tools

# List running applications
yarn application -list

# Get application details
yarn application -status application_1234567890_0001

# Kill application
yarn application -kill application_1234567890_0001

# View logs
yarn logs -applicationId application_1234567890_0001

# NodeManager status
yarn node -list -all

# Queue information
yarn queue -status production

Programmatic Monitoring

import org.apache.hadoop.yarn.api.records.*;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.conf.YarnConfiguration;

public class YARNMonitor {

    public static void main(String[] args) throws Exception {
        YarnConfiguration conf = new YarnConfiguration();
        YarnClient yarnClient = YarnClient.createYarnClient();
        yarnClient.init(conf);
        yarnClient.start();

        // List all applications
        List<ApplicationReport> apps =
            yarnClient.getApplications();

        for (ApplicationReport app : apps) {
            System.out.println("Application ID: " + app.getApplicationId());
            System.out.println("Name: " + app.getName());
            System.out.println("User: " + app.getUser());
            System.out.println("Queue: " + app.getQueue());
            System.out.println("State: " + app.getYarnApplicationState());
            System.out.println("Progress: " +
                (app.getProgress() * 100) + "%");
            System.out.println("Tracking URL: " + app.getTrackingUrl());
            System.out.println("---");
        }

        // Get cluster metrics
        YarnClusterMetrics metrics = yarnClient.getYarnClusterMetrics();
        System.out.println("\nCluster Metrics:");
        System.out.println("NodeManagers: " +
            metrics.getNumNodeManagers());

        yarnClient.stop();
    }
}

Advanced: Writing a Custom YARN Application

Basic skeleton for a custom YARN application:
import org.apache.hadoop.yarn.api.records.*;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.client.api.YarnClientApplication;

public class CustomYARNApp {

    public static void main(String[] args) throws Exception {
        YarnConfiguration conf = new YarnConfiguration();
        YarnClient yarnClient = YarnClient.createYarnClient();
        yarnClient.init(conf);
        yarnClient.start();

        // Create application
        YarnClientApplication app = yarnClient.createApplication();
        ApplicationSubmissionContext appContext =
            app.getApplicationSubmissionContext();
        ApplicationId appId = appContext.getApplicationId();

        // Set application name and queue
        appContext.setApplicationName("CustomApp");
        appContext.setQueue("default");

        // Configure ApplicationMaster container
        ContainerLaunchContext amContainer =
            Records.newRecord(ContainerLaunchContext.class);

        // Set AM command
        List<String> commands = new ArrayList<>();
        commands.add("$JAVA_HOME/bin/java" +
            " -Xmx256M" +
            " com.example.ApplicationMaster" +
            " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR +
                "/stdout" +
            " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR +
                "/stderr");

        amContainer.setCommands(commands);

        // Set AM resources
        Resource capability = Records.newRecord(Resource.class);
        capability.setMemory(512);
        capability.setVirtualCores(1);
        appContext.setResource(capability);

        // Set AM container spec
        appContext.setAMContainerSpec(amContainer);

        // Submit application
        yarnClient.submitApplication(appContext);

        System.out.println("Submitted application " + appId);

        yarnClient.stop();
    }
}

Best Practices

Resource Sizing

Calculate container size:
Container Memory = Task Memory + Overhead

Example:
  Task needs 2GB heap
  Container size = 2GB + 384MB (overhead) = 2.4GB
  Round up to: 2.5GB (2560MB)

Queue Configuration

Single-tenant cluster:
  → FIFO scheduler (simplest)

Multi-tenant with SLAs:
  → Capacity scheduler
  → Guarantee minimum resources per team

Research/development cluster:
  → Fair scheduler
  → Dynamic resource sharing

Monitoring and Alerts

Set up alerts for:
  • High queue utilization (>80%)
  • Long-running jobs (potential issues)
  • Failed applications
  • NodeManager failures

Interview Focus

Key Questions:
  1. “How does YARN improve upon Hadoop 1.x?”
    • Separates resource management from processing
    • Supports non-MapReduce applications
    • Better scalability (10,000+ nodes)
  2. “Explain the role of ApplicationMaster”
    • Per-application coordinator
    • Negotiates resources from RM
    • Monitors tasks, handles failures
  3. “Fair vs Capacity scheduler?”
    • Fair: Equal share, preemption, better for interactive
    • Capacity: Guaranteed capacity, hierarchical queues, better for SLAs

What’s Next?

Module 5: Hadoop Ecosystem & Integration

Explore Hive, Pig, HBase, and other ecosystem tools