Performance optimization in Hadoop is both an art and a science. A well-tuned Hadoop cluster can process data orders of magnitude faster than a poorly configured one. This chapter provides comprehensive strategies for maximizing Hadoop performance across all components.
Chapter Goals:
Master configuration tuning for HDFS, MapReduce, and YARN
Understand JVM optimization and garbage collection tuning
Learn compression strategies and their performance impact
Impact: 1 million small files consume 2000x more memory than one large file of the same total size. At scale (billions of files), this causes the NameNode to run out of RAM, regardless of physical disk capacity.
<!-- hdfs-site.xml --><!-- Block Size (default: 128MB) --><property> <name>dfs.blocksize</name> <value>134217728</value> <!-- 128MB for general workloads --> <description> Larger blocks (256MB+) for sequential analytics Default (128MB) for mixed workloads Smaller blocks (64MB) for many small jobs </description></property><!-- Replication Factor --><property> <name>dfs.replication</name> <value>3</value> <description> 3 for production (balance reliability/space) 2 for temporary data 1 for easily reproducible data </description></property><!-- DataNode Handler Threads --><property> <name>dfs.datanode.handler.count</name> <value>10</value> <!-- Increase for high concurrency --> <description> Number of threads to handle RPC requests Increase for high concurrent job count Formula: max(10, log2(cluster_size) * 20) </description></property><!-- NameNode Handler Threads --><property> <name>dfs.namenode.handler.count</name> <value>100</value> <!-- Critical for metadata operations --> <description> More handlers = better metadata operation concurrency Formula: 20 * ln(cluster_size) </description></property>
<!-- Performance I/O Settings --><!-- Read Buffer Size --><property> <name>io.file.buffer.size</name> <value>131072</value> <!-- 128KB --> <description> Buffer size for read operations 64KB for low memory, 128KB standard, 256KB for high throughput </description></property><!-- Write Packet Size --><property> <name>dfs.client-write-packet-size</name> <value>524288</value> <!-- 512KB --> <description> Larger packets reduce overhead but increase memory </description></property><!-- DataNode Transfer Threads --><property> <name>dfs.datanode.max.transfer.threads</name> <value>8192</value> <description> Maximum threads for data transfer Increase for high concurrent reads/writes </description></property><!-- Balance Bandwidth --><property> <name>dfs.datanode.balance.bandwidthPerSec</name> <value>10485760</value> <!-- 10 MB/s --> <description> Bandwidth for rebalancing operations Higher values = faster rebalancing but more impact </description></property>
<!-- Advanced Performance Settings --><!-- Short Circuit Reads --><property> <name>dfs.client.read.shortcircuit</name> <value>true</value> <description> Bypass DataNode protocol for local reads Requires Unix domain sockets </description></property><property> <name>dfs.domain.socket.path</name> <value>/var/lib/hadoop-hdfs/dn_socket</value></property><!-- Caching --><property> <name>dfs.datanode.max.locked.memory</name> <value>4294967296</value> <!-- 4GB --> <description> Memory for centralized cache management </description></property><!-- Erasure Coding (Hadoop 3.x) --><property> <name>dfs.namenode.ec.policies.enabled</name> <value>RS-6-3-1024k</value> <description> Reed-Solomon 6+3: 50% storage overhead vs 200% for replication Use for cold data </description></property>
Best For: General purpose, predictable pause times
# G1 Garbage Collector Configuration# Basic G1GC-XX:+UseG1GC-XX:MaxGCPauseMillis=200 # Target max pause time-XX:ParallelGCThreads=20 # Parallel GC threads-XX:ConcGCThreads=5 # Concurrent marking threads-XX:InitiatingHeapOccupancyPercent=45 # When to start marking# G1 Region Sizing-XX:G1HeapRegionSize=32m # Region size (1-32MB)# Detailed Logging-XX:+PrintGCDetails-XX:+PrintGCTimeStamps-XX:+PrintGCDateStamps-XX:+PrintAdaptiveSizePolicy-Xloggc:/var/log/hadoop/gc.log
Tuning Tips:
Increase ParallelGCThreads for more cores
Lower InitiatingHeapOccupancyPercent if seeing long pauses
Increase MaxGCPauseMillis if throughput is priority
CMS (Older, Still Used)
Best For: Low latency requirements, older Hadoop versions
# Concurrent Mark Sweep Configuration-XX:+UseConcMarkSweepGC-XX:+UseParNewGC # Parallel young generation-XX:CMSInitiatingOccupancyFraction=70 # When to start CMS-XX:+UseCMSInitiatingOccupancyOnly-XX:+CMSParallelRemarkEnabled # Parallel remark phase-XX:+CMSScavengeBeforeRemark # Clean young gen before remark-XX:ParallelGCThreads=20-XX:ConcGCThreads=5# Young Generation Sizing-XX:NewRatio=2 # Old:Young = 2:1-XX:SurvivorRatio=8 # Eden:Survivor = 8:1
Issues:
Fragmentation over time
Occasional full GC pauses
Deprecated in Java 14+
Parallel GC (Throughput)
Best For: Batch processing, throughput over latency
# Parallel Throughput Collector-XX:+UseParallelGC-XX:ParallelGCThreads=20-XX:+UseAdaptiveSizePolicy # Auto-tune generations-XX:GCTimeRatio=99 # 1% time in GC-XX:MaxGCPauseMillis=100# Generation Sizing-XX:NewRatio=2-XX:SurvivorRatio=8
Characteristics:
High throughput
Longer pause times acceptable
Good for map/reduce tasks
ZGC (Hadoop 3.x, Java 11+)
Best For: Very large heaps (100GB+), ultra-low latency
# Z Garbage Collector (Experimental)-XX:+UseZGC-XX:ZCollectionInterval=60 # Force GC every 60s-XX:ZAllocationSpikeTolerance=2-Xlog:gc*:file=/var/log/hadoop/gc.log
+---------------------------------------------------------------+| COMPRESSION CODEC COMPARISON |+---------------------------------------------------------------+| || Codec Speed Ratio Splittable Use Case || ────────────────────────────────────────────────────────────|| || None ████ 0% ✓ Testing only || Fast None Always Not for production || || LZ4 ████ ~2.0x ✗ Real-time processing|| Fastest Light No* Low-latency || || Snappy ███ ~2.0x ✗ MapReduce output || Fast Light No* Hot data || || Gzip ██ ~2.5x ✗ Cold storage || Medium Good No* Archival || || Bzip2 █ ~3.0x ✓ Large archival || Slow Best Yes Rarely processed || || Zstd ███ ~2.7x ✗ Modern alternative || Fast Good No* Hadoop 3.2+ || || * Can be splittable with block compression in container || formats like SequenceFile, Avro, Parquet, ORC || |+---------------------------------------------------------------+
4. Increase Shuffle Buffer and optimize merge settings.5. Check for Data Skew: Monitor reducer input distribution and implement custom partitioner if needed.
Question 2: Explain compression codec trade-offs in Hadoop. When would you choose Snappy vs Gzip vs Bzip2?
Answer:Snappy:
Speed: Very fast (400+ MB/s)
Ratio: Moderate (2:1)
Use Cases: Map output (ALWAYS), hot data, real-time processing
Gzip:
Speed: Medium (100 MB/s)
Ratio: Good (2.5-3:1)
Use Cases: Final output for archival, cold storage
Bzip2:
Speed: Slow (20 MB/s)
Ratio: Best (3-4:1)
Splittable: YES (natively)
Use Cases: Large files needing parallelism, maximum compression
Decision Matrix:
Frequent processing: Snappy/LZ4
Archival: Gzip or Bzip2
Map output: ALWAYS Snappy
Network transfer: Gzip
Question 3: How do you calculate optimal container size for a YARN cluster?
With performance tuning mastered, you’re ready for the final chapter on production deployment and real-world best practices.Chapter 8 Preview: Production deployment strategies, cluster management, monitoring, security, and real-world use cases.