HDFS Architecture & Internals
Module Duration : 4-5 hours
Hands-on Labs : 6 practical exercises
Prerequisites : Understanding of GFS concepts from Module 1
Introduction
HDFS (Hadoop Distributed File System) is the storage foundation of the Hadoop ecosystem. Inspired by Google’s GFS, HDFS provides:
Fault-tolerant storage across commodity hardware
High throughput for large datasets
Scalability to petabytes and beyond
Data locality for efficient processing
In this module, we’ll go beyond theory to understand implementation details and write real code.
HDFS Architecture Overview
The Core Components
┌──────────────────────────────────────────────────────────────┐
│ HDFS CLUSTER │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ NameNode (Master) │ │
│ │ │ │
│ │ • Metadata storage (in-memory) │ │
│ │ • Namespace operations │ │
│ │ • Block mapping │ │
│ │ • Replication policy enforcement │ │
│ │ • Heartbeat & Block reports │ │
│ └────────────┬───────────────────────────┘ │
│ │ │
│ │ Heartbeat (3 sec) + Block Reports │
│ │ │
│ ┌────────────┼────────────┬───────────────┬──────────┐ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────────┐┌──────────┐┌──────────┐┌──────────┐┌────────┐│
│ │DataNode 1││DataNode 2││DataNode 3││DataNode 4││DataNode││
│ │ ││ ││ ││ ││ N ││
│ │ Stores: ││ Stores: ││ Stores: ││ Stores: ││ ││
│ │ Block 1 ││ Block 1 ││ Block 2 ││ Block 2 ││ ... ││
│ │ Block 3 ││ Block 2 ││ Block 3 ││ Block 4 ││ ││
│ │ Block 5 ││ Block 4 ││ Block 5 ││ Block 6 ││ ││
│ └──────────┘└──────────┘└──────────┘└──────────┘└────────┘│
│ │
│ ┌─────────────────────────────────┐ │
│ │ Secondary NameNode │ │
│ │ (Checkpoint creation) │ │
│ └─────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
▲ ▲
│ │
│ │
┌────┴────┐ ┌────┴────┐
│ Client │ │ Client │
│ (Read/ │ │ (Read/ │
│ Write) │ │ Write) │
└─────────┘ └─────────┘
NameNode: The Brain
The NameNode is the single master that manages:
File System Namespace :
Directory tree
File metadata (permissions, timestamps)
Mapping: filename → block IDs
Block Management :
Block locations (which DataNodes store each block)
Replication factor enforcement
Block creation, deletion, and re-replication
Heartbeat Management :
Monitors DataNode health (3-second heartbeats)
Detects failures and initiates recovery
Critical Design Choice : All metadata stored in RAM
Pros : Fast metadata operations (no disk I/O)
Cons : Limits namespace size (1GB RAM ≈ 1 million blocks)
DataNodes: The Workers
DataNodes:
Store actual data blocks on local disk
Send heartbeats to NameNode every 3 seconds
Report block lists periodically (block reports)
Serve read/write requests from clients
Execute replication commands from NameNode
Secondary NameNode: The Misconception
Common Misconception : Secondary NameNode is NOT a backup or standby!
Actual Role : Performs periodic checkpointing
Merges edit logs into FSImage
Reduces NameNode restart time
Does NOT take over if NameNode fails (use HA NameNode for that)
Block Storage Deep Dive
What is a Block?
HDFS splits files into fixed-size blocks:
Default size : 128MB (configurable: 64MB, 256MB, etc.)
Why so large? : Minimize metadata overhead, optimize sequential reads
Example :
File: access.log (350MB)
Split into blocks:
┌─────────────┬─────────────┬─────────────┐
│ Block 1 │ Block 2 │ Block 3 │
│ (128MB) │ (128MB) │ (94MB) │
│ ID: 1001 │ ID: 1002 │ ID: 1003 │
└─────────────┴─────────────┴─────────────┘
Replication Strategy
Each block replicated (default: 3 copies) for fault tolerance.
Replica Placement Algorithm (Rack-Aware):
Block X needs 3 replicas:
Replica 1: Local node (where client writes from)
OR random node (if client outside cluster)
Replica 2: Different node on SAME rack as Replica 1
(same rack-switch, low latency)
Replica 3: Different node on DIFFERENT rack
(survives rack failure)
┌──────────────────────────────────────────────┐
│ Network │
└────────┬─────────────────────────┬───────────┘
│ │
┌────▼─────┐ ┌────▼─────┐
│ Rack 1 │ │ Rack 2 │
│ Switch │ │ Switch │
└────┬─────┘ └────┬─────┘
│ │
┌────┼─────┬─────┐ ┌────┴─────┐
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
DN1 DN2 DN3 DN4 DN5 DN6
↑ ↑ ↑
│ │ │
Replica Replica Replica
1 2 3
Why This Strategy? :
Fault tolerance : Survives node and rack failures
Write performance : 2 replicas on same rack (fast)
Read optimization : Multiple replicas for load balancing
Network efficiency : 1/3 of data crosses racks (not 3/3)
Block Placement Code Example
Here’s how HDFS decides block placement (simplified from actual code):
public class BlockPlacementPolicyDefault {
/**
* Choose target DataNodes for block replicas
* @param numOfReplicas Number of replicas to create
* @param writer Client writing the block (for locality)
* @param excludedNodes Nodes to exclude
* @param blocksize Size of the block
* @return Array of chosen DataNodes
*/
public DatanodeDescriptor [] chooseTarget (
int numOfReplicas ,
Node writer ,
List < DatanodeDescriptor > excludedNodes ,
long blocksize ) {
if (numOfReplicas == 0 || clusterMap . getNumOfLeaves () == 0 ) {
return new DatanodeDescriptor [ 0 ];
}
List < DatanodeDescriptor > results = new ArrayList <>();
// First replica: local to writer if possible
DatanodeDescriptor firstNode = chooseFirstReplica (writer);
results . add (firstNode);
excludedNodes . add (firstNode);
if (numOfReplicas == 1 ) {
return results . toArray ( new DatanodeDescriptor [ 1 ]);
}
// Second replica: different node, same rack as first
DatanodeDescriptor secondNode = chooseLocalRack (
firstNode, excludedNodes, blocksize);
results . add (secondNode);
excludedNodes . add (secondNode);
if (numOfReplicas == 2 ) {
return results . toArray ( new DatanodeDescriptor [ 2 ]);
}
// Third replica: different rack
DatanodeDescriptor thirdNode = chooseRemoteRack (
1 , results, excludedNodes, blocksize);
results . add (thirdNode);
// Additional replicas: distribute randomly
for ( int i = 3 ; i < numOfReplicas; i ++ ) {
DatanodeDescriptor nextNode = chooseRandom (
excludedNodes, blocksize);
results . add (nextNode);
excludedNodes . add (nextNode);
}
return results . toArray ( new DatanodeDescriptor [numOfReplicas]);
}
private DatanodeDescriptor chooseFirstReplica ( Node writer ) {
// If writer is a DataNode, use it
if (writer instanceof DatanodeDescriptor) {
return (DatanodeDescriptor) writer;
}
// Otherwise, choose random node
return chooseRandom ( new ArrayList <>(), 0 );
}
private DatanodeDescriptor chooseLocalRack (
DatanodeDescriptor node ,
List < DatanodeDescriptor > excluded ,
long blocksize ) {
String rackName = node . getNetworkLocation ();
List < DatanodeDescriptor > sameRackNodes =
clusterMap . getDatanodesInRack (rackName);
// Remove excluded nodes
sameRackNodes . removeAll (excluded);
// Choose node with most available space
return chooseBestNode (sameRackNodes, blocksize);
}
private DatanodeDescriptor chooseBestNode (
List < DatanodeDescriptor > nodes ,
long blocksize ) {
DatanodeDescriptor best = null ;
long maxSpace = 0 ;
for ( DatanodeDescriptor node : nodes) {
long available = node . getRemaining ();
if (available > maxSpace && available >= blocksize) {
maxSpace = available;
best = node;
}
}
return best;
}
}
HDFS Read Operation
Let’s trace a complete read operation with detailed code.
The Read Flow
Step 1: Client → NameNode
"Where are the blocks for /data/logs.txt?"
Step 2: NameNode → Client
Block 1: [DN1, DN2, DN5]
Block 2: [DN2, DN3, DN6]
Block 3: [DN3, DN4, DN1]
(sorted by proximity to client)
Step 3: Client → DN1 (closest DataNode)
"Send me Block 1"
Step 4: DN1 → Client
[Streaming Block 1 data...]
Step 5: Client → DN2 (closest for Block 2)
"Send me Block 2"
... and so on
Java Client Code
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.io.IOUtils;
public class HDFSReader {
public static void main ( String [] args ) throws Exception {
// Configuration object loads hdfs-site.xml, core-site.xml
Configuration conf = new Configuration ();
conf . set ( "fs.defaultFS" , "hdfs://namenode:9000" );
// Get FileSystem instance (connects to NameNode)
FileSystem fs = FileSystem . get (conf);
// Path to read
Path filePath = new Path ( "/user/hadoop/data/access.log" );
// Open input stream
// Internally: Contacts NameNode for block locations
FSDataInputStream in = fs . open (filePath);
try {
// Read data
// Internally: Connects to DataNodes sequentially
IOUtils . copyBytes (in, System . out , 4096 , false );
// Seek to specific position (if needed)
in . seek ( 1024000 ); // Jump to byte offset 1024000
byte [] buffer = new byte [ 1024 ];
in . read (buffer);
} finally {
IOUtils . closeStream (in);
}
fs . close ();
}
}
Here’s what happens inside fs.open():
public class DFSInputStream extends FSInputStream {
private DFSClient dfsClient ;
private String src ; // File path
private LocatedBlocks locatedBlocks ; // Block locations from NN
private long fileLength ;
private long pos = 0 ; // Current position in file
public DFSInputStream ( DFSClient dfsClient , String src ) {
this . dfsClient = dfsClient;
this . src = src;
// Contact NameNode to get block locations
openInfo ();
}
private void openInfo () {
// RPC call to NameNode
LocatedBlocks newInfo =
dfsClient . namenode . getBlockLocations (src, 0 , fileLength);
this . locatedBlocks = newInfo;
this . fileLength = newInfo . getFileLength ();
}
@ Override
public int read ( byte [] buf , int off , int len ) throws IOException {
// Find which block contains current position
LocatedBlock targetBlock = getBlockAt (pos);
// Choose best DataNode (closest, most available)
DatanodeInfo chosenNode = getBestNode (targetBlock);
// Connect to DataNode and read
BlockReader reader = getBlockReader (chosenNode, targetBlock);
int bytesRead = reader . read (buf, off, len);
pos += bytesRead;
return bytesRead;
}
private DatanodeInfo getBestNode ( LocatedBlock block ) {
DatanodeInfo [] nodes = block . getLocations ();
// Sort by network distance (local node preferred)
Arrays . sort (nodes, new Comparator < DatanodeInfo >() {
public int compare ( DatanodeInfo a , DatanodeInfo b ) {
int distA = networkDistance (clientNode, a);
int distB = networkDistance (clientNode, b);
return distA - distB;
}
});
return nodes[ 0 ]; // Closest node
}
private int networkDistance ( Node a , Node b ) {
// Same node: distance 0
// Same rack: distance 2
// Different rack: distance 4
String pathA = a . getNetworkLocation ();
String pathB = b . getNetworkLocation ();
if ( a . equals (b)) return 0 ;
if ( pathA . equals (pathB)) return 2 ;
return 4 ;
}
}
Key Optimizations :
Data locality : Reads from closest replica
Streaming : Reads one block at a time (doesn’t load entire file)
Retry logic : Switches to another replica on failure
Checksum verification : Validates data integrity
HDFS Write Operation
Writing is more complex due to replication pipeline.
The Write Flow
Step 1: Client → NameNode
"I want to create /data/new.log"
Step 2: NameNode checks permissions, quota
Creates file entry in namespace
Returns: "Proceed"
Step 3: Client buffers first 64KB of data
Step 4: Client → NameNode
"Allocate a block for /data/new.log"
Step 5: NameNode → Client
"Block ID 1234, write to [DN1, DN2, DN5]"
Step 6: Client → DN1
"Store block 1234, forward to DN2"
[Streams data to DN1]
Step 7: DN1 → DN2
"Store block 1234, forward to DN5"
[Streams data to DN2]
Step 8: DN2 → DN5
"Store block 1234"
[Streams data to DN5]
Step 9: DN5 → DN2 → DN1 → Client
"ACK: Block written successfully"
Repeat Steps 4-9 for each block
Write Pipeline Architecture
Client
│
│ (1) Write packet
▼
┌────────────┐
│ DataNode │───(2) Forward packet────┐
│ 1 │ │
│ │◄──(4) ACK──────────┐ │
└────────────┘ │ │
│ ▼
┌────────────┐
│ DataNode │──(3) Forward──┐
│ 2 │ │
│ │◄──(5) ACK──┐ │
└────────────┘ │ │
│ ▼
┌────────────┐
│ DataNode │
│ 3 │
│ │
└────────────┘
│
│ (6) ACK
▼
Pipeline complete
Java Write Code
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataOutputStream;
public class HDFSWriter {
public static void main ( String [] args ) throws Exception {
Configuration conf = new Configuration ();
conf . set ( "fs.defaultFS" , "hdfs://namenode:9000" );
FileSystem fs = FileSystem . get (conf);
Path filePath = new Path ( "/user/hadoop/output/results.txt" );
// Create file (overwrites if exists)
FSDataOutputStream out = fs . create (filePath);
// Or append to existing file
// FSDataOutputStream out = fs.append(filePath);
try {
String data = "Processing complete \n " ;
out . write ( data . getBytes ( "UTF-8" ));
// Flush to ensure data written
out . flush ();
// Optional: Force sync to DataNodes
out . hsync (); // Guarantees data visible to other readers
} finally {
out . close (); // Also flushes and syncs
}
fs . close ();
}
}
Advanced: Custom Replication Factor
public class CustomReplication {
public static void writeWithReplication (
String hdfsPath ,
byte [] data ,
short replicationFactor ) throws Exception {
Configuration conf = new Configuration ();
FileSystem fs = FileSystem . get (conf);
Path path = new Path (hdfsPath);
// Specify replication factor at creation
FSDataOutputStream out = fs . create (
path,
true , // overwrite
4096 , // buffer size
replicationFactor, // custom replication
134217728 // block size (128MB)
);
out . write (data);
out . close ();
fs . close ();
}
public static void changeReplication (
String hdfsPath ,
short newReplication ) throws Exception {
Configuration conf = new Configuration ();
FileSystem fs = FileSystem . get (conf);
Path path = new Path (hdfsPath);
// Change replication factor of existing file
boolean success = fs . setReplication (path, newReplication);
if (success) {
System . out . println ( "Replication changed to " + newReplication);
}
fs . close ();
}
}
Fault Tolerance Mechanisms
DataNode Failure
Detection :
NameNode expects heartbeat every 3 seconds
If no heartbeat for 10 minutes (default):
→ Mark DataNode as dead
→ Identify under-replicated blocks
→ Schedule re-replication
Recovery Process :
public class NameNodeReplication {
private void handleDeadDataNode ( DatanodeDescriptor deadNode ) {
// Get all blocks stored on dead node
List < Block > affectedBlocks = deadNode . getBlockList ();
for ( Block block : affectedBlocks) {
// Check current replication level
int currentReplicas = countReplicas (block);
int targetReplicas = block . getReplication ();
if (currentReplicas < targetReplicas) {
// Under-replicated! Schedule re-replication
addToReplicationQueue (block,
targetReplicas - currentReplicas);
}
}
// Remove dead node from cluster
removeDataNode (deadNode);
}
private void processReplicationQueue () {
while ( ! replicationQueue . isEmpty ()) {
Block block = replicationQueue . poll ();
// Find nodes with existing replicas
List < DatanodeDescriptor > sources =
getDataNodesForBlock (block);
// Choose target for new replica
DatanodeDescriptor target =
chooseTargetDataNode (sources);
// Command source to replicate to target
sources . get ( 0 ). addBlockToReplicate (block, target);
}
}
}
NameNode Failure (High Availability)
Problem : NameNode is single point of failure
Solution : HA NameNode with ZooKeeper
┌─────────────────────────────────────────────┐
│ ZooKeeper Cluster │
│ (Coordination & Leader Election) │
└──────────────┬──────────────────────────────┘
│
┌──────┴──────┐
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Active │ │ Standby │
│ NameNode │ │ NameNode │
│ │ │ │
│ (Serves │ │ (Tails edit │
│ requests) │ │ log, ready │
│ │ │ to takeover)│
└──────┬───────┘ └──────┬───────┘
│ │
│ │
▼ ▼
┌──────────────────────────────┐
│ Shared Edit Log │
│ (on JournalNodes or NFS) │
└──────────────────────────────┘
Automatic Failover :
Active NameNode crashes
ZooKeeper detects failure
Standby NameNode promoted to Active
Clients automatically redirect to new Active
HDFS Configuration
Essential Configuration Files
core-site.xml (Cluster-wide settings):
< configuration >
<!-- NameNode URI -->
< property >
< name > fs.defaultFS </ name >
< value > hdfs://namenode.example.com:9000 </ value >
</ property >
<!-- Temporary directory -->
< property >
< name > hadoop.tmp.dir </ name >
< value > /var/hadoop/tmp </ value >
</ property >
</ configuration >
hdfs-site.xml (HDFS-specific):
< configuration >
<!-- Replication factor -->
< property >
< name > dfs.replication </ name >
< value > 3 </ value >
</ property >
<!-- Block size (128MB) -->
< property >
< name > dfs.blocksize </ name >
< value > 134217728 </ value >
</ property >
<!-- NameNode data directory -->
< property >
< name > dfs.namenode.name.dir </ name >
< value > file:///var/hadoop/namenode </ value >
</ property >
<!-- DataNode data directories -->
< property >
< name > dfs.datanode.data.dir </ name >
< value > file:///disk1/hadoop,file:///disk2/hadoop </ value >
</ property >
<!-- Permissions (disable for testing only!) -->
< property >
< name > dfs.permissions.enabled </ name >
< value > true </ value >
</ property >
</ configuration >
Hands-on Labs
Lab 1: HDFS CLI Operations
# Create directory
hdfs dfs -mkdir -p /user/hadoop/input
# Upload file
hdfs dfs -put localfile.txt /user/hadoop/input/
# List files with block info
hdfs fsck /user/hadoop/input/localfile.txt -files -blocks -locations
# Check file stats
hdfs dfs -stat "%b %r %n" /user/hadoop/input/localfile.txt
# Output: blocksize replication filename
# Change replication
hdfs dfs -setrep -w 5 /user/hadoop/input/localfile.txt
# View file content
hdfs dfs -cat /user/hadoop/input/localfile.txt
# Get file to local
hdfs dfs -get /user/hadoop/input/localfile.txt ./downloaded.txt
# Check cluster health
hdfs dfsadmin -report
# Balance cluster
hdfs balancer -threshold 10
Lab 2: Programmatic File Operations
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs. * ;
public class HDFSOperations {
private FileSystem fs ;
public HDFSOperations () throws Exception {
Configuration conf = new Configuration ();
this . fs = FileSystem . get (conf);
}
// Create directory
public void createDirectory ( String path ) throws Exception {
Path dirPath = new Path (path);
fs . mkdirs (dirPath);
System . out . println ( "Created: " + path);
}
// List directory contents
public void listDirectory ( String path ) throws Exception {
Path dirPath = new Path (path);
FileStatus [] statuses = fs . listStatus (dirPath);
for ( FileStatus status : statuses) {
System . out . printf ( "%s %10d %s \n " ,
status . isDirectory () ? "DIR " : "FILE" ,
status . getLen (),
status . getPath (). getName ()
);
}
}
// Get block locations for a file
public void showBlockLocations ( String path ) throws Exception {
Path filePath = new Path (path);
FileStatus status = fs . getFileStatus (filePath);
BlockLocation [] blocks = fs . getFileBlockLocations (
status, 0 , status . getLen ());
for ( int i = 0 ; i < blocks . length ; i ++ ) {
BlockLocation block = blocks[i];
System . out . println ( "Block " + i + ":" );
System . out . println ( " Hosts: " +
String . join ( ", " , block . getHosts ()));
System . out . println ( " Offset: " + block . getOffset ());
System . out . println ( " Length: " + block . getLength ());
}
}
// Delete file/directory
public void delete ( String path , boolean recursive ) throws Exception {
Path targetPath = new Path (path);
boolean success = fs . delete (targetPath, recursive);
System . out . println ( "Deleted: " + success);
}
// Copy file within HDFS
public void copyFile ( String src , String dst ) throws Exception {
Path srcPath = new Path (src);
Path dstPath = new Path (dst);
FileUtil . copy (fs, srcPath, fs, dstPath, false , fs . getConf ());
}
public void close () throws Exception {
fs . close ();
}
}
Lab 3: Monitoring HDFS Health
import org.apache.hadoop.hdfs.DistributedFileSystem;
import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
public class HDFSMonitor {
public static void clusterHealth () throws Exception {
Configuration conf = new Configuration ();
DistributedFileSystem dfs =
(DistributedFileSystem) FileSystem . get (conf);
// Get cluster statistics
FsStatus status = dfs . getStatus ();
long capacity = status . getCapacity ();
long used = status . getUsed ();
long remaining = status . getRemaining ();
System . out . println ( "HDFS Cluster Status:" );
System . out . println ( " Capacity: " + formatSize (capacity));
System . out . println ( " Used: " + formatSize (used) +
" (" + (used * 100 / capacity) + "%)" );
System . out . println ( " Remaining: " + formatSize (remaining));
// Get DataNode information
DatanodeInfo [] datanodes = dfs . getDataNodeStats ();
System . out . println ( " \n DataNodes: " + datanodes . length );
for ( DatanodeInfo node : datanodes) {
System . out . println ( " " + node . getHostName ());
System . out . println ( " Capacity: " +
formatSize ( node . getCapacity ()));
System . out . println ( " Used: " +
formatSize ( node . getDfsUsed ()));
System . out . println ( " Remaining: " +
formatSize ( node . getRemaining ()));
System . out . println ( " State: " +
node . getAdminState ());
}
dfs . close ();
}
private static String formatSize ( long bytes ) {
long tb = 1024L * 1024 * 1024 * 1024 ;
long gb = 1024L * 1024 * 1024 ;
long mb = 1024L * 1024 ;
if (bytes >= tb) return (bytes / tb) + " TB" ;
if (bytes >= gb) return (bytes / gb) + " GB" ;
if (bytes >= mb) return (bytes / mb) + " MB" ;
return bytes + " B" ;
}
}
Optimizing Block Size
Rule of Thumb : Block size should minimize metadata while avoiding too many small files
Scenario: Processing 1TB of log files
Option 1: 64MB blocks
→ 16,384 blocks
→ More metadata
→ More map tasks
Option 2: 128MB blocks (default)
→ 8,192 blocks
→ Balanced
Option 3: 256MB blocks
→ 4,096 blocks
→ Less metadata
→ Fewer map tasks (less parallelism)
Choice: 128MB for balanced parallelism
Short-Circuit Reads
When client and DataNode are on same machine, skip network:
< property >
< name > dfs.client.read.shortcircuit </ name >
< value > true </ value >
</ property >
< property >
< name > dfs.domain.socket.path </ name >
< value > /var/lib/hadoop-hdfs/dn_socket </ value >
</ property >
Performance Impact : 30-50% faster local reads
Common Issues & Troubleshooting
Issue: NameNode Out of Memory
Symptoms : NameNode crashes, OutOfMemoryError in logsCause : Too many files/blocks for allocated heapSolutions :
Increase NameNode heap: -Xmx16g → -Xmx32g
Enable HDFS Federation (multiple NameNodes)
Merge small files into larger ones
Clean up old/unused data
Prevention : Monitor namespace size, set quotas
Issue: Under-Replicated Blocks
Symptoms : hdfs fsck shows under-replicated blocksCauses :
DataNode failures
Network issues
Disk space exhaustion
Solutions :# Check under-replicated blocks
hdfs fsck / | grep "Under-replicated"
# Identify problematic DataNodes
hdfs dfsadmin -report
# Trigger replication
hdfs dfsadmin -metasave /tmp/meta.txt
# If urgent, increase replication temporarily
hdfs dfs -setrep -R 4 /critical/data
Issue: DataNode Not Starting
Check logs : /var/log/hadoop/hadoop-hdfs-datanode-*.logCommon causes :
Incompatible cluster ID :
Error: Incompatible clusterIDs
Solution: Delete DataNode data, rejoin cluster
Port already in use :
Error: Address already in use: 50010
Solution: Kill process or change port in hdfs-site.xml
Permission issues :
Error: Permission denied on /var/hadoop/datanode
Solution: chown -R hdfs:hadoop /var/hadoop/datanode
Interview Focus
Key Concepts to Master :
Explain NameNode vs DataNode responsibilities
Describe block replication algorithm and rack awareness
Walk through read/write data flows
Discuss Single NameNode limitations and HA solutions
Compare HDFS to other storage (S3, traditional file systems)
Sample Questions :
“Why can’t HDFS handle lots of small files efficiently?”
Answer: Each file/block = metadata entry in NameNode RAM
1 million 1KB files = 1 million blocks vs 8 files of 128MB = 8 blocks
“How does HDFS ensure data locality for MapReduce?”
Answer: JobTracker queries NameNode for block locations, schedules map tasks on DataNodes storing blocks
“What happens if a client crashes during a write?”
Answer: File remains in incomplete state, lease eventually expires, NameNode closes file
What’s Next?
You now understand HDFS storage layer. Next, learn how to process this data with MapReduce!
Module 3: MapReduce Programming Model Master distributed data processing with MapReduce patterns and hands-on coding