> ## Documentation Index > Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt > Use this file to discover all available pages before exploring further. # HDFS Architecture & Internals > Master the Hadoop Distributed File System - architecture, replication, fault tolerance, and hands-on implementation # HDFS Architecture & Internals **Module Duration**: 4-5 hours **Hands-on Labs**: 6 practical exercises **Prerequisites**: Understanding of GFS concepts from Module 1 ## Introduction HDFS (Hadoop Distributed File System) is the storage foundation of the Hadoop ecosystem. Inspired by Google's GFS, HDFS provides: * **Fault-tolerant storage** across commodity hardware * **High throughput** for large datasets * **Scalability** to petabytes and beyond * **Data locality** for efficient processing In this module, we'll go beyond theory to understand implementation details and write real code. *** ## HDFS Architecture Overview ### The Core Components ``` ┌──────────────────────────────────────────────────────────────┐ │ HDFS CLUSTER │ │ │ │ ┌────────────────────────────────────────┐ │ │ │ NameNode (Master) │ │ │ │ │ │ │ │ • Metadata storage (in-memory) │ │ │ │ • Namespace operations │ │ │ │ • Block mapping │ │ │ │ • Replication policy enforcement │ │ │ │ • Heartbeat & Block reports │ │ │ └────────────┬───────────────────────────┘ │ │ │ │ │ │ Heartbeat (3 sec) + Block Reports │ │ │ │ │ ┌────────────┼────────────┬───────────────┬──────────┐ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ │ │ ┌──────────┐┌──────────┐┌──────────┐┌──────────┐┌────────┐│ │ │DataNode 1││DataNode 2││DataNode 3││DataNode 4││DataNode││ │ │ ││ ││ ││ ││ N ││ │ │ Stores: ││ Stores: ││ Stores: ││ Stores: ││ ││ │ │ Block 1 ││ Block 1 ││ Block 2 ││ Block 2 ││ ... ││ │ │ Block 3 ││ Block 2 ││ Block 3 ││ Block 4 ││ ││ │ │ Block 5 ││ Block 4 ││ Block 5 ││ Block 6 ││ ││ │ └──────────┘└──────────┘└──────────┘└──────────┘└────────┘│ │ │ │ ┌─────────────────────────────────┐ │ │ │ Secondary NameNode │ │ │ │ (Checkpoint creation) │ │ │ └─────────────────────────────────┘ │ └──────────────────────────────────────────────────────────────┘ ▲ ▲ │ │ │ │ ┌────┴────┐ ┌────┴────┐ │ Client │ │ Client │ │ (Read/ │ │ (Read/ │ │ Write) │ │ Write) │ └─────────┘ └─────────┘ ``` ### NameNode: The Brain The NameNode is the single master that manages: 1. **File System Namespace**: * Directory tree * File metadata (permissions, timestamps) * Mapping: filename → block IDs 2. **Block Management**: * Block locations (which DataNodes store each block) * Replication factor enforcement * Block creation, deletion, and re-replication 3. **Heartbeat Management**: * Monitors DataNode health (3-second heartbeats) * Detects failures and initiates recovery **Critical Design Choice**: All metadata stored in RAM * *Pros*: Fast metadata operations (no disk I/O) * *Cons*: Limits namespace size (1GB RAM ≈ 1 million blocks) ### DataNodes: The Workers DataNodes: * Store actual data blocks on local disk * Send heartbeats to NameNode every 3 seconds * Report block lists periodically (block reports) * Serve read/write requests from clients * Execute replication commands from NameNode ### Secondary NameNode: The Misconception **Common Misconception**: Secondary NameNode is NOT a backup or standby! **Actual Role**: Performs periodic checkpointing * Merges edit logs into FSImage * Reduces NameNode restart time * Does NOT take over if NameNode fails (use HA NameNode for that) *** ## Block Storage Deep Dive ### What is a Block? HDFS splits files into fixed-size blocks: * **Default size**: 128MB (configurable: 64MB, 256MB, etc.) * **Why so large?**: Minimize metadata overhead, optimize sequential reads **Example**: ``` File: access.log (350MB) Split into blocks: ┌─────────────┬─────────────┬─────────────┐ │ Block 1 │ Block 2 │ Block 3 │ │ (128MB) │ (128MB) │ (94MB) │ │ ID: 1001 │ ID: 1002 │ ID: 1003 │ └─────────────┴─────────────┴─────────────┘ ``` ### Replication Strategy Each block replicated (default: 3 copies) for fault tolerance. **Replica Placement Algorithm** (Rack-Aware): ``` Block X needs 3 replicas: Replica 1: Local node (where client writes from) OR random node (if client outside cluster) Replica 2: Different node on SAME rack as Replica 1 (same rack-switch, low latency) Replica 3: Different node on DIFFERENT rack (survives rack failure) ┌──────────────────────────────────────────────┐ │ Network │ └────────┬─────────────────────────┬───────────┘ │ │ ┌────▼─────┐ ┌────▼─────┐ │ Rack 1 │ │ Rack 2 │ │ Switch │ │ Switch │ └────┬─────┘ └────┬─────┘ │ │ ┌────┼─────┬─────┐ ┌────┴─────┐ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ ▼ DN1 DN2 DN3 DN4 DN5 DN6 ↑ ↑ ↑ │ │ │ Replica Replica Replica 1 2 3 ``` **Why This Strategy?**: * **Fault tolerance**: Survives node and rack failures * **Write performance**: 2 replicas on same rack (fast) * **Read optimization**: Multiple replicas for load balancing * **Network efficiency**: 1/3 of data crosses racks (not 3/3) ### Block Placement Code Example Here's how HDFS decides block placement (simplified from actual code): ```java theme={null} public class BlockPlacementPolicyDefault { /** * Choose target DataNodes for block replicas * @param numOfReplicas Number of replicas to create * @param writer Client writing the block (for locality) * @param excludedNodes Nodes to exclude * @param blocksize Size of the block * @return Array of chosen DataNodes */ public DatanodeDescriptor[] chooseTarget( int numOfReplicas, Node writer, List excludedNodes, long blocksize) { if (numOfReplicas == 0 || clusterMap.getNumOfLeaves() == 0) { return new DatanodeDescriptor[0]; } List results = new ArrayList<>(); // First replica: local to writer if possible DatanodeDescriptor firstNode = chooseFirstReplica(writer); results.add(firstNode); excludedNodes.add(firstNode); if (numOfReplicas == 1) { return results.toArray(new DatanodeDescriptor[1]); } // Second replica: different node, same rack as first DatanodeDescriptor secondNode = chooseLocalRack( firstNode, excludedNodes, blocksize); results.add(secondNode); excludedNodes.add(secondNode); if (numOfReplicas == 2) { return results.toArray(new DatanodeDescriptor[2]); } // Third replica: different rack DatanodeDescriptor thirdNode = chooseRemoteRack( 1, results, excludedNodes, blocksize); results.add(thirdNode); // Additional replicas: distribute randomly for (int i = 3; i < numOfReplicas; i++) { DatanodeDescriptor nextNode = chooseRandom( excludedNodes, blocksize); results.add(nextNode); excludedNodes.add(nextNode); } return results.toArray(new DatanodeDescriptor[numOfReplicas]); } private DatanodeDescriptor chooseFirstReplica(Node writer) { // If writer is a DataNode, use it if (writer instanceof DatanodeDescriptor) { return (DatanodeDescriptor) writer; } // Otherwise, choose random node return chooseRandom(new ArrayList<>(), 0); } private DatanodeDescriptor chooseLocalRack( DatanodeDescriptor node, List excluded, long blocksize) { String rackName = node.getNetworkLocation(); List sameRackNodes = clusterMap.getDatanodesInRack(rackName); // Remove excluded nodes sameRackNodes.removeAll(excluded); // Choose node with most available space return chooseBestNode(sameRackNodes, blocksize); } private DatanodeDescriptor chooseBestNode( List nodes, long blocksize) { DatanodeDescriptor best = null; long maxSpace = 0; for (DatanodeDescriptor node : nodes) { long available = node.getRemaining(); if (available > maxSpace && available >= blocksize) { maxSpace = available; best = node; } } return best; } } ``` *** ## HDFS Read Operation Let's trace a complete read operation with detailed code. ### The Read Flow ``` Step 1: Client → NameNode "Where are the blocks for /data/logs.txt?" Step 2: NameNode → Client Block 1: [DN1, DN2, DN5] Block 2: [DN2, DN3, DN6] Block 3: [DN3, DN4, DN1] (sorted by proximity to client) Step 3: Client → DN1 (closest DataNode) "Send me Block 1" Step 4: DN1 → Client [Streaming Block 1 data...] Step 5: Client → DN2 (closest for Block 2) "Send me Block 2" ... and so on ``` ### Java Client Code ```java theme={null} import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.io.IOUtils; public class HDFSReader { public static void main(String[] args) throws Exception { // Configuration object loads hdfs-site.xml, core-site.xml Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://namenode:9000"); // Get FileSystem instance (connects to NameNode) FileSystem fs = FileSystem.get(conf); // Path to read Path filePath = new Path("/user/hadoop/data/access.log"); // Open input stream // Internally: Contacts NameNode for block locations FSDataInputStream in = fs.open(filePath); try { // Read data // Internally: Connects to DataNodes sequentially IOUtils.copyBytes(in, System.out, 4096, false); // Seek to specific position (if needed) in.seek(1024000); // Jump to byte offset 1024000 byte[] buffer = new byte[1024]; in.read(buffer); } finally { IOUtils.closeStream(in); } fs.close(); } } ``` ### Behind the Scenes: DFSInputStream Here's what happens inside `fs.open()`: ```java theme={null} public class DFSInputStream extends FSInputStream { private DFSClient dfsClient; private String src; // File path private LocatedBlocks locatedBlocks; // Block locations from NN private long fileLength; private long pos = 0; // Current position in file public DFSInputStream(DFSClient dfsClient, String src) { this.dfsClient = dfsClient; this.src = src; // Contact NameNode to get block locations openInfo(); } private void openInfo() { // RPC call to NameNode LocatedBlocks newInfo = dfsClient.namenode.getBlockLocations(src, 0, fileLength); this.locatedBlocks = newInfo; this.fileLength = newInfo.getFileLength(); } @Override public int read(byte[] buf, int off, int len) throws IOException { // Find which block contains current position LocatedBlock targetBlock = getBlockAt(pos); // Choose best DataNode (closest, most available) DatanodeInfo chosenNode = getBestNode(targetBlock); // Connect to DataNode and read BlockReader reader = getBlockReader(chosenNode, targetBlock); int bytesRead = reader.read(buf, off, len); pos += bytesRead; return bytesRead; } private DatanodeInfo getBestNode(LocatedBlock block) { DatanodeInfo[] nodes = block.getLocations(); // Sort by network distance (local node preferred) Arrays.sort(nodes, new Comparator() { public int compare(DatanodeInfo a, DatanodeInfo b) { int distA = networkDistance(clientNode, a); int distB = networkDistance(clientNode, b); return distA - distB; } }); return nodes[0]; // Closest node } private int networkDistance(Node a, Node b) { // Same node: distance 0 // Same rack: distance 2 // Different rack: distance 4 String pathA = a.getNetworkLocation(); String pathB = b.getNetworkLocation(); if (a.equals(b)) return 0; if (pathA.equals(pathB)) return 2; return 4; } } ``` **Key Optimizations**: * **Data locality**: Reads from closest replica * **Streaming**: Reads one block at a time (doesn't load entire file) * **Retry logic**: Switches to another replica on failure * **Checksum verification**: Validates data integrity *** ## HDFS Write Operation Writing is more complex due to replication pipeline. ### The Write Flow ``` Step 1: Client → NameNode "I want to create /data/new.log" Step 2: NameNode checks permissions, quota Creates file entry in namespace Returns: "Proceed" Step 3: Client buffers first 64KB of data Step 4: Client → NameNode "Allocate a block for /data/new.log" Step 5: NameNode → Client "Block ID 1234, write to [DN1, DN2, DN5]" Step 6: Client → DN1 "Store block 1234, forward to DN2" [Streams data to DN1] Step 7: DN1 → DN2 "Store block 1234, forward to DN5" [Streams data to DN2] Step 8: DN2 → DN5 "Store block 1234" [Streams data to DN5] Step 9: DN5 → DN2 → DN1 → Client "ACK: Block written successfully" Repeat Steps 4-9 for each block ``` ### Write Pipeline Architecture ``` Client │ │ (1) Write packet ▼ ┌────────────┐ │ DataNode │───(2) Forward packet────┐ │ 1 │ │ │ │◄──(4) ACK──────────┐ │ └────────────┘ │ │ │ ▼ ┌────────────┐ │ DataNode │──(3) Forward──┐ │ 2 │ │ │ │◄──(5) ACK──┐ │ └────────────┘ │ │ │ ▼ ┌────────────┐ │ DataNode │ │ 3 │ │ │ └────────────┘ │ │ (6) ACK ▼ Pipeline complete ``` ### Java Write Code ```java theme={null} import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FSDataOutputStream; public class HDFSWriter { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://namenode:9000"); FileSystem fs = FileSystem.get(conf); Path filePath = new Path("/user/hadoop/output/results.txt"); // Create file (overwrites if exists) FSDataOutputStream out = fs.create(filePath); // Or append to existing file // FSDataOutputStream out = fs.append(filePath); try { String data = "Processing complete\n"; out.write(data.getBytes("UTF-8")); // Flush to ensure data written out.flush(); // Optional: Force sync to DataNodes out.hsync(); // Guarantees data visible to other readers } finally { out.close(); // Also flushes and syncs } fs.close(); } } ``` ### Advanced: Custom Replication Factor ```java theme={null} public class CustomReplication { public static void writeWithReplication( String hdfsPath, byte[] data, short replicationFactor) throws Exception { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path(hdfsPath); // Specify replication factor at creation FSDataOutputStream out = fs.create( path, true, // overwrite 4096, // buffer size replicationFactor, // custom replication 134217728 // block size (128MB) ); out.write(data); out.close(); fs.close(); } public static void changeReplication( String hdfsPath, short newReplication) throws Exception { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path(hdfsPath); // Change replication factor of existing file boolean success = fs.setReplication(path, newReplication); if (success) { System.out.println("Replication changed to " + newReplication); } fs.close(); } } ``` *** ## Fault Tolerance Mechanisms ### DataNode Failure **Detection**: ``` NameNode expects heartbeat every 3 seconds If no heartbeat for 10 minutes (default): → Mark DataNode as dead → Identify under-replicated blocks → Schedule re-replication ``` **Recovery Process**: ```java theme={null} public class NameNodeReplication { private void handleDeadDataNode(DatanodeDescriptor deadNode) { // Get all blocks stored on dead node List affectedBlocks = deadNode.getBlockList(); for (Block block : affectedBlocks) { // Check current replication level int currentReplicas = countReplicas(block); int targetReplicas = block.getReplication(); if (currentReplicas < targetReplicas) { // Under-replicated! Schedule re-replication addToReplicationQueue(block, targetReplicas - currentReplicas); } } // Remove dead node from cluster removeDataNode(deadNode); } private void processReplicationQueue() { while (!replicationQueue.isEmpty()) { Block block = replicationQueue.poll(); // Find nodes with existing replicas List sources = getDataNodesForBlock(block); // Choose target for new replica DatanodeDescriptor target = chooseTargetDataNode(sources); // Command source to replicate to target sources.get(0).addBlockToReplicate(block, target); } } } ``` ### NameNode Failure (High Availability) **Problem**: NameNode is single point of failure **Solution**: HA NameNode with ZooKeeper ``` ┌─────────────────────────────────────────────┐ │ ZooKeeper Cluster │ │ (Coordination & Leader Election) │ └──────────────┬──────────────────────────────┘ │ ┌──────┴──────┐ │ │ ▼ ▼ ┌──────────────┐ ┌──────────────┐ │ Active │ │ Standby │ │ NameNode │ │ NameNode │ │ │ │ │ │ (Serves │ │ (Tails edit │ │ requests) │ │ log, ready │ │ │ │ to takeover)│ └──────┬───────┘ └──────┬───────┘ │ │ │ │ ▼ ▼ ┌──────────────────────────────┐ │ Shared Edit Log │ │ (on JournalNodes or NFS) │ └──────────────────────────────┘ ``` **Automatic Failover**: 1. Active NameNode crashes 2. ZooKeeper detects failure 3. Standby NameNode promoted to Active 4. Clients automatically redirect to new Active *** ## HDFS Configuration ### Essential Configuration Files **core-site.xml** (Cluster-wide settings): ```xml theme={null} fs.defaultFS hdfs://namenode.example.com:9000 hadoop.tmp.dir /var/hadoop/tmp ``` **hdfs-site.xml** (HDFS-specific): ```xml theme={null} dfs.replication 3 dfs.blocksize 134217728 dfs.namenode.name.dir file:///var/hadoop/namenode dfs.datanode.data.dir file:///disk1/hadoop,file:///disk2/hadoop dfs.permissions.enabled true ``` *** ## Hands-on Labs ### Lab 1: HDFS CLI Operations ```bash theme={null} # Create directory hdfs dfs -mkdir -p /user/hadoop/input # Upload file hdfs dfs -put localfile.txt /user/hadoop/input/ # List files with block info hdfs fsck /user/hadoop/input/localfile.txt -files -blocks -locations # Check file stats hdfs dfs -stat "%b %r %n" /user/hadoop/input/localfile.txt # Output: blocksize replication filename # Change replication hdfs dfs -setrep -w 5 /user/hadoop/input/localfile.txt # View file content hdfs dfs -cat /user/hadoop/input/localfile.txt # Get file to local hdfs dfs -get /user/hadoop/input/localfile.txt ./downloaded.txt # Check cluster health hdfs dfsadmin -report # Balance cluster hdfs balancer -threshold 10 ``` ### Lab 2: Programmatic File Operations ```java theme={null} import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.*; public class HDFSOperations { private FileSystem fs; public HDFSOperations() throws Exception { Configuration conf = new Configuration(); this.fs = FileSystem.get(conf); } // Create directory public void createDirectory(String path) throws Exception { Path dirPath = new Path(path); fs.mkdirs(dirPath); System.out.println("Created: " + path); } // List directory contents public void listDirectory(String path) throws Exception { Path dirPath = new Path(path); FileStatus[] statuses = fs.listStatus(dirPath); for (FileStatus status : statuses) { System.out.printf("%s %10d %s\n", status.isDirectory() ? "DIR " : "FILE", status.getLen(), status.getPath().getName() ); } } // Get block locations for a file public void showBlockLocations(String path) throws Exception { Path filePath = new Path(path); FileStatus status = fs.getFileStatus(filePath); BlockLocation[] blocks = fs.getFileBlockLocations( status, 0, status.getLen()); for (int i = 0; i < blocks.length; i++) { BlockLocation block = blocks[i]; System.out.println("Block " + i + ":"); System.out.println(" Hosts: " + String.join(", ", block.getHosts())); System.out.println(" Offset: " + block.getOffset()); System.out.println(" Length: " + block.getLength()); } } // Delete file/directory public void delete(String path, boolean recursive) throws Exception { Path targetPath = new Path(path); boolean success = fs.delete(targetPath, recursive); System.out.println("Deleted: " + success); } // Copy file within HDFS public void copyFile(String src, String dst) throws Exception { Path srcPath = new Path(src); Path dstPath = new Path(dst); FileUtil.copy(fs, srcPath, fs, dstPath, false, fs.getConf()); } public void close() throws Exception { fs.close(); } } ``` ### Lab 3: Monitoring HDFS Health ```java theme={null} import org.apache.hadoop.hdfs.DistributedFileSystem; import org.apache.hadoop.hdfs.protocol.DatanodeInfo; public class HDFSMonitor { public static void clusterHealth() throws Exception { Configuration conf = new Configuration(); DistributedFileSystem dfs = (DistributedFileSystem) FileSystem.get(conf); // Get cluster statistics FsStatus status = dfs.getStatus(); long capacity = status.getCapacity(); long used = status.getUsed(); long remaining = status.getRemaining(); System.out.println("HDFS Cluster Status:"); System.out.println(" Capacity: " + formatSize(capacity)); System.out.println(" Used: " + formatSize(used) + " (" + (used * 100 / capacity) + "%)"); System.out.println(" Remaining: " + formatSize(remaining)); // Get DataNode information DatanodeInfo[] datanodes = dfs.getDataNodeStats(); System.out.println("\nDataNodes: " + datanodes.length); for (DatanodeInfo node : datanodes) { System.out.println(" " + node.getHostName()); System.out.println(" Capacity: " + formatSize(node.getCapacity())); System.out.println(" Used: " + formatSize(node.getDfsUsed())); System.out.println(" Remaining: " + formatSize(node.getRemaining())); System.out.println(" State: " + node.getAdminState()); } dfs.close(); } private static String formatSize(long bytes) { long tb = 1024L * 1024 * 1024 * 1024; long gb = 1024L * 1024 * 1024; long mb = 1024L * 1024; if (bytes >= tb) return (bytes / tb) + " TB"; if (bytes >= gb) return (bytes / gb) + " GB"; if (bytes >= mb) return (bytes / mb) + " MB"; return bytes + " B"; } } ``` *** ## Performance Tuning ### Optimizing Block Size **Rule of Thumb**: Block size should minimize metadata while avoiding too many small files ``` Scenario: Processing 1TB of log files Option 1: 64MB blocks → 16,384 blocks → More metadata → More map tasks Option 2: 128MB blocks (default) → 8,192 blocks → Balanced Option 3: 256MB blocks → 4,096 blocks → Less metadata → Fewer map tasks (less parallelism) Choice: 128MB for balanced parallelism ``` ### Short-Circuit Reads When client and DataNode are on same machine, skip network: ```xml theme={null} dfs.client.read.shortcircuit true dfs.domain.socket.path /var/lib/hadoop-hdfs/dn_socket ``` **Performance Impact**: 30-50% faster local reads *** ## Common Issues & Troubleshooting **Symptoms**: NameNode crashes, `OutOfMemoryError` in logs **Cause**: Too many files/blocks for allocated heap **Solutions**: 1. Increase NameNode heap: `-Xmx16g` → `-Xmx32g` 2. Enable HDFS Federation (multiple NameNodes) 3. Merge small files into larger ones 4. Clean up old/unused data **Prevention**: Monitor namespace size, set quotas **Symptoms**: `hdfs fsck` shows under-replicated blocks **Causes**: * DataNode failures * Network issues * Disk space exhaustion **Solutions**: ```bash theme={null} # Check under-replicated blocks hdfs fsck / | grep "Under-replicated" # Identify problematic DataNodes hdfs dfsadmin -report # Trigger replication hdfs dfsadmin -metasave /tmp/meta.txt # If urgent, increase replication temporarily hdfs dfs -setrep -R 4 /critical/data ``` **Check logs**: `/var/log/hadoop/hadoop-hdfs-datanode-*.log` **Common causes**: 1. **Incompatible cluster ID**: ``` Error: Incompatible clusterIDs Solution: Delete DataNode data, rejoin cluster ``` 2. **Port already in use**: ``` Error: Address already in use: 50010 Solution: Kill process or change port in hdfs-site.xml ``` 3. **Permission issues**: ``` Error: Permission denied on /var/hadoop/datanode Solution: chown -R hdfs:hadoop /var/hadoop/datanode ``` *** ## Interview Focus **Key Concepts to Master**: * Explain NameNode vs DataNode responsibilities * Describe block replication algorithm and rack awareness * Walk through read/write data flows * Discuss Single NameNode limitations and HA solutions * Compare HDFS to other storage (S3, traditional file systems) **Sample Questions**: 1. **"Why can't HDFS handle lots of small files efficiently?"** * Answer: Each file/block = metadata entry in NameNode RAM * 1 million 1KB files = 1 million blocks vs 8 files of 128MB = 8 blocks 2. **"How does HDFS ensure data locality for MapReduce?"** * Answer: JobTracker queries NameNode for block locations, schedules map tasks on DataNodes storing blocks 3. **"What happens if a client crashes during a write?"** * Answer: File remains in incomplete state, lease eventually expires, NameNode closes file *** ## What's Next? You now understand HDFS storage layer. Next, learn how to process this data with MapReduce! Master distributed data processing with MapReduce patterns and hands-on coding