Production Deployment & Operations
Module Duration : 3-4 hours
Focus : Real-world deployment, security, monitoring, troubleshooting
Outcome : Production-ready operational knowledge
Cluster Planning
Hardware Selection
NameNode :
CPU: 16-24 cores (high clock speed)
RAM: 128-256GB (1GB per million blocks)
Disk: RAID-10 SSDs for metadata
Network: 10GbE bonded
Example:
100M files, avg 2 blocks = 200M blocks
RAM needed: 200GB
Choose: 256GB server
DataNode :
CPU: 8-16 cores
RAM: 64-128GB
Disk: 12-24 × 4-8TB SATA (JBOD, not RAID)
Network: 10GbE
Calculation:
24 disks × 6TB = 144TB raw per node
With replication factor 3: 48TB usable per node
Edge Node (client gateway):
CPU: 8 cores
RAM: 32-64GB
Disk: 500GB SSD
Network: 10GbE
Cluster Sizing
Target capacity: 1PB usable
Calculation:
1PB usable × 3 (replication) = 3PB raw
3PB / 48TB per node = 63 DataNodes
Add 10% overhead → 70 DataNodes
Total cluster:
1 Active NameNode
1 Standby NameNode
70 DataNodes
3 ZooKeeper nodes
2-3 Edge nodes
= 77-78 nodes
Network Topology
┌──────────────────────────────────────────────────┐
│ Core Switch (40GbE) │
└────────┬─────────────────────┬───────────────────┘
│ │
┌────▼────┐ ┌─────▼────┐
│ Rack 1 │ │ Rack 2 │
│ Switch │ │ Switch │
│ (10GbE) │ │ (10GbE) │
└────┬────┘ └─────┬────┘
│ │
┌────┼──────┬───────┐ │
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
NN1 NN2 DN1-15 DN16-30 ...
Rack Awareness Configuration :
# /etc/hadoop/topology.sh
#!/bin/bash
# Map IP to rack location
case $1 in
10.1.1. * ) echo "/dc1/rack1" ;;
10.1.2. * ) echo "/dc1/rack2" ;;
10.1.3. * ) echo "/dc1/rack3" ;;
*) echo "/default-rack" ;;
esac
<!-- core-site.xml -->
< property >
< name > net.topology.script.file.name </ name >
< value > /etc/hadoop/topology.sh </ value >
</ property >
Security
Kerberos Authentication
1. Install Kerberos :
# On KDC server
yum install krb5-server krb5-libs krb5-workstation
# Edit /etc/krb5.conf
[libdefaults]
default_realm = HADOOP.EXAMPLE.COM
dns_lookup_realm = false
dns_lookup_kdc = false
[realms]
HADOOP.EXAMPLE.COM = {
kdc = kdc.example.com
admin_server = kdc.example.com
}
# Initialize database
kdb5_util create -s
# Start KDC
systemctl start krb5kdc
systemctl start kadmin
2. Create Principals :
3. Configure Hadoop :
<!-- core-site.xml -->
< property >
< name > hadoop.security.authentication </ name >
< value > kerberos </ value >
</ property >
< property >
< name > hadoop.security.authorization </ name >
< value > true </ value >
</ property >
<!-- hdfs-site.xml -->
< property >
< name > dfs.block.access.token.enable </ name >
< value > true </ value >
</ property >
< property >
< name > dfs.namenode.keytab.file </ name >
< value > /etc/security/keytabs/nn.service.keytab </ value >
</ property >
< property >
< name > dfs.namenode.kerberos.principal </ name >
< value > nn/[email protected] </ value >
</ property >
Encryption
Data at Rest (HDFS Transparent Encryption):
# Create encryption zone
hadoop key create mykey
hdfs crypto -createZone -keyName mykey -path /encrypted
# Files in /encrypted automatically encrypted
hdfs dfs -put sensitive.txt /encrypted/
Data in Transit :
<!-- hdfs-site.xml -->
< property >
< name > dfs.encrypt.data.transfer </ name >
< value > true </ value >
</ property >
< property >
< name > dfs.encrypt.data.transfer.algorithm </ name >
< value > 3des </ value >
</ property >
Authorization (Apache Ranger)
Ranger Policy Example:
┌─────────────────────────────────────────┐
│ Policy: Production Data Access │
├─────────────────────────────────────────┤
│ Resource: │
│ Path: /prod/* │
│ │
│ Permissions: │
│ Group: data-engineers │
│ - Read, Write, Execute │
│ │
│ Group: analysts │
│ - Read only │
│ │
│ User: admin │
│ - All permissions │
└─────────────────────────────────────────┘
High Availability (HA)
NameNode HA with QJM
Architecture :
┌──────────┐ ┌──────────┐
│ Active │◄───────►│ Standby │
│ NameNode │ │ NameNode │
└────┬─────┘ └────┬─────┘
│ │
│ Edit logs │
▼ ▼
┌────────────────────────────────┐
│ JournalNode Cluster │
│ (Quorum: 3 or 5 nodes) │
└────────────────────────────────┘
▲
│
┌────┴────┐
│ZooKeeper│
│ (Failover│
│ Coord) │
└─────────┘
Configuration :
<!-- hdfs-site.xml -->
< property >
< name > dfs.nameservices </ name >
< value > mycluster </ value >
</ property >
< property >
< name > dfs.ha.namenodes.mycluster </ name >
< value > nn1,nn2 </ value >
</ property >
< property >
< name > dfs.namenode.rpc-address.mycluster.nn1 </ name >
< value > namenode1:8020 </ value >
</ property >
< property >
< name > dfs.namenode.rpc-address.mycluster.nn2 </ name >
< value > namenode2:8020 </ value >
</ property >
< property >
< name > dfs.namenode.shared.edits.dir </ name >
< value > qjournal://jn1:8485;jn2:8485;jn3:8485/mycluster </ value >
</ property >
< property >
< name > dfs.client.failover.proxy.provider.mycluster </ name >
< value > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider </ value >
</ property >
< property >
< name > dfs.ha.automatic-failover.enabled </ name >
< value > true </ value >
</ property >
Initialize and Start :
# Format ZooKeeper for failover
hdfs zkfc -formatZK
# Start JournalNodes
hadoop-daemon.sh start journalnode
# Format NameNode 1
hdfs namenode -format
# Start NameNode 1
hadoop-daemon.sh start namenode
# Bootstrap standby NameNode 2
hdfs namenode -bootstrapStandby
# Start NameNode 2
hadoop-daemon.sh start namenode
# Start ZKFailoverControllers
hadoop-daemon.sh start zkfc
Monitoring
Key Metrics to Monitor
NameNode Metrics :
- Heap memory usage (alert at 80%)
- GC time (should be < 1% of total time)
- RPC queue time
- Number of under-replicated blocks
- Number of missing blocks
- Number of dead DataNodes
DataNode Metrics :
- Disk usage (alert at 85%)
- Disk I/O wait time
- Network throughput
- Number of failed volumes
- Heartbeat latency to NameNode
YARN Metrics :
- Cluster memory utilization
- Cluster CPU utilization
- Queue capacity usage
- Number of pending applications
- Application success/failure rate
Prometheus + Grafana Setup
1. Export Hadoop Metrics :
<!-- hadoop-metrics2.properties -->
*.sink.prometheus.class=org.apache.hadoop.metrics2.sink.PrometheusSink
*.sink.prometheus.port=9095
namenode.sink.prometheus.class=org.apache.hadoop.metrics2.sink.PrometheusSink
namenode.sink.prometheus.port=9096
datanode.sink.prometheus.class=org.apache.hadoop.metrics2.sink.PrometheusSink
datanode.sink.prometheus.port=9097
2. Prometheus Configuration :
# prometheus.yml
scrape_configs :
- job_name : 'hadoop-namenode'
static_configs :
- targets : [ 'namenode:9096' ]
- job_name : 'hadoop-datanodes'
static_configs :
- targets :
- 'datanode1:9097'
- 'datanode2:9097'
- 'datanode3:9097'
- job_name : 'yarn-resourcemanager'
static_configs :
- targets : [ 'resourcemanager:9098' ]
3. Grafana Dashboard :
{
"title" : "HDFS Cluster Health" ,
"panels" : [
{
"title" : "NameNode Heap Usage" ,
"targets" : [
{
"expr" : "jvm_memory_used_bytes{job='hadoop-namenode'}"
}
]
},
{
"title" : "Under-Replicated Blocks" ,
"targets" : [
{
"expr" : "hadoop_namenode_under_replicated_blocks"
}
]
}
]
}
Alerting Rules
# prometheus-alerts.yml
groups :
- name : hadoop_alerts
rules :
- alert : NameNodeHeapHigh
expr : jvm_memory_used_bytes{job="hadoop-namenode"} / jvm_memory_max_bytes > 0.8
for : 5m
annotations :
summary : "NameNode heap usage high"
- alert : UnderReplicatedBlocks
expr : hadoop_namenode_under_replicated_blocks > 1000
for : 10m
annotations :
summary : "High number of under-replicated blocks"
- alert : DataNodeDown
expr : up{job="hadoop-datanodes"} == 0
for : 2m
annotations :
summary : "DataNode {{ $labels.instance }} is down"
Backup and Disaster Recovery
# Automatic metadata checkpointing (handled by Secondary NN or Standby NN in HA)
# Manual backup
hdfs dfsadmin -saveNamespace
scp -r /var/hadoop/namenode/current backup-server:/backups/nn- $( date +%Y%m%d ) /
# Restore
scp -r backup-server:/backups/nn-20240115/ /var/hadoop/namenode/current
hdfs namenode -importCheckpoint
Data Replication Across Clusters
DistCp (Distributed Copy) :
# Copy data to disaster recovery cluster
hadoop distcp \
-update \
-delete \
-p \
hdfs://prod-cluster/data \
hdfs://dr-cluster/data
# Incremental copy (only changed files)
hadoop distcp \
-update \
-skipcrccheck \
hdfs://prod-cluster/data \
hdfs://dr-cluster/data
Automated Daily Backup :
#!/bin/bash
# backup-hdfs.sh
DATE = $( date +%Y%m%d )
LOG = /var/log/hadoop-backup- ${ DATE } .log
hadoop distcp \
-update \
-delete \
-log /tmp/distcp- ${ DATE } \
hdfs://prod/data \
hdfs://backup/data/ ${ DATE } 2>&1 | tee $LOG
if [ $? -eq 0 ]; then
echo "Backup completed successfully"
else
echo "Backup failed" | mail -s "HDFS Backup Failed" [email protected]
fi
Troubleshooting Common Issues
Diagnosis :
# Check job counters
hadoop job -counter job_123456 \
org.apache.hadoop.mapreduce.TaskCounter \
SPILLED_RECORDS
# High spills → Increase sort buffer
# Long shuffle time → Data skew or network issues
Solutions :
<!-- Increase sort buffer -->
< property >
< name > mapreduce.task.io.sort.mb </ name >
< value > 512 </ value >
</ property >
<!-- Increase reduce shuffle parallelism -->
< property >
< name > mapreduce.reduce.shuffle.parallelcopies </ name >
< value > 20 </ value >
</ property >
Issue 2: NameNode Out of Memory
Diagnosis :
# Check heap usage
jmap -heap < NameNode-PI D >
# Count files/blocks
hdfs dfsadmin -report | grep "blocks"
Solutions :
Increase heap:
export HADOOP_NAMENODE_OPTS = "-Xmx32g"
Enable HDFS Federation (multiple NameNodes)
Clean up small files:
hdfs dfs -find /path -type f -size -1m | xargs hdfs dfs -rm
Issue 3: DataNode Disk Failure
Detection :
2024-01-15 10:30:15 ERROR datanode.DataNode:
Failed to read block blk_123456 from disk /mnt/disk3:
Input/output error
Recovery :
# Mark disk as failed
touch /mnt/disk3/badDisk
# Restart DataNode (will skip bad disk)
hadoop-daemon.sh restart datanode
# NameNode will re-replicate blocks from other replicas
# Monitor: hdfs fsck / -blocks
Capacity Planning
Growth Projection
# Calculate growth trajectory
current_size_tb = 500 # TB
monthly_growth_rate = 0.15 # 15% per month
def project_capacity ( months ):
size = current_size_tb
for m in range (months):
size *= ( 1 + monthly_growth_rate)
return size
# Projection for 12 months
print ( f "12 months: { project_capacity( 12 ) :.2f} TB" )
# Output: 12 months: 2670.98 TB
# When to add nodes?
# Current capacity: 70 nodes × 48TB = 3,360 TB usable
# With 15% monthly growth, add 10 nodes every 6 months
Cost Optimization
Data Lifecycle Management :
# Move old data to erasure coding (saves space)
hdfs ec -setPolicy -path /archive -policy RS-6-3-1024k
# Erasure coding: 6 data + 3 parity = 1.5x overhead
# vs. Replication: 3x overhead
# Savings: 50% storage for archive data
Upgrading Hadoop
Rolling Upgrade Process
# 1. Prepare for upgrade (Hadoop 3.0+)
hdfs dfsadmin -rollingUpgrade prepare
# 2. Upgrade DataNodes one by one
ssh datanode1
hadoop-daemon.sh stop datanode
# Install new version
hadoop-daemon.sh start datanode
# Repeat for each DataNode with delay
# Wait for re-replication before next node
# 3. Upgrade NameNode standby
hadoop-daemon.sh stop namenode
# Install new version
hadoop-daemon.sh start namenode
# 4. Fail over to upgraded NameNode
hdfs haadmin -failover nn1 nn2
# 5. Upgrade former active NameNode
# (Now standby)
# 6. Finalize upgrade
hdfs dfsadmin -rollingUpgrade finalize
Best Practices Checklist
Security
Availability
Operations
What’s Next?
Capstone Project: Building a Complete Data Pipeline Apply everything you’ve learned in a comprehensive, production-ready project