Skip to main content

Production Deployment & Operations

Module Duration: 3-4 hours Focus: Real-world deployment, security, monitoring, troubleshooting Outcome: Production-ready operational knowledge

Cluster Planning

Hardware Selection

NameNode:
CPU: 16-24 cores (high clock speed)
RAM: 128-256GB (1GB per million blocks)
Disk: RAID-10 SSDs for metadata
Network: 10GbE bonded

Example:
  100M files, avg 2 blocks = 200M blocks
  RAM needed: 200GB
  Choose: 256GB server
DataNode:
CPU: 8-16 cores
RAM: 64-128GB
Disk: 12-24 × 4-8TB SATA (JBOD, not RAID)
Network: 10GbE

Calculation:
  24 disks × 6TB = 144TB raw per node
  With replication factor 3: 48TB usable per node
Edge Node (client gateway):
CPU: 8 cores
RAM: 32-64GB
Disk: 500GB SSD
Network: 10GbE

Cluster Sizing

Target capacity: 1PB usable

Calculation:
  1PB usable × 3 (replication) = 3PB raw
  3PB / 48TB per node = 63 DataNodes
  Add 10% overhead → 70 DataNodes

Total cluster:
  1 Active NameNode
  1 Standby NameNode
  70 DataNodes
  3 ZooKeeper nodes
  2-3 Edge nodes
  = 77-78 nodes

Network Topology

┌──────────────────────────────────────────────────┐
│              Core Switch (40GbE)                 │
└────────┬─────────────────────┬───────────────────┘
         │                     │
    ┌────▼────┐          ┌─────▼────┐
    │  Rack 1 │          │  Rack 2  │
    │ Switch  │          │  Switch  │
    │ (10GbE) │          │  (10GbE) │
    └────┬────┘          └─────┬────┘
         │                     │
    ┌────┼──────┬───────┐     │
    │    │      │       │     │
    ▼    ▼      ▼       ▼     ▼
   NN1  NN2   DN1-15   DN16-30 ...
Rack Awareness Configuration:
# /etc/hadoop/topology.sh
#!/bin/bash
# Map IP to rack location

case $1 in
    10.1.1.*) echo "/dc1/rack1" ;;
    10.1.2.*) echo "/dc1/rack2" ;;
    10.1.3.*) echo "/dc1/rack3" ;;
    *) echo "/default-rack" ;;
esac
<!-- core-site.xml -->
<property>
    <name>net.topology.script.file.name</name>
    <value>/etc/hadoop/topology.sh</value>
</property>

Security

Kerberos Authentication

1. Install Kerberos:
# On KDC server
yum install krb5-server krb5-libs krb5-workstation

# Edit /etc/krb5.conf
[libdefaults]
    default_realm = HADOOP.EXAMPLE.COM
    dns_lookup_realm = false
    dns_lookup_kdc = false

[realms]
    HADOOP.EXAMPLE.COM = {
        kdc = kdc.example.com
        admin_server = kdc.example.com
    }

# Initialize database
kdb5_util create -s

# Start KDC
systemctl start krb5kdc
systemctl start kadmin
2. Create Principals:
# Add Hadoop principals
kadmin.local

addprinc -randkey nn/[email protected]
addprinc -randkey dn/[email protected]
addprinc -randkey hdfs/[email protected]

# Export to keytabs
xst -k /etc/security/keytabs/nn.service.keytab \
    nn/[email protected]

xst -k /etc/security/keytabs/dn.service.keytab \
    dn/[email protected]
3. Configure Hadoop:
<!-- core-site.xml -->
<property>
    <name>hadoop.security.authentication</name>
    <value>kerberos</value>
</property>

<property>
    <name>hadoop.security.authorization</name>
    <value>true</value>
</property>

<!-- hdfs-site.xml -->
<property>
    <name>dfs.block.access.token.enable</name>
    <value>true</value>
</property>

<property>
    <name>dfs.namenode.keytab.file</name>
    <value>/etc/security/keytabs/nn.service.keytab</value>
</property>

<property>
    <name>dfs.namenode.kerberos.principal</name>
    <value>nn/[email protected]</value>
</property>

Encryption

Data at Rest (HDFS Transparent Encryption):
# Create encryption zone
hadoop key create mykey
hdfs crypto -createZone -keyName mykey -path /encrypted

# Files in /encrypted automatically encrypted
hdfs dfs -put sensitive.txt /encrypted/
Data in Transit:
<!-- hdfs-site.xml -->
<property>
    <name>dfs.encrypt.data.transfer</name>
    <value>true</value>
</property>

<property>
    <name>dfs.encrypt.data.transfer.algorithm</name>
    <value>3des</value>
</property>

Authorization (Apache Ranger)

Ranger Policy Example:
┌─────────────────────────────────────────┐
│ Policy: Production Data Access          │
├─────────────────────────────────────────┤
│ Resource:                               │
│   Path: /prod/*                         │
│                                         │
│ Permissions:                            │
│   Group: data-engineers                 │
│     - Read, Write, Execute              │
│                                         │
│   Group: analysts                       │
│     - Read only                         │
│                                         │
│   User: admin                           │
│     - All permissions                   │
└─────────────────────────────────────────┘

High Availability (HA)

NameNode HA with QJM

Architecture:
┌──────────┐         ┌──────────┐
│ Active   │◄───────►│ Standby  │
│ NameNode │         │ NameNode │
└────┬─────┘         └────┬─────┘
     │                    │
     │  Edit logs         │
     ▼                    ▼
┌────────────────────────────────┐
│    JournalNode Cluster         │
│    (Quorum: 3 or 5 nodes)      │
└────────────────────────────────┘


    ┌────┴────┐
    │ZooKeeper│
    │ (Failover│
    │  Coord)  │
    └─────────┘
Configuration:
<!-- hdfs-site.xml -->
<property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
</property>

<property>
    <name>dfs.ha.namenodes.mycluster</name>
    <value>nn1,nn2</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>namenode1:8020</value>
</property>

<property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>namenode2:8020</value>
</property>

<property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://jn1:8485;jn2:8485;jn3:8485/mycluster</value>
</property>

<property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
</property>
Initialize and Start:
# Format ZooKeeper for failover
hdfs zkfc -formatZK

# Start JournalNodes
hadoop-daemon.sh start journalnode

# Format NameNode 1
hdfs namenode -format

# Start NameNode 1
hadoop-daemon.sh start namenode

# Bootstrap standby NameNode 2
hdfs namenode -bootstrapStandby

# Start NameNode 2
hadoop-daemon.sh start namenode

# Start ZKFailoverControllers
hadoop-daemon.sh start zkfc

Monitoring

Key Metrics to Monitor

NameNode Metrics:
- Heap memory usage (alert at 80%)
- GC time (should be < 1% of total time)
- RPC queue time
- Number of under-replicated blocks
- Number of missing blocks
- Number of dead DataNodes
DataNode Metrics:
- Disk usage (alert at 85%)
- Disk I/O wait time
- Network throughput
- Number of failed volumes
- Heartbeat latency to NameNode
YARN Metrics:
- Cluster memory utilization
- Cluster CPU utilization
- Queue capacity usage
- Number of pending applications
- Application success/failure rate

Prometheus + Grafana Setup

1. Export Hadoop Metrics:
<!-- hadoop-metrics2.properties -->
*.sink.prometheus.class=org.apache.hadoop.metrics2.sink.PrometheusSink
*.sink.prometheus.port=9095

namenode.sink.prometheus.class=org.apache.hadoop.metrics2.sink.PrometheusSink
namenode.sink.prometheus.port=9096

datanode.sink.prometheus.class=org.apache.hadoop.metrics2.sink.PrometheusSink
datanode.sink.prometheus.port=9097
2. Prometheus Configuration:
# prometheus.yml
scrape_configs:
  - job_name: 'hadoop-namenode'
    static_configs:
      - targets: ['namenode:9096']

  - job_name: 'hadoop-datanodes'
    static_configs:
      - targets:
        - 'datanode1:9097'
        - 'datanode2:9097'
        - 'datanode3:9097'

  - job_name: 'yarn-resourcemanager'
    static_configs:
      - targets: ['resourcemanager:9098']
3. Grafana Dashboard:
{
  "title": "HDFS Cluster Health",
  "panels": [
    {
      "title": "NameNode Heap Usage",
      "targets": [
        {
          "expr": "jvm_memory_used_bytes{job='hadoop-namenode'}"
        }
      ]
    },
    {
      "title": "Under-Replicated Blocks",
      "targets": [
        {
          "expr": "hadoop_namenode_under_replicated_blocks"
        }
      ]
    }
  ]
}

Alerting Rules

# prometheus-alerts.yml
groups:
  - name: hadoop_alerts
    rules:
      - alert: NameNodeHeapHigh
        expr: jvm_memory_used_bytes{job="hadoop-namenode"} / jvm_memory_max_bytes > 0.8
        for: 5m
        annotations:
          summary: "NameNode heap usage high"

      - alert: UnderReplicatedBlocks
        expr: hadoop_namenode_under_replicated_blocks > 1000
        for: 10m
        annotations:
          summary: "High number of under-replicated blocks"

      - alert: DataNodeDown
        expr: up{job="hadoop-datanodes"} == 0
        for: 2m
        annotations:
          summary: "DataNode {{ $labels.instance }} is down"

Backup and Disaster Recovery

Metadata Backup

# Automatic metadata checkpointing (handled by Secondary NN or Standby NN in HA)

# Manual backup
hdfs dfsadmin -saveNamespace
scp -r /var/hadoop/namenode/current backup-server:/backups/nn-$(date +%Y%m%d)/

# Restore
scp -r backup-server:/backups/nn-20240115/ /var/hadoop/namenode/current
hdfs namenode -importCheckpoint

Data Replication Across Clusters

DistCp (Distributed Copy):
# Copy data to disaster recovery cluster
hadoop distcp \
  -update \
  -delete \
  -p \
  hdfs://prod-cluster/data \
  hdfs://dr-cluster/data

# Incremental copy (only changed files)
hadoop distcp \
  -update \
  -skipcrccheck \
  hdfs://prod-cluster/data \
  hdfs://dr-cluster/data
Automated Daily Backup:
#!/bin/bash
# backup-hdfs.sh

DATE=$(date +%Y%m%d)
LOG=/var/log/hadoop-backup-${DATE}.log

hadoop distcp \
  -update \
  -delete \
  -log /tmp/distcp-${DATE} \
  hdfs://prod/data \
  hdfs://backup/data/${DATE} 2>&1 | tee $LOG

if [ $? -eq 0 ]; then
    echo "Backup completed successfully"
else
    echo "Backup failed" | mail -s "HDFS Backup Failed" [email protected]
fi

Troubleshooting Common Issues

Issue 1: Slow Job Performance

Diagnosis:
# Check job counters
hadoop job -counter job_123456 \
  org.apache.hadoop.mapreduce.TaskCounter \
  SPILLED_RECORDS

# High spills → Increase sort buffer
# Long shuffle time → Data skew or network issues
Solutions:
<!-- Increase sort buffer -->
<property>
    <name>mapreduce.task.io.sort.mb</name>
    <value>512</value>
</property>

<!-- Increase reduce shuffle parallelism -->
<property>
    <name>mapreduce.reduce.shuffle.parallelcopies</name>
    <value>20</value>
</property>

Issue 2: NameNode Out of Memory

Diagnosis:
# Check heap usage
jmap -heap <NameNode-PID>

# Count files/blocks
hdfs dfsadmin -report | grep "blocks"
Solutions:
  1. Increase heap:
    export HADOOP_NAMENODE_OPTS="-Xmx32g"
    
  2. Enable HDFS Federation (multiple NameNodes)
  3. Clean up small files:
    hdfs dfs -find /path -type f -size -1m | xargs hdfs dfs -rm
    

Issue 3: DataNode Disk Failure

Detection:
2024-01-15 10:30:15 ERROR datanode.DataNode:
  Failed to read block blk_123456 from disk /mnt/disk3:
  Input/output error
Recovery:
# Mark disk as failed
touch /mnt/disk3/badDisk

# Restart DataNode (will skip bad disk)
hadoop-daemon.sh restart datanode

# NameNode will re-replicate blocks from other replicas
# Monitor: hdfs fsck / -blocks

Capacity Planning

Growth Projection

# Calculate growth trajectory
current_size_tb = 500  # TB
monthly_growth_rate = 0.15  # 15% per month

def project_capacity(months):
    size = current_size_tb
    for m in range(months):
        size *= (1 + monthly_growth_rate)
    return size

# Projection for 12 months
print(f"12 months: {project_capacity(12):.2f} TB")
# Output: 12 months: 2670.98 TB

# When to add nodes?
# Current capacity: 70 nodes × 48TB = 3,360 TB usable
# With 15% monthly growth, add 10 nodes every 6 months

Cost Optimization

Data Lifecycle Management:
# Move old data to erasure coding (saves space)
hdfs ec -setPolicy -path /archive -policy RS-6-3-1024k

# Erasure coding: 6 data + 3 parity = 1.5x overhead
# vs. Replication: 3x overhead
# Savings: 50% storage for archive data

Upgrading Hadoop

Rolling Upgrade Process

# 1. Prepare for upgrade (Hadoop 3.0+)
hdfs dfsadmin -rollingUpgrade prepare

# 2. Upgrade DataNodes one by one
ssh datanode1
hadoop-daemon.sh stop datanode
# Install new version
hadoop-daemon.sh start datanode

# Repeat for each DataNode with delay
# Wait for re-replication before next node

# 3. Upgrade NameNode standby
hadoop-daemon.sh stop namenode
# Install new version
hadoop-daemon.sh start namenode

# 4. Fail over to upgraded NameNode
hdfs haadmin -failover nn1 nn2

# 5. Upgrade former active NameNode
# (Now standby)

# 6. Finalize upgrade
hdfs dfsadmin -rollingUpgrade finalize

Best Practices Checklist

Security

  • Enable Kerberos authentication
  • Use encryption for sensitive data
  • Implement role-based access control (Ranger)
  • Regular security audits
  • Rotate credentials and keytabs

Availability

  • Deploy NameNode HA
  • Use ZooKeeper for coordination
  • Regular metadata backups
  • DR cluster for critical data
  • Test failover procedures

Performance

  • Monitor and tune GC settings
  • Optimize block size and replication factor
  • Use compression appropriately
  • Balance cluster regularly
  • Decommission slow/faulty nodes

Operations

  • Centralized logging (ELK stack)
  • Automated alerting (Prometheus/Grafana)
  • Capacity planning and forecasting
  • Regular cluster health checks
  • Documentation and runbooks

What’s Next?

Capstone Project: Building a Complete Data Pipeline

Apply everything you’ve learned in a comprehensive, production-ready project