Skip to main content

Compute Services

This chapter will teach you everything about running applications in Azure, starting from absolute basics. We’ll explain what compute actually means, why different options exist, and how to choose and use them confidently.

What You’ll Learn

By the end of this chapter, you’ll understand:
  • What “compute” means in cloud computing (explained from scratch)
  • Why you need compute resources and what they do
  • The differences between VMs, App Service, Functions, and Containers
  • How to create and configure each compute type step-by-step
  • When to use each option (with real-world decision criteria)
  • How to optimize cost and performance
  • Best practices for production deployments

What is “Compute”? (Start Here if You’re Completely New)

Let’s start with the absolute basics. What does “compute” even mean?

The Simple Explanation

Compute = The ability to run your code That’s it. When you write a program (a website, an app, a script), it needs to run SOMEWHERE. That “somewhere” needs:
  • CPU (Central Processing Unit): The brain that executes your code
  • Memory (RAM): Temporary storage while your code runs
  • Storage (Disk): Where your code and data are stored permanently
Compute is just a fancy tech word for “a computer that runs your code.”

Real-World Analogy

Think about baking a cake:
  • Your recipe = Your code (the instructions)
  • The kitchen = Compute resources
    • Oven = CPU (does the work)
    • Counter space = RAM (workspace while cooking)
    • Pantry = Storage (ingredients and supplies)
Without a kitchen, your recipe is useless. Without compute, your code can’t run.

Where Does Your Code Run?

Option 1: Your Laptop (Local)
Pros:
✅ Free (you already own it)
✅ Full control
✅ Easy to test

Cons:
❌ Only you can access it
❌ Goes offline when you close laptop
❌ Limited by your laptop's power
❌ If laptop dies, app goes down
Option 2: Your Company’s Server (On-Premises)
Pros:
✅ Your company controls it
✅ Can handle more traffic than a laptop
✅ Stays online 24/7 (if configured properly)

Cons:
❌ Expensive ($10,000+ upfront)
❌ Takes weeks to set up
❌ You manage everything (patches, hardware, backups)
❌ Can't easily scale (bought 1 server, stuck with 1 server)
Option 3: Azure Cloud (What We’re Learning)
Pros:
✅ No upfront cost (pay as you go)
✅ Deploy in minutes
✅ Scales automatically (1 server → 100 servers → 1 server)
✅ Microsoft manages hardware
✅ Available worldwide

Cons:
❌ Monthly cost (but often cheaper than on-premises)
❌ Need to learn Azure (that's why you're here!)

Why Azure Has Multiple “Compute” Services

Why not just one type of compute? Because different apps have different needs. Analogy: Transportation You wouldn’t use the same vehicle for every trip:
  • Going to grocery store → Walk or bike
  • Commute to work → Car or bus
  • Moving furniture → Truck
  • International trip → Airplane
Similarly, different apps need different compute:
  • Simple website → App Service (like a car: easy, sufficient for most)
  • Custom legacy app → Virtual Machine (like a truck: heavy-duty, more control)
  • Process one task → Azure Function (like an Uber: pay per ride)
  • Complex microservices → Kubernetes (like a fleet of vehicles: orchestrated system)

What is “Compute”?**

In simple terms, compute is the processing power that runs your applications—the CPU, memory, and resources that execute your code. Think of it as the “brain” of your application that processes requests, runs algorithms, and serves data to users. Azure offers multiple compute options, each designed for different scenarios. Choosing the right one impacts cost, performance, and operational overhead. This chapter will teach you what each service is, why you’d choose it, and how to use it effectively—from complete beginner to production-ready deployments. Azure Compute Spectrum

Understanding the Compute-to-Application Relationship

Let’s make this concrete with a real example: Example: Building a Blog Website
Your Blog Application Needs:
1. Web Server
   - Receives HTTP requests (user visits http://yourblog.com)
   - Sends back HTML pages
   - Compute needed: CPU to process requests, RAM to hold data

2. Database
   - Stores your blog posts, comments, users
   - Compute needed: CPU to query data, RAM to cache queries

3. File Storage
   - Stores images, videos you upload
   - Compute needed: Minimal (just storage, not much processing)

Where does this run?
- Without Azure: You set up a server in your closet
- With Azure: You rent compute resources in Microsoft's datacenter
The Flow:
User visits yourblog.com

Request goes to Azure datacenter

Azure Compute (your rented "computer") receives request

Your application code runs on that compute

Code fetches blog post from database

Code generates HTML page

Compute sends HTML back to user

User sees your blog

All of this happens on Azure Compute in milliseconds.

Breaking Down “Compute Resources”

When you rent compute in Azure, you’re actually renting these components: 1. CPU (vCPU - Virtual CPU)
  • What it is: Processing power. Measured in “cores.”
  • What it does: Executes your code, one instruction at a time
  • Analogy: Workers in a kitchen
    • 1 vCPU = 1 worker (handles 1 task at a time)
    • 4 vCPUs = 4 workers (handles 4 tasks simultaneously)
  • Example: Blog with 10 visitors → 1 vCPU sufficient
  • Example: Blog with 10,000 visitors → 8 vCPUs needed
2. Memory (RAM)
  • What it is: Temporary storage while your app runs
  • What it does: Holds data currently being processed
  • Analogy: Counter space in a kitchen
    • More RAM = More space to work with multiple things at once
    • Less RAM = Must finish one task before starting another
  • Example: Blog loads 10 posts → Needs 100 MB RAM
  • Example: Blog loads 1000 posts → Needs 2 GB RAM
3. Storage (Disk)
  • What it is: Permanent storage for your code and data
  • What it does: Stores files even when compute is turned off
  • Analogy: Pantry or closet (long-term storage)
  • Example: Blog application code → 500 MB
  • Example: 1,000 blog posts with images → 10 GB
4. Network
  • What it is: Bandwidth for sending/receiving data
  • What it does: Transfers data between user and your app
  • Analogy: Internet connection speed
  • Example: Small blog → 1 Mbps sufficient
  • Example: Video streaming site → 1 Gbps+ needed

Understanding the Compute Spectrum

Before diving into specific services, let’s understand the fundamental question: What do you need to run?

The Evolution of Compute Needs

Traditional On-Premises (Before Cloud):
You buy physical servers:
- Pay upfront ($10,000+)
- Takes weeks to arrive
- Fixed capacity (can't scale)
- You manage everything (OS, patches, hardware)
- If server breaks, you're down until it's fixed
Cloud Computing (Azure):
You rent virtual servers:
- Pay per hour/second (no upfront cost)
- Deploy in minutes
- Scale up/down instantly
- Microsoft manages hardware
- Automatic redundancy (if one fails, another takes over)

Why Multiple Compute Options?

Different applications have different needs:
Application TypeNeedsAzure Service
Simple websiteJust run code, don’t care about OSApp Service
Legacy applicationNeeds specific OS version, custom softwareVirtual Machines
MicroservicesNeed to orchestrate many containersAzure Kubernetes Service (AKS)
Event-drivenRun code only when triggered (e.g., file upload)Azure Functions
Quick taskRun a container once, no orchestrationContainer Instances
Real-World Analogy:
  • Virtual Machine = Renting an entire apartment (full control, more responsibility)
  • App Service = Renting a furnished room (less control, less responsibility, easier)
  • Azure Functions = Using a hotel room for one night (pay only when you use it)
[!TIP] Jargon Alert: Compute “Compute” is just a fancy word for “processing power” or “the ability to run code.” When someone says “compute resources,” they mean CPU, memory, and the servers that run your applications. Don’t let the word intimidate you—it’s just tech jargon for “the stuff that runs your code.”
[!WARNING] Gotcha: Choosing the Wrong Compute Service Many beginners choose VMs because they’re familiar, but VMs are often overkill. If you’re building a simple web app, use App Service. You’ll save time, money, and headaches. Only use VMs if you truly need full OS control.

Key Concepts You Must Understand

1. IaaS vs PaaS vs Serverless

These terms define how much Microsoft manages for you:
You manage: OS, runtime, applications, dataMicrosoft manages: Hardware, networking, datacenterExample: Virtual MachinesAnalogy: You rent a plot of land. You build the house, install plumbing, electricity—everything. The landlord just provides the land.When to use:
  • Need specific OS version (Windows Server 2012 R2)
  • Legacy applications that can’t run on PaaS
  • Full control over the environment
  • Compliance requirements (need to manage security patches yourself)
Trade-off: More control = More responsibility (you patch OS, manage security, handle failures)

2. Stateless vs Stateful Applications

Stateless: Application doesn’t store session data on the server. Each request is independent. Example: A REST API that processes requests. If the server restarts, no data is lost because state is stored in a database.
# Stateless API
@app.route('/api/users/<user_id>')
def get_user(user_id):
    # No session data stored on server
    # Fetches from database each time
    return db.get_user(user_id)
Stateful: Application stores session data in memory. If server restarts, session is lost. Example: A game server that keeps player positions in memory.
# Stateful game server
player_positions = {}  # Stored in memory

def update_position(player_id, x, y):
    player_positions[player_id] = (x, y)  # Lost if server restarts
Why This Matters:
  • Stateless apps can scale horizontally easily (add more servers, load balance)
  • Stateful apps need sticky sessions or external state storage (Redis, database)
Best Practice: Make applications stateless. Store state in databases, Redis, or Cosmos DB.
[!TIP] Jargon Alert: Stateless vs Stateful Stateless: Like ordering at a fast-food restaurant. Each order is independent—the cashier doesn’t remember your last order. The app doesn’t store session data on the server. Stateful: Like a sit-down restaurant where the waiter remembers your preferences. The app stores session data in memory. If the server restarts, that memory is lost.
[!WARNING] Gotcha: Stateful Applications Don’t Scale If your app stores user sessions in memory, you can’t easily add more servers. User A’s session is on Server 1, but the load balancer might send them to Server 2, which doesn’t have their session. Always use external storage (database, Redis) for state.

3. Horizontal vs Vertical Scaling

Vertical Scaling (Scale Up):
  • Make the server bigger (more CPU, more RAM)
  • Example: Upgrade from 4 vCPU to 8 vCPU
  • Limitation: Can only scale to maximum VM size
  • Cost: More expensive per unit
Horizontal Scaling (Scale Out):
  • Add more servers (2 servers → 4 servers → 10 servers)
  • Example: Add more VM instances to handle traffic
  • Advantage: Can scale to hundreds/thousands of servers
  • Cost: More cost-effective at scale
Real-World Example:
Scenario: Your website gets 10x more traffic

Vertical Scaling:
- Upgrade VM from 4 vCPU to 16 vCPU
- Cost: $200/month → $800/month
- Limitation: Can't go beyond 16 vCPU

Horizontal Scaling:
- Add 3 more VMs (4 total, each with 4 vCPU)
- Cost: $200/month → $800/month (same cost, but 4x redundancy)
- Advantage: If one VM fails, 3 others still serve traffic
Best Practice: Design for horizontal scaling from day one. Use load balancers, stateless applications, and autoscaling.

1. Compute Decision Tree

[!WARNING] Gotcha: Spot VM Eviction Spot VMs offer huge discounts (up to 90%), but Azure can take them back with only a 30-second warning. Never use them for production databases or critical APIs—only for stateless batch jobs that can fail and restart.
[!TIP] Jargon Alert: SLA (Service Level Agreement) Microsoft’s financial guarantee of uptime (e.g., 99.9%). If they miss it, you get a bill credit. Note: Single VMs often have a lower SLA than multiple VMs deployed in an Availability Set or Zone.

2. Virtual Machines Deep Dive

What is a Virtual Machine? A Virtual Machine (VM) is a software-based computer that runs on physical hardware. Think of it like this: Azure has massive physical servers in datacenters. They use virtualization technology to split one physical server into multiple “virtual” servers. Each VM gets its own CPU, RAM, and storage, isolated from other VMs. Why Use VMs?
  1. Full Control: You have root/admin access. Install any software, configure anything.
  2. Legacy Applications: Run old applications that require specific OS versions or configurations.
  3. Custom Requirements: Need specific drivers, software, or configurations that PaaS doesn’t support.
  4. Compliance: Some regulations require you to manage the OS yourself.

Under the Hood: How Azure Compute Works

To a Principal Engineer, a “VM” isn’t just a virtual computer; it’s a slice of a massive, distributed system. Here is what’s happening behind the scenes.

1. The Fabric Controller (The Brain)

Azure doesn’t have humans plugging in servers when you click “Create”. It uses the Fabric Controller.
  • It maintains a map of every physical server in the datacenter.
  • It tracks CPU/RAM utilization and hardware health.
  • When you request a VM, it finds a physical host with enough “white space” (available resources) and sends a command to create your VM.

2. The Hypervisor (The Gatekeeper)

Every physical host runs a custom version of Hyper-V.
  • It provides strict isolation between VMs. VM-A cannot see VM-B’s memory, even though they sit on the same physical chip.
  • It manages the vCPU scheduling. If you have a 2-vCPU VM, the hypervisor ensures you get your fair share of time on the physical CPU cores.

3. Service Healing (Self-Correcting Infrastructure)

What happens if the physical server hosting your VM catches fire?
  1. The Fabric Controller detects the heartbeats are missing.
  2. It immediately marks that host as “failed”.
  3. It finds a new, healthy physical host in the same cluster.
  4. It “respawns” your VM on the new host and re-attaches your Managed Disks.
  5. Your VM reboots automatically. This is why “Managed Disks” are critical—they live on the storage network, not the physical host, so they can be moved instantly.
[!IMPORTANT] Pro Insight: Availability Sets vs. Zones
  • Availability Sets ensure your VMs are on different Racks (Power/Network) in the same building.
  • Availability Zones ensure your VMs are in different Buildings (miles apart). Always use Availability Zones for production to survive a complete datacenter power failure.

When NOT to Use VMs:
  • Simple web applications (use App Service instead)
  • You just want to deploy code quickly (use PaaS)
  • You don’t want to manage OS patches (use PaaS)
[!WARNING] Gotcha: VM Management Overhead VMs require ongoing maintenance: OS patches, security updates, monitoring, backups. If you’re not prepared to manage this, use PaaS (App Service, Azure Functions). Many teams choose VMs thinking they’ll have “more control,” but end up spending 50% of their time on maintenance instead of building features.
[!TIP] Jargon Alert: Virtual Machine (VM) A VM is a software-based computer running on physical hardware. Think of it like this: Azure has massive physical servers. They use virtualization technology to split one physical server into multiple “virtual” servers. Each VM thinks it’s a real computer with its own CPU, RAM, and storage.

Understanding VM Components

Before choosing a VM size, you need to understand what you’re buying:
What it is: Processing power. More vCPUs = can handle more concurrent operations.Real-World Analogy: Like having more workers. 1 worker can handle 1 task at a time. 4 workers can handle 4 tasks simultaneously.How to Choose:
  • 1-2 vCPU: Small websites, dev/test environments
  • 4-8 vCPU: Medium web applications, small databases
  • 16+ vCPU: Large databases, high-traffic applications, data processing
Common Mistake: Over-provisioning. If your app uses 20% CPU, you don’t need 16 vCPUs. Start small, monitor, then scale up.
[!WARNING] Gotcha: Over-Provisioning Costs Money Many beginners choose the biggest VM “to be safe.” A 16 vCPU VM costs 800/month.Ifyouonlyuse20800/month. If you only use 20% CPU, you're wasting 640/month. Start with 2-4 vCPUs, monitor for a week, then scale up if needed. Azure makes it easy to resize VMs.

VM Size Families

B, D, DC, DS seriesBalanced CPU:Memory ratio (1:4)
Use cases:
- Web servers
- Small to medium databases
- Development/test environments
- Low to medium traffic apps

Examples:
- Standard_B2s: 2 vCPU, 4 GB RAM (Burstable)
- Standard_D4s_v5: 4 vCPU, 16 GB RAM
B-series (Burstable):
  • Accumulate CPU credits when idle
  • Burst to 100% when needed
  • Cost-effective for variable workloads
  • Perfect for dev/test

Step-by-Step: Creating Your First VM

Let’s create a VM from scratch, explaining every step and why we’re doing it:

Prerequisites

Before creating a VM, you need:
  1. Azure Account: Sign up at portal.azure.com (free tier works)
  2. Resource Group: A container for your resources (like a folder)
  3. Virtual Network: A network for your VM to connect to (like a LAN)

Step 1: Create Resource Group

What is a Resource Group? Think of it as a folder that contains related resources. All resources in a group can be managed together (delete the group = delete all resources).
# Create resource group
az group create \
  --name rg-learn-vm \
  --location eastus

# What this does:
# --name: Name of the resource group (must be unique in your subscription)
# --location: Azure region where resources will be created
#   - eastus = East US (Virginia) - good for US East Coast
#   - westeurope = West Europe (Netherlands) - good for Europe
#   - southeastasia = Southeast Asia (Singapore) - good for Asia
Why eastus? It’s one of the cheapest regions and has all services available. For production, choose the region closest to your users.

Step 2: Create Virtual Network

What is a Virtual Network (VNet)? Think of it as your private network in Azure. VMs in the same VNet can communicate with each other privately (like computers on the same WiFi network).
# Create virtual network
az network vnet create \
  --resource-group rg-learn-vm \
  --name vnet-learn \
  --address-prefix 10.0.0.0/16 \
  --subnet-name default \
  --subnet-prefix 10.0.1.0/24

# What this does:
# --address-prefix 10.0.0.0/16: 
#   - Defines the network range (10.0.0.0 to 10.0.255.255)
#   - /16 means first 16 bits are network, last 16 bits are hosts
#   - Can have up to 65,536 IP addresses (2^16)
# --subnet-prefix 10.0.1.0/24:
#   - Subnet is a smaller network within the VNet
#   - /24 means first 24 bits are network, last 8 bits are hosts
#   - Can have up to 256 IP addresses (2^8)
#   - VMs will get IPs like 10.0.1.4, 10.0.1.5, etc.
Why 10.0.0.0/16? This is a private IP range (RFC 1918). It won’t conflict with public internet IPs. Common choices:
  • 10.0.0.0/16 (10.0.0.0 - 10.0.255.255) - 65,536 IPs
  • 172.16.0.0/12 (172.16.0.0 - 172.31.255.255) - 1 million IPs
  • 192.168.0.0/16 (192.168.0.0 - 192.168.255.255) - 65,536 IPs

Step 3: Create Network Security Group (NSG)

What is an NSG? A firewall that controls traffic to/from your VM. By default, Azure blocks all inbound traffic. You need to explicitly allow ports (like port 22 for SSH, port 3389 for RDP).
# Create NSG
az network nsg create \
  --resource-group rg-learn-vm \
  --name nsg-learn

# Allow SSH (port 22) from anywhere
# WARNING: In production, restrict to your IP only!
az network nsg rule create \
  --resource-group rg-learn-vm \
  --nsg-name nsg-learn \
  --name AllowSSH \
  --priority 1000 \
  --protocol Tcp \
  --direction Inbound \
  --source-address-prefixes '*' \
  --source-port-ranges '*' \
  --destination-address-prefixes '*' \
  --destination-port-ranges 22 \
  --access Allow

# What this does:
# --priority 1000: Lower number = higher priority (evaluated first)
# --protocol Tcp: Allow TCP protocol (SSH uses TCP)
# --direction Inbound: Rule applies to incoming traffic
# --source-address-prefixes '*': Allow from any IP (NOT secure for production!)
# --destination-port-ranges 22: Allow traffic to port 22 (SSH)
# --access Allow: Allow this traffic (vs Deny)
Security Best Practice: Instead of '*', use your IP:
--source-address-prefixes 'YOUR_IP_ADDRESS/32'

Step 4: Create Public IP Address

What is a Public IP? An IP address accessible from the internet. Without this, you can’t connect to your VM from outside Azure.
# Create public IP
az network public-ip create \
  --resource-group rg-learn-vm \
  --name pip-learn-vm \
  --allocation-method Static \
  --sku Standard

# What this does:
# --allocation-method Static: IP address doesn't change (vs Dynamic)
# --sku Standard: Standard SKU (required for newer VMs)
Static vs Dynamic:
  • Static: IP address never changes (good for DNS records, firewall rules)
  • Dynamic: IP address can change when VM is stopped/started (cheaper, but less reliable)

Step 5: Create Network Interface (NIC)

What is a NIC? The network card that connects your VM to the network. It connects the VM to the VNet, NSG, and Public IP.
# Create NIC
az network nic create \
  --resource-group rg-learn-vm \
  --name nic-learn-vm \
  --vnet-name vnet-learn \
  --subnet default \
  --network-security-group nsg-learn \
  --public-ip-address pip-learn-vm

# What this does:
# --vnet-name: Connect to the VNet we created
# --subnet: Connect to the subnet (default)
# --network-security-group: Attach the NSG (firewall rules)
# --public-ip-address: Attach the public IP (for internet access)

Step 6: Create the Virtual Machine

Now we create the actual VM:
# Create VM
az vm create \
  --resource-group rg-learn-vm \
  --name vm-learn \
  --location eastus \
  --nics nic-learn-vm \
  --image UbuntuLTS \
  --size Standard_B2s \
  --admin-username azureuser \
  --generate-ssh-keys \
  --authentication-type ssh

# What this does:
# --name: Name of the VM (must be unique in resource group)
# --nics: Attach the network interface we created
# --image UbuntuLTS: Use Ubuntu Linux (latest LTS version)
#   Alternatives: 
#     - Win2019Datacenter (Windows Server 2019)
#     - RHEL (Red Hat Enterprise Linux)
#     - CentOS
# --size Standard_B2s: VM size (2 vCPU, 4 GB RAM, Burstable)
# --admin-username: Username for SSH login
# --generate-ssh-keys: Automatically generate SSH key pair
#   - Creates ~/.ssh/id_rsa (private key) and ~/.ssh/id_rsa.pub (public key)
#   - Public key is added to VM for passwordless login
# --authentication-type ssh: Use SSH keys (more secure than passwords)
What happens during VM creation?
  1. Azure allocates hardware in a datacenter
  2. Creates the VM with specified CPU/RAM
  3. Attaches the OS disk (contains Ubuntu)
  4. Connects to the network (via NIC)
  5. Boots the VM
  6. Installs your SSH public key
  7. VM is ready in 2-5 minutes

Step 7: Connect to Your VM

# Get the public IP address
az vm show \
  --resource-group rg-learn-vm \
  --name vm-learn \
  --show-details \
  --query publicIps \
  --output tsv

# Connect via SSH (replace with your IP)
ssh azureuser@<PUBLIC_IP>

# Example:
# ssh [email protected]
First-time connection: You’ll see a message asking to verify the host. Type yes and press Enter.

Step 8: Verify VM is Working

Once connected, run these commands to verify everything works:
# Check OS version
cat /etc/os-release

# Check CPU and memory
free -h
nproc

# Check disk space
df -h

# Check network
ip addr show
ping -c 3 8.8.8.8  # Test internet connectivity

Step 9: Install Software (Example: Nginx Web Server)

# Update package list
sudo apt update

# Install Nginx
sudo apt install -y nginx

# Start Nginx
sudo systemctl start nginx

# Enable Nginx to start on boot
sudo systemctl enable nginx

# Check status
sudo systemctl status nginx
Open port 80 in NSG (to access web server):
az network nsg rule create \
  --resource-group rg-learn-vm \
  --nsg-name nsg-learn \
  --name AllowHTTP \
  --priority 1001 \
  --protocol Tcp \
  --direction Inbound \
  --source-address-prefixes '*' \
  --destination-port-ranges 80 \
  --access Allow
Now visit http://<PUBLIC_IP> in your browser. You should see the Nginx welcome page!

Step 10: Clean Up (Important!)

Always delete resources when done to avoid charges:
# Delete the entire resource group (deletes all resources)
az group delete \
  --name rg-learn-vm \
  --yes \
  --no-wait

# What this does:
# --yes: Don't ask for confirmation
# --no-wait: Don't wait for deletion to complete (runs in background)
Cost: A Standard_B2s VM costs ~$30/month if left running. Always delete when not in use!
[!WARNING] Gotcha: VM Costs Add Up Quickly A single VM might cost 30/month,butifyouforgettodelete10VMs,thats30/month, but if you forget to delete 10 VMs, that's 300/month wasted. Always set up cost alerts and tag resources with “Owner” so you know who to contact. Use Azure Cost Management to find and delete unused resources.
[!TIP] Jargon Alert: Deallocate vs Stop Stop (in OS): Shuts down the operating system, but Azure still reserves the hardware. You’re still charged for compute! Deallocate: Releases the hardware back to Azure. No compute charges, only storage charges. Always deallocate VMs when not in use.
[!INFO] Aside: Azure Automation for Cost Savings Use Azure Automation to automatically stop (deallocate) dev/test VMs at 6 PM and start them at 8 AM. Saves 60% on compute costs (no charges during nights/weekends).

Understanding VM Creation Options

When creating a VM, you make several important decisions:
What it is: The OS and software pre-installed on the VM.Types:
  • Marketplace Images: Pre-configured OS (Ubuntu, Windows Server, RHEL)
  • Custom Images: Your own OS image (for consistent deployments)
  • Shared Image Gallery: Images shared across subscriptions
Common Choices:
UbuntuLTS: Ubuntu Linux (most popular for Linux)
Win2019Datacenter: Windows Server 2019
RHEL: Red Hat Enterprise Linux (enterprise)
CentOS: Community version of RHEL
How to Choose:
  • Linux: Cheaper (no Windows license), better for web servers, APIs
  • Windows: Required for .NET Framework apps, Windows-specific software

VM Pricing Models

Pay-as-you-go

No commitment, highest cost
  • Billed per second
  • Stop VM = stop compute charges
  • Storage still charged
Use for: Short-term, unpredictable workloads

Reserved Instances

1 or 3-year commitment
  • 30-50% discount (1-year)
  • 50-70% discount (3-year)
  • Can exchange for different size
Use for: Stable, long-running workloads

Spot VMs

Up to 90% discount
  • Can be evicted anytime
  • 30-second warning
  • No SLA
Use for: Batch jobs, testing, fault-tolerant apps

Azure Hybrid Benefit

Use existing Windows licenses
  • Up to 40% discount
  • Requires Software Assurance
  • Windows Server + SQL Server
Use for: Migrations from on-premises

Managed Disks

Disk TypeIOPSThroughputUse Case
Standard HDD50060 MB/sBackup, non-critical
Standard SSD500-6,00060-750 MB/sWeb servers, dev/test
Premium SSD120-20,00025-900 MB/sProduction databases
Ultra DiskUp to 160,000Up to 4,000 MB/sSAP HANA, top-tier SQL
Premium SSD Sizes:
P4:  32 GB,   120 IOPS,  25 MB/s
P10: 128 GB,  500 IOPS,  100 MB/s
P30: 1 TB,    5,000 IOPS, 200 MB/s
P80: 32 TB,   20,000 IOPS, 900 MB/s

VM High Availability

1

Availability Sets

Protect against planned maintenance and hardware failures
Fault Domains: 2-3 (different racks)
Update Domains: Up to 20 (staggered updates)

SLA: 99.95% (2+ VMs in availability set)
Use when: Regional deployment, no zone support
2

Availability Zones

Protect against datacenter failures
Deploy VMs across 3 zones:
- Zone 1: VM 1, 4, 7
- Zone 2: VM 2, 5, 8
- Zone 3: VM 3, 6, 9

SLA: 99.99% (2+ VMs across zones)
Use when: Maximum availability, region supports zones
3

VM Scale Sets

Autoscaling group of identical VMs
Features:
- Autoscale (CPU, memory, schedule)
- Load balancer integration
- Rolling upgrades
- Instance protection

SLA: 99.95% (availability set) or 99.99% (zones)
Use when: Scalable, stateless applications

3. VM Scale Sets

VM Scale Sets (VMSS) automatically scale identical VMs based on demand.

VMSS Architecture

Create VM Scale Set

# Create VMSS with autoscaling
az vmss create \
  --name vmss-web \
  --resource-group rg-prod \
  --image UbuntuLTS \
  --vm-sku Standard_D2s_v3 \
  --instance-count 2 \
  --zones 1 2 3 \
  --vnet-name vnet-prod \
  --subnet snet-web \
  --lb lb-web \
  --backend-pool-name pool-web \
  --admin-username azureuser \
  --generate-ssh-keys

# Configure autoscale
az monitor autoscale create \
  --resource-group rg-prod \
  --resource vmss-web \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name autoscale-web \
  --min-count 2 \
  --max-count 10 \
  --count 2

# Scale out rule (CPU > 75%)
az monitor autoscale rule create \
  --resource-group rg-prod \
  --autoscale-name autoscale-web \
  --condition "Percentage CPU > 75 avg 5m" \
  --scale out 1

# Scale in rule (CPU < 25%)
az monitor autoscale rule create \
  --resource-group rg-prod \
  --autoscale-name autoscale-web \
  --condition "Percentage CPU < 25 avg 5m" \
  --scale in 1

VMSS Rolling Upgrades

# Update VMSS image
az vmss update \
  --name vmss-web \
  --resource-group rg-prod \
  --set virtualMachineProfile.storageProfile.imageReference.version=latest

# Perform rolling upgrade
az vmss rolling-upgrade start \
  --name vmss-web \
  --resource-group rg-prod

# Monitor upgrade
az vmss rolling-upgrade get-latest \
  --name vmss-web \
  --resource-group rg-prod
Upgrade Policy:
  • Manual: You control when to upgrade
  • Rolling: Upgrade in batches (recommended)
  • Automatic: Upgrade immediately (risky)
[!WARNING] Gotcha: VMSS Rolling Upgrades Can Cause Downtime If you don’t configure health probes correctly, a rolling upgrade might terminate healthy instances before new ones are ready. Always set minAvailable in PodDisruptionBudget (for AKS) or use instance protection (for VMSS) to prevent too many instances from being down at once.
[!TIP] Jargon Alert: VM Scale Set (VMSS) A VM Scale Set is a group of identical VMs that automatically scale based on demand. Think of it like a restaurant: when it’s busy (high CPU), you hire more waiters (add VMs). When it’s slow (low CPU), you send waiters home (remove VMs). All waiters are identical (same VM image), so they can handle any table (request).

4. Azure App Service

What is App Service? App Service is Azure’s Platform-as-a-Service (PaaS) offering for hosting web applications. Think of it as a “managed web server” where you just deploy your code, and Microsoft handles everything else: OS updates, scaling, load balancing, SSL certificates, and more. Why Use App Service Instead of VMs?
AspectApp ServiceVirtual Machines
Setup Time5 minutes30+ minutes
OS ManagementMicrosoft handlesYou manage
ScalingAutomatic (1-30 instances)Manual or complex setup
SSL CertificatesFree (managed)You install and renew
DeploymentGit push, ZIP, DockerSSH, RDP, manual
Cost00-700/month3030-2000+/month
ControlLimited (can’t install custom software)Full control
Real-World Analogy:
  • VM: Like renting an empty apartment. You furnish it, maintain it, fix everything yourself.
  • App Service: Like staying in a hotel. Everything is provided, you just check in and use it.
When to Use App Service: ✅ Modern web applications (Node.js, Python, .NET, PHP, Java) ✅ REST APIs ✅ Mobile app backends ✅ You want to focus on code, not infrastructure ✅ Need automatic scaling ✅ Want zero-downtime deployments When NOT to Use App Service: ❌ Need to install custom software on the OS ❌ Need specific OS version (Windows Server 2012 R2) ❌ Legacy applications that require full VM control ❌ Need to run background services (use VMs or Container Instances)

Understanding App Service Architecture

Before diving in, let’s understand how App Service works:
Your Code (GitHub, Local)

App Service (Azure)
    ├── Web Server (IIS for Windows, Nginx for Linux)
    ├── Runtime (Node.js, Python, .NET, etc.)
    ├── Auto-scaling (adds/removes instances)
    ├── Load Balancer (distributes traffic)
    └── SSL Termination (handles HTTPS)

    Users (Internet)

The Pro’s View: What’s inside an App Service?

When you scale an App Service to “3 instances”, what actually happens?
  1. The Front End (Load Balancer): This is a shared layer provided by Microsoft. It receives all traffic to *.azurewebsites.net. It terminates SSL and routes the request to your specific worker.
  2. The Worker (The Compute): This is your instance. This is where your code runs. If you have “3 instances”, you have 3 separate worker VMs (though you don’t manage them).
  3. The File Server (Shared Storage): This is the most important “secret”. Your code and files don’t live on the worker’s local disk; they live on a Managed Remote File Share.
    • When you write a file to local storage in your code, it’s actually being written over the network to this share.
    • All 3 instances see the exact same files. This is why you don’t have to sync files between instances!
[!WARNING] Performance Gotcha: The File System is a Network Because the file system is remote, reading/writing thousands of small files (like a massive node_modules folder or a Local SQLite DB) can be slow. Solution: Use WEBSITE_RUN_FROM_PACKAGE=1. This mounts your entire app as a read-only ZIP file, which is cached locally on the worker for blazing-fast startups and file access.

Key Concepts:
  1. App Service Plan: The “hosting environment” that defines:
    • How much CPU/RAM you get
    • How many apps can run on it
    • What features are available (slots, VNet, etc.)
    • The cost
  2. Web App: Your actual application running on the plan. You can have multiple web apps on one plan (to save money).
  3. Deployment Slot: A separate instance of your app for testing. You can swap slots for zero-downtime deployments.

Step-by-Step: Creating Your First Web App

Let’s create a complete web application from scratch:

Step 1: Create App Service Plan

What is an App Service Plan? Think of it as the “hosting package” that defines the resources and features available.
# Create App Service Plan (Free tier for learning)
az appservice plan create \
  --name plan-learn \
  --resource-group rg-learn-app \
  --sku FREE \
  --location eastus

# What this does:
# --name: Name of the plan (must be globally unique)
# --sku FREE: Pricing tier (FREE, B1, S1, P1V2, etc.)
#   - FREE: $0/month, 1 GB RAM, 60 minutes/day compute
#   - B1: $55/month, 1.75 GB RAM, always on
#   - S1: $100/month, 1.75 GB RAM, autoscaling, slots
#   - P1V2: $400/month, 3.5 GB RAM, better performance
Understanding SKUs:
SKUPriceRAMAlways OnSlotsAutoscaleUse Case
FREE$01 GBLearning only
B1$551.75 GBDev/test
S1$1001.75 GB✅ (5)✅ (10)Production
P1V2$4003.5 GB✅ (20)✅ (30)High traffic
Why create plan separately? You can host multiple web apps on one plan (saves money). Each app shares the plan’s resources.

Step 2: Create Web App

# Create Web App
az webapp create \
  --name mywebapp-learn-$(date +%s) \
  --resource-group rg-learn-app \
  --plan plan-learn \
  --runtime "NODE|18-lts"

# What this does:
# --name: Name of web app (must be globally unique, like a domain)
#   - Format: <name>.azurewebsites.net
#   - Example: mywebapp-learn-1234567890.azurewebsites.net
# --runtime: Programming language and version
#   Options:
#     - "NODE|18-lts" (Node.js 18 LTS)
#     - "PYTHON|3.11" (Python 3.11)
#     - "DOTNETCORE|7.0" (.NET 7)
#     - "PHP|8.2" (PHP 8.2)
#     - "JAVA|17" (Java 17)
Why the unique name? The web app name becomes part of the URL (mywebapp-learn-1234567890.azurewebsites.net). It must be globally unique across all Azure customers.
[!WARNING] Gotcha: App Service Name Cannot Be Changed Once you create an App Service, the name is permanent. You can’t rename it. If you need a different name, you must create a new app and migrate. Choose your name carefully!
[!TIP] Jargon Alert: App Service Plan An App Service Plan is like a “hosting package” that defines:
  • How much CPU/RAM you get
  • How many apps can run on it (you can host multiple apps on one plan)
  • What features are available (slots, VNet, autoscaling)
  • The cost
Think of it like a gym membership: the plan determines what equipment (features) you can use.

Step 3: Create a Simple Application

Let’s create a simple Node.js application:
# Create project directory
mkdir my-web-app
cd my-web-app

# Initialize Node.js project
npm init -y

# Install Express (web framework)
npm install express

# Create app.js
cat > app.js << 'EOF'
const express = require('express');
const app = express();
const port = process.env.PORT || 3000;

app.get('/', (req, res) => {
  res.send(`
    <h1>Hello from Azure App Service!</h1>
    <p>This is my first web app on Azure.</p>
    <p>Node.js version: ${process.version}</p>
    <p>Environment: ${process.env.WEBSITE_SITE_NAME || 'local'}</p>
  `);
});

app.get('/api/health', (req, res) => {
  res.json({ status: 'healthy', timestamp: new Date().toISOString() });
});

app.listen(port, () => {
  console.log(`Server running on port ${port}`);
});
EOF

# Create package.json (if not exists)
cat > package.json << 'EOF'
{
  "name": "my-web-app",
  "version": "1.0.0",
  "main": "app.js",
  "scripts": {
    "start": "node app.js"
  },
  "dependencies": {
    "express": "^4.18.0"
  }
}
EOF
What this code does:
  • Creates a simple Express.js web server
  • Responds to GET requests at / (homepage)
  • Has a health check endpoint at /api/health
  • Uses process.env.PORT (Azure sets this automatically)

Step 4: Deploy to App Service

Option A: Deploy from Local ZIP
# Create ZIP file
zip -r app.zip . -x "*.git*" "node_modules/*"

# Deploy to App Service
az webapp deployment source config-zip \
  --resource-group rg-learn-app \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --src app.zip
Option B: Deploy from GitHub (Recommended)
# First, push your code to GitHub
git init
git add .
git commit -m "Initial commit"
git remote add origin https://github.com/YOUR_USERNAME/my-web-app.git
git push -u origin main

# Configure App Service to deploy from GitHub
az webapp deployment source config \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --resource-group rg-learn-app \
  --repo-url https://github.com/YOUR_USERNAME/my-web-app.git \
  --branch main \
  --manual-integration
What happens during deployment:
  1. Azure downloads your code from GitHub
  2. Runs npm install (installs dependencies)
  3. Looks for package.jsonscripts.start
  4. Runs npm start (starts your app)
  5. Your app is live at https://mywebapp-learn-<NUMBER>.azurewebsites.net

Step 5: Access Your Web App

# Get the URL
az webapp show \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --resource-group rg-learn-app \
  --query defaultHostName \
  --output tsv

# Visit in browser:
# https://mywebapp-learn-<NUMBER>.azurewebsites.net
What you’ll see:
  • Homepage: “Hello from Azure App Service!”
  • Health check: https://<URL>/api/health returns JSON

Step 6: View Logs

Real-time logs (see what your app is doing):
az webapp log tail \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --resource-group rg-learn-app
Download logs:
az webapp log download \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --resource-group rg-learn-app \
  --log-file app-logs.zip

Step 7: Configure Environment Variables

What are environment variables? Configuration values that change between environments (dev, staging, production).
# Set environment variable
az webapp config appsettings set \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --resource-group rg-learn-app \
  --settings \
    DATABASE_URL="postgresql://user:pass@host:5432/db" \
    API_KEY="secret-key-123" \
    NODE_ENV="production"

# View environment variables
az webapp config appsettings list \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --resource-group rg-learn-app
In your code, access them:
const dbUrl = process.env.DATABASE_URL;
const apiKey = process.env.API_KEY;
Best Practice: Never commit secrets to Git. Use environment variables or Azure Key Vault.

Step 8: Enable Continuous Deployment

What is Continuous Deployment? Automatically deploy new code when you push to GitHub.
# Enable continuous deployment
az webapp deployment source config \
  --name mywebapp-learn-<YOUR_NUMBER> \
  --resource-group rg-learn-app \
  --repo-url https://github.com/YOUR_USERNAME/my-web-app.git \
  --branch main \
  --manual-integration false
How it works:
  1. You push code to GitHub
  2. App Service detects the push
  3. Automatically downloads and deploys new code
  4. Your app updates in 1-2 minutes
Workflow:
# Make a change
echo "console.log('New version!');" >> app.js

# Commit and push
git add app.js
git commit -m "Add logging"
git push

# App Service automatically deploys (check logs to see it)

Understanding App Service Features

What it is: Separate instances of your app for testing before going live.How it works:
  1. Deploy new version to “staging” slot
  2. Test it thoroughly
  3. Swap staging ↔ production (instant, zero downtime)
  4. If issues, swap back (instant rollback)
Example:
# Create staging slot
az webapp deployment slot create \
  --name mywebapp \
  --resource-group rg-prod \
  --slot staging

# Deploy to staging
az webapp deployment source config-zip \
  --name mywebapp \
  --resource-group rg-prod \
  --slot staging \
  --src app-v2.zip

# Test staging URL: mywebapp-staging.azurewebsites.net

# Swap to production (zero downtime)
az webapp deployment slot swap \
  --name mywebapp \
  --resource-group rg-prod \
  --slot staging \
  --target-slot production
Benefits:
  • Test in production-like environment
  • Zero downtime deployments
  • Instant rollback if issues
  • Warm up app before swap (no cold start)
[!INFO] Aside: App Service Free Tier Limitations The FREE tier is great for learning, but has serious limitations:
  • Apps “sleep” after 20 minutes of inactivity (takes 30+ seconds to wake up)
  • No custom domains
  • No SSL certificates
  • No deployment slots
  • No autoscaling
For production, use at least the Basic tier ($55/month). The FREE tier is only for learning/testing.
[!TIP] Jargon Alert: Deployment Slot A deployment slot is a separate instance of your app. Think of it like having two identical apartments—you can test new furniture (code) in one apartment before moving it to your main apartment. Slots enable zero-downtime deployments: deploy to staging slot, test it, then swap it with production instantly.

App Service Plans

TierPriceFeaturesUse Case
Free$01 GB RAM, 60 min/dayLearning
Shared$10/month1 GB RAM, custom domainHobby projects
Basic$55/month1.75 GB RAM, SSDDev/test
Standard$100/monthAutoscale, slots, VNetProduction
Premium$400/monthMore scale, better perfHigh-traffic
Isolated$700/monthDedicated VNet (ASE)Enterprise

Deployment Slots

Deployment Slots enable zero-downtime deployments.
# Create deployment slot
az webapp deployment slot create \
  --name mywebapp \
  --resource-group rg-prod \
  --slot staging

# Deploy to staging
az webapp deployment source config \
  --name mywebapp \
  --resource-group rg-prod \
  --slot staging \
  --repo-url https://github.com/user/repo \
  --branch main

# Swap slots (staging → production)
az webapp deployment slot swap \
  --name mywebapp \
  --resource-group rg-prod \
  --slot staging \
  --target-slot production

App Service Best Practices

Deploy to staging, test, then swap to production. Instant rollback if issues.
Prevents app from unloading after idle time. Critical for production.
az webapp config set \
  --name mywebapp \
  --resource-group rg-prod \
  --always-on true
Connect to private resources (databases, storage) without public endpoints.
az webapp vnet-integration add \
  --name mywebapp \
  --resource-group rg-prod \
  --vnet vnet-prod \
  --subnet snet-app
Scale based on CPU, memory, or custom metrics.
az monitor autoscale create \
  --resource-group rg-prod \
  --resource mywebapp \
  --resource-type Microsoft.Web/serverfarms \
  --name autoscale-app \
  --min-count 2 \
  --max-count 10 \
  --count 2
No secrets in code. Authenticate to Azure services automatically.
# Enable managed identity
az webapp identity assign \
  --name mywebapp \
  --resource-group rg-prod

# Grant access to Key Vault
az keyvault set-policy \
  --name myvault \
  --object-id <identity-id> \
  --secret-permissions get list

5. Azure Container Instances (ACI)

ACI runs containers without managing VMs or orchestrators.

When to Use ACI

✅ Use ACI For

  • Quick container execution
  • CI/CD build agents
  • Batch jobs
  • Event-driven tasks
  • Dev/test environments

❌ Don't Use ACI For

  • Multi-container orchestration
  • Service discovery
  • Load balancing
  • Health checks → Use AKS instead

Deploy Container

# Deploy single container
az container create \
  --name aci-demo \
  --resource-group rg-demo \
  --image mcr.microsoft.com/azuredocs/aci-helloworld \
  --cpu 1 \
  --memory 1 \
  --ip-address Public \
  --dns-name-label aci-demo-unique \
  --ports 80

# Deploy multi-container group (sidecar pattern)
az container create \
  --resource-group rg-demo \
  --name multi-container \
  --image nginx \
  --cpu 1 \
  --memory 1 \
  --ports 80 \
  --environment-variables LOG_LEVEL=debug

# Get logs
az container logs \
  --name aci-demo \
  --resource-group rg-demo

# Execute command in container
az container exec \
  --name aci-demo \
  --resource-group rg-demo \
  --exec-command "/bin/bash"

6. Interview Questions

Beginner

App Service (PaaS):
  • Less management (Microsoft handles OS, patching)
  • Built-in autoscaling, deployment slots
  • Faster time-to-market
  • Cost-effective for web apps
Virtual Machines (IaaS):
  • Full control over OS and software
  • Custom configurations
  • Legacy applications
  • Specific compliance requirements
Decision: Use App Service unless you need full OS control.
Availability Sets:
  • Protect against hardware failures within a datacenter
  • Fault domains (different racks) + Update domains (staggered updates)
  • SLA: 99.95%
Availability Zones:
  • Protect against entire datacenter failures
  • Physically separate datacenters (separate power, cooling, network)
  • SLA: 99.99%
Best Practice: Use availability zones for production workloads.

Intermediate

Architecture:

Frontend:
- Azure Front Door (global load balancing, WAF)
- App Service (autoscale 2-20 instances)
- Deployment slots (blue-green deployments)

Backend:
- VMSS or AKS (for microservices)
- Autoscaling based on CPU/memory
- Load balancer (internal)

Data:
- Azure SQL (zone-redundant)
- Redis Cache (session management)
- Blob Storage (static assets)

Monitoring:
- Application Insights (APM)
- Log Analytics (centralized logs)
- Autoscale based on custom metrics

CI/CD:
- GitHub Actions or Azure DevOps
- Deploy to staging slot → test → swap

Cost Optimization:
- Use B-series VMs for dev/test
- Reserved Instances for production
- Autoscale to match demand
Strategies:
1. Right-size VMs:
   - Monitor CPU/memory usage
   - Downsize underutilized VMs
   - Use Azure Advisor recommendations

2. Use Reserved Instances:
   - 1-year: 30-50% savings
   - 3-year: 50-70% savings
   - For stable, long-running workloads

3. Spot VMs:
   - Up to 90% discount
   - For fault-tolerant workloads (batch, testing)

4. Stop VMs when not in use:
   - Dev/test: Stop nights and weekends
   - Use Azure Automation for scheduling

5. Use B-series (Burstable):
   - For variable workloads
   - Accumulate credits when idle

6. Azure Hybrid Benefit:
   - Use existing Windows licenses
   - Up to 40% savings

7. Delete unused resources:
   - Unattached disks
   - Old snapshots
   - Orphaned NICs and public IPs

8. Use autoscaling:
   - Scale down during low traffic
   - Scale up during high traffic

Advanced

Blue-Green Deployment with App Service:

1. Setup:
   Production slot (blue): Currently serving traffic
   Staging slot (green): New version

2. Deploy to Green:
   az webapp deployment source config \
     --name mywebapp \
     --slot staging \
     --repo-url https://github.com/user/repo \
     --branch release/v2.0

3. Test Green:
   - Access staging URL: mywebapp-staging.azurewebsites.net
   - Run smoke tests, integration tests
   - Verify database migrations

4. Warm Up Green:
   az webapp deployment slot swap \
     --name mywebapp \
     --slot staging \
     --target-slot production \
     --action preview

   # App Service warms up staging before swap

5. Swap (Zero Downtime):
   az webapp deployment slot swap \
     --name mywebapp \
     --slot staging \
     --target-slot production

   # Traffic instantly switches to green
   # No connection drops

6. Rollback (if needed):
   az webapp deployment slot swap \
     --name mywebapp \
     --slot production \
     --target-slot staging

   # Instant rollback (just swap again)

Benefits:
✅ Zero downtime
✅ Instant rollback
✅ Test in production-like environment
✅ No infrastructure changes
SQL Server on Azure VM Optimization:

1. Choose Right VM Size:
   - Memory-optimized: E-series (8:1 memory:CPU)
   - Example: Standard_E16s_v5 (16 vCPU, 128 GB RAM)

2. Storage Configuration:
   - OS Disk: Premium SSD P30 (ReadWrite cache)
   - Data Files: Premium SSD P40+ (ReadOnly cache)
   - Log Files: Premium SSD P30 (None cache)
   - TempDB: Local NVMe SSD

3. Disk Striping:
   # Windows Storage Spaces (RAID 0)
   - Stripe 4x P30 disks → 20,000 IOPS
   - Better than 1x P80 (same IOPS, more expensive)

4. SQL Server Configuration:
   - Max Server Memory: 80% of VM RAM
   - TempDB on local SSD (D: drive)
   - Multiple data files (8 files for TempDB)
   - Instant File Initialization: Enabled

5. Network Optimization:
   - Enable Accelerated Networking
   - Private Endpoint for Azure SQL connectivity
   - No public IPs

6. Backup Strategy:
   - Azure Backup (application-consistent)
   - Backup to Blob Storage (cool tier)
   - Retention: 7 days (daily), 4 weeks (weekly)

7. Monitoring:
   - Azure Monitor for VMs
   - SQL Insights (database metrics)
   - Alert on CPU > 80%, Memory > 85%

Result:
- 20,000+ IOPS
- &lt;1ms latency (local SSD for TempDB)
- 99.95% availability (availability zones)

Troubleshooting: When Compute Fails

Production environments aren’t perfect. Here is how to debug the two most common compute services.

1. Virtual Machine: “VM Not Responding”

If you can’t SSH/RDP into your VM, follow this triage:
  • Resource Health: Check “Resource Health” in the portal. If it says “Platform Initiated”, Microsoft is currently moving your VM due to hardware failure. Wait 5 minutes.
  • Serial Console: Use the Serial Console tool. This gives you a direct command-line view of the VM’s boot process, even if the network is down.
  • Boot Diagnostics: Check the screenshot in “Boot Diagnostics”. See an “Update” screen or a “Blue Screen of Death” (BSOD)?
  • Redeploy: As a last resort, click Redeploy. This forces the Fabric Controller to move your VM to a completely different physical host.

2. App Service: “503 Service Unavailable”

If your website is down:
  • Diagnose and Solve Problems: Use this built-in tool in the App Service portal. It’s surprisingly good at detecting things like “High Memory Usage” or “IP Restrictions”.
  • Log Stream: Check the Live Log Stream. Are you seeing “Out of Memory” (OOM) errors?
  • Kudu Console: Go to https://<appname>.scm.azurewebsites.net. This is the “Kudu” management site. You can browse files, check processes, and run commands directly on the worker.
  • Restart (Advanced): Don’t just restart the App. Restart the App Service Plan. This recycles all workers and can clear “Zombie Processes” that a simple app restart misses.

7. Key Takeaways

Choose the Right Compute

VMs for control, App Service for simplicity, AKS for microservices, Functions for events.

Use Availability Zones

Deploy across zones for 99.99% SLA. Critical for production.

Autoscaling is Essential

Scale based on demand. Save money during low traffic, handle spikes automatically.

Managed Identities

No secrets in code. Every compute service supports managed identity.

Cost Optimization

Right-size, use reserved instances, stop when not needed, leverage spot VMs.

Deployment Slots

Zero-downtime deployments with instant rollback. Use for all production apps.

Next Steps

Continue to Chapter 5

Master Azure Storage: Blob, Files, Disks, and data management strategies