Theoretical Foundation: Deploy Virtual Machines to Availability Zones and Availability Sets
1. Initial Intuitionβ
Imagine you're responsible for maintaining three ATMs in a city. If you place all three in the same bank branch, any problem at that branch (power outage, flooding, maintenance) makes all three unavailable. The obvious solution is to distribute the ATMs across different locations: if one location fails, the other two continue working.
In Azure, the same principle applies. When you have multiple VMs serving the same application, you need to ensure they don't all end up on the same physical hardware or in the same datacenter. If they do, a single failure can bring everything down simultaneously.
Availability Sets and Availability Zones are Azure's two mechanisms to intelligently distribute VMs, ensuring that physical failures or planned maintenance don't bring down your entire application at once.
The essential difference between the two lies in the scale of failure they protect against:
- Availability Sets protect against hardware failures within a single datacenter
- Availability Zones protect against failures of entire datacenters within a region
2. Contextβ
The problem these mechanisms solveβ
Important: no mechanism protects against application-level failures or operating system errors. These mechanisms specifically protect against physical infrastructure failures.
SLA (Service Level Agreement) and its dependencyβ
The SLA of an isolated VM in Azure is 99.9% (without zone or availability set). With the correct mechanisms:
| Configuration | SLA |
|---|---|
| Single VM without AZ or AS | 99.9% |
| 2+ VMs in Availability Set | 99.95% |
| 2+ VMs in different Availability Zones | 99.99% |
The difference from 99.95% to 99.99% may seem small, but it represents the difference between up to 4 hours of downtime per year and only 52 minutes per year.
3. Building the Conceptsβ
3.1 Availability Sets: protection within the datacenterβ
An Availability Set is a logical grouping of VMs that instructs Azure to distribute these VMs across separate physical hardware within a single datacenter, using two concepts: Fault Domains and Update Domains.
Fault Domains (FDs)β
A Fault Domain is a group of hardware that shares a common power source and network switch. It's essentially a physical rack.
If Rack A (FD0) loses power, only VM1 and VM4 are affected. VM2 and VM3 continue operating. The maximum number of Fault Domains is 3 in Azure.
Update Domains (UDs)β
An Update Domain represents a group of VMs that can be restarted simultaneously during a planned host update (hypervisor maintenance, platform updates).
Azure updates one Update Domain at a time, waiting 30 minutes between each one. With 5 VMs distributed across 5 Update Domains, never more than 1 VM is restarted simultaneously during maintenance. The maximum number of Update Domains is 20.
How FDs and UDs interactβ
Azure automatically distributes VMs across FDs and UDs when adding VMs to the Availability Set. You don't choose which FD or UD each VM goes to.
3.2 Availability Zones: protection between datacentersβ
An Availability Zone is a physically separate datacenter within an Azure region. Each zone has its own power, cooling, and network, with low-latency fiber optic connection between them.
If the Zone 1 datacenter has a total failure (building power outage, for example), VM-Web-2 and VM-Web-3 continue operating in Zones 2 and 3.
Regions with Availability Zones support: Not all Azure regions have Availability Zones. Larger regions like East US, West Europe, Southeast Asia, and brazilsouth have support. Check before architecting.
3.3 Flexible Orchestration and Virtual Machine Scale Setsβ
In addition to traditional Availability Sets, Azure offers Virtual Machine Scale Sets (VMSS) with two orchestration modes:
Uniform Orchestration: all VMs are identical, created from the same template. Focus on automatic scaling of identical instances.
Flexible Orchestration: allows heterogeneous VMs, functional equivalent of a modern Availability Set with Zones support.
For AZ-104, the focus is on Availability Sets and Availability Zones with individual VMs.
4. Structural Viewβ
Structural comparison: Availability Set vs. Availability Zoneβ
Reference architecture: 3-tier application with high availabilityβ
5. Practical Operationβ
Availability Set lifecycleβ
Non-obvious behaviorsβ
A VM cannot be added to an Availability Set after creation. The Availability Set is defined when creating the VM. If you want to put an existing VM in an Availability Set, you need to delete and recreate the VM. Disks can be preserved.
Availability Set and Availability Zone are mutually exclusive. It's not possible to place a VM simultaneously in an Availability Set and in a specific Availability Zone. They are alternative mechanisms. To use Availability Zones, you specify the zone when creating the VM, without an Availability Set.
VMs in different Availability Zones have 1-2ms network latency between them. Zones are different datacenters connected by fiber optic. The latency between zones in the same region is low enough for most applications, but extremely latency-sensitive applications (high-frequency trading) should consider this.
Managed disks in Availability Zones are zone-scoped. A VM in Zone 1 can only use managed disks that are also in Zone 1. When creating a VM in a specific zone, its disks are automatically created in the same zone.
Availability Set does not guarantee distribution across zones. An Availability Set protects against hardware failures within a datacenter. If the entire datacenter fails, all VMs in the Availability Set are affected. For datacenter-level protection, use Availability Zones.
Planned maintenance notifies in advance. Azure sends notifications before planned maintenance that will affect VMs. With Update Domains, maintenance is staggered, but each UD will go through maintenance in sequence.
6. Implementation Methodsβ
Azure Portalβ
To create Availability Set:
- Portal > Availability sets > + Create
- Define name, region, subscription, RG
- Configure: Fault domains (1-3) and Update domains (1-20)
- Select: Use managed disks (Yes - Aligned recommended)
- Create
To create VM in Availability Set:
- Portal > Virtual machines > + Create
- Availability tab > Availability set
- Select existing Availability Set
- Complete creation normally
To create VM in Availability Zone:
- Portal > Virtual machines > + Create
- Availability tab > Availability zone
- Select Zone 1, Zone 2 or Zone 3
- Complete creation normally
Azure CLIβ
# Create Availability Set with default configuration
az vm availability-set create \
--resource-group "rg-producao" \
--name "as-web-tier" \
--location "brazilsouth" \
--platform-fault-domain-count 3 \
--platform-update-domain-count 5
# View Availability Set details
az vm availability-set show \
--resource-group "rg-producao" \
--name "as-web-tier" \
--output json
# Create VM within an Availability Set
az vm create \
--resource-group "rg-producao" \
--name "vm-web-01" \
--availability-set "as-web-tier" \
--image "Ubuntu2204" \
--size "Standard_D2s_v5" \
--admin-username "azureadmin" \
--ssh-key-values ~/.ssh/id_rsa.pub
# Create second VM in the same Availability Set
az vm create \
--resource-group "rg-producao" \
--name "vm-web-02" \
--availability-set "as-web-tier" \
--image "Ubuntu2204" \
--size "Standard_D2s_v5" \
--admin-username "azureadmin" \
--ssh-key-values ~/.ssh/id_rsa.pub
# Create third VM in the same Availability Set
az vm create \
--resource-group "rg-producao" \
--name "vm-web-03" \
--availability-set "as-web-tier" \
--image "Ubuntu2204" \
--size "Standard_D2s_v5" \
--admin-username "azureadmin" \
--ssh-key-values ~/.ssh/id_rsa.pub
# Check which FDs and UDs the VMs were distributed to
az vm show \
--resource-group "rg-producao" \
--name "vm-web-01" \
--query "{Name: name, FD: platformFaultDomain, UD: platformUpdateDomain}" \
--output json
# Create VM in specific Availability Zone
az vm create \
--resource-group "rg-producao" \
--name "vm-web-z1" \
--zone 1 \
--image "Ubuntu2204" \
--size "Standard_D2s_v5" \
--admin-username "azureadmin" \
--ssh-key-values ~/.ssh/id_rsa.pub \
--location "brazilsouth"
# Create VM in Zone 2
az vm create \
--resource-group "rg-producao" \
--name "vm-web-z2" \
--zone 2 \
--image "Ubuntu2204" \
--size "Standard_D2s_v5" \
--admin-username "azureadmin" \
--ssh-key-values ~/.ssh/id_rsa.pub \
--location "brazilsouth"
# Create VM in Zone 3
az vm create \
--resource-group "rg-producao" \
--name "vm-web-z3" \
--zone 3 \
--image "Ubuntu2204" \
--size "Standard_D2s_v5" \
--admin-username "azureadmin" \
--ssh-key-values ~/.ssh/id_rsa.pub \
--location "brazilsouth"
# List VMs with their zones
az vm list \
--resource-group "rg-producao" \
--query "[].{Name: name, Zone: zones[0], Size: hardwareProfile.vmSize}" \
--output table
# Check which regions support Availability Zones
az account list-locations \
--query "[?metadata.supportsAvailabilityZones=='true'].name" \
--output table
# Check regions that support zones (including metadata)
az provider show \
--namespace Microsoft.Compute \
--query "resourceTypes[?resourceType=='virtualMachines'].zoneMappings[].location" \
--output table
Azure PowerShellβ
# Create Availability Set
New-AzAvailabilitySet `
-ResourceGroupName "rg-producao" `
-Name "as-web-tier" `
-Location "brazilsouth" `
-PlatformFaultDomainCount 3 `
-PlatformUpdateDomainCount 5 `
-Sku "Aligned" # Aligned = Managed Disks
# Create VM in Availability Set
$avSet = Get-AzAvailabilitySet -ResourceGroupName "rg-producao" -Name "as-web-tier"
$cred = Get-Credential -Message "Admin credentials"
$vmConfig = New-AzVMConfig -VMName "vm-web-01" -VMSize "Standard_D2s_v5" `
-AvailabilitySetId $avSet.Id
$vmConfig = Set-AzVMOperatingSystem -VM $vmConfig -Linux -ComputerName "vm-web-01" -Credential $cred
$vmConfig = Set-AzVMSourceImage -VM $vmConfig -PublisherName "Canonical" -Offer "UbuntuServer" -Skus "20.04-LTS" -Version "latest"
$vmConfig = Add-AzVMNetworkInterface -VM $vmConfig -Id $nicId
New-AzVM -ResourceGroupName "rg-producao" -Location "brazilsouth" -VM $vmConfig
# Create VM in Availability Zone
$vmConfig = New-AzVMConfig -VMName "vm-web-z1" -VMSize "Standard_D2s_v5" -Zone "1"
# ... complete configuration
New-AzVM -ResourceGroupName "rg-producao" -Location "brazilsouth" -VM $vmConfig -Zone "1"
# View VM distribution in FDs and UDs
Get-AzVM -ResourceGroupName "rg-producao" |
Select-Object Name, `
@{N="Zone"; E={$_.Zones[0]}}, `
@{N="FD"; E={$_.PlatformFaultDomain}}, `
@{N="UD"; E={$_.PlatformUpdateDomain}} |
Format-Table
Bicepβ
// Availability Set
resource availabilitySet 'Microsoft.Compute/availabilitySets@2023-03-01' = {
name: 'as-web-tier'
location: 'brazilsouth'
sku: {
name: 'Aligned' // Aligned = Managed Disks
}
properties: {
platformFaultDomainCount: 3
platformUpdateDomainCount: 5
}
}
// VM in Availability Set
resource vmInAvSet 'Microsoft.Compute/virtualMachines@2023-03-01' = {
name: 'vm-web-01'
location: 'brazilsouth'
properties: {
availabilitySet: {
id: availabilitySet.id
}
hardwareProfile: {
vmSize: 'Standard_D2s_v5'
}
// ... rest of properties
}
}
// VM in Availability Zone
resource vmInZone 'Microsoft.Compute/virtualMachines@2023-03-01' = {
name: 'vm-web-z1'
location: 'brazilsouth'
zones: ['1'] // Zone 1
properties: {
hardwareProfile: {
vmSize: 'Standard_D2s_v5'
}
// ... rest of properties
}
}
// Disk in Availability Zone (must be same zone as VM)
resource diskInZone 'Microsoft.Compute/disks@2022-07-02' = {
name: 'vm-web-z1-datadisk'
location: 'brazilsouth'
zones: ['1'] // Same zone as VM
sku: {
name: 'Premium_LRS'
}
properties: {
creationData: {
createOption: 'Empty'
}
diskSizeGB: 128
}
}
7. Control and Securityβ
Azure Policy to enforce high availabilityβ
# Audit VMs without Availability Zone or Availability Set
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| where isnull(zones) and isnull(properties.availabilitySet)
| project name, resourceGroup, location, subscriptionId
| order by location"
# Policy that audits VMs without HA mechanism
# Create custom policy that checks if VM has zone or AS
az policy definition create \
--name "audit-vm-ha-config" \
--display-name "VMs must have Availability Zone or Availability Set" \
--rules '{
"if": {
"allOf": [
{"field": "type", "equals": "Microsoft.Compute/virtualMachines"},
{"field": "zones", "exists": "false"},
{"field": "properties.availabilitySet.id", "exists": "false"}
]
},
"then": {"effect": "Audit"}
}' \
--mode "All"
Network considerations for high availabilityβ
Load Balancers and Application Gateways also need to be configured for high availability:
- Standard Load Balancer is compatible with Availability Zones and can be configured as zone-redundant
- Basic Load Balancer does not support Availability Zones (use Standard)
- For VMs in Availability Zones, the Load Balancer must be zone-redundant or span multiple zones
8. Decision Makingβ
Availability Set vs. Availability Zoneβ
| Situation | Choice | Reason |
|---|---|---|
| Region without AZ support (e.g., some smaller regions) | Availability Set | Only available option |
| Maximum SLA of 99.99% required | Availability Zone | Only way to achieve this SLA |
| Critical application in region with AZ support | Availability Zone | Protection against complete datacenter failure |
| Cost is constraint and region has AZ support | Availability Zone | Same cost as single VM, better SLA than AS |
| Already have VMs in AS and moving is costly | Keep AS | Migration to AZ requires recreating VMs |
| Database with Always On Availability Group | VMs in different zones | Replicas in separate zones for DR |
| Stateless application with many instances | Availability Zone | Distributes VMs across 3 separate datacenters |
Number of instances and FDs/UDsβ
| Number of VMs | Recommended FDs | Recommended UDs | Reason |
|---|---|---|---|
| 2 VMs | 2 | 2 | Ensures the 2 are in different FDs |
| 3 VMs | 3 | 3 | One per FD, one per UD |
| 5 VMs | 3 | 5 | Optimal distribution for staged maintenance |
| 10 VMs | 3 | 10 | 3 FDs maximum, 10 UDs for granular scaling |
| 20+ VMs | 3 | 20 | Azure maximums |
9. Best Practicesβ
Prefer Availability Zones over Availability Sets for new architectures. Availability Zones offer superior protection (datacenter level vs. rack level) at the same cost. Availability Sets exist primarily for compatibility with regions without Zone support and for legacy workloads.
Distribute VMs from each tier across all available zones. For an application with 3 web server instances, place one in each zone (Z1, Z2, Z3). Don't place 2 in Z1 and 1 in Z2, because if Z1 fails you lose 2/3 of capacity.
Use Standard Load Balancer, not Basic, for architectures with zones. The Standard LB can be configured as zone-redundant, ensuring load balancing continues even if a zone fails. The Basic LB doesn't support Availability Zones.
Configure Azure Monitor health alerts for VMs in HA. Even with multiple instances, monitor the health of each one. A failed instance can go unnoticed if others are absorbing the load, creating an "available but vulnerable" scenario.
Availability Sets should use Sku: Aligned for Managed Disks.
The Aligned SKU ensures that a VM's managed disks are in the same Fault Domain as the VM. With the Classic SKU (legacy), disks and VMs can be in different FDs, compromising failure protection.
Plan maintenance based on Update Domains. In scheduled maintenance windows, know how many UDs you have and that Azure sequences maintenance with 30 minutes between each UD. An AS with 5 UDs and 5 VMs will take ~2 hours to complete maintenance of all UDs.
10. Common Errorsβ
| Error | Why it happens | How to avoid |
|---|---|---|
| VM added to AS after creation | Not knowing AS is immutable post-creation | Plan and create with AS from the start |
| VM in AS and Zone at the same time | Mutually exclusive | Choose one or the other at creation |
| All 3 VMs in the same FD due to lack of AS | VMs created individually without AS or Zone | Always use AS or Zone in production |
| AS with Classic SKU and Managed Disks | Incompatible configuration | Use Aligned SKU for AS with Managed Disks |
| Using Basic LB with VMs in Availability Zones | Basic LB doesn't support AZ | Always use Standard LB |
| One zone with 2 VMs and another with 1 | Unbalanced distribution | Distribute equally: 1 per zone |
| Not checking AZ support in region before architecting | Assuming all regions have AZ | Check support before defining architecture |
| Disk in Zone 1, VM in Zone 2 | Attempted cross-zone usage | Disks must be in the same zone as the VM |
The most critical errorβ
Not using any HA mechanism on production VMs, relying only on the 99.9% SLA of a single VM. Mathematically, this means up to 8.7 hours of downtime per year. For an application with 5 VMs without any distribution mechanism, if they're all in the same rack and the rack fails, the application goes completely down. The cost of adding Availability Zones is zero: VMs in Availability Zones cost the same as VMs without zones.
11. Operation and Maintenanceβ
Check VM distribution in FDs and UDsβ
# View distribution of all VMs in an Availability Set
az vm list \
--resource-group "rg-producao" \
--query "[?availabilitySet != null].{
Nome: name,
FD: platformFaultDomain,
UD: platformUpdateDomain,
AS: availabilitySet.id
}" \
--output table
# Via Resource Graph: all VMs with zone and AS
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| project
name,
resourceGroup,
location,
zone=tostring(zones[0]),
availabilitySet=tostring(properties.availabilitySet.id),
faultDomain=tostring(properties.platformFaultDomain),
updateDomain=tostring(properties.platformUpdateDomain)
| order by location, zone"
# Count VMs per zone in a region (to check distribution)
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| where location == 'brazilsouth'
| summarize count() by zone=tostring(zones[0])
| order by zone"
Monitor instance healthβ
# View status of each VM in an Availability Set
for vm in vm-web-01 vm-web-02 vm-web-03; do
echo "=== $vm ==="
az vm get-instance-view \
--resource-group "rg-producao" \
--name "$vm" \
--query "instanceView.statuses[].{Code: code, DisplayStatus: displayStatus}" \
--output table
done
# Configure alert for when a VM becomes unavailable
az monitor activity-log alert create \
--name "alerta-vm-indisponivel" \
--resource-group "rg-monitoramento" \
--condition \
category=ResourceHealth \
--scope "/subscriptions/<sub-id>/resourceGroups/rg-producao"
Important limitsβ
| Resource | Limit |
|---|---|
| Fault Domains per Availability Set | 3 (maximum) |
| Update Domains per Availability Set | 20 (maximum) |
| VMs per Availability Set | No defined limit (practical: hundreds) |
| Availability Zones per region | 3 (standard, some regions have more) |
| VMs per Availability Zone | Limited by vCPU quota |
12. Integration and Automationβ
Deploy VMs distributed across zones via Terraformβ
variable "vm_zones" {
default = ["1", "2", "3"]
}
resource "azurerm_linux_virtual_machine" "web" {
count = 3
name = "vm-web-z${var.vm_zones[count.index]}"
resource_group_name = azurerm_resource_group.prod.name
location = azurerm_resource_group.prod.location
size = "Standard_D2s_v5"
zone = var.vm_zones[count.index] # Z1, Z2, Z3
# ... rest of configurations
tags = {
Zone = "zone-${var.vm_zones[count.index]}"
}
}
# Zone-redundant Load Balancer
resource "azurerm_lb" "main" {
name = "lb-web"
resource_group_name = azurerm_resource_group.prod.name
location = azurerm_resource_group.prod.location
sku = "Standard" # Standard = AZ support
frontend_ip_configuration {
name = "frontend"
public_ip_address_id = azurerm_public_ip.lb.id
zones = ["1", "2", "3"] # Zone-redundant
}
}
Azure Policy to ensure zone distributionβ
# Custom Policy: Production VMs must be in Availability Zones
az policy definition create \
--name "require-vm-availability-zone" \
--display-name "VMs de producao devem usar Availability Zones" \
--rules '{
"if": {
"allOf": [
{"field": "type", "equals": "Microsoft.Compute/virtualMachines"},
{"field": "tags.Environment", "equals": "Production"},
{"field": "zones", "exists": "false"}
]
},
"then": {"effect": "Deny"}
}' \
--mode "All"
# Assign policy to production RG
az policy assignment create \
--name "enforce-az-producao" \
--policy "require-vm-availability-zone" \
--scope "/subscriptions/<sub-id>/resourceGroups/rg-producao"
13. Final Summaryβ
Essential points:
- Availability Set distributes VMs between Fault Domains (physical racks) and Update Domains (maintenance groups) within a single datacenter. Protects against hardware failures and planned maintenance.
- Availability Zone distributes VMs between physically separate datacenters within a region. Protects against complete datacenter failures.
- Maximum of 3 Fault Domains and 20 Update Domains per Availability Set
- Azure automatically distributes VMs between FDs and UDs when adding to AS; you don't control which FD each VM goes to
- A VM cannot be added to an AS after its creation; the AS is immutable post-deploy
- AS and Availability Zone are mutually exclusive: choose one or the other at VM creation
- Managed disks in AZ must be in the same zone as the VM
Critical differences:
- Fault Domain vs. Update Domain: FD protects against physical failures (rack/power); UD controls planned maintenance (sequencing of reboots)
- Availability Set vs. Availability Zone: AS protects within datacenter (rack); AZ protects between datacenters (building)
- SLA: Single VM = 99.9%; 2+ VMs in AS = 99.95%; 2+ VMs in AZ = 99.99%
- SKU Aligned vs. Classic for AS: Aligned is necessary for Managed Disks; Classic is legacy
What needs to be remembered for AZ-104:
- To create VM in AS:
--availability-set <name>in CLI - To create VM in Zone:
--zone <1|2|3>in CLI - Default number of FDs in portal: 2 (configurable up to 3)
- Default number of UDs in portal: 5 (configurable up to 20)
- Basic Load Balancer doesn't support Availability Zones; use Standard LB
- Availability Zones aren't available in all regions; check before architecting
- VMs in AS receive 99.95% SLA; VMs in different AZ receive 99.99%
- The combined SLA formula for N independent VMs:
1 - (1 - individual_SLA)^N