Theoretical Foundation: Configure Azure Site Recovery for Azure Resources

1. Initial Intuition

In previous topics, you learned to protect data with Azure Backup: creating vaults, policies, and performing backups and restores. Backup solves the problem of data loss (someone deleted a file, a disk corrupted, a database was accidentally altered).

Azure Site Recovery (ASR) solves a different and more serious problem: what if the entire Azure region becomes unavailable? An earthquake, a catastrophic datacenter failure, a prolonged power outage. In this scenario, having backups in the same region doesn't help, as the vault would also be inaccessible.

The analogy: Azure Backup is like a safe inside your office where you keep copies of documents. Azure Site Recovery is like having a complete and operational office in another city, ready to function immediately if your main headquarters is destroyed. It's not just about data; it's about entire running infrastructure.

ASR continuously replicates your VMs to a secondary region. If the primary region fails, you execute a failover and your VMs come up in the secondary region within minutes. When the primary region recovers, you execute a failback to return to the original state.

2. Context

ASR is the BCDR (Business Continuity and Disaster Recovery) component of Azure. While Backup focuses on RPO and data retention, ASR focuses on:

RTO (Recovery Time Objective): how long for infrastructure to be operational after a disaster
RPO (Recovery Point Objective): how much data loss is acceptable (in ASR, RPO for Azure VMs is approximately 60 seconds)

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

ASR exists within the Recovery Services Vault (not in the Backup Vault). This is important: you use the same vault for both Azure Backup and ASR, but they are separate functionalities within it.

ASR for Azure resources (VM to VM, region to region) is different from ASR for on-premises resources. The focus of this topic is exclusively Azure VM to Azure VM (Azure-to-Azure replication), which is the scenario required in AZ-104.

3. Building the Concepts

3.1 Fundamental Terminology

Before proceeding, you need to master the specific ASR terms.

Source Region: where your VMs are running normally. Example: Brazil South.

Target Region: where VMs will be replicated and where failover will occur. Example: East US 2.

Replication: continuous process of copying disk changes from the source VM to the target region. It's incremental and happens in background without impacting the VM.

Cache Storage Account: storage account automatically created in the source region. Disk changes are first sent to this cache before being transferred to the target region. Acts as a buffer to ensure no changes are lost.

Recovery Point: point in time captured during replication. There are two types:

Crash-consistent: captured automatically every 5 minutes. Equivalent to the VM state as if it had been abruptly shut down. Adequate for most workloads.
App-consistent: captured with configurable frequency (default: every 4 hours). Uses VSS to ensure application consistency (databases, services). Safer for transactional workloads.

Failover: process of activating replicated VMs in the target region. Can be:

Test Failover: creates VMs in the target region in an isolated network, without affecting replication or production. For DR testing.
Planned Failover: controlled failover, with no data loss. Used for planned region maintenance.
Unplanned Failover (Failover): triggered when the primary region fails. May have data loss equivalent to RPO.

Failback: process of returning operations to the source region after a failover, when the primary region recovers.

Reprotect: after a failover, VMs are running in the target region. To enable failback, you must "reprotect" the VMs, reversing the replication direction (target becomes temporary source).

3.2 Recovery Plans

A Recovery Plan is an orchestrated sequence of failover for multiple VMs. Instead of executing failover individually on each VM, you create a plan that:

Defines the startup order of VMs (databases before application servers)
Groups VMs that should start simultaneously
Includes manual actions (e.g., notify team) or automated scripts between steps

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

3.3 Resources automatically created in target region

When you enable replication for a VM, ASR automatically creates (or allows you to configure) the following resources in the target region:

Resource	Default Behavior	Configurable
Resource Group	Creates new with "-asr" suffix	Yes
Virtual Network	Creates new mapped from source	Yes (Network Mapping)
Subnet	Replicates subnet structure	Yes
Storage Account (cache)	Creates in source for cache	Partially
Managed Disks	Creates replicated disks in target	Yes (disk type)
VM (replica)	Created only at failover	Yes (size, configurations)
Availability Set / Zones	Configures in target	Yes

3.4 Network Mapping

Network Mapping is the configuration that defines how virtual networks from the source region map to networks in the target region. This ensures that after failover, VMs in the target region connect to the correct networks.

Without Network Mapping, failover VMs are connected to a generic default network. With Network Mapping, you ensure correct connectivity with other resources, VPNs, and ExpressRoutes in the target region.

4. Structural View

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

5. Practical Operation

Complete ASR lifecycle

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Detailed steps to enable replication

1. Prerequisite: create the Recovery Services Vault in the target region

This is a critical and often confused point: the vault for ASR must be in the TARGET region, not the source. The logic is that if the source region fails, the vault in the source would also be unavailable. The vault in the target region remains accessible to orchestrate failover.

# The vault MUST be in the target region
az recovery-services vault create \
  --resource-group rg-asr-eastus2 \
  --name rsv-asr-eastus2 \
  --location eastus2

2. Enable replication for a VM

In the portal:

Access the Recovery Services Vault (in the target region)
Click "Site Recovery" > "Enable replication"
Configure:
- Source: Azure, source region, Resource Group and VM
- Target: region, Resource Group, VNet, subnet, disk type
- Replication settings: replication policy
Confirm and wait for initial synchronization

Initial synchronization can take hours depending on disk size. During this period, status is "Enabling replication" and then "Synchronizing".

3. Verify replication health

After initial synchronization, status changes to Protected. Monitor:

RPO: time since last recovery point. Should be close to 0-60 seconds
Replication health: Critical, Warning, or Healthy
Last recovery point: the most recent available recovery point

Test Failover: how and when to execute

Test Failover creates VMs in the target region in an isolated network specified by you, without interrupting replication and without affecting production. It's the most important DR operation to validate that ASR is configured correctly.

Important Test Failover behaviors:

VMs created in test failover are not the production replica; they are temporary VMs created specifically for testing
Replication continues normally during test failover
You must clean up the test failover after validation, which removes test VMs and frees temporary resources
If you don't clean up, test VMs remain consuming cost

Failover: sequence of events

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Commit is a critical step: after validating that VMs in the target region are working, you confirm the failover with Commit. This ends the possibility of returning to the previous recovery point and prepares the environment for the Reprotect/Failback process.

6. Implementation Methods

6.1 Azure Portal

When to use: initial setup, failover/failback operations in emergency situations where familiarity with the portal is essential.

Enabling replication in the portal:

Access the vault in the target region
In Site Recovery, click "Enable replication"
Fill Source: "Azure", source region, Resource Group, VM
Fill Target: target region, Resource Group, VNet, subnet, storage
Configure Replication Policy
Review and enable

Limitation: not scalable for many VMs; use PowerShell or CLI to enable batch replication.

6.2 Azure PowerShell

When to use: enable replication on multiple VMs, automation, infrastructure pipelines.

# Set vault context (target region)
$vault = Get-AzRecoveryServicesVault `
  -ResourceGroupName "rg-asr-eastus2" `
  -Name "rsv-asr-eastus2"

Set-AzRecoveryServicesAsrVaultContext -Vault $vault

# Get source region fabric (created automatically when vault detects source)
$primaryFabric = Get-AzRecoveryServicesAsrFabric `
  -Name "asr-a2a-default-brazilsouth-container"

# Get source Protection Container
$primaryContainer = Get-AzRecoveryServicesAsrProtectionContainer `
  -Fabric $primaryFabric

# Get replication policy
$replicationPolicy = Get-AzRecoveryServicesAsrPolicy `
  -Name "24-hour-retention-policy"

# Get target container
$recoveryFabric = Get-AzRecoveryServicesAsrFabric `
  -Name "asr-a2a-default-eastus2-container"
$recoveryContainer = Get-AzRecoveryServicesAsrProtectionContainer `
  -Fabric $recoveryFabric

# Associate containers with policy
$containerMapping = Get-AzRecoveryServicesAsrProtectionContainerMapping `
  -ProtectionContainer $primaryContainer `
  -Name "mapping-brazilsouth-to-eastus2"

# Get VM to replicate
$vm = Get-AzVM `
  -ResourceGroupName "rg-app-prod" `
  -Name "vm-producao-01"

# Configure disk details for replication
$diskConfig = New-AzRecoveryServicesAsrAzureToAzureDiskReplicationConfig `
  -ManagedDisk `
  -LogStorageAccountId "/subscriptions/.../storageAccounts/cacheaccount" `
  -DiskId $vm.StorageProfile.OsDisk.ManagedDisk.Id `
  -RecoveryResourceGroupId "/subscriptions/.../resourceGroups/rg-asr-eastus2" `
  -RecoveryReplicaDiskAccountType "Premium_LRS" `
  -RecoveryTargetDiskAccountType "Premium_LRS"

# Enable replication
New-AzRecoveryServicesAsrReplicationProtectedItem `
  -AzureToAzure `
  -AzureVmId $vm.Id `
  -Name "replication-vm-producao-01" `
  -ProtectionContainerMapping $containerMapping `
  -AzureToAzureDiskReplicationConfiguration $diskConfig `
  -RecoveryResourceGroupId "/subscriptions/.../resourceGroups/rg-asr-eastus2" `
  -RecoveryVirtualNetworkId "/subscriptions/.../virtualNetworks/vnet-destino"

6.3 Azure CLI

When to use: automation scripts, simple pipelines, status checks.

# List replicated items
az site-recovery replication-protected-item list \
  --resource-group rg-asr-eastus2 \
  --vault-name rsv-asr-eastus2 \
  --fabric-name asr-a2a-default-brazilsouth-container \
  --protection-container-name asr-a2a-default-brazilsouth-container \
  --output table

# Check replication health of an item
az site-recovery replication-protected-item show \
  --resource-group rg-asr-eastus2 \
  --vault-name rsv-asr-eastus2 \
  --fabric-name asr-a2a-default-brazilsouth-container \
  --protection-container-name asr-a2a-default-brazilsouth-container \
  --replicated-protected-item-name replication-vm-producao-01

ASR CLI is less complete than PowerShell for complex operations like enabling replication with detailed disk configurations. For failover and monitoring operations, CLI is sufficient.

6.4 ARM Template / Terraform

When to use: IaC for complete DR infrastructure, environments where all configuration needs to be versioned.

ASR configuration via ARM/Terraform is complex because it involves multiple interdependent resources (fabrics, containers, mappings, protected items). For AZ-104, knowledge of portal and PowerShell is sufficient. In real production environments, Terraform has providers for ASR via azurerm_site_recovery_replication_policy and azurerm_site_recovery_protected_vm.

7. Control and Security

RBAC for ASR

Role	Capabilities
Site Recovery Contributor	Manage ASR completely, except creating vaults
Site Recovery Operator	Execute failover and failback; cannot modify configurations
Site Recovery Reader	Read-only access to replication status

Network considerations for replication

ASR needs outbound connectivity from the source VM to ASR and Azure Storage endpoints. In environments with restrictive Network Security Groups (NSGs) or User Defined Routes (UDRs) forcing traffic through firewall, you need to:

Allow outbound traffic to Service Tags AzureSiteRecovery and Storage in NSGs
Or configure Private Endpoints for the vault, eliminating public internet traffic
Verify that proxies or firewalls are not blocking necessary endpoints

Replication Policy: security settings

The Replication Policy defines:

Recovery Point Retention: how long recovery points are kept (default: 24 hours, maximum: 15 days)
App-consistent snapshot frequency: frequency of application-consistent snapshots (default: 4 hours)
Multi-VM consistency: allows VMs in a group to be failed over together with the same recovery point (disabled by default; enabling causes slight performance impact)

8. Decision Making

ASR vs Backup: when to use each

Situation	Solution	Reason
Accidentally deleted file	Azure Backup	ASR doesn't protect individual data
Database corrupted by wrong query	Azure Backup	Need to restore to previous point
Entire primary region unavailable	Azure Site Recovery	Backup in same region would also be unavailable
RTO < 30 minutes for critical VM	Azure Site Recovery	Backup has RTO of hours; ASR has RTO of minutes
RPO < 1 hour	Azure Site Recovery	Daily backup has 24h RPO; ASR has ~60s RPO
Planned regional maintenance	Azure Site Recovery	Planned Failover with no data loss
7-year retention compliance	Azure Backup	ASR doesn't maintain history; only current state

Attention: ASR and Backup are not mutually exclusive. For critical workloads, use both: ASR to ensure operational continuity in regional disasters and Backup for protection against data loss or corruption.

Where to create the vault for ASR

Scenario	Vault location	Reason
ASR for Azure VMs (A2A)	DESTINATION region	Vault accessible even if source fails
Azure VM Backup	Same region as VM	Vault close to protected data

9. Best Practices

Vault in destination region, always: the ASR vault must be in the destination region. This is the most important and most misunderstood rule of ASR.

Execute Test Failover regularly: at least once per quarter for each critical item. An untested failover is a failover of unknown reliability. Document the actual RTO measured in tests.

Separate critical VMs in Recovery Plans: don't execute failover manually VM by VM in a crisis situation. Recovery Plans ensure boot order and reduce human errors under pressure.

Configure Network Mapping: without this, failover VMs may not connect to the correct networks, requiring manual reconfiguration during a disaster, increasing RTO.

Monitor RPO continuously: an RPO that gradually grows indicates replication problems. Configure alerts when RPO exceeds 60 minutes, for example.

Consider Multi-VM Consistency only when necessary: enabling Multi-VM Consistency adds replication overhead. Use only for VMs that genuinely need consistency between them (e.g., database cluster).

Properly size the cache storage account: the cache storage account needs sufficient capacity to absorb write spikes from the source VM. Use Standard_LRS as cache account type.

10. Common Errors

Error: creating the vault in the source region Why it happens: the operator creates the vault where their VMs are, by analogy with Azure Backup. How to avoid: memorize the rule: ASR vault in the DESTINATION region. Always.

Error: never executing Test Failover Why it happens: Test Failover seems optional and creates extra work (cleanup after test). How to avoid: include Test Failover in the maintenance calendar. Without testing, you have no guarantee that failover will work when you need it most.

Error: forgetting to Commit after successful failover Why it happens: the operator does failover, validates VMs, and forgets to commit. How to avoid: include Commit as a mandatory step in the DR runbook. Without Commit, the failover isn't finalized and Reprotect can't be started.

Error: trying to use ASR for long-term backup Why it happens: confusing disaster recovery with data backup. How to avoid: remember that ASR only maintains 15 days of recovery points. For long retention, use Azure Backup. Use both together for complete protection.

Error: not configuring Network Mapping Why it happens: it's an additional configuration that seems optional in the enablement flow. How to avoid: configure Network Mapping immediately after creating the vault. Without it, failover may create VMs without adequate network connectivity.

Error: enabling Multi-VM Consistency for all VMs by default Why it happens: the operator thinks more consistency is always better. How to avoid: Multi-VM Consistency should only be enabled for VMs in the same cluster or that share real-time data dependency. For independent VMs, the overhead isn't justified.

11. Operation and Maintenance

Daily monitoring

In the portal, access the vault and go to Site Recovery. Check:

Replicated Items: status of each replicated VM (Healthy, Warning, Critical)
Recovery Plans: DR plan integrity
Jobs: failures in the last 24h
RPO: time since last recovery point for each critical VM

Replication health states

State	Meaning	Required action
Healthy	Replication working; RPO within expected	None
Warning	Minor issue; slightly elevated RPO or isolated event	Investigate, but not critical
Critical	Replication interrupted or very high RPO	Immediate action required
Synchronizing	Initial sync or re-sync in progress	Wait for completion

Important ASR limits for Azure VMs

Limit	Value
Protected VMs per vault	5000
Disks per replicated VM	Up to 100 disks
Maximum replicated disk size	32 TB
Recovery point retention	Up to 15 days
Minimum guaranteed RPO	~60 seconds
Maximum replication throughput per VM	No fixed limit; limited by network and disk

12. Integration and Automation

Integration with Azure Automation for automated DR

The most advanced pattern is to integrate Recovery Plans with Azure Automation Runbooks to create a fully automated failover process:

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Integration with Azure Traffic Manager / Front Door

After a failover, VMs are in the destination region, but DNS and load balancing may still be pointing to the source region. Integrate ASR with:

Azure Traffic Manager: configure with priority or automatic failover to redirect traffic to the destination region after a failover
Azure Front Door: offers automatic global failover based on health probes

Executing failover via PowerShell in Recovery Plan

# Get the Recovery Plan
$rp = Get-AzRecoveryServicesAsrRecoveryPlan `
  -Name "rp-producao-completo" `
  -Vault $vault

# Execute Test Failover
$job = Start-AzRecoveryServicesAsrTestFailoverJob `
  -RecoveryPlan $rp `
  -Direction PrimaryToRecovery `
  -AzureVMNetworkId "/subscriptions/.../virtualNetworks/vnet-teste-isolado"

# Wait for completion
Get-AzRecoveryServicesAsrJob -Job $job | Wait-AzRecoveryServicesAsrJob

# Clean up Test Failover
Start-AzRecoveryServicesAsrTestFailoverCleanupJob `
  -RecoveryPlan $rp `
  -Comment "Quarterly DR test completed successfully"

13. Final Summary

What it is: Azure Site Recovery is Azure's disaster recovery service that continuously replicates VMs from a source region to a destination region, enabling failover in minutes with RPO of approximately 60 seconds.

Essential points:

The ASR vault must be in the destination region, not the source
The RPO of ASR for Azure VMs is approximately 60 seconds (crash-consistent every 5 min, app-consistent configurable)
There are three types of failover: Test (non-destructive, isolated network), Planned (no data loss) and Unplanned (emergency)
After failover, it's mandatory to Commit before starting Reprotect for failback
Recovery Plans orchestrate multi-VM failover with boot order and automated actions
ASR and Azure Backup are complementary, not alternatives; use both for critical workloads

Critical differences:

Aspect	Azure Backup	Azure Site Recovery
Objective	Data protection	Operational continuity
RPO	Hours to days	~60 seconds
RTO	Hours	Minutes
Retention	Up to 99 years	Up to 15 days
Vault location	Same region as VM	DESTINATION region
Test capability	Restore in sandbox	Test Failover (no impact)
Protection scope	Files, VMs, databases	Entire VMs

What needs to be remembered for AZ-104:

ASR vault always in the destination region
Test Failover doesn't affect production and doesn't interrupt replication
Commit is mandatory after failover to enable Reprotect
Recovery Plans define boot order and group VMs
Multi-VM Consistency adds overhead; use only when necessary
Network Mapping ensures correct VM connectivity after failover
ASR doesn't replace Backup: use both together for complete protection

1. Initial Intuition​

2. Context​

3. Building the Concepts​

3.1 Fundamental Terminology​

3.2 Recovery Plans​

3.3 Resources automatically created in target region​

3.4 Network Mapping​

4. Structural View​

5. Practical Operation​

Complete ASR lifecycle​

Detailed steps to enable replication​

Test Failover: how and when to execute​

Failover: sequence of events​

6. Implementation Methods​

6.1 Azure Portal​

6.2 Azure PowerShell​

6.3 Azure CLI​

6.4 ARM Template / Terraform​

7. Control and Security​

RBAC for ASR​

Network considerations for replication​

Replication Policy: security settings​

8. Decision Making​

ASR vs Backup: when to use each​

Where to create the vault for ASR​

9. Best Practices​

10. Common Errors​

11. Operation and Maintenance​

Daily monitoring​

Replication health states​

Important ASR limits for Azure VMs​

12. Integration and Automation​

Integration with Azure Automation for automated DR​

Integration with Azure Traffic Manager / Front Door​

Executing failover via PowerShell in Recovery Plan​

13. Final Summary​

1. Initial Intuition

2. Context

3. Building the Concepts

3.1 Fundamental Terminology

3.2 Recovery Plans

3.3 Resources automatically created in target region

3.4 Network Mapping

4. Structural View

5. Practical Operation

Complete ASR lifecycle

Detailed steps to enable replication

Test Failover: how and when to execute

Failover: sequence of events

6. Implementation Methods

6.1 Azure Portal

6.2 Azure PowerShell

6.3 Azure CLI

6.4 ARM Template / Terraform

7. Control and Security

RBAC for ASR

Network considerations for replication

Replication Policy: security settings

8. Decision Making

ASR vs Backup: when to use each

Where to create the vault for ASR

9. Best Practices

10. Common Errors

11. Operation and Maintenance

Daily monitoring

Replication health states

Important ASR limits for Azure VMs

12. Integration and Automation

Integration with Azure Automation for automated DR

Integration with Azure Traffic Manager / Front Door

Executing failover via PowerShell in Recovery Plan

13. Final Summary