Skip to main content

Theoretical Foundation: Configure Azure Site Recovery for Azure Resources


1. Initial Intuition​

In previous topics, you learned to protect data with Azure Backup: creating vaults, policies, and performing backups and restores. Backup solves the problem of data loss (someone deleted a file, a disk corrupted, a database was accidentally altered).

Azure Site Recovery (ASR) solves a different and more serious problem: what if the entire Azure region becomes unavailable? An earthquake, a catastrophic datacenter failure, a prolonged power outage. In this scenario, having backups in the same region doesn't help, as the vault would also be inaccessible.

The analogy: Azure Backup is like a safe inside your office where you keep copies of documents. Azure Site Recovery is like having a complete and operational office in another city, ready to function immediately if your main headquarters is destroyed. It's not just about data; it's about entire running infrastructure.

ASR continuously replicates your VMs to a secondary region. If the primary region fails, you execute a failover and your VMs come up in the secondary region within minutes. When the primary region recovers, you execute a failback to return to the original state.


2. Context​

ASR is the BCDR (Business Continuity and Disaster Recovery) component of Azure. While Backup focuses on RPO and data retention, ASR focuses on:

  • RTO (Recovery Time Objective): how long for infrastructure to be operational after a disaster
  • RPO (Recovery Point Objective): how much data loss is acceptable (in ASR, RPO for Azure VMs is approximately 60 seconds)
100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

ASR exists within the Recovery Services Vault (not in the Backup Vault). This is important: you use the same vault for both Azure Backup and ASR, but they are separate functionalities within it.

ASR for Azure resources (VM to VM, region to region) is different from ASR for on-premises resources. The focus of this topic is exclusively Azure VM to Azure VM (Azure-to-Azure replication), which is the scenario required in AZ-104.


3. Building the Concepts​

3.1 Fundamental Terminology​

Before proceeding, you need to master the specific ASR terms.

Source Region: where your VMs are running normally. Example: Brazil South.

Target Region: where VMs will be replicated and where failover will occur. Example: East US 2.

Replication: continuous process of copying disk changes from the source VM to the target region. It's incremental and happens in background without impacting the VM.

Cache Storage Account: storage account automatically created in the source region. Disk changes are first sent to this cache before being transferred to the target region. Acts as a buffer to ensure no changes are lost.

Recovery Point: point in time captured during replication. There are two types:

  • Crash-consistent: captured automatically every 5 minutes. Equivalent to the VM state as if it had been abruptly shut down. Adequate for most workloads.
  • App-consistent: captured with configurable frequency (default: every 4 hours). Uses VSS to ensure application consistency (databases, services). Safer for transactional workloads.

Failover: process of activating replicated VMs in the target region. Can be:

  • Test Failover: creates VMs in the target region in an isolated network, without affecting replication or production. For DR testing.
  • Planned Failover: controlled failover, with no data loss. Used for planned region maintenance.
  • Unplanned Failover (Failover): triggered when the primary region fails. May have data loss equivalent to RPO.

Failback: process of returning operations to the source region after a failover, when the primary region recovers.

Reprotect: after a failover, VMs are running in the target region. To enable failback, you must "reprotect" the VMs, reversing the replication direction (target becomes temporary source).


3.2 Recovery Plans​

A Recovery Plan is an orchestrated sequence of failover for multiple VMs. Instead of executing failover individually on each VM, you create a plan that:

  • Defines the startup order of VMs (databases before application servers)
  • Groups VMs that should start simultaneously
  • Includes manual actions (e.g., notify team) or automated scripts between steps
100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

3.3 Resources automatically created in target region​

When you enable replication for a VM, ASR automatically creates (or allows you to configure) the following resources in the target region:

ResourceDefault BehaviorConfigurable
Resource GroupCreates new with "-asr" suffixYes
Virtual NetworkCreates new mapped from sourceYes (Network Mapping)
SubnetReplicates subnet structureYes
Storage Account (cache)Creates in source for cachePartially
Managed DisksCreates replicated disks in targetYes (disk type)
VM (replica)Created only at failoverYes (size, configurations)
Availability Set / ZonesConfigures in targetYes

3.4 Network Mapping​

Network Mapping is the configuration that defines how virtual networks from the source region map to networks in the target region. This ensures that after failover, VMs in the target region connect to the correct networks.

Without Network Mapping, failover VMs are connected to a generic default network. With Network Mapping, you ensure correct connectivity with other resources, VPNs, and ExpressRoutes in the target region.


4. Structural View​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

5. Practical Operation​

Complete ASR lifecycle​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Detailed steps to enable replication​

1. Prerequisite: create the Recovery Services Vault in the target region

This is a critical and often confused point: the vault for ASR must be in the TARGET region, not the source. The logic is that if the source region fails, the vault in the source would also be unavailable. The vault in the target region remains accessible to orchestrate failover.

# The vault MUST be in the target region
az recovery-services vault create \
--resource-group rg-asr-eastus2 \
--name rsv-asr-eastus2 \
--location eastus2

2. Enable replication for a VM

In the portal:

  1. Access the Recovery Services Vault (in the target region)
  2. Click "Site Recovery" > "Enable replication"
  3. Configure:
    • Source: Azure, source region, Resource Group and VM
    • Target: region, Resource Group, VNet, subnet, disk type
    • Replication settings: replication policy
  4. Confirm and wait for initial synchronization

Initial synchronization can take hours depending on disk size. During this period, status is "Enabling replication" and then "Synchronizing".

3. Verify replication health

After initial synchronization, status changes to Protected. Monitor:

  • RPO: time since last recovery point. Should be close to 0-60 seconds
  • Replication health: Critical, Warning, or Healthy
  • Last recovery point: the most recent available recovery point

Test Failover: how and when to execute​

Test Failover creates VMs in the target region in an isolated network specified by you, without interrupting replication and without affecting production. It's the most important DR operation to validate that ASR is configured correctly.

Important Test Failover behaviors:

  • VMs created in test failover are not the production replica; they are temporary VMs created specifically for testing
  • Replication continues normally during test failover
  • You must clean up the test failover after validation, which removes test VMs and frees temporary resources
  • If you don't clean up, test VMs remain consuming cost

Failover: sequence of events​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Commit is a critical step: after validating that VMs in the target region are working, you confirm the failover with Commit. This ends the possibility of returning to the previous recovery point and prepares the environment for the Reprotect/Failback process.


6. Implementation Methods​

6.1 Azure Portal​

When to use: initial setup, failover/failback operations in emergency situations where familiarity with the portal is essential.

Enabling replication in the portal:

  1. Access the vault in the target region
  2. In Site Recovery, click "Enable replication"
  3. Fill Source: "Azure", source region, Resource Group, VM
  4. Fill Target: target region, Resource Group, VNet, subnet, storage
  5. Configure Replication Policy
  6. Review and enable

Limitation: not scalable for many VMs; use PowerShell or CLI to enable batch replication.


6.2 Azure PowerShell​

When to use: enable replication on multiple VMs, automation, infrastructure pipelines.

# Set vault context (target region)
$vault = Get-AzRecoveryServicesVault `
-ResourceGroupName "rg-asr-eastus2" `
-Name "rsv-asr-eastus2"

Set-AzRecoveryServicesAsrVaultContext -Vault $vault

# Get source region fabric (created automatically when vault detects source)
$primaryFabric = Get-AzRecoveryServicesAsrFabric `
-Name "asr-a2a-default-brazilsouth-container"

# Get source Protection Container
$primaryContainer = Get-AzRecoveryServicesAsrProtectionContainer `
-Fabric $primaryFabric

# Get replication policy
$replicationPolicy = Get-AzRecoveryServicesAsrPolicy `
-Name "24-hour-retention-policy"

# Get target container
$recoveryFabric = Get-AzRecoveryServicesAsrFabric `
-Name "asr-a2a-default-eastus2-container"
$recoveryContainer = Get-AzRecoveryServicesAsrProtectionContainer `
-Fabric $recoveryFabric

# Associate containers with policy
$containerMapping = Get-AzRecoveryServicesAsrProtectionContainerMapping `
-ProtectionContainer $primaryContainer `
-Name "mapping-brazilsouth-to-eastus2"

# Get VM to replicate
$vm = Get-AzVM `
-ResourceGroupName "rg-app-prod" `
-Name "vm-producao-01"

# Configure disk details for replication
$diskConfig = New-AzRecoveryServicesAsrAzureToAzureDiskReplicationConfig `
-ManagedDisk `
-LogStorageAccountId "/subscriptions/.../storageAccounts/cacheaccount" `
-DiskId $vm.StorageProfile.OsDisk.ManagedDisk.Id `
-RecoveryResourceGroupId "/subscriptions/.../resourceGroups/rg-asr-eastus2" `
-RecoveryReplicaDiskAccountType "Premium_LRS" `
-RecoveryTargetDiskAccountType "Premium_LRS"

# Enable replication
New-AzRecoveryServicesAsrReplicationProtectedItem `
-AzureToAzure `
-AzureVmId $vm.Id `
-Name "replication-vm-producao-01" `
-ProtectionContainerMapping $containerMapping `
-AzureToAzureDiskReplicationConfiguration $diskConfig `
-RecoveryResourceGroupId "/subscriptions/.../resourceGroups/rg-asr-eastus2" `
-RecoveryVirtualNetworkId "/subscriptions/.../virtualNetworks/vnet-destino"

6.3 Azure CLI​

When to use: automation scripts, simple pipelines, status checks.

# List replicated items
az site-recovery replication-protected-item list \
--resource-group rg-asr-eastus2 \
--vault-name rsv-asr-eastus2 \
--fabric-name asr-a2a-default-brazilsouth-container \
--protection-container-name asr-a2a-default-brazilsouth-container \
--output table

# Check replication health of an item
az site-recovery replication-protected-item show \
--resource-group rg-asr-eastus2 \
--vault-name rsv-asr-eastus2 \
--fabric-name asr-a2a-default-brazilsouth-container \
--protection-container-name asr-a2a-default-brazilsouth-container \
--replicated-protected-item-name replication-vm-producao-01

ASR CLI is less complete than PowerShell for complex operations like enabling replication with detailed disk configurations. For failover and monitoring operations, CLI is sufficient.


6.4 ARM Template / Terraform​

When to use: IaC for complete DR infrastructure, environments where all configuration needs to be versioned.

ASR configuration via ARM/Terraform is complex because it involves multiple interdependent resources (fabrics, containers, mappings, protected items). For AZ-104, knowledge of portal and PowerShell is sufficient. In real production environments, Terraform has providers for ASR via azurerm_site_recovery_replication_policy and azurerm_site_recovery_protected_vm.


7. Control and Security​

RBAC for ASR​

RoleCapabilities
Site Recovery ContributorManage ASR completely, except creating vaults
Site Recovery OperatorExecute failover and failback; cannot modify configurations
Site Recovery ReaderRead-only access to replication status

Network considerations for replication​

ASR needs outbound connectivity from the source VM to ASR and Azure Storage endpoints. In environments with restrictive Network Security Groups (NSGs) or User Defined Routes (UDRs) forcing traffic through firewall, you need to:

  1. Allow outbound traffic to Service Tags AzureSiteRecovery and Storage in NSGs
  2. Or configure Private Endpoints for the vault, eliminating public internet traffic
  3. Verify that proxies or firewalls are not blocking necessary endpoints

Replication Policy: security settings​

The Replication Policy defines:

  • Recovery Point Retention: how long recovery points are kept (default: 24 hours, maximum: 15 days)
  • App-consistent snapshot frequency: frequency of application-consistent snapshots (default: 4 hours)
  • Multi-VM consistency: allows VMs in a group to be failed over together with the same recovery point (disabled by default; enabling causes slight performance impact)

8. Decision Making​

ASR vs Backup: when to use each​

SituationSolutionReason
Accidentally deleted fileAzure BackupASR doesn't protect individual data
Database corrupted by wrong queryAzure BackupNeed to restore to previous point
Entire primary region unavailableAzure Site RecoveryBackup in same region would also be unavailable
RTO < 30 minutes for critical VMAzure Site RecoveryBackup has RTO of hours; ASR has RTO of minutes
RPO < 1 hourAzure Site RecoveryDaily backup has 24h RPO; ASR has ~60s RPO
Planned regional maintenanceAzure Site RecoveryPlanned Failover with no data loss
7-year retention complianceAzure BackupASR doesn't maintain history; only current state

Attention: ASR and Backup are not mutually exclusive. For critical workloads, use both: ASR to ensure operational continuity in regional disasters and Backup for protection against data loss or corruption.

Where to create the vault for ASR​

ScenarioVault locationReason
ASR for Azure VMs (A2A)DESTINATION regionVault accessible even if source fails
Azure VM BackupSame region as VMVault close to protected data

9. Best Practices​

Vault in destination region, always: the ASR vault must be in the destination region. This is the most important and most misunderstood rule of ASR.

Execute Test Failover regularly: at least once per quarter for each critical item. An untested failover is a failover of unknown reliability. Document the actual RTO measured in tests.

Separate critical VMs in Recovery Plans: don't execute failover manually VM by VM in a crisis situation. Recovery Plans ensure boot order and reduce human errors under pressure.

Configure Network Mapping: without this, failover VMs may not connect to the correct networks, requiring manual reconfiguration during a disaster, increasing RTO.

Monitor RPO continuously: an RPO that gradually grows indicates replication problems. Configure alerts when RPO exceeds 60 minutes, for example.

Consider Multi-VM Consistency only when necessary: enabling Multi-VM Consistency adds replication overhead. Use only for VMs that genuinely need consistency between them (e.g., database cluster).

Properly size the cache storage account: the cache storage account needs sufficient capacity to absorb write spikes from the source VM. Use Standard_LRS as cache account type.


10. Common Errors​

Error: creating the vault in the source region Why it happens: the operator creates the vault where their VMs are, by analogy with Azure Backup. How to avoid: memorize the rule: ASR vault in the DESTINATION region. Always.

Error: never executing Test Failover Why it happens: Test Failover seems optional and creates extra work (cleanup after test). How to avoid: include Test Failover in the maintenance calendar. Without testing, you have no guarantee that failover will work when you need it most.

Error: forgetting to Commit after successful failover Why it happens: the operator does failover, validates VMs, and forgets to commit. How to avoid: include Commit as a mandatory step in the DR runbook. Without Commit, the failover isn't finalized and Reprotect can't be started.

Error: trying to use ASR for long-term backup Why it happens: confusing disaster recovery with data backup. How to avoid: remember that ASR only maintains 15 days of recovery points. For long retention, use Azure Backup. Use both together for complete protection.

Error: not configuring Network Mapping Why it happens: it's an additional configuration that seems optional in the enablement flow. How to avoid: configure Network Mapping immediately after creating the vault. Without it, failover may create VMs without adequate network connectivity.

Error: enabling Multi-VM Consistency for all VMs by default Why it happens: the operator thinks more consistency is always better. How to avoid: Multi-VM Consistency should only be enabled for VMs in the same cluster or that share real-time data dependency. For independent VMs, the overhead isn't justified.


11. Operation and Maintenance​

Daily monitoring​

In the portal, access the vault and go to Site Recovery. Check:

  • Replicated Items: status of each replicated VM (Healthy, Warning, Critical)
  • Recovery Plans: DR plan integrity
  • Jobs: failures in the last 24h
  • RPO: time since last recovery point for each critical VM

Replication health states​

StateMeaningRequired action
HealthyReplication working; RPO within expectedNone
WarningMinor issue; slightly elevated RPO or isolated eventInvestigate, but not critical
CriticalReplication interrupted or very high RPOImmediate action required
SynchronizingInitial sync or re-sync in progressWait for completion

Important ASR limits for Azure VMs​

LimitValue
Protected VMs per vault5000
Disks per replicated VMUp to 100 disks
Maximum replicated disk size32 TB
Recovery point retentionUp to 15 days
Minimum guaranteed RPO~60 seconds
Maximum replication throughput per VMNo fixed limit; limited by network and disk

12. Integration and Automation​

Integration with Azure Automation for automated DR​

The most advanced pattern is to integrate Recovery Plans with Azure Automation Runbooks to create a fully automated failover process:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Integration with Azure Traffic Manager / Front Door​

After a failover, VMs are in the destination region, but DNS and load balancing may still be pointing to the source region. Integrate ASR with:

  • Azure Traffic Manager: configure with priority or automatic failover to redirect traffic to the destination region after a failover
  • Azure Front Door: offers automatic global failover based on health probes

Executing failover via PowerShell in Recovery Plan​

# Get the Recovery Plan
$rp = Get-AzRecoveryServicesAsrRecoveryPlan `
-Name "rp-producao-completo" `
-Vault $vault

# Execute Test Failover
$job = Start-AzRecoveryServicesAsrTestFailoverJob `
-RecoveryPlan $rp `
-Direction PrimaryToRecovery `
-AzureVMNetworkId "/subscriptions/.../virtualNetworks/vnet-teste-isolado"

# Wait for completion
Get-AzRecoveryServicesAsrJob -Job $job | Wait-AzRecoveryServicesAsrJob

# Clean up Test Failover
Start-AzRecoveryServicesAsrTestFailoverCleanupJob `
-RecoveryPlan $rp `
-Comment "Quarterly DR test completed successfully"

13. Final Summary​

What it is: Azure Site Recovery is Azure's disaster recovery service that continuously replicates VMs from a source region to a destination region, enabling failover in minutes with RPO of approximately 60 seconds.

Essential points:

  • The ASR vault must be in the destination region, not the source
  • The RPO of ASR for Azure VMs is approximately 60 seconds (crash-consistent every 5 min, app-consistent configurable)
  • There are three types of failover: Test (non-destructive, isolated network), Planned (no data loss) and Unplanned (emergency)
  • After failover, it's mandatory to Commit before starting Reprotect for failback
  • Recovery Plans orchestrate multi-VM failover with boot order and automated actions
  • ASR and Azure Backup are complementary, not alternatives; use both for critical workloads

Critical differences:

AspectAzure BackupAzure Site Recovery
ObjectiveData protectionOperational continuity
RPOHours to days~60 seconds
RTOHoursMinutes
RetentionUp to 99 yearsUp to 15 days
Vault locationSame region as VMDESTINATION region
Test capabilityRestore in sandboxTest Failover (no impact)
Protection scopeFiles, VMs, databasesEntire VMs

What needs to be remembered for AZ-104:

  • ASR vault always in the destination region
  • Test Failover doesn't affect production and doesn't interrupt replication
  • Commit is mandatory after failover to enable Reprotect
  • Recovery Plans define boot order and group VMs
  • Multi-VM Consistency adds overhead; use only when necessary
  • Network Mapping ensures correct VM connectivity after failover
  • ASR doesn't replace Backup: use both together for complete protection