Skip to main content

Theoretical Foundation: Perform Backup and Restore Operations by Using Azure Backup


1. Initial Intuition​

In the previous topics, you created the vault and configured the policies. Now it's time to put everything into operation: execute backups and restore data when needed.

Think of the complete cycle like this: the vault is the safe, the policy is the storage contract, and the backup and restore operations are the acts of depositing and withdrawing. Knowing how to create the safe and sign the contract isn't enough; you need to know how to deposit at the right time, how to withdraw in an emergency, and what withdrawal options exist.

In practice, Azure Backup offers two operational paths:

  • Scheduled backup: executed automatically according to the configured policy. You don't need to do anything after configuring the protected item.
  • On-demand backup: executed manually outside of the schedule. Used before critical changes, for testing, or to meet specific requirements.

Restoration, in turn, isn't simply "undoing everything." There are multiple restore modalities, each suitable for a specific scenario. Choosing the wrong option can be more time-consuming or may overwrite data that's still good.


2. Context​

Backup and restore operations are at the heart of the data protection journey. All the effort to create vaults and policies exists so that when something goes wrong, you can restore with confidence.

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

The context for AZ-104 is clear: you need to know not only that Azure Backup exists, but how to enable protection on specific items, trigger manual backups, choose the correct type of restore, and monitor backup and restore jobs.


3. Concept Construction​

3.1 Enabling Protection on an Item​

Before any backup happens, you need to associate the resource to the vault and a policy. This process is called Enable Backup or Configure Backup.

For Azure VMs, the process involves:

  1. Select the vault (which must be in the same region as the VM)
  2. Choose the applicable backup policy
  3. Confirm the association

From this moment, the first backup is automatically scheduled. The first backup is always a full backup; subsequent ones are incremental (with Enhanced Policy) or follow legacy behavior.

An important behavior: after enabling protection on a VM, Azure installs or uses the VMSnapshot (Windows) or VMSnapshotLinux (Linux) extension on the VM. This extension coordinates consistent snapshot capture with the operating system. If the extension cannot be installed, backup fails.


3.2 Restore Types for VMs​

This is the most important concept in this section. Azure Backup offers four restore modalities for VMs:

1. Restore the VM (Create New VM): creates a new VM from the recovery point. The original VM remains intact. Useful for comparing states, testing restoration without impacting production, or recovering in parallel.

2. Restore Disks: restores VM disks to a storage account without automatically creating the VM. You can then manually create a VM from the disks, or attach the disks to an existing VM. More flexible, but requires additional steps.

3. Replace Existing (In-Place Restore): replaces the OS disk or data disks of the existing VM with data from the recovery point. The VM must be running for this operation. More direct for problem fixing, but destructive: current disk data is replaced.

4. Cross Region Restore (CRR): restores to a secondary region from a replicated recovery point via GRS. Available only when the vault uses GRS and CRR is enabled. Used in regional disaster scenarios.

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

3.3 File-Level Recovery (Item-Level Restore)​

Besides restoring the entire VM or disks, Azure Backup allows recovery of individual files without restoring the complete VM.

The process is: Azure mounts the recovery point as a volume on your computer (via executable script generated in the portal), you navigate through the files and copy only what you need. The volume remains accessible for 12 hours by default (maximum of 24 hours).

This method is drastically faster when you need a specific file, like an accidentally deleted document or a corrupted configuration.


3.4 On-Demand Backup​

An on-demand backup is a backup executed immediately, outside the policy schedule. It does not replace the next scheduled backup; both will be executed.

The retention of an on-demand backup is configured at execution time. You define how many days that specific point should be retained, regardless of policy rules.

Common use cases:

  • Before applying a critical patch or operating system update
  • Before migrating data or executing a destructive script
  • To create a recovery point outside the standard scheduling window
  • To meet an audit requirement for a specific point in time

3.5 Stop Protection​

Azure Backup offers two ways to stop item protection:

Stop protection and retain data: stops backup scheduling but keeps all existing recovery points. You continue being charged for storing the points. Useful when temporarily decommissioning a resource but want to maintain restoration capability.

Stop protection and delete data: stops scheduling and schedules deletion of all recovery points. With Soft Delete enabled, data is retained for 14 days before definitive deletion.


4. Structural View​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

5. Practical Functionality​

Scheduled backup flow for VM​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Quiescing is the process of putting the file system in a consistent state before the snapshot. On Windows, this is done via VSS (Volume Shadow Copy Service). On Linux, it's done via configurable pre/post scripts or via file system freeze. Without quiescing, the snapshot may capture data in an inconsistent state, which can cause corruption when restoring.


Non-obvious behaviors​

The first backup can take hours: the initial backup is a full backup of all disk data, regardless of size. A VM with a 500 GB disk will take much longer than one with 30 GB. Subsequent backups are incremental and much faster.

The backup job has two phases: the first phase creates the snapshot (fast, usually minutes). The second phase transfers data to the vault (slow, can take hours depending on size and bandwidth). Instant Restore becomes available after the first phase.

Restore isn't instantaneous from the vault tier: restoring from the vault (not from snapshot) can take hours for large VMs. Restore from Instant Restore Tier (snapshot) is much faster, but only available during the snapshot retention period (1 to 5 days for Standard, 1 to 30 for Enhanced).

Replace Existing requires the VM to be running: this behavior surprises many. Azure needs the VM running to safely coordinate disk replacement.


6. Implementation Methods​

6.1 Azure Portal​

Enabling protection on a VM:

  1. Access the Recovery Services Vault
  2. Click "Backup"
  3. In "Where is your workload running?": select "Azure"
  4. In "What do you want to backup?": select "Virtual machine"
  5. Click "Backup"
  6. Select the backup policy
  7. Select the VMs to protect
  8. Click "Enable Backup"

Executing on-demand backup:

  1. In the vault, access "Backup Items" > "Azure Virtual Machine"
  2. Select the desired VM
  3. Click "Backup Now"
  4. Define the retention date for the created point
  5. Confirm

Executing VM restore (Create New VM):

  1. In the vault, access "Backup Items" > "Azure Virtual Machine"
  2. Select the VM
  3. Click "Restore VM"
  4. Select the recovery point
  5. Choose "Create new" as restore type
  6. Configure: new VM name, Resource Group, Virtual Network, Subnet, staging storage account
  7. Click "Restore"

Executing File-Level Recovery:

  1. In the vault, access "Backup Items" > "Azure Virtual Machine"
  2. Select the VM
  3. Click "File Recovery"
  4. Select the recovery point
  5. Download the generated executable script
  6. Execute the script on the target VM (Windows: .exe, Linux: .sh)
  7. The script mounts the backup disk as a local volume
  8. Copy the necessary files
  9. Click "Unmount Disks" in the portal for unmounting

6.2 Azure CLI​

Enable protection on a VM:

# Get the vault
VAULT_NAME="rsv-prod-brazilsouth"
RG="rg-backup-prod"
VM_NAME="vm-producao-01"
VM_RG="rg-app-prod"
POLICY_NAME="policy-vm-prod-daily"

# Enable backup on VM
az backup protection enable-for-vm \
--resource-group $RG \
--vault-name $VAULT_NAME \
--vm $VM_NAME \
--policy-name $POLICY_NAME

Execute on-demand backup:

# Trigger on-demand backup
az backup protection backup-now \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--backup-management-type AzureIaasVM \
--retain-until "31-12-2025"

List recovery points:

az backup recoverypoint list \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--backup-management-type AzureIaasVM \
--workload-type VM \
--output table

Restore disks (Restore Disks):

# Get recovery point ID
RECOVERY_POINT_ID=$(az backup recoverypoint list \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--backup-management-type AzureIaasVM \
--workload-type VM \
--query "[0].name" -o tsv)

# Execute disk restore to storage account
az backup restore restore-disks \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--rp-name $RECOVERY_POINT_ID \
--storage-account "stagebackuprestore" \
--restore-to-staging-storage-account true

Monitor jobs:

# List recent jobs
az backup job list \
--resource-group $RG \
--vault-name $VAULT_NAME \
--output table

# Check specific job details
az backup job show \
--resource-group $RG \
--vault-name $VAULT_NAME \
--name <job-id>

6.3 Azure PowerShell​

Enable protection:

$vault = Get-AzRecoveryServicesVault `
-ResourceGroupName "rg-backup-prod" `
-Name "rsv-prod-brazilsouth"

Set-AzRecoveryServicesVaultContext -Vault $vault

$policy = Get-AzRecoveryServicesBackupProtectionPolicy `
-Name "policy-vm-prod-daily"

$vm = Get-AzVM `
-ResourceGroupName "rg-app-prod" `
-Name "vm-producao-01"

Enable-AzRecoveryServicesBackupProtection `
-ResourceGroupName "rg-app-prod" `
-Name "vm-producao-01" `
-Policy $policy

On-demand backup:

$container = Get-AzRecoveryServicesBackupContainer `
-ContainerType AzureVM `
-FriendlyName "vm-producao-01"

$item = Get-AzRecoveryServicesBackupItem `
-Container $container `
-WorkloadType AzureVM

Backup-AzRecoveryServicesBackupItem `
-Item $item `
-ExpiryDateTimeUTC (Get-Date).AddDays(30)

List points and restore:

# List recovery points
$rps = Get-AzRecoveryServicesBackupRecoveryPoint `
-Item $item

# Select the most recent
$rp = $rps[0]

# Restore disks to storage account
$storageAccount = Get-AzStorageAccount `
-ResourceGroupName "rg-restore-staging" `
-Name "stagebackuprestore"

Restore-AzRecoveryServicesBackupItem `
-RecoveryPoint $rp `
-StorageAccountName $storageAccount.StorageAccountName `
-StorageAccountResourceGroupName $storageAccount.ResourceGroupName `
-RestoreOnlyOSDisk $false

7. Control and Security​

RBAC for backup and restore operations​

OperationMinimum required role
Enable protection on VMBackup Contributor + VM Contributor
Execute on-demand backupBackup Operator
Restore VM (Create New)Backup Operator + permission on destination RG
Restore disksBackup Operator + Storage Contributor on staging storage
Replace ExistingBackup Operator + VM Contributor on VM
File RecoveryBackup Operator
Stop protectionBackup Contributor

Multi-user Authorization (MUA)​

Azure Backup supports Multi-user Authorization: critical operations like disabling soft delete, stopping protection with data deletion, or modifying vault security settings may require approval from a second administrator.

To enable MUA, a Resource Guard configured in a separate subscription is required, managed by a different security team from the backup team. This implements role segregation for destructive operations.

Protection against unauthorized restoration​

Restore creates real resources (VMs, disks, files) that consume cost and may expose sensitive data. Therefore:

  • Limit access to the Backup Operator role only to those who truly need to execute restores
  • Monitor restore jobs via Azure Monitor and configure alerts for any restore operation
  • Use Resource Locks on production vaults to prevent accidental deletion

8. Decision Making​

Which restore type to use​

ScenarioRestore ModalityReason
Corrupted VM, need to restore everything without losing the originalCreate New VMCreates parallel VM; original intact for comparison
Individual file accidentally deletedFile-Level RecoveryMuch faster; no need to restore entire VM
Corrupted OS disk, VM exists but won't bootReplace ExistingReplaces existing VM disk; more direct
Migration or VM reconfiguration before restoringRestore DisksMore control; allows manual VM reconfiguration
Primary region unavailable due to disasterCross Region RestoreOnly available method to restore in secondary region
Need to test restore without impacting productionCreate New VMRestoration in isolated environment

Scheduled vs on-demand backup​

SituationApproachReason
Planned maintenance or critical deploymentOn-demandGuarantees recovery point at exact moment before change
Continuous workload protectionScheduledAutomatic, consistent, no manual intervention
Audit or compliance on specific dateOn-demandCreates point with specific retention for that date
Maintenance window outside scheduled hoursOn-demandComplements standard scheduling

9. Best Practices​

Always perform on-demand backup before significant changes: critical security patches, application updates, configuration changes, or migration are moments that require a current recovery point, regardless of scheduling.

Test restore regularly in isolated environment: untested backups are of unknown value. Schedule quarterly restore tests using "Create New VM" in a sandbox Resource Group, verifying data integrity and execution time.

Use File-Level Recovery for point-in-time recoveries: recovering a single file via File Recovery is orders of magnitude faster than restoring an entire VM. Train the team to use this path before triggering a complete restore.

Monitor the first backup of each new item: the first backup is the most critical and the most time-consuming. Actively verify the first backup job when enabling protection on new resources.

Document actual RTO: theoretical RTO (based on estimates) frequently differs from actual RTO (based on tests). Execute a real restore and measure the time to have documented and reliable RTO.

Use separate staging Resource Group for restores: disk restores produce artifacts (VMs, disks, configuration files) that need to be managed. Use a dedicated Resource Group for restore staging, with automatic cleanup after validation.


10. Common Errors​

Error: expecting restore to be instantaneous Why it happens: confusion between the speed of Instant Restore (snapshot) and restore from vault. How to avoid: understand that Instant Restore is only available within the snapshot retention period (1 to 5 days for Standard). Outside this period, restore comes from vault and may take hours.

Error: using Replace Existing when VM is shut down Why it happens: operator shuts down VM before restoring, thinking it's safer. How to avoid: remember that Replace Existing requires VM to be running. For shut down VM, use Restore Disks or Create New VM.

Error: not configuring staging storage account before disk restore Why it happens: disk restore requires a storage account in the same region. Operators forget to create storage beforehand. How to avoid: include staging storage creation in restoration playbook. Ideally, maintain a dedicated staging storage account, always available.

Error: confusing File Recovery with complete restore for large files Why it happens: File Recovery mounts disk as volume; for very large files (tens of GB), copying over network can be as slow as complete restore. How to avoid: for large file recovery (above few GB), evaluate if Restore Disks would be more efficient.

Error: not verifying integrity after restore Why it happens: operator assumes "restore completed" means "data integrity". How to avoid: always validate restore: start VM, verify services, confirm critical data integrity before declaring restore successful.

Error: performing direct restore to production without parallel testing Why it happens: pressure for speed in incident situations. How to avoid: use "Create New VM" to validate recovery point in parallel before executing Replace Existing or decommissioning current VM.


11. Operation and Maintenance​

Monitoring backup jobs​

In the portal, access the vault and navigate to Backup Jobs. Possible states:

Job StatusMeaningRequired Action
CompletedJob finished successfullyNone
Completed with warningsJob completed but with warnings (e.g., open file not included)Investigate warning
In ProgressJob runningWait; monitor duration
FailedJob completely failedInvestigate cause; check logs
CancelledJob manually cancelledVerify if intentional

Common causes of backup failure​

CauseSymptomResolution
VMSnapshot extension outdated or corruptedFailure in snapshot phaseRe-install extension in portal
VM without connectivity to Azure Backup endpointTimeout during transferCheck NSG, UDR and Private Endpoints
Disk exceeding supported limitDisk backup failureCheck maximum supported size (32 TB)
Staging storage account deletedDisk restore failureRecreate staging storage account
Insufficient permissions403 error in logsReview RBAC for operator and extension

Important operational limits​

LimitValue
Maximum disk size for VM backup32 TB
Recovery points per protected item (VMs)9999
Instant Restore window (Standard)1 to 5 days
Instant Restore window (Enhanced)1 to 30 days
File Recovery mounted volume duration12 hours (default), maximum 24h
Maximum on-demand backups per day per item9
Minimum retention time for on-demand1 day

12. Integration and Automation​

On-demand backup automation before deploys​

A common pattern is integrating on-demand backup in CI/CD pipelines, ensuring a recovery point is created before any production deploy:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Automated alerts via Azure Monitor​

Configure alerts for backup failures without manual monitoring:

# Create alert rule for backup failures
az monitor alert create \
--name "alert-backup-failure" \
--resource-group rg-backup-prod \
--target /subscriptions/{sub}/resourceGroups/rg-backup-prod/providers/Microsoft.RecoveryServices/vaults/rsv-prod-brazilsouth \
--condition "category=Backup and level=Error" \
--action-group /subscriptions/{sub}/resourceGroups/rg-backup-prod/providers/microsoft.insights/actionGroups/ag-backup-ops

Runbooks for automated restore​

For DR (Disaster Recovery) scenarios with aggressive RTO, it's possible to create Azure Automation Runbooks that execute restore automatically when triggered by an alert or webhook. The runbook calls Azure Backup APIs via PowerShell to:

  1. Identify the most recent recovery point
  2. Trigger disk restore to staging
  3. Create new VM from disks
  4. Execute post-restore validation scripts
  5. Redirect traffic (via Azure Load Balancer or DNS) to the new VM

13. Final Summary​

What are backup and restore operations: set of actions that enable protection on Azure resources, execute scheduled and manual backups, and recover data from restoration points stored in the vault.

Essential points:

  • Enabling protection associates the resource to the vault and policy; the first backup is always full and may take hours
  • On-demand backup creates an immediate recovery point with retention defined at execution; does not cancel the next scheduled backup
  • There are four restore modes for VMs: Create New VM, Restore Disks, Replace Existing and Cross Region Restore
  • File-Level Recovery mounts the recovery point as local volume for up to 24 hours for individual file recovery
  • Instant Restore uses the snapshot tier for fast restoration; outside the snapshot period, restore comes from vault and is slower

Critical differences between restore modes:

ModeOriginal VM affectedVM needs to be onSpeed
Create New VMNoIndifferentModerate
Restore DisksNoIndifferentModerate
Replace ExistingYes (disks replaced)Yes, mandatoryMore direct
Cross Region RestoreNoIndifferentSlower (data from secondary region)
File RecoveryNoIndifferentFast for small files

What needs to be remembered for AZ-104:

  • Replace Existing requires VM to be running
  • File Recovery mounts volume for maximum 24 hours
  • Cross Region Restore requires vault with GRS and CRR enabled
  • First backup is always full; subsequent ones are incremental (Enhanced Policy)
  • Maximum 9 on-demand backups per day per protected item
  • On-demand backup doesn't replace next scheduled backup; both occur
  • Instant Restore is only available within snapshot retention period configured in policy