Theoretical Foundation: Perform Backup and Restore Operations by Using Azure Backup

1. Initial Intuition

In the previous topics, you created the vault and configured the policies. Now it's time to put everything into operation: execute backups and restore data when needed.

Think of the complete cycle like this: the vault is the safe, the policy is the storage contract, and the backup and restore operations are the acts of depositing and withdrawing. Knowing how to create the safe and sign the contract isn't enough; you need to know how to deposit at the right time, how to withdraw in an emergency, and what withdrawal options exist.

In practice, Azure Backup offers two operational paths:

Scheduled backup: executed automatically according to the configured policy. You don't need to do anything after configuring the protected item.
On-demand backup: executed manually outside of the schedule. Used before critical changes, for testing, or to meet specific requirements.

Restoration, in turn, isn't simply "undoing everything." There are multiple restore modalities, each suitable for a specific scenario. Choosing the wrong option can be more time-consuming or may overwrite data that's still good.

2. Context

Backup and restore operations are at the heart of the data protection journey. All the effort to create vaults and policies exists so that when something goes wrong, you can restore with confidence.

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

The context for AZ-104 is clear: you need to know not only that Azure Backup exists, but how to enable protection on specific items, trigger manual backups, choose the correct type of restore, and monitor backup and restore jobs.

3. Concept Construction

3.1 Enabling Protection on an Item

Before any backup happens, you need to associate the resource to the vault and a policy. This process is called Enable Backup or Configure Backup.

For Azure VMs, the process involves:

Select the vault (which must be in the same region as the VM)
Choose the applicable backup policy
Confirm the association

From this moment, the first backup is automatically scheduled. The first backup is always a full backup; subsequent ones are incremental (with Enhanced Policy) or follow legacy behavior.

An important behavior: after enabling protection on a VM, Azure installs or uses the VMSnapshot (Windows) or VMSnapshotLinux (Linux) extension on the VM. This extension coordinates consistent snapshot capture with the operating system. If the extension cannot be installed, backup fails.

3.2 Restore Types for VMs

This is the most important concept in this section. Azure Backup offers four restore modalities for VMs:

1. Restore the VM (Create New VM): creates a new VM from the recovery point. The original VM remains intact. Useful for comparing states, testing restoration without impacting production, or recovering in parallel.

2. Restore Disks: restores VM disks to a storage account without automatically creating the VM. You can then manually create a VM from the disks, or attach the disks to an existing VM. More flexible, but requires additional steps.

3. Replace Existing (In-Place Restore): replaces the OS disk or data disks of the existing VM with data from the recovery point. The VM must be running for this operation. More direct for problem fixing, but destructive: current disk data is replaced.

4. Cross Region Restore (CRR): restores to a secondary region from a replicated recovery point via GRS. Available only when the vault uses GRS and CRR is enabled. Used in regional disaster scenarios.

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

3.3 File-Level Recovery (Item-Level Restore)

Besides restoring the entire VM or disks, Azure Backup allows recovery of individual files without restoring the complete VM.

The process is: Azure mounts the recovery point as a volume on your computer (via executable script generated in the portal), you navigate through the files and copy only what you need. The volume remains accessible for 12 hours by default (maximum of 24 hours).

This method is drastically faster when you need a specific file, like an accidentally deleted document or a corrupted configuration.

3.4 On-Demand Backup

An on-demand backup is a backup executed immediately, outside the policy schedule. It does not replace the next scheduled backup; both will be executed.

The retention of an on-demand backup is configured at execution time. You define how many days that specific point should be retained, regardless of policy rules.

Common use cases:

Before applying a critical patch or operating system update
Before migrating data or executing a destructive script
To create a recovery point outside the standard scheduling window
To meet an audit requirement for a specific point in time

3.5 Stop Protection

Azure Backup offers two ways to stop item protection:

Stop protection and retain data: stops backup scheduling but keeps all existing recovery points. You continue being charged for storing the points. Useful when temporarily decommissioning a resource but want to maintain restoration capability.

Stop protection and delete data: stops scheduling and schedules deletion of all recovery points. With Soft Delete enabled, data is retained for 14 days before definitive deletion.

4. Structural View

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

5. Practical Functionality

Scheduled backup flow for VM

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Quiescing is the process of putting the file system in a consistent state before the snapshot. On Windows, this is done via VSS (Volume Shadow Copy Service). On Linux, it's done via configurable pre/post scripts or via file system freeze. Without quiescing, the snapshot may capture data in an inconsistent state, which can cause corruption when restoring.

Non-obvious behaviors

The first backup can take hours: the initial backup is a full backup of all disk data, regardless of size. A VM with a 500 GB disk will take much longer than one with 30 GB. Subsequent backups are incremental and much faster.

The backup job has two phases: the first phase creates the snapshot (fast, usually minutes). The second phase transfers data to the vault (slow, can take hours depending on size and bandwidth). Instant Restore becomes available after the first phase.

Restore isn't instantaneous from the vault tier: restoring from the vault (not from snapshot) can take hours for large VMs. Restore from Instant Restore Tier (snapshot) is much faster, but only available during the snapshot retention period (1 to 5 days for Standard, 1 to 30 for Enhanced).

Replace Existing requires the VM to be running: this behavior surprises many. Azure needs the VM running to safely coordinate disk replacement.

6. Implementation Methods

6.1 Azure Portal

Enabling protection on a VM:

Access the Recovery Services Vault
Click "Backup"
In "Where is your workload running?": select "Azure"
In "What do you want to backup?": select "Virtual machine"
Click "Backup"
Select the backup policy
Select the VMs to protect
Click "Enable Backup"

Executing on-demand backup:

In the vault, access "Backup Items" > "Azure Virtual Machine"
Select the desired VM
Click "Backup Now"
Define the retention date for the created point
Confirm

Executing VM restore (Create New VM):

In the vault, access "Backup Items" > "Azure Virtual Machine"
Select the VM
Click "Restore VM"
Select the recovery point
Choose "Create new" as restore type
Configure: new VM name, Resource Group, Virtual Network, Subnet, staging storage account
Click "Restore"

Executing File-Level Recovery:

In the vault, access "Backup Items" > "Azure Virtual Machine"
Select the VM
Click "File Recovery"
Select the recovery point
Download the generated executable script
Execute the script on the target VM (Windows: .exe, Linux: .sh)
The script mounts the backup disk as a local volume
Copy the necessary files
Click "Unmount Disks" in the portal for unmounting

6.2 Azure CLI

Enable protection on a VM:

# Get the vault
VAULT_NAME="rsv-prod-brazilsouth"
RG="rg-backup-prod"
VM_NAME="vm-producao-01"
VM_RG="rg-app-prod"
POLICY_NAME="policy-vm-prod-daily"

# Enable backup on VM
az backup protection enable-for-vm \
  --resource-group $RG \
  --vault-name $VAULT_NAME \
  --vm $VM_NAME \
  --policy-name $POLICY_NAME

Execute on-demand backup:

# Trigger on-demand backup
az backup protection backup-now \
  --resource-group $RG \
  --vault-name $VAULT_NAME \
  --container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --backup-management-type AzureIaasVM \
  --retain-until "31-12-2025"

List recovery points:

az backup recoverypoint list \
  --resource-group $RG \
  --vault-name $VAULT_NAME \
  --container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --backup-management-type AzureIaasVM \
  --workload-type VM \
  --output table

Restore disks (Restore Disks):

# Get recovery point ID
RECOVERY_POINT_ID=$(az backup recoverypoint list \
  --resource-group $RG \
  --vault-name $VAULT_NAME \
  --container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --backup-management-type AzureIaasVM \
  --workload-type VM \
  --query "[0].name" -o tsv)

# Execute disk restore to storage account
az backup restore restore-disks \
  --resource-group $RG \
  --vault-name $VAULT_NAME \
  --container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
  --rp-name $RECOVERY_POINT_ID \
  --storage-account "stagebackuprestore" \
  --restore-to-staging-storage-account true

Monitor jobs:

# List recent jobs
az backup job list \
  --resource-group $RG \
  --vault-name $VAULT_NAME \
  --output table

# Check specific job details
az backup job show \
  --resource-group $RG \
  --vault-name $VAULT_NAME \
  --name <job-id>

6.3 Azure PowerShell

Enable protection:

$vault = Get-AzRecoveryServicesVault `
  -ResourceGroupName "rg-backup-prod" `
  -Name "rsv-prod-brazilsouth"

Set-AzRecoveryServicesVaultContext -Vault $vault

$policy = Get-AzRecoveryServicesBackupProtectionPolicy `
  -Name "policy-vm-prod-daily"

$vm = Get-AzVM `
  -ResourceGroupName "rg-app-prod" `
  -Name "vm-producao-01"

Enable-AzRecoveryServicesBackupProtection `
  -ResourceGroupName "rg-app-prod" `
  -Name "vm-producao-01" `
  -Policy $policy

On-demand backup:

$container = Get-AzRecoveryServicesBackupContainer `
  -ContainerType AzureVM `
  -FriendlyName "vm-producao-01"

$item = Get-AzRecoveryServicesBackupItem `
  -Container $container `
  -WorkloadType AzureVM

Backup-AzRecoveryServicesBackupItem `
  -Item $item `
  -ExpiryDateTimeUTC (Get-Date).AddDays(30)

List points and restore:

# List recovery points
$rps = Get-AzRecoveryServicesBackupRecoveryPoint `
  -Item $item

# Select the most recent
$rp = $rps[0]

# Restore disks to storage account
$storageAccount = Get-AzStorageAccount `
  -ResourceGroupName "rg-restore-staging" `
  -Name "stagebackuprestore"

Restore-AzRecoveryServicesBackupItem `
  -RecoveryPoint $rp `
  -StorageAccountName $storageAccount.StorageAccountName `
  -StorageAccountResourceGroupName $storageAccount.ResourceGroupName `
  -RestoreOnlyOSDisk $false

7. Control and Security

RBAC for backup and restore operations

Operation	Minimum required role
Enable protection on VM	Backup Contributor + VM Contributor
Execute on-demand backup	Backup Operator
Restore VM (Create New)	Backup Operator + permission on destination RG
Restore disks	Backup Operator + Storage Contributor on staging storage
Replace Existing	Backup Operator + VM Contributor on VM
File Recovery	Backup Operator
Stop protection	Backup Contributor

Multi-user Authorization (MUA)

Azure Backup supports Multi-user Authorization: critical operations like disabling soft delete, stopping protection with data deletion, or modifying vault security settings may require approval from a second administrator.

To enable MUA, a Resource Guard configured in a separate subscription is required, managed by a different security team from the backup team. This implements role segregation for destructive operations.

Protection against unauthorized restoration

Restore creates real resources (VMs, disks, files) that consume cost and may expose sensitive data. Therefore:

Limit access to the Backup Operator role only to those who truly need to execute restores
Monitor restore jobs via Azure Monitor and configure alerts for any restore operation
Use Resource Locks on production vaults to prevent accidental deletion

8. Decision Making

Which restore type to use

Scenario	Restore Modality	Reason
Corrupted VM, need to restore everything without losing the original	Create New VM	Creates parallel VM; original intact for comparison
Individual file accidentally deleted	File-Level Recovery	Much faster; no need to restore entire VM
Corrupted OS disk, VM exists but won't boot	Replace Existing	Replaces existing VM disk; more direct
Migration or VM reconfiguration before restoring	Restore Disks	More control; allows manual VM reconfiguration
Primary region unavailable due to disaster	Cross Region Restore	Only available method to restore in secondary region
Need to test restore without impacting production	Create New VM	Restoration in isolated environment

Scheduled vs on-demand backup

Situation	Approach	Reason
Planned maintenance or critical deployment	On-demand	Guarantees recovery point at exact moment before change
Continuous workload protection	Scheduled	Automatic, consistent, no manual intervention
Audit or compliance on specific date	On-demand	Creates point with specific retention for that date
Maintenance window outside scheduled hours	On-demand	Complements standard scheduling

9. Best Practices

Always perform on-demand backup before significant changes: critical security patches, application updates, configuration changes, or migration are moments that require a current recovery point, regardless of scheduling.

Test restore regularly in isolated environment: untested backups are of unknown value. Schedule quarterly restore tests using "Create New VM" in a sandbox Resource Group, verifying data integrity and execution time.

Use File-Level Recovery for point-in-time recoveries: recovering a single file via File Recovery is orders of magnitude faster than restoring an entire VM. Train the team to use this path before triggering a complete restore.

Monitor the first backup of each new item: the first backup is the most critical and the most time-consuming. Actively verify the first backup job when enabling protection on new resources.

Document actual RTO: theoretical RTO (based on estimates) frequently differs from actual RTO (based on tests). Execute a real restore and measure the time to have documented and reliable RTO.

Use separate staging Resource Group for restores: disk restores produce artifacts (VMs, disks, configuration files) that need to be managed. Use a dedicated Resource Group for restore staging, with automatic cleanup after validation.

10. Common Errors

Error: expecting restore to be instantaneous Why it happens: confusion between the speed of Instant Restore (snapshot) and restore from vault. How to avoid: understand that Instant Restore is only available within the snapshot retention period (1 to 5 days for Standard). Outside this period, restore comes from vault and may take hours.

Error: using Replace Existing when VM is shut down Why it happens: operator shuts down VM before restoring, thinking it's safer. How to avoid: remember that Replace Existing requires VM to be running. For shut down VM, use Restore Disks or Create New VM.

Error: not configuring staging storage account before disk restore Why it happens: disk restore requires a storage account in the same region. Operators forget to create storage beforehand. How to avoid: include staging storage creation in restoration playbook. Ideally, maintain a dedicated staging storage account, always available.

Error: confusing File Recovery with complete restore for large files Why it happens: File Recovery mounts disk as volume; for very large files (tens of GB), copying over network can be as slow as complete restore. How to avoid: for large file recovery (above few GB), evaluate if Restore Disks would be more efficient.

Error: not verifying integrity after restore Why it happens: operator assumes "restore completed" means "data integrity". How to avoid: always validate restore: start VM, verify services, confirm critical data integrity before declaring restore successful.

Error: performing direct restore to production without parallel testing Why it happens: pressure for speed in incident situations. How to avoid: use "Create New VM" to validate recovery point in parallel before executing Replace Existing or decommissioning current VM.

11. Operation and Maintenance

Monitoring backup jobs

In the portal, access the vault and navigate to Backup Jobs. Possible states:

Job Status	Meaning	Required Action
Completed	Job finished successfully	None
Completed with warnings	Job completed but with warnings (e.g., open file not included)	Investigate warning
In Progress	Job running	Wait; monitor duration
Failed	Job completely failed	Investigate cause; check logs
Cancelled	Job manually cancelled	Verify if intentional

Common causes of backup failure

Cause	Symptom	Resolution
VMSnapshot extension outdated or corrupted	Failure in snapshot phase	Re-install extension in portal
VM without connectivity to Azure Backup endpoint	Timeout during transfer	Check NSG, UDR and Private Endpoints
Disk exceeding supported limit	Disk backup failure	Check maximum supported size (32 TB)
Staging storage account deleted	Disk restore failure	Recreate staging storage account
Insufficient permissions	403 error in logs	Review RBAC for operator and extension

Important operational limits

Limit	Value
Maximum disk size for VM backup	32 TB
Recovery points per protected item (VMs)	9999
Instant Restore window (Standard)	1 to 5 days
Instant Restore window (Enhanced)	1 to 30 days
File Recovery mounted volume duration	12 hours (default), maximum 24h
Maximum on-demand backups per day per item	9
Minimum retention time for on-demand	1 day

12. Integration and Automation

On-demand backup automation before deploys

A common pattern is integrating on-demand backup in CI/CD pipelines, ensuring a recovery point is created before any production deploy:

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Automated alerts via Azure Monitor

Configure alerts for backup failures without manual monitoring:

# Create alert rule for backup failures
az monitor alert create \
  --name "alert-backup-failure" \
  --resource-group rg-backup-prod \
  --target /subscriptions/{sub}/resourceGroups/rg-backup-prod/providers/Microsoft.RecoveryServices/vaults/rsv-prod-brazilsouth \
  --condition "category=Backup and level=Error" \
  --action-group /subscriptions/{sub}/resourceGroups/rg-backup-prod/providers/microsoft.insights/actionGroups/ag-backup-ops

Runbooks for automated restore

For DR (Disaster Recovery) scenarios with aggressive RTO, it's possible to create Azure Automation Runbooks that execute restore automatically when triggered by an alert or webhook. The runbook calls Azure Backup APIs via PowerShell to:

Identify the most recent recovery point
Trigger disk restore to staging
Create new VM from disks
Execute post-restore validation scripts
Redirect traffic (via Azure Load Balancer or DNS) to the new VM

13. Final Summary

What are backup and restore operations: set of actions that enable protection on Azure resources, execute scheduled and manual backups, and recover data from restoration points stored in the vault.

Essential points:

Enabling protection associates the resource to the vault and policy; the first backup is always full and may take hours
On-demand backup creates an immediate recovery point with retention defined at execution; does not cancel the next scheduled backup
There are four restore modes for VMs: Create New VM, Restore Disks, Replace Existing and Cross Region Restore
File-Level Recovery mounts the recovery point as local volume for up to 24 hours for individual file recovery
Instant Restore uses the snapshot tier for fast restoration; outside the snapshot period, restore comes from vault and is slower

Critical differences between restore modes:

Mode	Original VM affected	VM needs to be on	Speed
Create New VM	No	Indifferent	Moderate
Restore Disks	No	Indifferent	Moderate
Replace Existing	Yes (disks replaced)	Yes, mandatory	More direct
Cross Region Restore	No	Indifferent	Slower (data from secondary region)
File Recovery	No	Indifferent	Fast for small files

What needs to be remembered for AZ-104:

Replace Existing requires VM to be running
File Recovery mounts volume for maximum 24 hours
Cross Region Restore requires vault with GRS and CRR enabled
First backup is always full; subsequent ones are incremental (Enhanced Policy)
Maximum 9 on-demand backups per day per protected item
On-demand backup doesn't replace next scheduled backup; both occur
Instant Restore is only available within snapshot retention period configured in policy

1. Initial Intuition​

2. Context​

3. Concept Construction​

3.1 Enabling Protection on an Item​

3.2 Restore Types for VMs​

3.3 File-Level Recovery (Item-Level Restore)​

3.4 On-Demand Backup​

3.5 Stop Protection​

4. Structural View​

5. Practical Functionality​

Scheduled backup flow for VM​

Non-obvious behaviors​

6. Implementation Methods​

6.1 Azure Portal​

6.2 Azure CLI​

6.3 Azure PowerShell​

7. Control and Security​

RBAC for backup and restore operations​

Multi-user Authorization (MUA)​

Protection against unauthorized restoration​

8. Decision Making​

Which restore type to use​

Scheduled vs on-demand backup​

9. Best Practices​

10. Common Errors​

11. Operation and Maintenance​

Monitoring backup jobs​

Common causes of backup failure​

Important operational limits​

12. Integration and Automation​

On-demand backup automation before deploys​

Automated alerts via Azure Monitor​

Runbooks for automated restore​

13. Final Summary​

1. Initial Intuition

2. Context

3. Concept Construction

3.1 Enabling Protection on an Item

3.2 Restore Types for VMs

3.3 File-Level Recovery (Item-Level Restore)

3.4 On-Demand Backup

3.5 Stop Protection

4. Structural View

5. Practical Functionality

Scheduled backup flow for VM

Non-obvious behaviors

6. Implementation Methods

6.1 Azure Portal

6.2 Azure CLI

6.3 Azure PowerShell

7. Control and Security

RBAC for backup and restore operations

Multi-user Authorization (MUA)

Protection against unauthorized restoration

8. Decision Making

Which restore type to use

Scheduled vs on-demand backup

9. Best Practices

10. Common Errors

11. Operation and Maintenance

Monitoring backup jobs

Common causes of backup failure

Important operational limits

12. Integration and Automation

On-demand backup automation before deploys

Automated alerts via Azure Monitor

Runbooks for automated restore

13. Final Summary