Theoretical Foundation: Perform Backup and Restore Operations by Using Azure Backup
1. Initial Intuitionβ
In the previous topics, you created the vault and configured the policies. Now it's time to put everything into operation: execute backups and restore data when needed.
Think of the complete cycle like this: the vault is the safe, the policy is the storage contract, and the backup and restore operations are the acts of depositing and withdrawing. Knowing how to create the safe and sign the contract isn't enough; you need to know how to deposit at the right time, how to withdraw in an emergency, and what withdrawal options exist.
In practice, Azure Backup offers two operational paths:
- Scheduled backup: executed automatically according to the configured policy. You don't need to do anything after configuring the protected item.
- On-demand backup: executed manually outside of the schedule. Used before critical changes, for testing, or to meet specific requirements.
Restoration, in turn, isn't simply "undoing everything." There are multiple restore modalities, each suitable for a specific scenario. Choosing the wrong option can be more time-consuming or may overwrite data that's still good.
2. Contextβ
Backup and restore operations are at the heart of the data protection journey. All the effort to create vaults and policies exists so that when something goes wrong, you can restore with confidence.
The context for AZ-104 is clear: you need to know not only that Azure Backup exists, but how to enable protection on specific items, trigger manual backups, choose the correct type of restore, and monitor backup and restore jobs.
3. Concept Constructionβ
3.1 Enabling Protection on an Itemβ
Before any backup happens, you need to associate the resource to the vault and a policy. This process is called Enable Backup or Configure Backup.
For Azure VMs, the process involves:
- Select the vault (which must be in the same region as the VM)
- Choose the applicable backup policy
- Confirm the association
From this moment, the first backup is automatically scheduled. The first backup is always a full backup; subsequent ones are incremental (with Enhanced Policy) or follow legacy behavior.
An important behavior: after enabling protection on a VM, Azure installs or uses the VMSnapshot (Windows) or VMSnapshotLinux (Linux) extension on the VM. This extension coordinates consistent snapshot capture with the operating system. If the extension cannot be installed, backup fails.
3.2 Restore Types for VMsβ
This is the most important concept in this section. Azure Backup offers four restore modalities for VMs:
1. Restore the VM (Create New VM): creates a new VM from the recovery point. The original VM remains intact. Useful for comparing states, testing restoration without impacting production, or recovering in parallel.
2. Restore Disks: restores VM disks to a storage account without automatically creating the VM. You can then manually create a VM from the disks, or attach the disks to an existing VM. More flexible, but requires additional steps.
3. Replace Existing (In-Place Restore): replaces the OS disk or data disks of the existing VM with data from the recovery point. The VM must be running for this operation. More direct for problem fixing, but destructive: current disk data is replaced.
4. Cross Region Restore (CRR): restores to a secondary region from a replicated recovery point via GRS. Available only when the vault uses GRS and CRR is enabled. Used in regional disaster scenarios.
3.3 File-Level Recovery (Item-Level Restore)β
Besides restoring the entire VM or disks, Azure Backup allows recovery of individual files without restoring the complete VM.
The process is: Azure mounts the recovery point as a volume on your computer (via executable script generated in the portal), you navigate through the files and copy only what you need. The volume remains accessible for 12 hours by default (maximum of 24 hours).
This method is drastically faster when you need a specific file, like an accidentally deleted document or a corrupted configuration.
3.4 On-Demand Backupβ
An on-demand backup is a backup executed immediately, outside the policy schedule. It does not replace the next scheduled backup; both will be executed.
The retention of an on-demand backup is configured at execution time. You define how many days that specific point should be retained, regardless of policy rules.
Common use cases:
- Before applying a critical patch or operating system update
- Before migrating data or executing a destructive script
- To create a recovery point outside the standard scheduling window
- To meet an audit requirement for a specific point in time
3.5 Stop Protectionβ
Azure Backup offers two ways to stop item protection:
Stop protection and retain data: stops backup scheduling but keeps all existing recovery points. You continue being charged for storing the points. Useful when temporarily decommissioning a resource but want to maintain restoration capability.
Stop protection and delete data: stops scheduling and schedules deletion of all recovery points. With Soft Delete enabled, data is retained for 14 days before definitive deletion.
4. Structural Viewβ
5. Practical Functionalityβ
Scheduled backup flow for VMβ
Quiescing is the process of putting the file system in a consistent state before the snapshot. On Windows, this is done via VSS (Volume Shadow Copy Service). On Linux, it's done via configurable pre/post scripts or via file system freeze. Without quiescing, the snapshot may capture data in an inconsistent state, which can cause corruption when restoring.
Non-obvious behaviorsβ
The first backup can take hours: the initial backup is a full backup of all disk data, regardless of size. A VM with a 500 GB disk will take much longer than one with 30 GB. Subsequent backups are incremental and much faster.
The backup job has two phases: the first phase creates the snapshot (fast, usually minutes). The second phase transfers data to the vault (slow, can take hours depending on size and bandwidth). Instant Restore becomes available after the first phase.
Restore isn't instantaneous from the vault tier: restoring from the vault (not from snapshot) can take hours for large VMs. Restore from Instant Restore Tier (snapshot) is much faster, but only available during the snapshot retention period (1 to 5 days for Standard, 1 to 30 for Enhanced).
Replace Existing requires the VM to be running: this behavior surprises many. Azure needs the VM running to safely coordinate disk replacement.
6. Implementation Methodsβ
6.1 Azure Portalβ
Enabling protection on a VM:
- Access the Recovery Services Vault
- Click "Backup"
- In "Where is your workload running?": select "Azure"
- In "What do you want to backup?": select "Virtual machine"
- Click "Backup"
- Select the backup policy
- Select the VMs to protect
- Click "Enable Backup"
Executing on-demand backup:
- In the vault, access "Backup Items" > "Azure Virtual Machine"
- Select the desired VM
- Click "Backup Now"
- Define the retention date for the created point
- Confirm
Executing VM restore (Create New VM):
- In the vault, access "Backup Items" > "Azure Virtual Machine"
- Select the VM
- Click "Restore VM"
- Select the recovery point
- Choose "Create new" as restore type
- Configure: new VM name, Resource Group, Virtual Network, Subnet, staging storage account
- Click "Restore"
Executing File-Level Recovery:
- In the vault, access "Backup Items" > "Azure Virtual Machine"
- Select the VM
- Click "File Recovery"
- Select the recovery point
- Download the generated executable script
- Execute the script on the target VM (Windows: .exe, Linux: .sh)
- The script mounts the backup disk as a local volume
- Copy the necessary files
- Click "Unmount Disks" in the portal for unmounting
6.2 Azure CLIβ
Enable protection on a VM:
# Get the vault
VAULT_NAME="rsv-prod-brazilsouth"
RG="rg-backup-prod"
VM_NAME="vm-producao-01"
VM_RG="rg-app-prod"
POLICY_NAME="policy-vm-prod-daily"
# Enable backup on VM
az backup protection enable-for-vm \
--resource-group $RG \
--vault-name $VAULT_NAME \
--vm $VM_NAME \
--policy-name $POLICY_NAME
Execute on-demand backup:
# Trigger on-demand backup
az backup protection backup-now \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--backup-management-type AzureIaasVM \
--retain-until "31-12-2025"
List recovery points:
az backup recoverypoint list \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--backup-management-type AzureIaasVM \
--workload-type VM \
--output table
Restore disks (Restore Disks):
# Get recovery point ID
RECOVERY_POINT_ID=$(az backup recoverypoint list \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--backup-management-type AzureIaasVM \
--workload-type VM \
--query "[0].name" -o tsv)
# Execute disk restore to storage account
az backup restore restore-disks \
--resource-group $RG \
--vault-name $VAULT_NAME \
--container-name "IaasVMContainer;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--item-name "VM;iaasvmcontainerv2;$VM_RG;$VM_NAME" \
--rp-name $RECOVERY_POINT_ID \
--storage-account "stagebackuprestore" \
--restore-to-staging-storage-account true
Monitor jobs:
# List recent jobs
az backup job list \
--resource-group $RG \
--vault-name $VAULT_NAME \
--output table
# Check specific job details
az backup job show \
--resource-group $RG \
--vault-name $VAULT_NAME \
--name <job-id>
6.3 Azure PowerShellβ
Enable protection:
$vault = Get-AzRecoveryServicesVault `
-ResourceGroupName "rg-backup-prod" `
-Name "rsv-prod-brazilsouth"
Set-AzRecoveryServicesVaultContext -Vault $vault
$policy = Get-AzRecoveryServicesBackupProtectionPolicy `
-Name "policy-vm-prod-daily"
$vm = Get-AzVM `
-ResourceGroupName "rg-app-prod" `
-Name "vm-producao-01"
Enable-AzRecoveryServicesBackupProtection `
-ResourceGroupName "rg-app-prod" `
-Name "vm-producao-01" `
-Policy $policy
On-demand backup:
$container = Get-AzRecoveryServicesBackupContainer `
-ContainerType AzureVM `
-FriendlyName "vm-producao-01"
$item = Get-AzRecoveryServicesBackupItem `
-Container $container `
-WorkloadType AzureVM
Backup-AzRecoveryServicesBackupItem `
-Item $item `
-ExpiryDateTimeUTC (Get-Date).AddDays(30)
List points and restore:
# List recovery points
$rps = Get-AzRecoveryServicesBackupRecoveryPoint `
-Item $item
# Select the most recent
$rp = $rps[0]
# Restore disks to storage account
$storageAccount = Get-AzStorageAccount `
-ResourceGroupName "rg-restore-staging" `
-Name "stagebackuprestore"
Restore-AzRecoveryServicesBackupItem `
-RecoveryPoint $rp `
-StorageAccountName $storageAccount.StorageAccountName `
-StorageAccountResourceGroupName $storageAccount.ResourceGroupName `
-RestoreOnlyOSDisk $false
7. Control and Securityβ
RBAC for backup and restore operationsβ
| Operation | Minimum required role |
|---|---|
| Enable protection on VM | Backup Contributor + VM Contributor |
| Execute on-demand backup | Backup Operator |
| Restore VM (Create New) | Backup Operator + permission on destination RG |
| Restore disks | Backup Operator + Storage Contributor on staging storage |
| Replace Existing | Backup Operator + VM Contributor on VM |
| File Recovery | Backup Operator |
| Stop protection | Backup Contributor |
Multi-user Authorization (MUA)β
Azure Backup supports Multi-user Authorization: critical operations like disabling soft delete, stopping protection with data deletion, or modifying vault security settings may require approval from a second administrator.
To enable MUA, a Resource Guard configured in a separate subscription is required, managed by a different security team from the backup team. This implements role segregation for destructive operations.
Protection against unauthorized restorationβ
Restore creates real resources (VMs, disks, files) that consume cost and may expose sensitive data. Therefore:
- Limit access to the Backup Operator role only to those who truly need to execute restores
- Monitor restore jobs via Azure Monitor and configure alerts for any restore operation
- Use Resource Locks on production vaults to prevent accidental deletion
8. Decision Makingβ
Which restore type to useβ
| Scenario | Restore Modality | Reason |
|---|---|---|
| Corrupted VM, need to restore everything without losing the original | Create New VM | Creates parallel VM; original intact for comparison |
| Individual file accidentally deleted | File-Level Recovery | Much faster; no need to restore entire VM |
| Corrupted OS disk, VM exists but won't boot | Replace Existing | Replaces existing VM disk; more direct |
| Migration or VM reconfiguration before restoring | Restore Disks | More control; allows manual VM reconfiguration |
| Primary region unavailable due to disaster | Cross Region Restore | Only available method to restore in secondary region |
| Need to test restore without impacting production | Create New VM | Restoration in isolated environment |
Scheduled vs on-demand backupβ
| Situation | Approach | Reason |
|---|---|---|
| Planned maintenance or critical deployment | On-demand | Guarantees recovery point at exact moment before change |
| Continuous workload protection | Scheduled | Automatic, consistent, no manual intervention |
| Audit or compliance on specific date | On-demand | Creates point with specific retention for that date |
| Maintenance window outside scheduled hours | On-demand | Complements standard scheduling |
9. Best Practicesβ
Always perform on-demand backup before significant changes: critical security patches, application updates, configuration changes, or migration are moments that require a current recovery point, regardless of scheduling.
Test restore regularly in isolated environment: untested backups are of unknown value. Schedule quarterly restore tests using "Create New VM" in a sandbox Resource Group, verifying data integrity and execution time.
Use File-Level Recovery for point-in-time recoveries: recovering a single file via File Recovery is orders of magnitude faster than restoring an entire VM. Train the team to use this path before triggering a complete restore.
Monitor the first backup of each new item: the first backup is the most critical and the most time-consuming. Actively verify the first backup job when enabling protection on new resources.
Document actual RTO: theoretical RTO (based on estimates) frequently differs from actual RTO (based on tests). Execute a real restore and measure the time to have documented and reliable RTO.
Use separate staging Resource Group for restores: disk restores produce artifacts (VMs, disks, configuration files) that need to be managed. Use a dedicated Resource Group for restore staging, with automatic cleanup after validation.
10. Common Errorsβ
Error: expecting restore to be instantaneous Why it happens: confusion between the speed of Instant Restore (snapshot) and restore from vault. How to avoid: understand that Instant Restore is only available within the snapshot retention period (1 to 5 days for Standard). Outside this period, restore comes from vault and may take hours.
Error: using Replace Existing when VM is shut down Why it happens: operator shuts down VM before restoring, thinking it's safer. How to avoid: remember that Replace Existing requires VM to be running. For shut down VM, use Restore Disks or Create New VM.
Error: not configuring staging storage account before disk restore Why it happens: disk restore requires a storage account in the same region. Operators forget to create storage beforehand. How to avoid: include staging storage creation in restoration playbook. Ideally, maintain a dedicated staging storage account, always available.
Error: confusing File Recovery with complete restore for large files Why it happens: File Recovery mounts disk as volume; for very large files (tens of GB), copying over network can be as slow as complete restore. How to avoid: for large file recovery (above few GB), evaluate if Restore Disks would be more efficient.
Error: not verifying integrity after restore Why it happens: operator assumes "restore completed" means "data integrity". How to avoid: always validate restore: start VM, verify services, confirm critical data integrity before declaring restore successful.
Error: performing direct restore to production without parallel testing Why it happens: pressure for speed in incident situations. How to avoid: use "Create New VM" to validate recovery point in parallel before executing Replace Existing or decommissioning current VM.
11. Operation and Maintenanceβ
Monitoring backup jobsβ
In the portal, access the vault and navigate to Backup Jobs. Possible states:
| Job Status | Meaning | Required Action |
|---|---|---|
| Completed | Job finished successfully | None |
| Completed with warnings | Job completed but with warnings (e.g., open file not included) | Investigate warning |
| In Progress | Job running | Wait; monitor duration |
| Failed | Job completely failed | Investigate cause; check logs |
| Cancelled | Job manually cancelled | Verify if intentional |
Common causes of backup failureβ
| Cause | Symptom | Resolution |
|---|---|---|
| VMSnapshot extension outdated or corrupted | Failure in snapshot phase | Re-install extension in portal |
| VM without connectivity to Azure Backup endpoint | Timeout during transfer | Check NSG, UDR and Private Endpoints |
| Disk exceeding supported limit | Disk backup failure | Check maximum supported size (32 TB) |
| Staging storage account deleted | Disk restore failure | Recreate staging storage account |
| Insufficient permissions | 403 error in logs | Review RBAC for operator and extension |
Important operational limitsβ
| Limit | Value |
|---|---|
| Maximum disk size for VM backup | 32 TB |
| Recovery points per protected item (VMs) | 9999 |
| Instant Restore window (Standard) | 1 to 5 days |
| Instant Restore window (Enhanced) | 1 to 30 days |
| File Recovery mounted volume duration | 12 hours (default), maximum 24h |
| Maximum on-demand backups per day per item | 9 |
| Minimum retention time for on-demand | 1 day |
12. Integration and Automationβ
On-demand backup automation before deploysβ
A common pattern is integrating on-demand backup in CI/CD pipelines, ensuring a recovery point is created before any production deploy:
Automated alerts via Azure Monitorβ
Configure alerts for backup failures without manual monitoring:
# Create alert rule for backup failures
az monitor alert create \
--name "alert-backup-failure" \
--resource-group rg-backup-prod \
--target /subscriptions/{sub}/resourceGroups/rg-backup-prod/providers/Microsoft.RecoveryServices/vaults/rsv-prod-brazilsouth \
--condition "category=Backup and level=Error" \
--action-group /subscriptions/{sub}/resourceGroups/rg-backup-prod/providers/microsoft.insights/actionGroups/ag-backup-ops
Runbooks for automated restoreβ
For DR (Disaster Recovery) scenarios with aggressive RTO, it's possible to create Azure Automation Runbooks that execute restore automatically when triggered by an alert or webhook. The runbook calls Azure Backup APIs via PowerShell to:
- Identify the most recent recovery point
- Trigger disk restore to staging
- Create new VM from disks
- Execute post-restore validation scripts
- Redirect traffic (via Azure Load Balancer or DNS) to the new VM
13. Final Summaryβ
What are backup and restore operations: set of actions that enable protection on Azure resources, execute scheduled and manual backups, and recover data from restoration points stored in the vault.
Essential points:
- Enabling protection associates the resource to the vault and policy; the first backup is always full and may take hours
- On-demand backup creates an immediate recovery point with retention defined at execution; does not cancel the next scheduled backup
- There are four restore modes for VMs: Create New VM, Restore Disks, Replace Existing and Cross Region Restore
- File-Level Recovery mounts the recovery point as local volume for up to 24 hours for individual file recovery
- Instant Restore uses the snapshot tier for fast restoration; outside the snapshot period, restore comes from vault and is slower
Critical differences between restore modes:
| Mode | Original VM affected | VM needs to be on | Speed |
|---|---|---|---|
| Create New VM | No | Indifferent | Moderate |
| Restore Disks | No | Indifferent | Moderate |
| Replace Existing | Yes (disks replaced) | Yes, mandatory | More direct |
| Cross Region Restore | No | Indifferent | Slower (data from secondary region) |
| File Recovery | No | Indifferent | Fast for small files |
What needs to be remembered for AZ-104:
- Replace Existing requires VM to be running
- File Recovery mounts volume for maximum 24 hours
- Cross Region Restore requires vault with GRS and CRR enabled
- First backup is always full; subsequent ones are incremental (Enhanced Policy)
- Maximum 9 on-demand backups per day per protected item
- On-demand backup doesn't replace next scheduled backup; both occur
- Instant Restore is only available within snapshot retention period configured in policy