Theoretical Foundation: Create a Recovery Services Vault
1. Initial Intuitionβ
Imagine you have a critical server running your application. One day, due to human error, a disk fails, or someone accidentally deletes important files. Without a protection plan, you've lost data. With a plan, you restore everything in minutes.
The Recovery Services Vault is exactly this "protection vault" in Azure. It is a managed container that stores backup and disaster recovery data for Azure and on-premises resources.
The most direct analogy: think of a bank vault. You deposit your most valuable assets there (VM backups, databases, files) and when you need them, you withdraw them with guaranteed integrity. The bank (Azure) manages the vault infrastructure; you manage what goes in and the access rules.
In practice, the Recovery Services Vault serves two main purposes:
- Azure Backup: protect data against deletion, corruption, or failure
- Azure Site Recovery (ASR): replicate virtual machines to another region and ensure business continuity in case of regional disaster
2. Contextβ
Within the Azure ecosystem, data protection is organized in layers. The Recovery Services Vault is the central element of this structure.
The vault exists because Azure needed a unified resource that:
- Manages metadata and backup data with automatic redundancy
- Centralizes retention and scheduling policies
- Provides granular access control via RBAC
- Ensures compliance with soft delete and immutability
- Allows centralized monitoring and alerts
Without the vault, each backup solution would be isolated, without unified visibility and without integrated security guarantees.
3. Concept Constructionβ
3.1 What composes a Recovery Services Vaultβ
Before creating the vault, you need to understand its fundamental elements.
Region: the vault is a regional resource. It can only protect resources in the same region or replicate resources to another region. You cannot backup a VM in Brazil South in a vault in East US.
Storage redundancy: defines how backup data is physically replicated. This is configured in the vault and affects cost and resilience.
| Type | Acronym | Copies | Use case |
|---|---|---|---|
| Locally Redundant Storage | LRS | 3 copies in the same zone/datacenter | Low cost, acceptable if there is ASR |
| Geo-Redundant Storage | GRS | 6 copies in 2 distinct regions | Protection against regional disaster |
| Zone-Redundant Storage | ZRS | 3 copies in different zones | High zonal availability |
The default is GRS. If the vault is used only with ASR and replication data is already in another region, LRS may be sufficient, reducing cost.
Cross Region Restore (CRR): functionality that allows restoring backups from a secondary region, even if the primary region is unavailable. Only available when redundancy is GRS.
Soft Delete: additional protection that retains backup data for 14 days after deletion, preventing accidental or malicious loss. Enabled by default since 2020.
Immutability: prevents alteration or deletion of backup data during the defined retention period. Critical for regulatory compliance.
3.2 Backup Policiesβ
A Backup Policy is a set of rules that defines:
- How often to backup (scheduling frequency)
- What time to execute
- How long to retain recovery points
Policies are associated with the vault and applied to individual protected items.
There are two types of policy:
- Standard Policy: daily backup with granular retention options by day, week, month, and year
- Enhanced Policy: supports hourly backup, offering lower RPO (Recovery Point Objective), but with higher cost
4. Structural Viewβ
The Recovery Services Vault positions itself as a central hub between data sources and protection destinations.
5. Practical Operationβ
Recovery Services Vault lifecycleβ
A critical and frequently ignored behavior: storage redundancy can only be changed before the first backup is executed. After that, the only way to change is to delete all protected items, change the configuration, and re-protect everything.
Prerequisites for creationβ
Before creating the vault, you need to have:
- An active subscription in Azure
- An existing Resource Group or permission to create one
- RBAC permission: minimum of Contributor in the Resource Group or Backup Contributor in the desired scope
- Define the region based on the resources to be protected
6. Implementation Methodsβ
6.1 Azure Portal (Graphical Interface)β
When to use: one-time creation, learning environments, visual validation of configurations.
Steps:
- Access portal.azure.com
- Search for "Recovery Services vaults" in the search bar
- Click "Create"
- Fill in the fields:
- Subscription: select the correct subscription
- Resource Group: create a new one or select existing
- Vault name: unique name in the Resource Group (3 to 50 characters, alphanumeric and hyphens)
- Region: same region as the resources to protect
- Click "Review + Create" then "Create"
After creation, access the vault and configure immediately in Properties > Backup Configuration:
- Storage Replication Type (LRS, GRS, or ZRS)
- Cross Region Restore (requires GRS)
- Security Settings (Soft Delete, immutability)
Limitation: manual process, not replicable, subject to human error at scale.
6.2 Azure CLIβ
When to use: automation scripts, CI/CD pipelines, batch creation.
# Create Resource Group (if necessary)
az group create \
--name rg-backup-prod \
--location brazilsouth
# Create the Recovery Services Vault
az backup vault create \
--resource-group rg-backup-prod \
--name rsv-prod-brazilsouth \
--location brazilsouth
# Configure storage redundancy
az backup vault backup-properties set \
--resource-group rg-backup-prod \
--name rsv-prod-brazilsouth \
--backup-storage-redundancy GeoRedundant
# Enable Cross Region Restore
az backup vault backup-properties set \
--resource-group rg-backup-prod \
--name rsv-prod-brazilsouth \
--cross-region-restore-flag true
Advantage: fast, scriptable, integrable into pipelines. Limitation: requires Azure CLI installed and authenticated.
6.3 Azure PowerShellβ
When to use: corporate Windows environments, automation integrated with existing PowerShell scripts.
# Create Resource Group
New-AzResourceGroup -Name "rg-backup-prod" -Location "brazilsouth"
# Create Recovery Services Vault
New-AzRecoveryServicesVault `
-ResourceGroupName "rg-backup-prod" `
-Name "rsv-prod-brazilsouth" `
-Location "brazilsouth"
# Get vault reference
$vault = Get-AzRecoveryServicesVault `
-ResourceGroupName "rg-backup-prod" `
-Name "rsv-prod-brazilsouth"
# Configure redundancy
Set-AzRecoveryServicesBackupProperty `
-Vault $vault `
-BackupStorageRedundancy GeoRedundant
6.4 ARM Template (Azure Resource Manager)β
When to use: Infrastructure as Code, environments with strict governance, repeatable and versioned deployments.
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.RecoveryServices/vaults",
"apiVersion": "2023-01-01",
"name": "rsv-prod-brazilsouth",
"location": "[resourceGroup().location]",
"sku": {
"name": "RS0",
"tier": "Standard"
},
"properties": {}
}
]
}
Advantage: versionable, auditable, reusable in multiple environments. Limitation: higher learning curve; post-creation configurations (redundancy, soft delete) require additional resources in the template.
6.5 Terraformβ
When to use: multi-cloud environments, teams already using Terraform as IaC standard.
resource "azurerm_resource_group" "backup" {
name = "rg-backup-prod"
location = "Brazil South"
}
resource "azurerm_recovery_services_vault" "main" {
name = "rsv-prod-brazilsouth"
location = azurerm_resource_group.backup.location
resource_group_name = azurerm_resource_group.backup.name
sku = "Standard"
soft_delete_enabled = true
storage_mode_type = "GeoRedundant"
}
Advantage: state management, execution plan, native integration with other Azure resources. Limitation: requires Terraform installed and AzureRM provider configured.
7. Control and Securityβ
RBAC in Recovery Services Vaultβ
The vault supports specific roles to separate responsibilities:
| Role | Capability |
|---|---|
| Backup Contributor | Create/manage backups, create vaults, cannot delete |
| Backup Operator | Enable backup, trigger jobs, restore. Cannot remove protection |
| Backup Reader | Read-only. View backups and jobs |
| Site Recovery Contributor | Manage ASR completely, except create vaults |
| Site Recovery Operator | Execute failover and failback |
Soft Deleteβ
When enabled (default), when deleting a protected item:
- Data is retained for 14 additional days in "soft deleted" state
- During this period, deletion can be undone (undelete)
- After 14 days, data is permanently removed
- It's possible to extend to 180 days with extended retention configuration
Attention: even with the vault "empty" of active items, if there are items in soft delete, the vault cannot be deleted. You need to purge (permanently delete) the soft deleted items first.
Immutabilityβ
Three possible states:
| State | Behavior |
|---|---|
| Disabled | No immutability protection |
| Enabled (unlocked) | Immutable data, but can be disabled |
| Enabled (locked) | Immutable data, cannot be reverted. Irreversible |
The "locked" state is required by regulations like LGPD, SOC 2, and Brazilian banking regulations when requiring proof that backups were not tampered with.
8. Decision Makingβ
Storage redundancyβ
| Situation | Best choice | Reason |
|---|---|---|
| Critical VM without ASR configured | GRS | Protection against regional disaster via backup |
| VM with ASR replicating to another region | LRS | ASR already ensures regional recovery; LRS reduces cost |
| Zonal availability requirement | ZRS | Protects against zone failure within the same region |
| Dev/test environment | LRS | Minimum cost; tolerable loss |
Number of vaults per environmentβ
| Scenario | Recommendation | Reason |
|---|---|---|
| Separated prod, staging, dev environments | One vault per environment | Policy and access isolation |
| Multiple regions | One vault per region | Vault is regional; data should stay close to resources |
| Compliance with multiple departments | One vault per department or BU | RBAC and retention policy isolation |
| Small organization, simple resources | Single vault | Operational simplicity |
9. Best Practicesβ
Standardized naming: use a clear and consistent convention. Example: rsv-[environment]-[region] like rsv-prod-brazilsouth or rsv-dev-eastus2.
Mandatory tags: apply tags like Environment, CostCenter, Owner, and Application on the vault to facilitate governance and chargeback.
Minimum access policy: use specific backup roles (Backup Operator, Backup Contributor) instead of giving Contributor or Owner to backup operators.
Soft Delete always enabled: never disable in production environments. The cost of 14 days additional retention is negligible compared to the risk of irreversible loss.
Separation by region: never try to centralize backups from multiple regions in a single vault. This is not supported and compromises latency and data residency compliance.
Proactive monitoring: configure alerts in Azure Monitor for backup job failures. A backup silently failing for weeks is a serious risk.
Test restore regularly: creating backups without ever testing restoration is a false security practice. Schedule periodic restore tests in isolated environments.
10. Common Errorsβ
Error: creating the vault in a different region than resources Why it happens: the operator creates the vault in a "default" region without checking where the VMs are. How to avoid: check the region of protected resources before creating the vault. The rule is: vault and resource in the same region for backup.
Error: forgetting to configure redundancy before the first backup Why it happens: the vault is created and protection is activated immediately without reviewing storage settings. How to avoid: create the vault, immediately configure storage redundancy and soft delete, only then activate item protection.
Error: trying to delete a vault with protected items or in soft delete Why it happens: the operator removes VMs from Azure and assumes the vault is empty. How to avoid: before deleting the vault, check in Backup Items, Replication Items, and Backup Infrastructure if there are active items. Execute purge on soft deleted items.
Error: using a single vault for all environments Why it happens: excessive simplification to reduce management. How to avoid: separate vaults by environment (prod, staging, dev) to prevent dev retention policies from impacting prod and to facilitate RBAC.
Error: not configuring backup failure alerts Why it happens: assuming "if there's no alert, it's working". How to avoid: configure immediately after vault creation backup notifications via Azure Monitor or email alerts in the vault.
11. Operation and Maintenanceβ
Daily monitoringβ
In the portal, access the vault and check:
- Backup Jobs: jobs with Failed or Warning status require immediate attention
- Backup Alerts: active alerts that need investigation
- Backup Reports (via Azure Monitor Workbooks): historical view of backup compliance
Important Recovery Services Vault limitsβ
| Limit | Value |
|---|---|
| Vaults per subscription | No documented limit, but recommended to organize by Resource Group |
| VMs protected per vault | 1000 VMs per vault (performance recommendation) |
| Backup policy per vault | Up to 200 policies |
| Recovery points per item | Varies by type, up to 9999 for VMs |
| Maximum protected disk size (VM) | 32 TB |
Cost managementβ
Recovery Services Vault costs are composed of:
- Protected instance: charged per protected VM, based on disk size
- Backup storage: charged by data volume stored, with different costs for LRS, ZRS and GRS
- Transactions: read/write operations on storage
Monitor via Azure Cost Management filtering by Resource Group or vault tags for cost visibility per environment.
12. Integration and Automationβ
Integration with Azure Policyβ
You can use Azure Policy to ensure that all VMs in a subscription or Resource Group are protected by a specific vault. The built-in policy Configure backup on VMs of a location to an existing central Vault in the same location automates the association of new VMs to the vault.
Automation with Azure Automation / Logic Appsβ
Common automation patterns:
- Trigger backup on-demand via runbook when a significant change is detected (e.g., before a deployment)
- Weekly compliance report sent by email via Logic App querying the vault API
- Auto-register new VMs to the vault via event-driven automation with Azure Event Grid
REST APIβ
The vault exposes complete REST APIs. Example of creation via API:
PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.RecoveryServices/vaults/{vaultName}?api-version=2023-01-01
With body:
{
"location": "brazilsouth",
"sku": {
"name": "RS0",
"tier": "Standard"
},
"properties": {}
}
13. Final Summaryβ
What it is: regional container in Azure that stores backup data and disaster recovery configurations (ASR) for Azure and on-premises resources.
Essential points:
- The vault is always regional: must be in the same region as the protected resources
- Storage redundancy (LRS, GRS, ZRS) can only be changed before the first backup
- Soft Delete is enabled by default and retains data for 14 days after deletion. Vaults with soft deleted items cannot be deleted without purge
- Cross Region Restore is only available with GRS redundancy
- A vault cannot be deleted while there are protected, replicated, or soft deleted items
- Use specific backup roles (Backup Contributor, Backup Operator) instead of generic roles to follow the principle of least privilege
Critical differences:
| Point | Detail |
|---|---|
| GRS vs LRS | GRS for primary backup; LRS when ASR already provides regional resilience |
| Soft Delete vs Immutability | Soft delete protects against accidental deletion; immutability protects against tampering |
| Locked vs Unlocked Immutability | Locked is irreversible. Use only when required by regulation |
| Standard vs Enhanced Policy | Enhanced supports hourly backup (lower RPO), with higher cost |
What needs to be remembered for AZ-104:
- Vault created before any backup configuration
- Redundancy configured immediately after creation, before the first backup
- Vault and protected resource must be in the same region
- Soft delete prevents immediate data deletion; requires manual purge for permanent deletion
- Granular RBAC available with specific backup and site recovery roles