Skip to main content

Theoretical Foundation: Configure Azure Storage Redundancy


1. Initial Intuition​

Imagine you have an extremely important document: a company contract. You don't keep just one copy. You make one copy in the office safe, another at home, and another at a notary office in another city. If the office catches fire, you still have the other two copies.

Storage redundancy is exactly this principle applied to Azure: the service automatically maintains multiple copies of your data in different physical locations, so that hardware failures, datacenter failures, or even geographic disasters don't cause data loss.

The difference between redundancy options lies in how many copies exist and where they are physically located.


2. Context​

Every Storage Account in Azure has a redundancy configuration. This configuration determines:

  • The account's availability SLA
  • Data durability (probability of no loss)
  • Storage cost
  • Recovery capability in case of regional failure

Microsoft guarantees durability of at least 11 nines (99.999999999%) for all redundancy options, but what varies is the type of failure each option protects against.

Other Azure services that depend on the configured redundancy:

  • Azure Backup
  • Azure Site Recovery
  • VM diagnostics
  • Azure Functions (internal storage)
  • Any application that uses Azure Storage SDK

3. Building the Concepts​

3.1 The physical structure you need to understand first​

Before talking about redundancy, you need to understand Azure's physical hierarchy:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular
  • Region: A set of datacenters in a geographic area (e.g., East US, Brazil South).
  • Availability Zone (AZ): A physically separate datacenter within a region, with independent power, cooling, and network. Not every region has zones.
  • Fault Domain: A rack or set of racks that share power and network switch. A failure in one fault domain (e.g., power outage in a rack) doesn't affect another.

Each redundancy option spreads copies differently within this hierarchy.


3.2 The six redundancy options​

Azure offers six configurations. They are organized into two groups: local (within a region) and geographic (between regions).

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

3.3 LRS: Locally Redundant Storage​

What it is: Maintains 3 synchronous copies of data within a single datacenter, in different fault domains.

Analogy: Three copies of the contract stored in three different cabinets in the same office.

Protects against: Hardware failure (disk, server, rack).

Doesn't protect against: Entire datacenter failure (fire, flood, prolonged power outage).

AttributeValue
Copies3
Location1 datacenter, multiple fault domains
Durability11 nines (99.999999999%)
Read availability SLA99.9%
Write availability SLA99.9%
Relative costCheapest

When write is confirmed: The write is considered successful only after being replicated to all 3 copies synchronously.


3.4 ZRS: Zone Redundant Storage​

What it is: Maintains 3 synchronous copies of data, each in a different Availability Zone within the same region.

Analogy: Three copies of the contract in three different buildings in the same city, each with its own power generation.

Protects against: Entire datacenter failure, local network failure, disasters affecting one zone.

Doesn't protect against: Disaster affecting the entire region (earthquake, regional blackout).

AttributeValue
Copies3
Location3 availability zones in the same region
Durability12 nines (99.9999999999%)
Read availability SLA99.9%
Write availability SLA99.9%
Relative costModerate

Important restriction: ZRS is only available in regions that have Availability Zones. Check regional availability before choosing.


3.5 GRS: Geo-Redundant Storage​

What it is: Combines LRS in the primary region (3 synchronous copies) with asynchronous replication to a geographically distant secondary region (where 3 more LRS copies are maintained). Total of 6 copies.

Analogy: Three copies in the office in SΓ£o Paulo and three copies in an archive in Miami. The Miami copies are updated with a small delay.

Protects against: Complete regional disasters.

Doesn't protect against: Recently written data that hasn't been replicated yet (RPO is not zero).

AttributeValue
Copies6 (3 primary LRS + 3 secondary LRS)
Location2 geographically separated regions
Durability16 nines (99.99999999999999%)
Read availability SLA99.9% (primary only)
Write availability SLA99.9%
Relative costHigh

Critical behavior: By default, the secondary region in GRS does not accept reads. It exists only for failover. You cannot access that data unless you initiate a failover or use RA-GRS.

RPO (Recovery Point Objective): Since replication is asynchronous, in case of disaster there may be data loss from the last few minutes. Microsoft documents typical RPO as less than 15 minutes, but there's no contractual guarantee of this value.


3.6 GZRS: Geo-Zone Redundant Storage​

What it is: Combines ZRS in the primary region (3 copies in 3 different zones) with asynchronous replication to the secondary region (where LRS maintains 3 additional copies). Total of 6 copies.

Analogy: Three copies in three different buildings in SΓ£o Paulo and three copies in Miami.

This is the highest resilience option available in Azure Storage.

AttributeValue
Copies6 (3 primary ZRS + 3 secondary LRS)
Location3 primary zones + 1 secondary region
Durability16 nines
Read availability SLA99.99% (with RA-GZRS)
Write availability SLA99.9%
Relative costHighest

3.7 RA-GRS and RA-GZRS: Read-Access​

The RA-GRS and RA-GZRS variants are identical to GRS and GZRS respectively, with one difference: they enable reading from the secondary region at any time, without needing failover.

The secondary region read endpoint has the -secondary suffix:

https://mystorageaccount-secondary.blob.core.windows.net

This is useful for:

  • Distributing read load geographically
  • Maintaining read availability even if the primary region goes down
  • Applications that accept slightly outdated data

Consistency model in secondary: Data read from the secondary region may be slightly behind the primary. This is called eventual consistency. Don't use secondary for reads that require absolutely current data.


4. Comparative Structural View​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

5. How It Works in Practice​

5.1 How synchronous replication works (LRS and ZRS)​

When your application writes to Azure Storage with LRS or ZRS:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

The 200 OK response is only returned after all 3 copies confirm. This ensures that in case of immediate failure after writing, no data is lost.


5.2 How asynchronous replication works (GRS and GZRS)​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Confirmation to the application is immediate after local replication. Replication to the secondary region happens in the background. This is where RPO is not zero.


5.3 Failover and what happens​

When the primary region fails with GRS/GZRS:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Critical behavior: After failover, the account returns to LRS state. You need to manually reconfigure to GRS/GZRS after the original region recovers.


6. Implementation Methods​

6.1 Azure Portal​

During Storage Account creation, the option is in the Basics tab, Redundancy field.

To change in an existing account: Storage Account > Configuration > Replication.

Advantage: Visual, with descriptions of each option. Limitation: Not scalable for multiple accounts.


6.2 Azure CLI​

During creation:

az storage account create \
--name mystorageaccount \
--resource-group myRG \
--location eastus \
--sku Standard_RAGZRS \
--kind StorageV2

Changing existing account:

az storage account update \
--name mystorageaccount \
--resource-group myRG \
--sku Standard_GRS

Valid values for --sku:

SKURedundancy option
Standard_LRSLRS
Standard_ZRSZRS
Standard_GRSGRS
Standard_GZRSGZRS
Standard_RAGRSRA-GRS
Standard_RAGZRSRA-GZRS
Premium_LRSPremium LRS
Premium_ZRSPremium ZRS

6.3 Azure PowerShell​

# Create with redundancy
New-AzStorageAccount `
-ResourceGroupName "myRG" `
-Name "mystorageaccount" `
-Location "eastus" `
-SkuName "Standard_RAGZRS" `
-Kind "StorageV2"

# Change existing redundancy
Set-AzStorageAccount `
-ResourceGroupName "myRG" `
-Name "mystorageaccount" `
-SkuName "Standard_GRS"

6.4 Bicep / ARM​

resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' = {
name: 'mystorageaccount'
location: 'eastus'
sku: {
name: 'Standard_RAGZRS'
}
kind: 'StorageV2'
properties: {}
}

6.5 Azure Policy (compliance automation)​

To ensure every Storage Account uses at least ZRS:

{
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Storage/storageAccounts"
},
{
"field": "Microsoft.Storage/storageAccounts/sku.name",
"in": ["Standard_LRS", "Premium_LRS"]
}
]
},
"then": {
"effect": "deny"
}
}

7. Control and Security​

7.1 Which changes are allowed (and which are not)​

Not every redundancy transition is directly possible:

FromToPossible?How
LRSZRSYes (some regions)Live migration or recreate
LRSGRS/RA-GRSYesDirect in portal/CLI
LRSGZRS/RA-GZRSYesDirect
ZRSLRSYesDirect
ZRSGZRS/RA-GZRSYesDirect
GRSRA-GRSYesDirect
GRSLRSYesDirect
GRSZRSNot directRecreate account or live migration
GZRSRA-GZRSYesDirect
RA-GRSGRSYesDirect

General rule: Conversions involving changes between local and zonal replication (LRS to ZRS and vice versa) require data migration, not just configuration changes.

7.2 Live Migration to ZRS​

When you request migration from LRS to ZRS via Microsoft support:

  • Data is copied to remaining zones transparently
  • No downtime for the application
  • Available only in regions with availability zones
  • Can take days depending on data volume

8. Decision Making​

8.1 Decision guide by scenario​

SituationBest choiceReason
Development/test dataLRSMinimum cost, non-critical data
Production app in region with AZsZRSZone protection without geo cost
Regulatory continuity requirementGRS or GZRSData replicated in second region
High global read availabilityRA-GRS or RA-GZRSSecondary read without failover
Maximum available resilienceGZRS or RA-GZRSZRS primary + geo replication
High volume analytics dataLRSHigh volume, geo cost unviable
LGPD/GDPR regulatory complianceLRS or ZRSData doesn't leave region/country
Critical data backupGRSSecond region ensures recovery
Application tolerant to slightly outdated dataRA-GRSSecondary read with eventual consistency

8.2 Cost versus resilience trade-offs​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Cost increases from left to right. So does resilience.


8.3 Synchronous vs asynchronous replication: what this means for RPO​

ReplicationRPOMeans
LRS / ZRS (synchronous)0No data lost on failure
GRS / GZRS (asynchronous for geo)Typically < 15 minMay lose recent data in disaster

RPO (Recovery Point Objective) is the maximum data that can be lost. This is a business decision, not just technical.


9. Best Practices​

  • Define redundancy during creation: Changing later, especially to ZRS, requires migration and can generate operational complexity.
  • Use ZRS as minimum standard for production in regions that support availability zones.
  • Prefer RA-GRS or RA-GZRS instead of GRS/GZRS when the application can benefit from secondary reads.
  • Document expected RPO and RTO for each Storage Account and align with redundancy choice.
  • Test failover in staging environments before relying on it in production.
  • Use Azure Policy to ensure production accounts aren't inadvertently created with LRS.
  • Consider regulatory compliance: In some countries or sectors, data cannot leave the region. In that case, LRS or ZRS are mandatory.
  • For ADLS Gen2 (HNS enabled), verify support: Not all redundancy options are available for all account type combinations.

10. Common Errors​

ErrorWhy it happensHow to avoid
Believing GRS guarantees zero RPOGeo replication is asynchronousUnderstand there's a data loss window
Trying to read from secondary without RA-GRSSecondary isn't accessible by default with GRSUse RA-GRS or RA-GZRS when secondary read is needed
Not reconfiguring geo after failoverAfter failover account remains in LRSManually reactivate GRS/GZRS after primary recovery
Using LRS in production out of habitLRS is default in interfaceConfigure ZRS as default via Policy
Thinking SKU change is instantaneousMigration to ZRS can take daysPlan ahead
Ignoring that ZRS isn't available in all regionsRegions without AZs don't support ZRSCheck regional availability before designing
Using GRS with regulatory data that can't leave countryGRS replicates to another region/countryUse LRS or ZRS for geographically restricted data

11. Operation and Maintenance​

11.1 Monitoring replication health​

In Azure Monitor, you can monitor:

  • GeoReplicationStatus: Reports if geo replication is active and in what state.
  • LastSyncTime: The timestamp of the last moment when the secondary region was fully synchronized with the primary. Use this to understand current actual RPO.

To check LastSyncTime via CLI:

az storage account show \
--name mystorageaccount \
--resource-group myRG \
--query geoReplicationStats.lastSyncTime

--query geoReplicationStats


Returns:

```json
{
"canFailover": true,
"lastSyncTime": "2025-01-15T10:30:00Z",
"status": "Live"
}

11.2 How to initiate manual failover​

When the primary region fails and you need to promote the secondary:

az storage account failover \
--name mystorageaccount \
--resource-group myRG

Via PowerShell:

Invoke-AzStorageAccountFailover `
-ResourceGroupName "myRG" `
-Name "mystorageaccount"

Failover is not instantaneous. During the process, the account may become temporarily unavailable for writes.


11.3 Customer-managed vs Microsoft-managed failover​

TypeWho initiatesWhen it occurs
Customer-managed failoverAzure AdministratorManually, when decided
Microsoft-managed failoverMicrosoftAutomatically if Microsoft declares disaster and cannot recover the primary

Customer-managed failover requires canFailover to be true in the geoReplicationStats return.


12. Integration and Automation​

12.1 Azure Policy for redundancy compliance​

Create policy initiatives that prevent LRS in production environments:

az policy assignment create \
--name "require-zrs-or-higher" \
--policy "/providers/Microsoft.Authorization/policyDefinitions/<id>" \
--scope "/subscriptions/<subscription-id>/resourceGroups/production-rg"

12.2 Automated LastSyncTime monitoring​

Configure alerts in Azure Monitor when LastSyncTime exceeds a threshold:

  • Create a custom metric or use Log Analytics
  • Trigger alerts if LastSyncTime is more than 30 minutes delayed
  • Integrate with Azure Action Groups for email/Teams/PagerDuty notification

12.3 Terraform (IaC)​

resource "azurerm_storage_account" "example" {
name = "mystorageaccount"
resource_group_name = azurerm_resource_group.example.name
location = "eastus"
account_tier = "Standard"
account_replication_type = "RAGZRS"
}

Valid values for account_replication_type: LRS, ZRS, GRS, GZRS, RAGRS, RAGZRS.


13. Final Summary​

The six options and what they protect against:

  • LRS: 3 copies in 1 datacenter. Protects against hardware failure. Lowest cost.
  • ZRS: 3 copies in 3 zones. Protects against entire datacenter failure. Requires zones in the region.
  • GRS: Primary LRS + secondary LRS in another region (asynchronous). Protects against regional disaster. Secondary inaccessible by default.
  • GZRS: Primary ZRS + secondary LRS. Maximum resilience. Secondary inaccessible by default.
  • RA-GRS: Same as GRS, but with read access enabled on secondary.
  • RA-GZRS: Same as GZRS, but with read access enabled on secondary.

Critical differences:

  • Synchronous replication (LRS, ZRS): Zero RPO. Write only confirmed after all copies.
  • Asynchronous replication (geo): Non-zero RPO. Write confirmed before replication to secondary. Recent data may be lost in disaster.
  • GRS vs RA-GRS: Only the RA variant allows reading from secondary without failover.
  • After failover: Account returns to LRS. Manual reconfiguration of geo-redundancy is required.

What needs to be remembered:

  • Changing between LRS and ZRS may require data migration (not just configuration change).
  • ZRS is not available in all regions.
  • LastSyncTime is the practical indicator of current RPO for geo-redundant accounts.
  • Data with regulatory geographic location restrictions should use LRS or ZRS.
  • After a failover, reconfiguration to geo-redundancy is the administrator's responsibility.
  • Premium accounts only support LRS and ZRS (not GRS or GZRS).