Skip to main content

Troubleshooting Lab: Configure Azure Storage redundancy

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A development team reports that the application is returning intermittent errors when trying to read blobs from a storage account. The administrator investigates and collects the following information:

  • The storage account is configured with RA-GRS
  • The application uses the secondary endpoint for all reads, as configured by the development team three weeks ago
  • The primary region is operational with no incidents registered in Azure Service Health
  • The storage account was created six months ago
  • The administrator checks the portal and identifies that a failover was completed the previous night by another member of the operations team

The output of the command executed by the administrator is:

az storage account show \
--name minhaconta \
--resource-group producao-rg \
--query "[sku.name, primaryEndpoints.blob, secondaryEndpoints]" \
--output json
[
"Standard_GRS",
"https://minhaconta.blob.core.windows.net/",
null
]

What is the root cause of the read errors in the application?

A) The account was downgraded from RA-GRS to GRS during the failover, eliminating the secondary read endpoint.

B) The primary region has high latency, causing intermittent failures on the secondary endpoint.

C) The completed failover promoted the secondary to primary, and the new state does not yet have a configured secondary region, making the secondary endpoint nonexistent.

D) The Standard_GRS SKU does not support reads on the secondary endpoint, requiring explicit RA-GRS.


Scenario 2 β€” Action Decision​

The operations team identified that a critical production storage account, currently configured with LRS, is located in an Azure region that experienced two datacenter incidents in the last twelve months. Leadership decided that the account should be migrated to GZRS to increase resilience.

The responsible administrator has the following information:

  • The account stores approximately 40 TB of historical log data accessed rarely
  • Migration to ZRS or GZRS can be done via direct conversion in the portal for LRS accounts in regions that support availability zones
  • The current region supports availability zones
  • The available maintenance window is 4 hours in the next early morning
  • No applications are writing to the account during the maintenance window
  • The security team requested an audit report of the account before any changes, but has not yet delivered the report template

What is the correct action to take during the maintenance window?

A) Start the direct conversion from LRS to GZRS through the portal, as all technical requirements are met and the window is available.

B) Wait for the audit report from the security team before executing any changes, as the change cannot occur without this document.

C) Create a new account with GZRS, copy data with AzCopy, and update application connections during the window.

D) Convert the account to ZRS first and, in a future window, scale to GZRS to reduce the risk of each step.


Scenario 3 β€” Root Cause​

An administrator receives a ticket reporting that a storage account is not replicating data to the secondary region. The account was created two days ago with the following configurations, according to the ARM template used:

{
"kind": "StorageV2",
"sku": {
"name": "Standard_ZRS"
},
"properties": {
"accessTier": "Hot",
"supportsHttpsTrafficOnly": true,
"minimumTlsVersion": "TLS1_2"
}
}

The administrator verifies in the portal that the account is operational, blobs are being written successfully, and TLS 1.2 is active. He also confirms that the subscription has available quota for geo-redundant accounts and that there are no blocking policies on the resource group.

The architecture team reports that they expected automatic geographic replication because the account was created in a region with a geographic pair defined by Azure.

What is the root cause of the absence of geographic replication?

A) The minimum TLS policy prevents geographic replication between regions with different TLS versions.

B) The account was deployed with Standard_ZRS SKU, which replicates between availability zones within the same region and does not perform geographic replication.

C) Geographic replication is not automatic in new accounts and must be manually enabled after 24 hours of creation.

D) The Hot access tier blocks geographic replication; it would be necessary to use Cool to enable GRS or GZRS.


Scenario 4 β€” Collateral Impact​

An administrator identifies that a storage account configured with GRS has failover available in the portal after an incident in the primary region. After confirming with leadership, he executes the failover manually. The process is completed successfully and the application returns to normal operation in the new primary region.

Three days later, the compliance team reports that blob access audit reports, which were automatically exported to a container in the same account, are incomplete and have gaps in the period immediately following the failover.

The administrator confirms that:

  • The main application is writing normally
  • The account's diagnostic settings point to the same destination container
  • The destination container exists and is accessible
  • No changes were made to diagnostic settings

Which collateral consequence of the failover explains the gaps in the reports?

A) The failover permanently interrupts diagnostic logs, requiring recreation of the storage account.

B) During the failover and synchronization period, diagnostic writes that occurred in the original primary region may not have been replicated to the secondary before promotion, generating gaps in audit data.

C) The log destination container was automatically deleted during failover to avoid corrupted data.

D) Diagnostic settings were reset to default values after failover, disabling log export.


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: C

The decisive clue is in the command output: the secondaryEndpoints field returns null. This confirms that a secondary endpoint no longer exists on the account. The reason is that the failover promoted the secondary region to primary. After failover completion, the account operates with a single region until Azure provisions a new secondary asynchronously, which can take hours or days depending on the amount of data.

The irrelevant information in the statement is the fact that the account was created six months ago. This data has no relation to post-failover behavior and was included to mislead the reader into looking for causes related to account age.

Alternative A is the most dangerous distractor. Although the displayed SKU is Standard_GRS after failover (and not Standard_RAGRS), this does not represent a functional downgrade by the failover itself; it's the expected transient state. The real cause is not the SKU change, but the absence of the secondary endpoint. Acting based on alternative A would lead the administrator to try reconfiguring the SKU as a solution, which would not solve the immediate problem.


Answer Key β€” Scenario 2​

Answer: B

The critical restriction of the scenario is explicit: the security team requested an audit report before any changes and has not yet delivered the template. Executing the conversion without this document violates a process control established by the organization itself. The technical window is available, infrastructure requirements are met, but the governance process has not been fulfilled.

Alternative A ignores the governance restriction and correctly applies only technical criteria. In an exam or in production, ignoring explicit process restrictions is an error even when the technical action is correct.

Alternative C describes a valid approach for migrations that do not support direct conversion, but is unnecessarily complex for this scenario, where direct conversion from LRS to GZRS is supported. Additionally, it also ignores the audit restriction.

Alternative D applies a step-by-step escalation logic that has no mandatory technical basis in this context; direct conversion from LRS to GZRS is supported in regions with availability zones, making this division unnecessary and risky by prolonging the exposure period.


Answer Key β€” Scenario 3​

Answer: B

The ARM template is the direct evidence: the configured SKU is Standard_ZRS. ZRS replicates data between availability zones within a single region. It does not perform any form of geographic replication, regardless of whether the region has a geographic pair defined by Azure.

The irrelevant information is the confirmation that the subscription has quota for geo-redundant accounts and that there are no blocks on the resource group. This data was included to mislead the reader into looking for administrative or permission causes, when the cause is simply the chosen SKU.

Alternative A is technically unfounded: the minimum TLS configuration does not affect geographic replication. Alternative C describes a behavior that does not exist on the platform. Alternative D is also false; the access tier has no relation to geographic replication support. The reasoning error that the distractors represent is looking for causes in visible and technical configurations when the cause is in the SKU, which is the central and most obvious parameter of the template.


Answer Key β€” Scenario 4​

Answer: B

GRS replicates asynchronously. This means that, at the time of failover, there may be a data delta between the primary and secondary regions that has not yet been synchronized. Recent writes, including diagnostic records generated immediately before and during the failover process, may not have been replicated in time. The observable result is gaps in audit logs corresponding exactly to the incident and failover period.

The clue in the statement is the combination of two factors: gaps occur in the period immediately following the failover and all configurations are correct after the event. This eliminates configuration causes and points directly to the loss of unsynchronized data inherent to GRS asynchronous replication.

Alternative D is the most dangerous distractor. The administrator could waste time inspecting and reconfiguring diagnostics that are already correct, delaying identification of the real cause and generating incorrect compliance reports about what caused the gap. The real consequence of acting based on it would be documenting a configuration incident that did not occur.


Troubleshooting Tree: Configure Azure Storage redundancy​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color legend:

ColorNode type
Dark blueInitial symptom (entry point)
Medium blueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeValidation or intermediate verification

To use this tree when facing a real problem, start with the root node describing the observed symptom. Answer each diagnostic question based on what you can verify directly, whether via portal, CLI, or command output. When you reach an identified cause node, read the corresponding action before executing any changes. If the path taken does not correspond to the actual symptom, return to the last decision node and follow the alternative branch. The tree covers the main failure patterns related to Azure storage redundancy and can be navigated in less than two minutes with the environment in front of you.