Troubleshooting Lab: Interpret an Azure Resource Manager template or a Bicep file
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
An infrastructure team deploys an ARM template via Azure CLI to provision a Virtual Network with three subnets. The template was successfully validated using az deployment group validate and no errors were reported. However, when executing the actual deployment, the command fails with the following message:
(InvalidTemplateDeployment) The template deployment failed because of policy violation.
PolicyDefinitionName: 'allowed-locations'
Resource: 'Microsoft.Network/virtualNetworks/myVnet'
RequestedLocation: 'brazilsouth'
AllowedLocations: 'eastus, westeurope'
The team notes that the parameter file used in the deployment contains the following entry:
"parameters": {
"vnetLocation": {
"value": "brazilsouth"
},
"vnetName": {
"value": "myVnet"
},
"addressPrefix": {
"value": "10.0.0.0/16"
}
}
The template uses [parameters('vnetLocation')] to define the VNet's location property. The template was also successfully used last week to create a VNet in eastus with a different parameter file. The Git repository shows no changes to the template since then.
What is the root cause of the deployment failure?
A) The template contains a syntax error in the [parameters('vnetLocation')] function that is only detected at runtime, not during validation.
B) The value "brazilsouth" passed in the parameter file violates an Azure Policy applied to the resource group scope, which restricts allowed locations.
C) The az deployment group validate command doesn't validate external parameters, so the parameter file was ignored and the deployment used an incorrect defaultValue.
D) The VNet myVnet already exists in eastus and Azure blocked the attempt to recreate the resource in a different region.
Scenario 2 β Action Decisionβ
An administrator identifies that an ARM template deployment failed in production with the following error:
(ResourceNotFound) The Resource 'Microsoft.Network/networkSecurityGroups/nsg-backend'
under resource group 'rg-prod' was not found.
The template deploys a Virtual Machine that references an existing NSG via resourceId(). Investigation confirms that the NSG nsg-backend was accidentally deleted by another team member 20 minutes ago. The production environment is partially degraded because new VMs cannot be provisioned. The team has a backup of the original template that created the NSG, stored in the repository. The Service Principal used by the pipeline has Contributor permission on the resource group.
Recreating the NSG immediately would restore the ability to provision new VMs, but the security rules associated with the NSG need to be reviewed by the security team before application. The security team is available and can review within 30 minutes.
What is the correct action to take at this moment?
A) Immediately execute the deployment of the backup template to recreate the NSG with all original rules, restoring the environment without waiting for review.
B) Wait for complete review by the security team and only then recreate the NSG using the backup template, accepting the degradation period.
C) Recreate the NSG manually via the Azure portal without any inbound or outbound rules, and wait for the security team review before applying rules via template.
D) Open a Microsoft support ticket to restore the deleted NSG from a platform-managed infrastructure snapshot.
Scenario 3 β Root Causeβ
A Bicep file is compiled and deployed successfully. After deployment, a developer tries to access the output value storageKey via CLI with the following command:
az deployment group show \
--resource-group rg-dev \
--name main \
--query properties.outputs.storageKey.value
The command returns null. The Bicep file contains the following snippet:
resource storageAccount 'Microsoft.Storage/storageAccounts@2022-09-01' = {
name: storageAccountName
location: resourceGroup().location
sku: { name: 'Standard_LRS' }
kind: 'StorageV2'
}
output storageKey string = storageAccount.listKeys().keys[0].value
The developer confirms that the deployment was completed with Succeeded status. The resource group rg-dev contains only this deployment. The storage account is visible in the portal and operational. The developer has the Reader role assigned on the resource group.
What is the root cause of the null return for the storageKey output?
A) The listKeys() function is not supported in output blocks of Bicep files and the compiler silently ignores it, generating an empty output.
B) The parameter name in the CLI command is incorrect; the correct output would be storagekeys in lowercase, as Bicep normalizes output names.
C) The developer doesn't have permission to see the output value, as listKeys() requires the Microsoft.Storage/storageAccounts/listKeys/action action, which is not included in the Reader role.
D) The deployment was executed by a different Service Principal and outputs generated by other principals are hidden from users with the Reader role.
Scenario 4 β Diagnostic Sequenceβ
An engineer receives the following error when executing the deployment of a complex ARM template with multiple interdependent resources:
(CircularDependency) Found circular dependency for resource
'Microsoft.Compute/virtualMachines/vm-app'.
Dependencies: vm-app -> nic-app -> pip-app -> vm-app
The engineer needs to investigate and fix the problem. The available steps are:
- S1: Remove or correct the dependency that closes the cycle, ensuring the dependency graph is a DAG (directed acyclic graph).
- S2: Locate the
vm-appresource in the template and inspect itsdependsOnblock. - S3: Check if the
pip-app(Public IP) resource references or declares explicit dependency onvm-app. - S4: Confirm that the deployment executes successfully after the correction and that all resources were provisioned in the expected order.
- S5: Trace the declared dependency chain from
vm-appto identify where the cycle closes.
What is the correct sequence for investigation and resolution?
A) S2, S5, S3, S1, S4
B) S3, S2, S5, S4, S1
C) S5, S3, S1, S2, S4
D) S2, S3, S5, S4, S1
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The error message is explicit: PolicyDefinitionName: 'allowed-locations' and RequestedLocation: 'brazilsouth'. This indicates that an Azure Policy applied to the scope blocked the deployment because the requested location is not in the list of locations allowed by the policy. The parameter file correctly passed "brazilsouth" to the template, and the template correctly used that value, but the platform policy prevented resource creation in that region.
The irrelevant information in the scenario is the successful usage history in eastus the previous week. This fact confirms that the template is syntactically valid, but has no relation to the current failure, which is about policy compliance, not syntax.
The main diagnostic error that the distractors represent is focusing on the template as the source of the problem. Alternative A confuses validation behavior with runtime syntax errors, which don't apply here. Alternative C incorrectly describes the behavior of validate, which does consider external parameter files when provided. Alternative D is technically unlikely and not supported by any evidence in the statement.
The most dangerous distractor is A, as it would lead the engineer to look for a syntax error in the template when the real problem is in the subscription or resource group policy configuration.
Answer Key β Scenario 2β
Answer: B
The scenario establishes two critical constraints: the environment is degraded (pressure to act quickly) and the NSG rules need security review (governance constraint). Option B respects both constraints, accepting the degradation period of up to 30 minutes while the security team validates the rules before recreating the resource with verified configuration.
Option A ignores the explicit security constraint in the statement. In production, recreating an NSG with unreviewed rules can introduce a larger exposure window than the current degradation. Option C is technically possible, but creates an NSG without rules, which can block legitimate traffic and worsen degradation. Option D is incorrect because Azure doesn't offer automatic restoration of deleted resources via support for NSGs.
The 30-minute time constraint for review is the key clue: it makes waiting viable and removes the justification for acting immediately without validation.
Answer Key β Scenario 3β
Answer: C
The listKeys() function in Bicep is a listaction that, when used in an output block, requires the principal executing the deployment query to have the Microsoft.Storage/storageAccounts/listKeys/action permission. This permission is not included in the Reader role. When the user executes az deployment group show, Azure evaluates whether they have permission to see the value of that specific output. Since they don't, it returns null instead of an explicit error.
The irrelevant information is the deployment's Succeeded status and the storage account's visibility in the portal. Both confirm that the resource was created correctly, but have no relation to the ability to read the output value that depends on a privileged listaction.
Alternative A is false: listKeys() is supported in Bicep outputs and is not silently ignored. Alternative B is false: Bicep preserves the capitalization of declared output names. The most dangerous distractor is A, as it would lead the engineer to rewrite the Bicep instead of adjusting user permissions.
Answer Key β Scenario 4β
Answer: A β S2, S5, S3, S1, S4
The correct sequence starts from the resource identified in the error (vm-app), inspects its declared dependencies, traces the complete chain to locate where the cycle closes, identifies the resource that creates the cycle closure (pip-app referencing vm-app), corrects the problematic dependency, and validates the deployment.
Starting with S3 (alternative B) would be investigating an intermediate resource before understanding the origin point declared in the error, which is disorderly diagnosis. Alternative C jumps directly to tracing the chain without first locating the entry point in the template, making tracing more difficult. Alternative D delays the correction until after S4, which doesn't make sense since S4 is the final validation.
The correct logic is always to start from the resource identified in the error, progressively expand the graph, and correct before validating.
Troubleshooting Tree: Interpret an Azure Resource Manager template or a Bicep fileβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark blue | Initial symptom (entry point) |
| Blue | Diagnostic question |
| Red | Identified cause |
| Orange | Intermediate validation or check |
| Green | Recommended action or resolution |
When facing a real problem with ARM templates or Bicep files, start with the root node describing the observed symptom. At each question node, answer based on what you can directly verify from the error message or deployment behavior. Follow the path that corresponds to your context until reaching a red node (identified cause) or orange node (required check), then proceed to the corresponding green node, which indicates the resolution action. The diagram covers the most frequent paths and can be traversed in any order based on the presented symptom.