Skip to main content

Troubleshooting Lab: Implement and manage virtual network connectivity by using Azure Virtual Network Manager

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A platform team configured Azure Virtual Network Manager to manage connectivity between VNets from three product teams. The chosen topology was Hub and Spoke, with vnet-hub as the hub and vnet-app1, vnet-app2, and vnet-app3 as spokes, all in the East US region. The network group was created with static membership. The configuration was saved and successfully deployed to the East US region, as confirmed in the portal's deployment history.

Three days after deployment, the vnet-app3 team opens a ticket reporting that virtual machines in this VNet cannot reach any resources in vnet-hub. The teams for vnet-app1 and vnet-app2 have not reported any issues.

The administrator runs the following commands and gets the outputs below:

# Check peerings in vnet-hub
az network vnet peering list \
--resource-group rg-network \
--vnet-name vnet-hub \
--output table

Name PeeringState AllowForwardedTraffic
-------------------------------------- -------------- ---------------------
avnm-hub-to-vnet-app1-xxxxxx Connected True
avnm-hub-to-vnet-app2-xxxxxx Connected True

The administrator also observes that vnet-app3 was added to the network group two days ago, after the initial deployment. The VNet vnet-app3 was recently created from an ARM template and is located in East US. The subscription is within the scope of the Network Manager.

What is the root cause of the vnet-app3 connectivity failure?

A) The VNet vnet-app3 was created from an ARM template, which generates a resource identifier incompatible with Azure Virtual Network Manager
B) vnet-app3 was added to the network group after the configuration deployment; since the configuration was not redeployed after this addition, the peering was not created
C) Azure Virtual Network Manager does not automatically create peerings for VNets added to network groups with static membership after initial deployment
D) The peering between vnet-hub and vnet-app3 was created but is in "Initiated" state waiting for manual acceptance by the team responsible for vnet-app3


Scenario 2 β€” Action Decision​

The security team identified that a security admin rule created in Azure Virtual Network Manager is incorrectly blocking legitimate HTTPS traffic between vnet-app1 and an internal endpoint in vnet-hub. The rule in question has priority 100, "Deny" action, and covers destination port range 443. The cause has been confirmed: the rule was created with an excessively broad scope during an emergency configuration window that occurred the previous night.

The environment has the following constraints:

  • It is 2:00 PM on a Tuesday
  • The affected service processes financial transactions in production
  • The business team reports active impact since 1:45 PM
  • Removing the rule requires redeployment of the security configuration, which takes between 5 and 15 minutes
  • There is a priority 90 rule in the same security configuration that allows HTTPS traffic from a specific CIDR; the source IP address of the affected traffic is within this CIDR

What is the correct action to take at this moment?

A) Remove the priority 100 rule immediately and redeploy the security configuration
B) Create a temporary NSG on the destination subnet in vnet-hub allowing HTTPS, as an immediate mitigation while the definitive fix is prepared
C) Change the blocking rule priority from 100 to 101 so that the priority 90 (Allow) rule precedes it, and redeploy the configuration
D) Document the incident and schedule the fix for the next maintenance window to avoid additional impact during peak hours


Scenario 3 β€” Root Cause​

An organization uses Azure Virtual Network Manager with scope on a management group containing three subscriptions. The Network Manager was created in subscription A. Production VNets are distributed across subscriptions A, B, and C. The Mesh connectivity configuration was created and successfully deployed according to the portal.

The chief network administrator reports that vnet-prod-c1 and vnet-prod-c2, both in subscription C, do not appear as members of the network group, although vnet-prod-a1, vnet-prod-b1, and vnet-prod-b2 are correctly listed. He mentions that he used dynamic membership with an Azure Policy that evaluates the tag env=production. He verified that all VNets have the correct tag applied.

The junior administrator suggests that the problem is caused by the fact that the Network Manager was created in subscription A, which would limit its ability to manage VNets from other subscriptions. The chief administrator dismisses this hypothesis.

Portal logs show:

Policy compliance scan: Last evaluated 6 hours ago
vnet-prod-c1: Tag env=production [Compliant]
vnet-prod-c2: Tag env=production [Compliant]
Assignment scope: /providers/Microsoft.Management/managementGroups/mg-corp

The administrator also verifies that the VNets in subscription C were created yesterday afternoon.

What is the root cause of the problem?

A) Azure Policy evaluates resources asynchronously; VNets in subscription C were created recently and have not yet gone through a complete compliance evaluation cycle that triggers dynamic membership
B) The Azure Policy scope is set to the management group, but Network Manager only processes dynamic memberships from subscriptions where it was created
C) VNets created within the last 24 hours are placed in temporary quarantine by Azure Resource Manager and are not available for dynamic membership policies
D) The env=production tag on VNets in subscription C has different capitalization than the condition defined in the policy, causing a false positive in the compliance report


Scenario 4 β€” Diagnostic Sequence​

An administrator receives the following report: VMs in vnet-spoke1 and vnet-spoke2 cannot communicate directly with each other, even though both VNets are spokes in a Hub and Spoke topology managed by Azure Virtual Network Manager. Communication from each spoke to the hub works normally. The configuration was deployed a week ago with no changes since then.

The available investigation steps are:

  • Step P: Verify if the "Enable direct connectivity between spokes" option is enabled in the connectivity configuration
  • Step Q: Confirm if the connectivity configuration was redeployed after any recent changes
  • Step R: Verify the existence and state of direct peerings between vnet-spoke1 and vnet-spoke2 using az network vnet peering list
  • Step S: Check if there are security admin rules blocking traffic between the spokes' CIDRs
  • Step T: Confirm if both VNets are in the same network group associated with the connectivity configuration

What is the correct investigation sequence?

A) T, P, R, Q, S
B) P, T, Q, R, S
C) R, P, T, S, Q
D) Q, T, P, S, R


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The command output confirms that only two peerings exist in vnet-hub, corresponding to vnet-app1 and vnet-app2. The peering for vnet-app3 is missing. The decisive clue is in the chronological order: vnet-app3 was added to the network group two days after deployment, and the statement mentions no subsequent redeployment.

In Azure Virtual Network Manager, deploying a configuration is a point-in-time act that projects the desired state to selected regions. Adding a member to a network group updates the group definition but does not automatically trigger a new deployment. Explicit redeployment is required for peerings to reflect the new member.

The information about the VNet being created from an ARM template is intentionally irrelevant: the creation method does not affect AVNM compatibility.

Option C represents an important conceptual mistake: the limitation is not in the membership type (static), but in the absence of redeployment. Static membership with redeployment would have created the peering normally. Option D is plausible as a symptom in manual peerings, but AVNM-managed peerings do not require manual acceptance.

The most dangerous distractor is C, as it leads the administrator to conclude they need to change the membership type to dynamic, when the real solution is simply to redeploy.


Answer Key β€” Scenario 2​

Answer: C

The critical constraint that defines the correct answer is the existence of the priority 90 rule with Allow action covering exactly the CIDR of the affected traffic. In AVNM security admin rules, rules with lower priority numbers are evaluated first. Changing the blocking rule's priority from 100 to 101 makes the Allow rule with priority 90 be evaluated first, restoring traffic without needing to remove any rule.

This approach is less destructive than option A (complete rule removal) and equally effective given the constraint set, while preserving the original intent of the rule for other scopes that may depend on it.

Option B (temporary NSG) is technically valid but represents an additional layer of unnecessary complexity, since the solution within AVNM itself is more direct. Option A is correct in intent but more invasive when a less destructive alternative exists. Option D ignores the active production impact, which is unacceptable given the described context.

The most common reasoning error is going directly to rule removal (option A) without analyzing if the existing structure already offers a more surgical solution.


Answer Key β€” Scenario 3​

Answer: A

The decisive clue is in the combination of two facts: VNets in subscription C were created yesterday afternoon, and the last policy evaluation cycle occurred 6 hours ago. Although the compliance report shows VNets as "Compliant," this does not mean dynamic membership has been processed yet. Compliance and effective membership in the network group depend on the evaluation cycle and AVNM processing, which may have latency.

The policy compliance report marking VNets as "Compliant" is intentionally distracting information: it indicates the tag was evaluated correctly but does not confirm that AVNM has already processed the resulting membership.

Option B is exactly the misconception the chief administrator correctly dismissed: the Network Manager creation location does not limit its ability to manage VNets within the defined scope. Option C is technically fictional; there is no 24-hour quarantine mechanism in Azure Resource Manager. Option D is the most dangerous distractor: it suggests investigating tag capitalization, which would consume time without results, since the report shows VNets as "Compliant" with the correct value.

Acting on distractor D would lead the administrator to edit production tags unnecessarily, potentially causing configuration drift.


Answer Key β€” Scenario 4​

Answer: A (T, P, R, Q, S)

The correct sequence follows progressive elimination logic from most fundamental to most specific:

T confirms if both VNets are in the same network group associated with the configuration, as without this none of the following steps make sense. P checks if the direct connectivity option is enabled, as its absence explains the symptom without need for additional investigation. R verifies if direct peerings were actually created, confirming or refuting the configuration's effectiveness. Q checks if there was redeployment after any changes, covering the case where the configuration exists but was not projected correctly. S investigates security rule blocking, which is the most specific and least likely cause given that communication with the hub works.

Sequence B makes the error of checking the connectivity option (P) before confirming VNets are in the correct group (T), investigating a configuration that may not apply to the problem scope. Sequence C starts with peering verification (R), which is a confirmation step, not initial diagnosis. Sequence D starts with redeployment (Q), which is premature corrective action before any diagnosis.

The underlying principle is: validate scope and configuration before verifying resulting state, and verify resulting state before investigating external blocks.


Troubleshooting Tree: Implement and manage virtual network connectivity by using Azure Virtual Network Manager​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question (binary decision or by state)
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate verification or validation

When facing a real problem, start at the root node and answer each question based on what is observable in the environment, without assuming causes. Follow the path corresponding to the obtained answer. Intermediate validation nodes (orange) indicate that a practical verification should be executed before proceeding. Diagnosis ends when you reach a red node (identified cause) or green node (recommended action), never before that.