Skip to main content

Troubleshooting Lab: Create and configure virtual network peering

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team reports that two virtual machines, vm-app (in VNet-East, East US region) and vm-db (in VNet-West, West US region), lost connectivity to each other after a maintenance window performed the previous night.

The administrator responsible for maintenance reports that the activities performed were:

  • Update of NSG rules for the subnet where vm-db is allocated
  • Addition of a new address space 10.2.128.0/18 to VNet-West
  • Restart of vm-db to apply operating system patches

The East-to-West peering existed before maintenance and was not directly touched. When checking the current state in the portal, the administrator observes:

Peering: East-to-West
Status: Disconnected

Peering: West-to-East
Status: Connected

vm-app can resolve vm-db's DNS normally. Ping between VMs returns no response.

What is the root cause of connectivity loss?

A) The updated NSG rules on vm-db's subnet are blocking ICMP traffic from VNet-East.

B) Adding a new address space to VNet-West put the East-to-West peering in Disconnected state, requiring manual resynchronization.

C) Global peering between different regions requires both sides to be in Connected state simultaneously; since one side is Disconnected, traffic is blocked in both directions.

D) Restarting vm-db during the maintenance window corrupted the effective route table associated with the peering, blocking inbound traffic.


Scenario 2 β€” Action Decision​

The cause of an incident has been identified: the peering between VNet-Hub and VNet-Spoke3 has Initiated state on the hub side and no corresponding peering on the spoke side. Investigation revealed that a junior engineer created only half of the peering during a network expansion procedure.

The environment is production. Both networks are in the same subscription and region. Currently, no resources in VNet-Spoke3 have connectivity to the hub. The business team is waiting for normalization to resume operations of a payment system that depends on access to a service hosted in the hub.

Current restrictions are:

  • Not permitted to change the address space of any VNet at this time
  • Official maintenance window only opens at 10 PM, but the situation is classified as a critical incident with authorization for immediate action
  • The administrator has Network Contributor role on both VNets

What is the correct action to take at this moment?

A) Delete the incomplete peering on the hub side and recreate both sides of the peering within the maintenance window at 10 PM to ensure consistency.

B) Create the corresponding peering on VNet-Spoke3 side pointing to VNet-Hub, completing the pair and putting both sides in Connected state.

C) Wait for the maintenance window and recreate the complete peering on both sides simultaneously, as asynchronously created peering can cause instability.

D) Escalate to an administrator with Owner role on the subscription, as completing a peering in Initiated state requires elevated permissions beyond Network Contributor.


Scenario 3 β€” Root Cause​

A company operates a hub-and-spoke topology with three networks: VNet-Hub, VNet-A, and VNet-B. The peering between VNet-Hub and VNet-A, and between VNet-Hub and VNet-B, is active and in Connected state on all sides.

The administrator receives a complaint: resources in VNet-A cannot communicate with resources in VNet-B. The infrastructure team confirms that no NSG is blocking traffic between the two spokes and that the route tables of the involved subnets have no custom routes.

The administrator executes the following command and gets the output below:

az network nic show-effective-route-table \
--resource-group rg-prod \
--name nic-vm-spoke-a \
--output table
Source    State    Address Prefix    Next Hop Type      Next Hop IP
-------- ------- ---------------- ----------------- -----------
Default Active 10.0.0.0/16 VnetLocal -
Default Active 10.1.0.0/16 VNetPeering -
Default Active 10.2.0.0/16 VNetPeering -
Default Active 0.0.0.0/0 Internet -

VNet-B's address space is 10.2.0.0/16. The route to VNet-B appears in the effective table of the VM in VNet-A.

What is the root cause of communication failure between spokes?

A) The effective routes show the next hop as VNetPeering, but traffic between spokes requires the next hop to be an NVA or gateway IP address; therefore, the routes are configured incorrectly.

B) The route to VNet-B in the effective table of the VM in VNet-A indicates that peering is active, but Azure peering is not transitive; without a routing mechanism in the hub (NVA or Azure Route Server), traffic will not be forwarded from VNet-A to VNet-B through the hub.

C) The destination subnet's NSG in VNet-B is blocking traffic, as the absence of custom rules doesn't mean default rules allow peering traffic between different spokes.

D) The fact that the effective table shows routes to both spokes indicates there's a route conflict in the hub, which needs to be resolved with an explicit UDR before communication is possible.


Scenario 4 β€” Diagnostic Sequence​

An administrator receives the following report: "VM vm-finance, located in VNet-Finance, cannot access a file server hosted in VNet-Core. The peering between the two networks was created two weeks ago and worked normally until yesterday."

The administrator has the following investigation steps available, listed out of order:

  1. Verify peering state (Connected or Disconnected) on both sides in portal or via CLI
  2. Analyze NSG rules on the destination subnet in VNet-Core to identify inbound blocks
  3. Check if there was any address space change in either VNet since the last time connection worked
  4. Run az network nic show-effective-route-table on vm-finance's NIC to confirm if the route to VNet-Core is present and active
  5. Test connectivity with Test-NetConnection or ping from vm-finance to confirm the exact scope of failure

Which diagnostic sequence represents the most logical and efficient approach?

A) 5 β†’ 1 β†’ 3 β†’ 4 β†’ 2

B) 1 β†’ 4 β†’ 3 β†’ 5 β†’ 2

C) 3 β†’ 1 β†’ 5 β†’ 4 β†’ 2

D) 5 β†’ 3 β†’ 1 β†’ 4 β†’ 2


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue in the scenario is the asymmetric peering state: East-to-West in Disconnected while West-to-East remains Connected. This asymmetric state is exactly the expected behavior when a VNet's address space is modified after peering establishment. Azure invalidates the peering side that belongs to the network that didn't undergo the change, as it needs to be resynchronized to recognize the new prefix. Manual resynchronization via portal (Sync button) or via CLI is the only path to restore Connected state.

The information about vm-db restart is purposely irrelevant and represents a common trap: attributing network failure to an action that affects only the VM's operating system, not Azure's routing infrastructure.

NSG updates are a plausible distractor, but NSG would explain traffic blocking with Connected peering, not the Disconnected peering state itself. The most dangerous distractor is alternative C, as it correctly mixes the asymmetry observation with a wrong conclusion: one side's state doesn't block the other bidirectionally by platform rule; blocking occurs because routing is broken due to lack of resynchronization.


Answer Key β€” Scenario 2​

Answer: B

Peering in Initiated state means only one of the two sides was created. The technical solution is to create the complementary peering on the missing side, which will put both sides in Connected immediately. This doesn't require any address space changes, eliminating any risk related to the restriction stated in the scenario.

The scenario is classified as a critical incident with explicit authorization for immediate action, which directly invalidates alternatives A and C, which propose waiting for the 10 PM window. Waiting would be the correct decision in a planned maintenance context, but not in an incident with current authorization.

Alternative D is the most dangerous distractor: the Network Contributor role has the Microsoft.Network/virtualNetworks/peer/action permission, which is exactly what's needed to create a peering. Escalating to an Owner would be unnecessary and introduce delay in a critical incident without any technical benefit.


Answer Key β€” Scenario 3​

Answer: B

The effective route table confirms that vm-finance in VNet-A has an active route to VNet-B's prefix with next hop VNetPeering. This means the control plane is correct: Azure knows VNet-B exists and that the path goes through peering. The problem is in the forwarding plane: Azure peering is not transitive. The packet reaches the hub, but the hub has no mechanism to forward it to the destination spoke. Without an NVA with IP forwarding enabled or a configured Azure Route Server, traffic between spokes is silently dropped at the hub.

The information about absence of NSGs and UDRs is relevant because it eliminates competing hypotheses, but also serves to induce the reader to conclude that "if there's no blocking, it should work," which is incorrect reasoning: absence of blocking doesn't replace absence of active routing.

Alternative A represents a serious technical misconception: next hop VNetPeering is exactly the correct and expected value for routes injected by peering. Acting based on this distractor would lead the administrator to create unnecessary and potentially disruptive UDRs.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is 5 β†’ 1 β†’ 3 β†’ 4 β†’ 2.

Diagnostic reasoning should start from the concrete symptom and progress from most superficial to most granular:

  • Step 5 confirms the real scope of failure before any infrastructure investigation. Without this, the administrator might investigate a wrong hypothesis.
  • Step 1 checks if peering infrastructure is intact. A Disconnected peering ends routing investigation and points directly to the cause.
  • Step 3 investigates if there was address space change, which is the most common cause of a healthy peering suddenly entering Disconnected.
  • Step 4 examines effective routes to confirm if the forwarding plane is correct, even with Connected peering.
  • Step 2 analyzes NSGs only after confirming that routing and peering are correct, as NSG is a security control that operates on traffic that would already reach the destination.

Starting with NSG analysis (alternatives B and D in different positions) is the most common diagnostic error: the administrator starts from the most visible blocking without first verifying if the network path even exists.


Troubleshooting Tree: Create and configure virtual network peering​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Legend:

  • Dark blue: initial symptom or entry point
  • Blue: diagnostic question with verifiable answer
  • Red: identified cause
  • Green: recommended action or resolution

To use this tree when facing a real problem, start from the root node describing the absent connectivity symptom. Answer each diagnostic question based on what you can directly observe in the portal or via CLI, without assuming the cause. Follow the path indicated by your answer until reaching a red cause identification node, then apply the corresponding green action. If the action doesn't resolve the problem, return to the last question node and reevaluate the given answer.