Troubleshooting Lab: Design a Virtual WAN architecture, including selecting types and services

Diagnostic Scenarios

Scenario 1 — Root Cause

A network team receives a ticket reporting that two branch offices connected via S2S VPN to the same Virtual WAN hub in the East US region can communicate with spoke VNets associated to this hub, but cannot reach each other directly. Latency to spoke VNets is as expected with no packet loss. The hub was provisioned three weeks ago and shows Succeeded status in the Azure portal. There have been no configuration changes in the last 72 hours. The team verified that the branch VPN Gateways are operational and that IPSec tunnels appear as Connected in the portal.

Branch A (S2S VPN) --> Hub East US --> VNet Spoke 1  [OK]
Branch B (S2S VPN) --> Hub East US --> VNet Spoke 2  [OK]
Branch A           --> Branch B                      [FAILURE]

What is the root cause of the communication failure between Branch A and Branch B?

A) The branch IPSec tunnels are in degraded state, preventing transitive routing even with Connected status displayed in the portal.

B) The hub was provisioned as Virtual WAN Basic, which does not support transitive routing between branches.

C) The branches are using overlapping address prefixes, causing the hub to silently drop traffic between them.

D) The hub's default route table has propagation disabled for the branch S2S VPN connections.

Scenario 2 — Action Decision

The cause has already been identified: a production Virtual WAN hub was created as Basic and needs to be converted to Standard to enable transitive routing required by a new project. The hub currently has 12 active S2S VPN connections with branches in operation. The tier conversion is supported by Azure. The architecture team has a maintenance window available next weekend, but the project manager requests that the change be made immediately since the new project is already delayed. There is no staging environment with Virtual WAN configured.

What is the correct action to take at this moment?

A) Execute the conversion immediately via Azure portal, as the operation is non-disruptive and existing S2S VPN connections are automatically preserved.

B) Create a new Standard hub in parallel, migrate connections manually, and deactivate the Basic hub after validation.

C) Wait for the maintenance window to execute the conversion, communicating the risk to the project manager and documenting the potential impact on the 12 active connections.

D) Execute the conversion immediately using Azure CLI to reduce operation time, without needing a maintenance window.

Scenario 3 — Root Cause

An organization operates a Virtual WAN Standard with two hubs: Hub-WestEurope and Hub-EastUS. Spoke VNets are connected to both hubs. The security team recently enabled Azure Firewall in Hub-WestEurope, converting it to a Secured Virtual Hub. After this change, application teams report that VMs in spoke VNets connected to Hub-WestEurope lost internet access. East-west traffic between VNets in the same hub continues working. Hub-EastUS was not changed and its spoke VNets maintain internet access normally. The security team confirms that no explicit deny rules were created in Azure Firewall.

Hub-WestEurope (Secured Virtual Hub)
  ├── VNet Spoke WE-1  --> Internet  [FAILURE]
  ├── VNet Spoke WE-2  --> Internet  [FAILURE]
  └── Azure Firewall   --> Status: Running

Hub-EastUS (Standard hub)
  ├── VNet Spoke US-1  --> Internet  [OK]

What is the root cause of internet access loss in Hub-WestEurope spoke VNets?

A) Azure Firewall running in the hub is consuming all available throughput capacity, blocking outbound traffic from spoke VNets.

B) By converting the hub to Secured Virtual Hub, internet traffic routing intent was enabled, redirecting outbound traffic to Azure Firewall, which still lacks permission rules for internet traffic.

C) Converting to Secured Virtual Hub automatically removes managed peering between spoke VNets and the hub, requiring manual VNet reconnection.

D) Azure Firewall blocks all north-south traffic by default until a firewall policy with explicit DNAT rules is associated to the hub.

Scenario 4 — Diagnostic Sequence

A VM in a spoke VNet connected to Hub-BrazilSouth cannot reach a VM in a spoke VNet connected to Hub-EastUS, both belonging to the same Virtual WAN Standard. Traffic within the same hub works normally. A network engineer receives the ticket and needs to investigate the problem.

The available investigation steps are:

Step P: Verify that both spoke VNets are effectively connected to their respective hubs with Connected status.
Step Q: Confirm that the Virtual WAN resource is Standard type and not Basic.
Step R: Inspect the source VM NIC effective routes to verify if a route exists for the destination prefix.
Step S: Use Network Watcher Connection Troubleshoot between the two VMs to validate end-to-end connectivity and identify where traffic is being dropped.
Step T: Check if there are custom route tables applied to the hubs that might be blocking inter-hub route propagation.

What is the correct investigation sequence?

A) Q -> P -> R -> T -> S

B) S -> R -> P -> Q -> T

C) R -> S -> Q -> P -> T

D) P -> Q -> T -> R -> S

Answer Key and Explanations

Answer Key — Scenario 1

Answer: B

The definitive clue in the scenario is the observed behavior: branches reach spoke VNets, but cannot reach each other. This pattern is the exact signature of Virtual WAN Basic limitation: it supports S2S VPN connectivity from branch to VNet, but does not enable transitive routing between branches. The Basic tier lacks the necessary control plane to propagate and resolve routes between branch-to-branch connections.

The hub Succeeded status and tunnels in Connected state are correct information, but irrelevant for diagnosis. They eliminate infrastructure failures but do not contradict the tier limitation. This is the detail purposefully included as a distractor.

Option A represents the classic error of confusing symptom with cause: Connected status in the portal confirms tunnel control plane, not transitive routing. Option D is technically plausible, but the default route table does not block propagation in Basic hubs; it simply does not exist with transitive capability. Acting based on distractor D would lead to hours of routing investigation on a problem that only resolves with tier conversion.

Answer Key — Scenario 2

Answer: C

The critical constraint in the scenario is the existence of 12 active S2S VPN connections in production and the absence of a staging environment. While Basic to Standard conversion is supported and technically possible at any time, the risk of impact on active connections without prior validation in an equivalent environment justifies waiting for the available maintenance window. The correct decision prioritizes production environment stability over project timeline pressure.

Option A presents the operation as non-disruptive, which is a dangerous oversimplification: Microsoft documentation describes the conversion as supported, but does not guarantee absence of impact in all scenarios, especially with multiple active connections. Option B would be valid in a planned migration context with available resources, but creating and migrating 12 connections manually without staging introduces greater risk than waiting for the weekend. Option D represents correct action applied at the wrong time: the execution method does not eliminate the risk of absence of maintenance window.

Answer Key — Scenario 3

Answer: B

The root cause is the default behavior of routing intent in Secured Virtual Hubs. When internet traffic routing intent is enabled, the hub starts advertising the 0.0.0.0/0 prefix via Azure Firewall to all connected spoke VNets. Outbound traffic from these VNets is then redirected to the firewall before exiting to the internet. Since no application rule or network rule for permission was created in Azure Firewall, the firewall's default deny-all behavior drops outbound traffic.

The information about Hub-EastUS not being changed and with internet working normally is relevant data that confirms the problem scope, but could lead the reader to focus on infrastructure differences between hubs instead of routing intent behavior.

Option A is incorrect because Azure Firewall does not silently saturate throughput under normal operating conditions. Option C is false: conversion to Secured Virtual Hub does not remove managed peerings. Option D confuses DNAT behavior with network and application rule behavior; internet outbound traffic does not require DNAT. The most dangerous distractor is C, as it would lead the engineer to unnecessarily try to reconnect VNets, postponing the real solution.

Answer Key — Scenario 4

Answer: A

The correct sequence is Q -> P -> R -> T -> S, which follows the logic of progressive elimination from simplest to most complex, from control plane to data plane.

Q validates the fundamental assumption: if the Virtual WAN is Basic, all subsequent investigation is irrelevant, as the problem is tier-related. P confirms that spoke VNets are effectively connected with valid status, eliminating basic connectivity failures. R inspects effective routes in the source VM NIC: if the route to destination does not exist in the route table, the problem is in the control plane and directs to T. T checks custom route tables that might be blocking inter-hub propagation. Only after validating the entire routing chain does it make sense to execute S, which consumes more time and resources and is only useful diagnostic when the control plane is correct.

Sequence B starts at the end, using Connection Troubleshoot before understanding the control plane, which wastes time and may return inconclusive results. Sequence C starts with route inspection without first confirming tier and VNet connectivity, which can lead to erroneous conclusions. Sequence D reverses P and Q, investigating VNet connectivity before confirming that the tier supports the intended traffic.

Troubleshooting Tree: Design a Virtual WAN architecture, including selecting types and services

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Legend:

Color	Node type
Dark blue	Initial symptom (entry point)
Blue	Diagnostic question
Red	Identified cause
Green	Recommended action or resolution
Orange	Intermediate validation or verification

To use this tree when facing a real problem, start with the root node identifying the connectivity failure symptom in Virtual WAN. The first question always validates the tier, as it eliminates or confirms the largest class of limitations at once. Follow the branches by objectively answering each question based on what is observable in the portal or via CLI. Each path ends in a named cause or concrete action. If the path reaches the orange validation node, execute Network Watcher Connection Troubleshoot before advancing to data plane hypotheses.

Diagnostic Scenarios​

Scenario 1 — Root Cause​

Scenario 2 — Action Decision​

Scenario 3 — Root Cause​

Scenario 4 — Diagnostic Sequence​

Answer Key and Explanations​

Answer Key — Scenario 1​

Answer Key — Scenario 2​

Answer Key — Scenario 3​

Answer Key — Scenario 4​

Troubleshooting Tree: Design a Virtual WAN architecture, including selecting types and services​