Troubleshooting Lab: Recommend a route advertisement configuration
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A network team reports that VMs in an Azure VNet cannot reach servers at 172.16.50.0/24 on the on-premises network. The connection uses ExpressRoute with private peering. The circuit is in Provisioned state and private peering appears as Connected in the Azure portal.
The engineer runs the following command to check routes learned by the gateway:
Get-AzVirtualNetworkGatewayLearnedRoute `
-VirtualNetworkGatewayName gw-expressroute-prod `
-ResourceGroupName rg-network-prod | Format-Table
LocalAddress Network NextHop SourceType Origin AsPath
------------ ------- ------- ---------- ------ ------
10.0.0.1 10.0.0.0/16 - Network - -
10.0.0.1 192.168.0.0/24 10.0.0.254 EBgp Ibgp 65100
10.0.0.1 172.16.10.0/24 10.0.0.254 EBgp Ibgp 65100
10.0.0.1 172.16.20.0/24 10.0.0.254 EBgp Ibgp 65100
The engineer also verifies that the circuit has contracted bandwidth of 500 Mbps and current utilization is at 12%. The MTU configured on the on-premises side is 1500 bytes.
What is the root cause of the problem?
A) The ExpressRoute private peering is degraded and not propagating all routes correctly
B) The on-premises router is not advertising the 172.16.50.0/24 prefix via BGP to the ExpressRoute circuit
C) The Azure VNet does not have a User Defined Route (UDR) pointing 172.16.50.0/24 to the gateway
D) The low circuit utilization indicates a provisioning issue that limits the number of accepted prefixes
Scenario 2 β Action Decisionβ
The cause of the problem has been identified: Branch-to-Branch is disabled on the Azure Virtual WAN hub, preventing two branches connected via S2S VPN from communicating directly. Enabling this configuration resolves the problem immediately without needing a maintenance window according to internal documentation.
The current context is:
- Production environment with 99.9% SLA
- 15 other branches are operating normally and depend on the same hub
- The change affects the hub's route tables globally
- The security team has not yet been consulted about the impact of direct traffic between branches
- The incident ticket has been open for 40 minutes
What is the correct action to take at this moment?
A) Enable Branch-to-Branch immediately, since the cause is confirmed and the change doesn't require a maintenance window
B) Consult the security team about the impact before enabling Branch-to-Branch, given that the change affects the traffic model of all branches
C) Create a temporary static route on the vWAN hub pointing traffic between the two problematic branches, avoiding impact on the others
D) Reopen the ticket as a VPN configuration problem and escalate to the on-premises device vendor
Scenario 3 β Root Causeβ
An organization has a VNet with an Azure Route Server and a Network Virtual Appliance (NVA) that BGP peers with the Route Server. The NVA learns on-premises routes via BGP and propagates them to the Route Server. There is also a spoke VNet connected via peering to the Route Server VNet.
The team reports that VMs in the spoke VNet cannot reach the on-premises network 10.200.0.0/16, although VMs in the same VNet as the Route Server can without problems.
The peering configuration between the two VNets is:
| Configuration | VNet Route Server (hub) | VNet Spoke |
|---|---|---|
| Allow Gateway Transit | Enabled | N/A |
| Use Remote Gateways | N/A | Disabled |
| Allow Forwarded Traffic | Enabled | Enabled |
| Allow Virtual Network Access | Enabled | Enabled |
The engineer verifies that the effective routes for VMs in the spoke VNet show only local routes and the default route 0.0.0.0/0. The Route Server in the hub VNet correctly shows the 10.200.0.0/16 route learned from the NVA.
The Route Server SKU is Standard and was deployed three weeks ago without changes.
What is the root cause of the problem?
A) Allow Forwarded Traffic is not enabled on the spoke VNet, preventing packets from the NVA from traversing the peering
B) Use Remote Gateways is disabled on the spoke VNet, preventing it from learning routes propagated by the Route Server via peering
C) The Route Server does not propagate routes to spoke VNets via peering without Branch-to-Branch being enabled
D) The Route Server Standard SKU has a prefix limit that was exceeded, causing partial route propagation
Scenario 4 β Diagnostic Sequenceβ
An engineer receives the following report: VMs in an Azure VNet stop reaching the on-premises network after a maintenance change performed overnight. Before maintenance, everything worked correctly. The connection uses VPN Gateway with BGP enabled.
The available investigation steps are:
- Check if the VPN tunnel is in Connected state in the Azure portal
- Examine effective routes on NICs of affected VMs
- Confirm the ASN and BGP peer address configured in the Local Network Gateway
- Check routes advertised and learned by the VPN Gateway via
Get-AzVirtualNetworkGatewayLearnedRoute - Test connectivity with
pingorTest-NetConnectionfrom a VM to on-premises IP
What is the correct diagnostic sequence?
A) 5 β 1 β 4 β 2 β 3
B) 1 β 5 β 2 β 4 β 3
C) 1 β 4 β 3 β 2 β 5
D) 2 β 4 β 1 β 3 β 5
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The output from the Get-AzVirtualNetworkGatewayLearnedRoute command is the central diagnostic clue. The gateway learned via BGP the prefixes 192.168.0.0/24, 172.16.10.0/24 and 172.16.20.0/24, but the 172.16.50.0/24 prefix is absent. This indicates that the problem is not in Azure: the gateway is functioning, BGP peering is active (origin EBgp) and other routes are being received normally. The cause can only be on the on-premises side, where the router is simply not advertising this specific prefix.
The information about circuit utilization (12%) and MTU (1500 bytes) is purposely irrelevant. None of these factors affect which prefixes are advertised via BGP.
Alternative (A) is incorrect: peering appears as Connected and other routes are being learned, which rules out degradation. Alternative (C) would only be valid if the route existed in the gateway table but not in the VMs; since the route doesn't exist even in the gateway, the problem is earlier. Alternative (D) doesn't exist as real ExpressRoute behavior.
The most dangerous distractor is (C): an engineer could create an unnecessary UDR pointing to an invalid next-hop, creating a routing black hole instead of solving the real problem.
Answer Key β Scenario 2β
Answer: B
The cause is confirmed and the technical solution is clear, but the scenario introduces a critical constraint that cannot be ignored: the change affects the traffic model of all 15 branches and the security team has not yet been consulted. Enabling Branch-to-Branch without this validation could open communication paths between branches that violate network segmentation policies defined by the organization.
Alternative (A) represents the classic mistake of prioritizing technical speed over governance. The fact that no maintenance window is required doesn't eliminate the need for security approval for a broad-scope change. Alternative (C) is technically unfeasible: managed vWAN doesn't allow static routes between branches as described. Alternative (D) is incorrect because the cause has already been identified and is in Azure, not in the vendor device.
Answer Key β Scenario 3β
Answer: B
The peering configuration table is the decisive clue. Use Remote Gateways is Disabled on the spoke VNet. Without this configuration enabled, the spoke VNet doesn't instruct Azure to use the gateway or Route Server from the partner VNet to learn external routes. The result is exactly what was observed: VMs in the spoke see only local routes, while VMs in the Route Server VNet, which is directly associated with it, see routes propagated by the NVA.
The Route Server SKU and time since deployment are irrelevant information planted in the statement. Route Server Standard doesn't have a practical prefix limit that would cause this specific symptom.
Alternative (A) is incorrect because Allow Forwarded Traffic controls whether packets originated outside the VNet can traverse the peering, not whether BGP routes are propagated. Alternative (C) confuses the Branch-to-Branch function: this feature controls route exchange between gateways (VPN/ExpressRoute) and the NVA, not propagation to spoke VNets. Alternative (D) is a volume distractor without technical basis for the presented symptom.
Answer Key β Scenario 4β
Answer: A
The correct sequence is 5 β 1 β 4 β 2 β 3, following the diagnostic logic from general to specific, starting with the observable symptom and advancing toward the cause.
Step 5 (test connectivity) confirms the symptom and delimits the scope: is it a routing or tunnel problem? Step 1 (check tunnel state) answers whether the transport layer is active. If the tunnel is Connected, the problem is routing, and step 4 (check learned routes) reveals whether BGP is propagating the expected prefixes. Step 2 (effective routes on VMs) confirms whether the routes reached the instances. Only then does it make sense to go to step 3 (check ASN and BGP peer in Local Network Gateway), as this would be the cause of a BGP session that doesn't establish, which would be visible earlier in the previous steps.
Sequence (B) fails by intercalating connectivity testing before checking routes, losing diagnostic information. Sequence (C) goes straight to routes before confirming if the tunnel is active, skipping a layer. Sequence (D) starts with effective routes on VMs, which is an impact verification, not cause, and doesn't guide diagnosis efficiently.
Troubleshooting Tree: Recommend a route advertisement configurationβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark Blue | Initial symptom (entry point) |
| Medium Blue | Diagnostic question (decision) |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Intermediate verification or validation |
To use this tree when facing a real problem, start with the root node (dark blue) that describes the observed symptom. At each question node (medium blue), answer based on what you can verify directly in the portal, via PowerShell, or via CLI. Follow the branch corresponding to the observed response. When you reach a red node, the cause is identified; the green node immediately accessible from it indicates the action to take. Never skip intermediate questions: each layer eliminates a class of hypotheses and avoids corrective actions applied in the wrong place.