Skip to main content

Troubleshooting Lab: Configure routing rules

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A hub VNet contains an NVA responsible for inspecting all traffic between spoke VNets and on-premises networks. The topology uses VNet Peering with Allow Gateway Transit on the hub and Use Remote Gateways on the spokes. The hub's VPN gateway is operational and BGP sessions with on-premises are established.

The team reports that VMs in Spoke A reach on-premises resources normally. VMs in Spoke B, however, reach resources within their own VNet and within the hub, but cannot reach any on-premises prefixes. Spoke B's peering was created three days ago, after a workload migration.

The effective route table of a VM in Spoke B shows:

Source    State    Address Prefix    Next Hop Type       Next Hop IP
Default Active 10.0.0.0/8 VNetPeering -
Default Active 0.0.0.0/0 Internet -

On-premises routes (172.16.0.0/12) do not appear in the effective table.

The team has already confirmed that the VPN gateway is advertising on-premises prefixes via BGP and that Spoke A has these routes visible in its effective tables.

What is the root cause of the problem in Spoke B?

A) Spoke B's peering does not have the Use Remote Gateways option enabled

B) The hub's VPN gateway has the BGP session for Spoke B suspended due to excessive advertised prefixes

C) The Route Table associated with Spoke B's subnet has BGP route propagation disabled

D) Spoke B does not have a UDR with next hop VirtualNetworkGateway for on-premises prefixes


Scenario 2 β€” Action Decision​

The operations team has identified that a Route Table in production is causing a routing loop between two NVAs. Traffic destined for the 192.168.10.0/24 segment enters NVA-1, which has a UDR pointing to NVA-2, which in turn has a UDR pointing back to NVA-1. VMs dependent on this segment are completely without connectivity.

The cause is confirmed: the UDR in NVA-2's subnet incorrectly points to NVA-1's IP as the next hop for 192.168.10.0/24, when it should point to the perimeter firewall (10.0.1.100).

The environment operates in production 24 hours. There is no scheduled maintenance window. The UDR correction does not require resource restart and takes immediate effect after saving. The change team requires incident registration before any changes to production Route Tables, but the operations manager is available for emergency authorization.

What is the correct action to take at this time?

A) Immediately remove the Route Table from NVA-2's subnet to eliminate the loop and then create the correct UDR

B) Request emergency authorization from the operations manager, register the incident, and correct the next hop of the UDR in NVA-2's subnet

C) Create a new Route Table with the correct configuration and associate it with NVA-2's subnet without modifying the existing one, to preserve the current state as rollback

D) Wait for a formal maintenance window opening, as changes to production Route Tables without complete change process may generate greater instability


Scenario 3 β€” Root Cause​

A team configured a workload subnet with the following Route Table to force Internet-bound traffic through an NVA:

Route Name     Prefix         Next Hop Type      Next Hop IP
ForceTunnel 0.0.0.0/0 VirtualAppliance 10.1.0.4

After associating the Route Table, VMs in the subnet lost Internet access. The NVA is operational, IP forwarding is enabled on the NVA's NIC, and the NVA can ping public addresses from its own interface. The subnet's NSG allows outbound to any destination on port 443.

An engineer inspects the NVA's subnet and observes that it also has an associated Route Table, with the following entry:

Route Name     Prefix        Next Hop Type      Next Hop IP
ToInternet 0.0.0.0/0 VirtualAppliance 10.1.0.4

VM outbound traffic reaches the NVA as confirmed by NVA logs. Internal connectivity between VMs continues to work normally.

What is the root cause of the Internet access failure?

A) The subnet's NSG is blocking return traffic from the Internet to VMs, as there is no inbound rule allowing responses

B) The Route Table associated with the NVA's subnet creates a loop: traffic leaving the NVA to the Internet is redirected back to the NVA itself

C) The NVA does not have an associated public IP address, preventing outbound NAT traffic from working correctly

D) The 0.0.0.0/0 route of type VirtualAppliance is not applied to Internet-bound traffic, only to inter-VNet traffic


Scenario 4 β€” Diagnostic Sequence​

A VM in a spoke subnet cannot reach an on-premises server (172.20.5.10). The VNet uses peering with a hub that contains an ExpressRoute gateway. No recent changes have been reported by the team.

Available investigation steps are:

P1 β€” Verify if the 172.20.0.0/16 route appears in the VM's NIC effective route table

P2 β€” Verify BGP session status on the ExpressRoute gateway via az network express-route list-route-tables

P3 β€” Confirm if the VM subnet's Route Table has BGP route propagation enabled

P4 β€” Test TCP connectivity from VM to 172.20.5.10:443 using Test-NetConnection

P5 β€” Verify if the spoke's peering with the hub has Use Remote Gateways enabled

What is the correct diagnostic sequence?

A) P4, P1, P3, P5, P2

B) P1, P3, P5, P2, P4

C) P2, P5, P3, P1, P4

D) P3, P5, P1, P2, P4


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: A

The decisive clue is in Spoke B VM's effective route table: the 0.0.0.0/0 route points to Internet, and on-premises prefixes are completely absent. This pattern indicates that peering exists (VNet routes are visible), but the hub's gateway is not being used as a remote gateway for this spoke.

When Use Remote Gateways is not enabled on Spoke B's peering, Azure does not propagate routes learned by the hub's gateway to this spoke's effective tables. Spoke A works because its peering was configured correctly before the migration.

Option C would be plausible if the routes existed in the table but were disabled. The fact that on-premises routes don't even appear rules out this distractor: disabled BGP route propagation suppresses routes that would already exist from gateway propagation; in this case, propagation via remote gateway never occurred.

Option D describes the solution for a scenario where BGP route propagation is disabled, not the root cause here. Option B is factually implausible: BGP between the gateway and on-premises does not operate per individual spoke.

Acting based on option D (manually adding UDRs) would mask the problem without fixing it, as any future peering changes would require manual UDR updates, creating unnecessary operational fragility.


Answer Key β€” Scenario 2​

Answer: B

The cause is identified, the production impact is active and severe, and the correction is surgical with no additional instability risk. The critical constraint in the scenario is not technical: it's procedural. The process requires incident registration and authorization before production Route Table changes. The manager is available for emergency authorization, making the correct path to start this process immediately.

Option A ignores the procedural constraint and introduces additional risk: removing the entire Route Table would temporarily interrupt any useful routes it contains, plus not follow the required process.

Option C is technically ineffective: associating a new Route Table to the subnet would replace the previous one, not preserve it as rollback. Additionally, maintaining two Route Tables is not possible for the same subnet in Azure.

Option D is the most dangerous distractor. Waiting for a formal window when there's active production impact and the manager can provide emergency authorization is a decision that unnecessarily prolongs the incident and ignores the emergency authorization mechanism explicitly described in the scenario.


Answer Key β€” Scenario 3​

Answer: B

The relevant information confirming the cause is in the NVA subnet's Route Table: it contains a 0.0.0.0/0 route with next hop VirtualAppliance pointing to the NVA's own IP (10.1.0.4). This means when the NVA tries to forward traffic to the Internet, Azure consults the route table of the subnet where the NVA is located and redirects this traffic back to the NVA itself, creating a loop that exhausts packet TTL without reaching the Internet.

The irrelevant information in the scenario is the workload subnet's NSG allowing outbound on port 443. The NSG has no relation to the problem: traffic doesn't even reach the Internet for a return block to be relevant. It's also irrelevant that the NVA can ping from its own interface: this ping uses the NVA's NIC route directly, doesn't go through the subnet's Route Table.

Option C is a plausible distractor in other contexts: NVAs doing outbound NAT usually need public IP. However, the scenario indicates that NVA logs confirm traffic arrival, and the problem is in the behavior after the NVA tries to forward, not in the absence of public IP.


Answer Key β€” Scenario 4​

Answer: B

The correct sequence is: P1, P3, P5, P2, P4.

Correct diagnosis follows the logic of first checking the state closest to the VM (effective route table), then progressively more distant local causes, and reserving end-to-end connectivity testing for when the routing plane has already been validated.

P1 immediately determines if the VM even knows a route to the destination. If the route doesn't exist in the effective table, subsequent steps identify why. P3 checks if BGP route propagation is enabled in the subnet's Route Table, as it's the most local cause for missing routes. P5 verifies if peering is configured to use the hub's remote gateway, which is the next link in the chain. P2 checks BGP state on the ExpressRoute gateway, which is the source of on-premises routes. P4 is the actual connectivity test, executed only after the routing plane is validated, to confirm no other layer (NSG, firewall, application) is blocking.

Option A starts with the connectivity test (P4), which is ineffective when the routing problem hasn't been diagnosed yet: a timeout says nothing about where traffic is being dropped. Option C starts with the ExpressRoute gateway, skipping steps that could reveal much simpler local causes. Option D has a plausible sequence but inverts P3 and P5, checking BGP route propagation before confirming if peering is configured to use the remote gateway, when peering is a prerequisite for gateway propagation to work.


Troubleshooting Tree: Configure routing rules​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or verification

To use this tree when facing a real problem, start with the root node describing the connectivity symptom. Answer each diagnostic question based on what you can directly observe: effective route table, NVA logs, peering configurations, and gateway state. Follow the path corresponding to your answer until you reach an identified cause node. From there, apply the recommended action and return to validation nodes to confirm the cause has been eliminated before closing the diagnosis.