Skip to main content

Troubleshooting Lab: Configure user-defined routes

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team reports that VMs in the app-subnet (10.2.1.0/24) lost internet access after a maintenance window the previous night. The environment has an NVA deployed in nva-subnet (10.2.2.0/24) with IP 10.2.2.10, responsible for inspecting and forwarding outbound traffic.

During maintenance, an administrator applied operating system patches to the NVA and restarted the VM. The administrator confirms that the NVA VM is powered on and that IP forwarding is enabled on the Azure NIC. The NSG associated with the nva-subnet allows inbound and outbound traffic on the necessary ports. The VNet peering was not changed.

The route table associated with the app-subnet contains the following entry:

Destination prefix: 0.0.0.0/0
Next hop type: VirtualAppliance
Next hop address: 10.2.2.10

When running a ping 8.8.8.8 from a VM in the app-subnet, the packets do not return.

What is the root cause of the problem?

A) IP forwarding on the Azure NIC of the NVA was disabled during the VM restart

B) The NSG on the nva-subnet is blocking outbound ICMP traffic to the internet

C) The NVA operating system does not have IP forwarding enabled after the restart

D) The route table with the UDR was disassociated from the app-subnet during the maintenance window


Scenario 2 β€” Action Decision​

The cause of the problem below has already been identified by the team: a route table was mistakenly associated with the GatewaySubnet of a hub VNet, adding a UDR with next hop None for the prefix 10.0.0.0/8. This is dropping all traffic routed via VPN Gateway between the on-premises network and the spoke VNets.

The environment is production. Approximately 300 remote users depend on VPN connectivity at this time. The GatewaySubnet does not have any other legitimate route table associated that can be restored. The team has write permissions to the hub VNet resource group.

What is the correct action to take at this time?

A) Edit the existing UDR in the route table, changing the next hop type from None to VirtualNetworkGateway

B) Disassociate the route table from the GatewaySubnet immediately, without waiting for a maintenance window

C) Create a new route table with the correct route and associate it with the GatewaySubnet before removing the incorrect one

D) Restart the VPN Gateway to force route re-reading and restore connectivity


Scenario 3 β€” Root Cause​

An administrator configures a hub-spoke topology with centralized traffic inspection. The hub contains an NVA at 10.0.1.4. The spoke has the subnet spoke-app (10.1.2.0/24).

The route table associated with spoke-app contains:

Destination prefix: 0.0.0.0/0
Next hop type: VirtualAppliance
Next hop address: 10.0.1.4

The VNet Peering between hub and spoke is configured with Allow forwarded traffic enabled on both sides. The NVA VM is running, with IP forwarding enabled on the Azure NIC and operating system. The administrator reports that traffic from spoke-app to the internet works correctly, but traffic from spoke-app to a spoke-db subnet (10.1.3.0/24) within the same spoke VNet does not reach its destination.

The administrator suspects the peering has a problem and opens a ticket with the network team.

What is the root cause of the problem?

A) VNet Peering does not propagate routes to subnets within the same spoke VNet when a 0.0.0.0/0 UDR is active

B) The NVA in the hub does not have a return route to the spoke-db subnet after forwarding traffic

C) The 0.0.0.0/0 UDR is also capturing traffic destined for 10.1.3.0/24 and sending it to the NVA, which does not forward this traffic back to the spoke VNet

D) IP forwarding on the NVA operating system does not support forwarding traffic between subnets of the same remote VNet


Scenario 4 β€” Diagnostic Sequence​

A VM in the prod-app subnet cannot reach an internal endpoint at 10.5.0.20. The environment has custom route tables, NSGs, and an NVA in the path. You need to diagnose the problem efficiently.

The steps below are out of order:

  1. Verify in Azure Network Watcher with the Next Hop tool which next hop is being resolved for destination 10.5.0.20 from the source VM
  2. Confirm if the NVA is forwarding traffic at the operating system level, checking logs or interface counters
  3. Check the effective routes of the source VM's NIC to identify which route is being applied
  4. Confirm if there is an NSG rule blocking traffic between the source VM and destination 10.5.0.20
  5. Test direct connectivity from the source VM to 10.5.0.20 using Test-NetConnection or curl

What is the correct diagnostic sequence?

A) 5 -> 1 -> 3 -> 4 -> 2

B) 5 -> 4 -> 1 -> 3 -> 2

C) 5 -> 3 -> 1 -> 4 -> 2

D) 1 -> 3 -> 5 -> 4 -> 2


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: C

The central clue is in the sequence of events: the problem arose after patches and operating system restart of the NVA. IP forwarding in Azure (enabled on the NIC) and IP forwarding in the operating system are independent configurations. Azure only allows the interface to receive packets destined for other IPs; who decides to forward them is the OS kernel. After a restart, manually applied configurations (like net.ipv4.ip_forward=1 without persistence in /etc/sysctl.conf on Linux) are lost.

The irrelevant information in the statement is the NSG state on the nva-subnet: since traffic is not even being forwarded by the NVA, the NSG is not the bottleneck, and its mention serves only to divert the diagnosis.

Alternative A is attractive because it reminds us that the two configurations are distinct, but the statement explicitly confirms that IP forwarding on the Azure NIC is enabled. Alternative D would be catastrophic if real, but there is no indication of disassociation in the description. The most dangerous distractor is A, as it would lead the analyst to check the NIC configuration (which is already correct) instead of accessing the NVA OS and checking the forwarding state.


Answer Key β€” Scenario 2​

Answer: B

The cause is already identified and clear: an improper route table is associated with the GatewaySubnet, with a None route that drops traffic. The correct and immediate action is to disassociate the route table, restoring the gateway's default behavior. Microsoft explicitly documents that the GatewaySubnet should not have route tables with UDRs that interfere with gateway routes. Removing the association is the most direct action, with lowest risk and immediate effect for the 300 affected users.

Alternative A is incorrect because editing the UDR to VirtualNetworkGateway is not the proper configuration for the GatewaySubnet; it should remain without route tables. Alternative C adds an unnecessary step that prolongs the impact. Alternative D is the most dangerous distractor: restarting the VPN Gateway is a long operation (can take 45 minutes or more), would cause additional interruption, and would not solve the problem, as the route table would still be associated after restart.


Answer Key β€” Scenario 3​

Answer: C

The 0.0.0.0/0 UDR captures any traffic whose destination does not have a more specific route. The prefix 10.1.3.0/24 (spoke-db) is contained in 10.1.0.0/16, but if there is no more specific system route visible in the spoke-app route table for this prefix after the UDR overlay, traffic to 10.1.3.0/24 is also sent to the NVA in the hub. The NVA receives the packet but has no configuration to forward traffic back to a subnet within the spoke VNet, as its natural next hop to 10.1.3.0/24 depends on a peering route that may not exist or be properly propagated in this return path.

The irrelevant information is the confirmation that internet traffic works: this data confirms that the NVA is operational for external traffic, but has no relation to the east-west traffic problem within the spoke.

The most dangerous distractor is B, as the lack of return route in the NVA is a real problem in inspection topologies, but the root cause here is earlier: the traffic should not even be reaching the NVA. The correct solution would be to add specific UDRs for local spoke subnets with next hop VnetLocal, preventing the 0.0.0.0/0 route from capturing this traffic.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is: 5 -> 1 -> 3 -> 4 -> 2

The progressive diagnostic reasoning starts from the observable symptom to the most specific cause:

  • Step 5: objectively confirm the symptom before investigating infrastructure. Without confirming the failure, any investigation may be premature.
  • Step 1: use Network Watcher's Next Hop to immediately understand which next hop is resolved by Azure's control plane. This reveals if the problem is in the routing layer.
  • Step 3: check the NIC's effective routes to get the complete view of applied routes, confirming or detailing what Next Hop already indicated.
  • Step 4: after confirming that the route leads to the NVA or correct destination, verify if an NSG is blocking traffic in this path.
  • Step 2: lastly, verify NVA behavior at the OS level, as this is the most costly step and should only be done when previous steps indicate that traffic is reaching the NVA but not being forwarded.

Alternative B is attractive but incorrect: checking NSG before effective routes would lead to investigating a filtering layer without yet knowing if the route is correct. Alternative D starts with Network Watcher without first confirming the symptom, skipping the initial validation step.


Troubleshooting Tree: Configure user-defined routes​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color legend:

ColorNode type
Dark blueInitial symptom (entry point)
Medium blueDiagnostic question (verifiable decision)
RedIdentified cause
GreenRecommended action or resolution

To use this tree when facing a real problem, start with the root node describing the observed symptom and follow each branch by objectively answering the decision node's question based on what you can observe or measure in the environment. Each answer eliminates a set of hypotheses and directs toward the correct cause or action without needing to test all possibilities. The shortest path to resolution is always the one that starts from control plane verification (Next Hop, effective routes) before descending to the data plane (NVA, NSG, OS).