Skip to main content

Troubleshooting Lab: Design and Implement User-Defined Routes (UDRs)

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team reports that VMs in a production subnet (10.1.2.0/24) have lost connectivity to a log server located at 10.3.0.10, which belongs to a remote VNet connected via VNet Peering. Connectivity to resources within the same VNet continues to work normally. The peering shows Connected status on both sides.

The administrator checks the effective routes of the NIC from one of the affected VMs and gets the following output:

Destination prefix    Next hop              Source
-------------------- -------------------- --------------------
10.1.0.0/16 VnetLocal Default
10.3.0.0/16 VirtualAppliance User
10.3.0.0/24 VirtualNetworkPeering Default
0.0.0.0/0 VirtualAppliance User

The NVA referenced in the UDR has IP 10.0.1.4 and is running. The subnet 10.1.2.0/24 has a route table associated with two entries: one for 0.0.0.0/0 and one for 10.3.0.0/16, both pointing to the NVA. The security team reports that an NSG was added to the NVA subnet the previous week, but there were no changes to the UDRs.

What is the root cause of the connectivity loss to 10.3.0.10?

A) The NSG added to the NVA subnet is blocking traffic before it reaches the log server.

B) The UDR with prefix 10.3.0.0/16 overrides the system route from peering (10.3.0.0/24), and the NVA is not correctly forwarding traffic to the destination.

C) The peering shows Connected status but didn't propagate the route 10.3.0.0/24 correctly, causing silent drops.

D) The UDR route 10.3.0.0/16 is less specific than the peering route 10.3.0.0/24, so Azure uses the peering route and the problem is elsewhere.


Scenario 2 β€” Action Decision​

An organization identified that VMs in a spoke subnet (10.2.1.0/24) are communicating directly with the internet without passing through the Azure Firewall deployed in the hub VNet (10.0.0.4). The cause has been confirmed: the spoke subnet has no route table associated, so it uses only default system routes.

The environment is production with active SLA. The firewall is already configured and operational. A colleague suggests immediately creating and associating a UDR with 0.0.0.0/0 pointing to 10.0.0.4 as VirtualAppliance.

Before executing the action, the responsible architect identifies a critical constraint: there's an application rule in Azure Firewall that hasn't been configured yet to allow HTTPS traffic from spoke VMs to three SaaS endpoints used by production systems. These endpoints are not documented in the CMDB.

What is the correct action to take at this moment?

A) Create and associate the UDR immediately, since the firewall is already operational and can be adjusted later based on blocked traffic logs.

B) Identify the SaaS endpoints accessed by production VMs via flow logs or traffic analysis, configure the firewall rules, and only then create and associate the UDR.

C) Create the UDR with 0.0.0.0/0 pointing to the firewall and simultaneously add a temporary allow all rule in the firewall to avoid interruption during transition.

D) Open a ticket for the security team to identify the endpoints before any action, without creating the UDR or changing the firewall.


Scenario 3 β€” Root Cause​

A VM (vm-app-01) in an application subnet is trying to reach an on-premises endpoint at IP 192.168.10.20. The connection is made via VPN Gateway deployed in the hub VNet. The environment uses hub-and-spoke topology with peering between hub and spoke. The spoke-to-hub peering has the Use Remote Gateway option enabled, and the hub-to-spoke peering has Allow Gateway Transit enabled.

The administrator runs the following command from the VM and gets no response:

curl --connect-timeout 5 http://192.168.10.20
# curl: (28) Connection timed out after 5001 milliseconds

He checks the effective routes of the NIC from vm-app-01:

Destination prefix    Next hop              Source
-------------------- -------------------- --------------------
10.1.0.0/16 VnetLocal Default
10.0.0.0/16 VNetPeering Default
192.168.10.0/24 VirtualNetworkGateway Default
0.0.0.0/0 VirtualAppliance User
192.168.0.0/16 VirtualAppliance User

The VPN Gateway has Connected status and the on-premises tunnel is active. The on-premises network team confirms no recent changes to their infrastructure. The NVA referenced in the UDR 192.168.0.0/16 has IP 10.0.1.4 and is running.

What is the root cause of the connectivity failure?

A) The system route 192.168.10.0/24 learned via gateway is being applied correctly, so the problem is in the VPN tunnel, which must be having intermittent failures.

B) The UDR with prefix 192.168.0.0/16 overrides the more specific route 192.168.10.0/24 learned via gateway, redirecting traffic to the NVA instead of the VPN Gateway.

C) The peering with Use Remote Gateway prevents routes learned via BGP from being propagated to the spoke subnet, blocking the route 192.168.10.0/24.

D) The route 0.0.0.0/0 pointing to the NVA intercepts traffic destined for 192.168.10.20 because it's evaluated before more specific routes in the effective table.


Scenario 4 β€” Collateral Impact​

An administrator identified that VMs in an application subnet were communicating with 8.8.8.8 directly, without passing through the inspection NVA. To fix this, he created a UDR with 0.0.0.0/0 pointing to the NVA (10.0.1.4 as VirtualAppliance) and associated it to the subnet. Traffic to 8.8.8.8 started being inspected as expected.

The next day, the platform team reports that Azure Monitor Agent on the VMs in the subnet stopped sending metrics and logs to the Log Analytics workspace. The VMs are running and the agents appear as installed.

What is the collateral impact caused by the corrective action applied?

A) The UDR changed the system routes of the entire VNet, blocking agent access to the Log Analytics endpoint in all subnets.

B) The NVA is dropping HTTPS traffic from VMs to Azure Monitor endpoints because it doesn't have rules configured to allow these destinations, interrupting telemetry.

C) Associating a route table to the subnet automatically disabled the Service Endpoint for Microsoft.OperationalInsights, breaking private connectivity to Log Analytics.

D) The route 0.0.0.0/0 via NVA suppresses system routes for Microsoft managed prefixes, making Azure Monitor endpoints inaccessible even if the NVA allows the traffic.


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The effective routes output shows two entries for the 10.3.0.0 space: a UDR with /16 prefix (source User) and a peering route with /24 prefix (source Default). By the longest prefix match rule, the /24 route would be more specific and should prevail, making alternative D seemingly correct. However, Azure's real behavior is that UDRs always take precedence over system routes of the same prefix length, but when the UDR prefix is less specific (/16) and the system route prefix is more specific (/24), Azure uses the more specific route. In this case, the peering route 10.3.0.0/24 should win.

The critical detail the scenario provides is in the actual output: both routes appear in the table, and traffic to 10.3.0.10 should follow the /24 peering route. However, the scenario confirms connectivity is failing. The cause is in the NVA: the UDR /16 captures traffic to 10.3.0.0/16 as a whole, and the NVA is not forwarding correctly because IP Forwarding is not enabled or because the NVA doesn't have a return route. The peering route /24 appears but is overridden in practice by the NVA's behavior in the path.

The information about the NSG added to the NVA subnet is intentionally irrelevant information. The scenario states the NVA is running, and the NSG would affect traffic that already reached the NVA, not route selection. The diagnosis shouldn't start with the NSG without first confirming traffic is reaching the NVA.

The most dangerous distractor is D: inducing the reader to conclude that the peering route prevails and the problem is elsewhere, ignoring the NVA's actual behavior in forwarding.


Answer Key β€” Scenario 2​

Answer: B

The cause has been identified and the firewall is operational, but there's a critical constraint: three production SaaS endpoints are not documented and don't have allow rules in the firewall yet. Associating the UDR before configuring these rules would block production traffic without warning, since Azure Firewall denies by default everything not explicitly allowed.

The correct action is to identify the endpoints first, configure the rules, and only then apply the UDR. This is the only path that respects the active SLA without introducing an impact window.

Alternative A is the most dangerous distractor: associating the UDR and adjusting later seems agile, but in production with active SLA, "later" happens during an active interruption. Alternative C introduces a temporary allow all rule, which violates security posture and defeats the firewall's purpose. Alternative D errs by omission: opening a ticket without any parallel action, like flow log analysis, wastes time without delivering progress.


Answer Key β€” Scenario 3​

Answer: B

The effective routes table shows two overlapping entries for the 192.168.x.x space:

192.168.10.0/24   VirtualNetworkGateway   Default
192.168.0.0/16 VirtualAppliance User

By the longest prefix match rule, 192.168.10.0/24 (more specific) should prevail over 192.168.0.0/16. This would make alternative B seemingly wrong. However, UDR precedence over system routes applies when prefixes have the same length. When lengths differ, the longer prefix wins regardless of source.

The real cause is that the UDR 192.168.0.0/16 has /16 prefix and the gateway route has /24 prefix. Azure correctly applies 192.168.10.0/24 via gateway as the longest route. But the NVA referenced in the UDR /16 is dropping traffic that should go to the gateway, because it doesn't have a return route or IP Forwarding is misconfigured for this prefix. The effective route shows the gateway as next hop for /24, but traffic doesn't arrive because the NVA in the /16 route interferes first.

The information about the VPN Gateway's Connected status and the on-premises team's confirmation of no changes are irrelevant information: they divert diagnosis to the VPN tunnel when the problem is in route selection and forwarding within Azure.

The most dangerous distractor is A: trusting the Connected tunnel status and concluding intermittent failure, without examining the effective routes table.


Answer Key β€” Scenario 4​

Answer: B

By creating a UDR with 0.0.0.0/0 pointing to the NVA, all outbound traffic from VMs in the subnet starts being inspected by the NVA, including HTTPS traffic from Azure Monitor agents to public service endpoints (*.ods.opinsights.azure.com, *.oms.opinsights.azure.com and similar). If the NVA doesn't have explicit rules to allow these destinations, it silently drops the traffic, interrupting telemetry.

Alternative A is wrong because UDRs are associated to specific subnets and don't affect other subnets in the VNet. Alternative C is technically plausible but incorrect: associating a route table doesn't automatically disable Service Endpoints; this is a separate and explicit operation. Alternative D describes behavior that doesn't exist: the route 0.0.0.0/0 doesn't suppress system routes for specific Microsoft prefixes; it only captures traffic without more specific matches.

The real collateral impact is the absence of rules in the NVA for Azure Monitor endpoints, a destination often forgotten when redirecting outbound traffic via appliance.


Troubleshooting Tree: Design and Implement User-Defined Routes (UDRs)​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark blueInitial symptom (entry point)
BlueDiagnostic question
OrangeValidation or intermediate verification
RedIdentified cause
GreenRecommended action or resolution

To use this tree when facing a real problem, start at the root node describing the observed symptom and follow the branches by answering each question based on what you can verify in the environment. The blue questions are all directly verifiable in the portal, via CLI, or via NIC effective routes. When you reach a red node, you've identified the cause. The immediately related green node indicates the corrective action. Always validate the result after applying the correction before closing the diagnosis.