Skip to main content

Troubleshooting Lab: Connect a virtual network to an ExpressRoute circuit

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team reports that VMs in a production virtual network lost connectivity to on-premises servers at 14:37. The ExpressRoute circuit continues with "Enabled" status in the Azure portal. The Virtual Network Gateway shows "Succeeded" state. The connection resource appears with Connection State: Connected.

The responsible engineer executes the following commands to investigate:

# Get-AzVirtualNetworkGatewayLearnedRoute output
Network NextHop Origin SourcePeer Prefix AsPath Weight
------- ------- ------ ---------- ------ ------ ------
VNet 10.0.0.1 Network 10.0.0.0/16 32768

The on-premises infrastructure team reports no scheduled maintenance. The circuit was contracted via a partner provider and the physical link is active according to the provider. The security team reports that a new firewall policy was applied at the on-premises perimeter this morning but ensures that only outbound internet traffic was affected.

What is the root cause of the connectivity loss?

A. The Virtual Network Gateway entered automatic maintenance mode, temporarily suspending packet forwarding

B. The on-premises router stopped advertising local network prefixes via BGP to the circuit, which is why no routes to on-premises destinations appear in the gateway table

C. The firewall policy applied by the security team is blocking return traffic from VMs, as internet outbound rules can inadvertently affect asymmetric routing via ExpressRoute

D. The BGP peering between the gateway and the Microsoft Edge Router was dropped due to keepalive timer expiration, and the connection resource has not yet reflected this state


Scenario 2 β€” Action Decision​

The cause of a connectivity failure between a VNet and the on-premises network has been identified: the Authorization Key used in the connection resource has expired. The ExpressRoute circuit belongs to a different subscription than the subscription containing the VNet and gateway. The allowed maintenance window is only 30 minutes, started 10 minutes ago. Any change to the connection resource drops the BGP session during reconfiguration. The on-premises team has confirmed availability for the procedure.

The engineer has Network Contributor access in the VNet subscription but only read access in the circuit subscription.

What is the correct action to take at this moment?

A. Delete the existing connection resource, request the circuit owner to generate a new Authorization Key, and recreate the connection resource with the new key within the maintenance window

B. Update the Authorization Key field directly in the existing connection resource without deleting it to minimize downtime

C. Immediately escalate to the circuit subscription owner and request generation of a new Authorization Key, as the reset action requires permission in that subscription, and the maintenance window still has sufficient time for the complete procedure

D. Wait for the current maintenance window to end and schedule a new procedure with a broader window, as the remaining 20 minutes are insufficient to safely recreate the connection


Scenario 3 β€” Root Cause​

A company recently connected a second VNet to the same Standard ExpressRoute circuit. The first VNet continued working normally. VMs in the second VNet can reach on-premises resources but cannot reach VMs in the first VNet via the private path.

The configuration of both VNets is as follows:

ConfigurationVNet-A (original)VNet-B (new)
Address space10.1.0.0/1610.2.0.0/16
Gateway SKUErGw2AZErGw1AZ
Peering with other VNetNot configuredNot configured
Circuit connectionActiveActive

The engineer verifies learned routes in VNet-B gateway and confirms that prefix 10.1.0.0/16 does not appear in the table. The network team reports that the circuit used is Standard type, not Premium. The on-premises router is correctly advertising both VNets' prefixes to local clients.

What is the root cause of the problem?

A. The SKU difference between gateways (ErGw2AZ vs ErGw1AZ) prevents routes from being exchanged between VNets connected to the same circuit

B. The ExpressRoute circuit does not function as a transit mechanism between VNets connected to it; communication between VNet-A and VNet-B requires VNet peering or another direct connectivity mechanism

C. Standard type circuits do not support multiple simultaneous VNet connections; an upgrade to Premium would be required before connecting VNet-B

D. Both VNets' prefixes need to be manually advertised via BGP for the circuit to propagate routes between them; the absence of explicit route advertisement configuration explains the failure


Scenario 4 β€” Diagnostic Sequence​

An administrator receives the following alert at 09:12:

ALERT: ExpressRoute connection 'conn-prod-expressroute'
Status: Degraded
Gateway: vnet-gw-prod (ErGw3AZ)
Circuit: circuit-prod-br
Symptom: Partial packet loss on private peering path
Time detected: 09:08 UTC

The administrator has the following investigation steps available:

  1. Verify BGP peer state in the gateway with Get-AzVirtualNetworkGatewayBGPPeerStatus
  2. Check for scheduled maintenance on the circuit by provider via portal or support
  3. Confirm if the circuit's provider provisioning state is still "Provisioned"
  4. Execute Connection Monitor or latency tests between a VNet VM and an on-premises host to quantify the loss
  5. Check gateway diagnostic logs in Log Analytics to identify reconnection events or control plane errors

What is the correct investigation sequence for this scenario?

A. 3 β†’ 2 β†’ 1 β†’ 5 β†’ 4

B. 4 β†’ 1 β†’ 3 β†’ 5 β†’ 2

C. 1 β†’ 3 β†’ 2 β†’ 4 β†’ 5

D. 2 β†’ 4 β†’ 3 β†’ 1 β†’ 5


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue is in the Get-AzVirtualNetworkGatewayLearnedRoute command output: the table contains only the VNet's own route (10.0.0.0/16 with origin Network), without any routes learned via BGP from EBgp origin. This indicates that the gateway is not receiving route advertisements from the on-premises side. When the on-premises router stops advertising prefixes via BGP, the gateway has no way to know how to reach local destinations, and traffic from VMs simply cannot find a return path.

The information about the firewall policy applied by the security team is the irrelevant information inserted intentionally. It describes internet outbound rules, which do not affect ExpressRoute BGP control plane. Alternative C attempts to exploit this information to mislead the diagnosis.

Alternative A is incorrect because automatic gateway maintenance would not silently suspend forwarding; the gateway state would reflect this. Alternative D is plausible but would be refuted by the fact that the connection resource still appears as "Connected"; a BGP peering drop would eventually reflect in the connection state or be visible via Get-AzVirtualNetworkGatewayBGPPeerStatus. The most dangerous distractor is C, as it leverages real information from the scenario to lead the engineer to investigate in the wrong place.


Answer Key β€” Scenario 2​

Answer: C

The critical constraint in the scenario is that the engineer has only read access in the circuit subscription. The operation to generate a new Authorization Key requires write permission in that subscription. Therefore, the engineer cannot execute this step alone. The correct action is to immediately escalate to the circuit subscription owner, as the maintenance window still has 20 minutes available, sufficient time for a procedure involving key generation and connection resource recreation.

Alternative A describes the correct technical sequence but ignores the permission constraint: the engineer could not request key generation themselves. Alternative B would be the most elegant action if possible, but Authorization Keys are not editable in-place fields in an existing connection resource; the connection needs to be recreated. Alternative D is the most conservative distractor and represents the paralysis-by-analysis error: 20 minutes are sufficient for the procedure if the on-premises team is already available.


Answer Key β€” Scenario 3​

Answer: B

ExpressRoute does not function as a transit router between VNets. Even when two VNets are connected to the same circuit, traffic between them is not routed through the circuit. Each VNet learns on-premises routes from the circuit, but one VNet's prefix is not propagated to the other VNet via ExpressRoute. For VNet-A and VNet-B to communicate, VNet peering must be configured between them, or a hub-spoke architecture with gateway transit enabled must be used.

The clue confirming this cause is in the table: neither VNet has peering configured with the other, and prefix 10.1.0.0/16 does not appear in the routes learned by VNet-B's gateway. The fact that the on-premises router correctly advertises prefixes is the irrelevant information: it confirms that the circuit works correctly for the VNet-to-on-premises path but has no relation to the absence of VNet-to-VNet communication.

Alternative A is a distractor that exploits real confusion about SKU compatibility: different SKUs do not prevent connection to the same circuit. Alternative C is incorrect because Standard type limits refer to the number of connections, not the ability to operate multiple simultaneous connections. Alternative D represents a misconception about how BGP works in ExpressRoute: VNet prefixes are advertised automatically.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is: 3 β†’ 2 β†’ 1 β†’ 5 β†’ 4.

The diagnostic reasoning moves from the most fundamental to the most granular level:

First, confirm if the circuit still has provider provisioning state = "Provisioned" (step 3), as a change in this state would invalidate all subsequent investigation in Azure's control plane. Next, check for scheduled maintenance by the provider (step 2), which is an external and frequent cause of partial degradation not visible in the portal. Then, verify BGP peer state (step 1) to confirm if the control plane is intact. Subsequently, analyze gateway diagnostic logs (step 5) to identify reconnection events or errors that confirm or refute the hypothesis. Finally, execute latency and loss tests (step 4) to quantify and document the impact, which is useful for opening tickets with the provider but is not the first step in diagnosis under pressure.

Alternative B errs by starting with latency testing (step 4), which measures the symptom instead of investigating the cause. Alternative C starts with BGP (step 1), which is premature without first validating if the circuit is provisioned. Alternative D starts with provider maintenance, which would be reasonable, but places latency testing as the second step, before any control plane validation.


Troubleshooting Tree: Connect a virtual network to an ExpressRoute circuit​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark blue (#1a1a2e)Initial symptom (entry point)
Blue (#0077b6)Diagnostic question
Red (#d62828)Identified cause
Green (#2d6a4f)Recommended action or resolution

When facing an actual failure, always start at the root node and answer each question based on what is observable in the portal, PowerShell or CLI commands, or provider information. Each branch eliminates a class of causes and progressively leads to the identified cause node. Upon reaching a red node, the cause is confirmed and the following green node indicates immediate action. Never skip intermediate questions: a high-level symptom like "no connectivity" can have origins in the physical plane, BGP control plane, or VNet routing logic, and each level requires independent validation before acting.