Skip to main content

Troubleshooting Lab: Design and implement ExpressRoute options, including Global Reach, FastPath, and ExpressRoute Direct

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A network team from an industrial company reports that connectivity between the on-premises data center in Chicago and VMs hosted in an Azure VNet was interrupted after a maintenance window performed the previous night. The ExpressRoute circuit appears as Provisioned in the Azure portal and the provider status also shows as Provisioned.

During the investigation, the engineer collects the following information:

az network express-route show \
--name er-chicago-prod \
--resource-group rg-network \
--query "{circuitStatus:circuitProvisioningState, provider:serviceProviderProvisioningState}"

{
"circuitStatus": "Enabled",
"provider": "Provisioned"
}
az network express-route peering list \
--circuit-name er-chicago-prod \
--resource-group rg-network \
--query "[].{type:peeringType, state:state, primaryPrefix:primaryPeerAddressPrefix}"

[
{
"type": "AzurePrivatePeering",
"state": "Disabled",
"primaryPrefix": "10.0.0.0/30"
}
]

The engineer also verifies that the virtual network gateway associated with the VNet is in Succeeded state and that no routes were changed in the on-premises routing tables during maintenance. The circuit was migrated to a new resource group during this maintenance window.

What is the root cause of the connectivity loss?

A) The circuit was moved to a new resource group, which automatically disassociates the virtual network gateway from the connection resource
B) Private peering was disabled, preventing the establishment of BGP sessions between Microsoft edge routers and on-premises equipment
C) The virtual network gateway lost the static route to the on-premises network after the circuit migration
D) The peering prefix 10.0.0.0/30 conflicts with the VNet address space, blocking the BGP session


Scenario 2 β€” Action Decision​

A logistics company uses ExpressRoute Global Reach to connect their data centers in SΓ£o Paulo and Buenos Aires through Microsoft's backbone. The security team identified that, due to regulatory requirements in the sector, traffic between the two on-premises sites can no longer transit through Microsoft's network starting at midnight on the next business day. Both circuits must continue operating normally to access VNets in Azure, where critical cargo tracking systems run in continuous production.

The cause is identified: Global Reach between the two circuits needs to be removed. The approved maintenance window is 30 minutes, starting at 11:30 PM the night before the regulatory deadline.

What is the correct action to take within the maintenance window?

A) Delete both ExpressRoute circuits and reprovision them without enabling Global Reach, ensuring full compliance
B) Remove only the Global Reach association between the two circuits, preserving the circuits and their connections to VNets intact
C) Disable private peering on both circuits during the window and re-enable it without reconfiguring Global Reach
D) Reconfigure on-premises routing tables to block prefixes advertised via Global Reach, without changing Azure configuration


Scenario 3 β€” Root Cause​

A financial services company provisioned ExpressRoute Direct with 100 Gbps ports at a peering location in New York. Over this port pair, three logical circuits were created for three different business units. The team reports that the third business unit's circuit, named er-direct-bu3, has NotProvisioned status and no traffic flows through it.

The engineer collects the following data:

az network express-route list \
--resource-group rg-direct-ny \
--query "[].{name:name, bandwidth:serviceProviderProperties.bandwidthInMbps, state:circuitProvisioningState}"

[
{ "name": "er-direct-bu1", "bandwidth": 50000, "state": "Enabled" },
{ "name": "er-direct-bu2", "bandwidth": 40000, "state": "Enabled" },
{ "name": "er-direct-bu3", "bandwidth": 20000, "state": "NotProvisioned" }
]

The engineer also verifies that the private peering of circuit er-direct-bu3 is configured correctly, that BGP sessions are attempting to establish, and that there are no health alerts in the ExpressRoute Direct Port dashboard in the Azure portal.

What is the root cause of the NotProvisioned state on the third circuit?

A) The private peering of circuit er-direct-bu3 was configured before the circuit was enabled, which corrupts the provisioning state
B) BGP sessions cannot be established while the circuit is in NotProvisioned state, which indicates physical layer failure of the port
C) The sum of the three circuits' bandwidth exceeds the 100 Gbps port capacity, preventing the third circuit from being provisioned
D) Circuit er-direct-bu3 belongs to a different resource group than the other two, which blocks shared provisioning over the same ExpressRoute Direct Port


Scenario 4 β€” Diagnostic Sequence​

An engineer receives the following report: "VMs in a VNet peered via VNet Peering with the main VNet cannot be reached from the on-premises environment. The main VNet has normal on-premises connectivity via ExpressRoute. FastPath is enabled on the main VNet's gateway."

The engineer has the following investigation steps available:

  • Step P: Verify if VNet Peering is configured with "Allow Gateway Transit" option on the main VNet and "Use Remote Gateway" on the peered VNet
  • Step Q: Confirm that the virtual network gateway is in Succeeded state and associated with the correct ExpressRoute circuit
  • Step R: Check effective route tables on NICs of peered VNet VMs to confirm if the on-premises route is present
  • Step S: Confirm that FastPath is enabled and that the gateway tier is compatible with FastPath
  • Step T: Test direct connectivity between on-premises and a main VNet VM to isolate if the problem is general or restricted to the peered VNet

Which diagnostic sequence is the most efficient and technically correct for this scenario?

A) S, Q, P, T, R
B) T, Q, P, R, S
C) Q, S, T, P, R
D) T, P, R, Q, S


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue is in the peering list command output: the AzurePrivatePeering has "state": "Disabled". Without active private peering, no BGP session can be established between MSEEs and on-premises equipment, and therefore no routes are exchanged, interrupting all data connectivity. The circuit and provider status being Provisioned and Enabled indicates that the provisioning layer is correct, but the routing control plane is inactive.

The information about resource group migration is the scenario's irrelevant distractor. Moving a circuit between resource groups in Azure does not alter peering states or disassociate connections; it's a management metadata operation, not a network configuration operation.

The most dangerous distractor is A, as the resource group migration is explicitly mentioned in the statement and creates a convincing but false temporal correlation. Acting based on this distractor would lead the engineer to recreate the connection resource and gateway without resolving the disabled peering, maintaining the failure.


Answer Key β€” Scenario 2​

Answer: B

Global Reach is a point-to-point association between two specific circuits. Removing this association terminates exclusively the path between the two on-premises sites, without affecting either circuit's connectivity to Azure VNets. This is exactly the surgical action the scenario requires: comply with regulatory requirements without introducing downtime to critical production systems.

Alternative A represents the technically nuclear action: deleting the circuits would interrupt all Azure connectivity, including production tracking systems, causing severe impact that the scenario constraints explicitly prohibit. Alternative C, disabling private peering, would also bring down VNet connectivity, violating the same constraint. Alternative D solves the problem only locally at one on-premises site and is not an Azure-managed solution; additionally, it doesn't guarantee that the other site will also block traffic.


Answer Key β€” Scenario 3​

Answer: C

The sum of the three circuits' bandwidth is 50,000 + 40,000 + 20,000 = 110,000 Mbps (110 Gbps), which exceeds the 100 Gbps total port capacity. ExpressRoute Direct does not allow the sum of logical circuits' bandwidth provisioned over the same port pair to exceed the contracted capacity. Therefore, the third circuit, which was the last to be created, remains in NotProvisioned state.

The information about BGP sessions attempting to establish and private peering being configured correctly is irrelevant in this context: the problem occurs at the bandwidth capacity layer, before any BGP negotiation. The ExpressRoute Direct Port dashboard without alerts might confuse diagnosis, but physical port health alerts are distinct from allocated bandwidth capacity alerts.

The most dangerous distractor is B, as the NotProvisioned state does indeed prevent BGP, but the engineer would be confusing symptom with cause. Investigating the physical layer would be an expensive and time-consuming path with no result.


Answer Key β€” Scenario 4​

Answer: B

The correct sequence is T, Q, P, R, S, which follows progressive diagnostic logic from general to specific:

  • T first: confirming that the main VNet has normal on-premises connectivity isolates if the problem is general (circuit or gateway) or restricted to the peered VNet. If T fails, the scope changes completely.
  • Q second: validating gateway state ensures that the base infrastructure is healthy before investigating peering configurations.
  • P third: checking "Allow Gateway Transit" and "Use Remote Gateway" configurations is the most likely step to contain the root cause in this scenario, as without these options the peered VNet doesn't receive on-premises routes via the main VNet's gateway.
  • R fourth: inspecting effective routes on peered VNet NICs confirms if on-premises routes are indeed being propagated after validating P.
  • S last: verifying FastPath compatibility is relevant, but FastPath doesn't affect peered VNet reachability, so it should be the last step, only to rule out secondary hypothesis.

Starting with S (alternative A) would be the most common error, as FastPath is explicitly mentioned in the statement and attracts attention, but it's not the cause of peered VNet failures.


Troubleshooting Tree: Design and Implement ExpressRoute Options​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause or failure state
GreenRecommended action or resolution
OrangeValidation or intermediate verification

To use this tree when facing a real problem, start with the root node by identifying the connectivity symptom. At each question node, observe what is directly verifiable: provisioning state in the portal, CLI command output, BGP state, peering configurations. Follow the path corresponding to what you observe until you reach a red node (identified cause) or green node (recommended action). Never skip steps: the question order was designed to eliminate hypotheses progressively, from most general to most specific, avoiding premature corrective actions that may amplify impact.