Skip to main content

Troubleshooting Lab: Select an ExpressRoute Connectivity Model

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A company's network team reports that an ExpressRoute circuit provisioned three days ago shows status "Provider status: Not provisioned" in the Azure portal, even after the connectivity partner confirmed that the physical link is already established and active on their side.

The circuit uses the Point-to-point Ethernet model. The company is located in SΓ£o Paulo and the chosen peering location was Campinas. The service key was successfully generated in the Azure portal four days ago. The operations team confirms that BGP has not yet been configured on either side.

Information collected:

ExpressRoute Circuit
Provider status : Not provisioned
Circuit status : Enabled
Peering location : Campinas
Bandwidth : 1 Gbps
SKU : Standard
Service key : a1b2c3d4-xxxx-xxxx-xxxx-xxxxxxxxxxxx

The responsible engineer suspects the problem is the absence of BGP configuration and opens a ticket to configure the sessions immediately.

What is the root cause of the observed status?

A) The absence of BGP configuration prevents the provider from signaling the circuit as provisioned, since BGP is necessary to complete the provisioning handshake.

B) The provider has not yet configured their end of the circuit in Microsoft's provisioning system, which is a mandatory step that precedes any routing configuration by the customer.

C) The Standard SKU does not support the Point-to-point Ethernet model at the Campinas peering location, generating an inconsistent provisioning state.

D) The service key was generated before the physical link was available, making it invalid and requiring a new circuit to be created.


Scenario 2 β€” Action Decision​

A financial organization identified that their 10 Gbps ExpressRoute circuit based on ExpressRoute Direct is consistently operating above 85% utilization during business hours, causing packet drops in bursts. The cause was confirmed by the NOC team: traffic volume has grown beyond the originally planned capacity.

The environment has the following constraints:

  • The physical port pair at the peering location is 100 Gbps
  • There are currently 3 logical circuits provisioned over this port pair, totaling 10 Gbps of allocated bandwidth
  • Budget for new circuits has been approved
  • Any change that causes total production traffic interruption needs to be scheduled with a 72-hour advance maintenance window
  • The problem is currently causing production impact

What is the correct action to take at this moment?

A) Request the provider to immediately replace the port pair with a higher capacity one, since the bottleneck is in the physical infrastructure.

B) Provision a new additional logical circuit over the same existing physical port pair, increasing available bandwidth without the need for interruption and without requiring a maintenance window for existing traffic.

C) Wait for the 72-hour maintenance window opening to reconfigure existing circuits with higher bandwidth individually.

D) Immediately migrate the environment to the Any-to-any (IPVPN) model, which distributes load across multiple WAN paths and solves the saturation problem without requiring a maintenance window.


Scenario 3 β€” Root Cause​

A company with equipment installed in a colocation datacenter in SΓ£o Paulo tries to establish connectivity with Azure using the CloudExchange colocation model. The Exchange provider confirmed that the virtual cross-connection was created between the customer router and the MSEE. However, BGP sessions remain in Idle state after 48 hours.

Information collected by the network engineer:

# Customer router output (IOS-XE)

show bgp summary
Neighbor AS State Up/Down Prefixes Rcvd
12.0.0.1 12076 Idle never 0
12.0.0.2 12076 Idle never 0

# Applied peering configuration
neighbor 12.0.0.1 remote-as 12076
neighbor 12.0.0.1 ebgp-multihop 5
neighbor 12.0.0.2 remote-as 12076
neighbor 12.0.0.2 ebgp-multihop 5

The security team reports that during the previous week, they updated the router ACLs to block inbound TCP connections originating from outside the corporate AS. The engineer dismisses this information considering that BGP connections are initiated by the router itself.

The router can ping addresses 12.0.0.1 and 12.0.0.2. The service key was sent to the provider and the circuit status is Enabled / Provisioned.

What is the root cause of BGP sessions in Idle state?

A) The ebgp-multihop 5 parameter is configured incorrectly; for direct eBGP connections with MSEE, TTL should be 1 (without multihop), and this value prevents session establishment.

B) The circuit has private peering not yet configured in the Azure portal, since Provisioned status only reflects physical circuit provisioning, not peering configuration.

C) The ACLs updated by the security team are blocking inbound TCP connections on port 179, preventing MSEE from initiating the BGP session back to the customer router.

D) AS 12076 is reserved for Microsoft and cannot be used as remote-as in private peering configurations; the customer should use the public ASN assigned to their circuit.


Scenario 4 β€” Diagnostic Sequence​

An engineer receives a report that after migrating ExpressRoute connectivity model from Point-to-point Ethernet to CloudExchange colocation, routes that were previously advertised correctly by Azure to the on-premises network stopped appearing in the routing table of local equipment.

The engineer has the following investigation steps available:

  1. Verify if private peering is configured and active in the Azure portal for the new circuit
  2. Confirm if the virtual cross-connection was completed by the Exchange provider and if the physical layer is active
  3. Check the BGP routing table on the customer router to identify which prefixes, if any, are being received
  4. Confirm if the old circuit (Point-to-point) was deprovisioned before the new circuit was completely validated
  5. Verify if routes advertised by Azure are present in the MSEE routing table using the portal route verification tool

What is the correct diagnostic sequence?

A) 2 β†’ 1 β†’ 5 β†’ 3 β†’ 4

B) 1 β†’ 3 β†’ 2 β†’ 4 β†’ 5

C) 3 β†’ 5 β†’ 1 β†’ 2 β†’ 4

D) 4 β†’ 2 β†’ 1 β†’ 5 β†’ 3


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The status "Provider status: Not provisioned" specifically indicates that the connectivity provider has not yet completed the provisioning step of their circuit end in Microsoft's system. In the Point-to-point Ethernet model, after the customer generates the service key and shares it with the provider, it's the provider who must signal to Microsoft that the physical link and their layer configuration are ready. Until this signaling occurs, the status remains "Not provisioned", regardless of what the provider reports verbally.

The decisive clue in the scenario is that the circuit shows "Circuit status: Enabled" but "Provider status: Not provisioned", which is an intermediate state documented in the ExpressRoute circuit lifecycle. This specific state points directly to the pending action from the provider in Microsoft's provisioning system, not to customer configurations.

The information about the absence of BGP configuration is irrelevant for this diagnosis. BGP only comes into play after the circuit is completely provisioned by both sides. The engineer in the scenario made the classic mistake of advancing to the routing layer (layer 3) before validating that the provisioning layer (operational) was completed. Acting based on distractor A and configuring BGP before provisioning is complete neither resolves nor advances the problem.


Answer Key β€” Scenario 2​

Answer: B

The central technical differentiator of ExpressRoute Direct is precisely the capability to create multiple independent logical circuits over the same physical port pair. Since the port pair is 100 Gbps and only 10 Gbps are currently allocated, physical capacity is available. Provisioning a new additional logical circuit resolves the saturation problem without interrupting existing circuits, making the action possible without requiring a 72-hour maintenance window for production traffic.

Alternative A is incorrect because the bottleneck is not in the physical port pair (100 Gbps with only 10 Gbps in use), but in the capacity of logical circuits provisioned over it. Replacing the port pair would be an unnecessary and disruptive action.

Alternative C represents a technically valid action, but applied at the wrong time: waiting 72 hours while there's production impact and the non-disruptive solution is available is the incorrect decision given the constraints context.

Alternative D is the most dangerous: migrating the connectivity model during an active production incident introduces high risk, operational complexity and would certainly require a maintenance window, violating the scenario's own constraint.


Answer Key β€” Scenario 3​

Answer: C

The root cause is blocking inbound TCP on port 179 by the updated ACLs. BGP protocol operates over TCP on port 179 and, although the session is initiated by the customer router, the MSEE also needs to establish the TCP connection back (the TCP three-way handshake process requires bidirectional traffic). ACLs that block inbound TCP connections originating from outside the corporate AS effectively prevent SYN-ACK and subsequent packets from MSEE from reaching the customer router, keeping sessions in Idle state indefinitely.

The decisive clue is the combination of two facts: ping to MSEE addresses works (layer 3 OK, ACL doesn't block ICMP) and ACLs were recently changed to block inbound TCP from outside the corporate AS. AS 12076 (Microsoft) is external to the corporate AS, therefore its TCP packets would be blocked.

The information about ebgp-multihop 5 is the irrelevant information purposefully included. In a CloudExchange colocation, the customer router and MSEE are in the same facility and technically on the same data link layer, therefore ebgp-multihop is not necessary, but its presence doesn't prevent BGP session establishment by itself.

The most dangerous distractor is A, since ebgp-multihop draws attention for being an atypical configuration, leading the engineer to focus on TTL adjustment while the real problem is in the network security policy. Acting based on A would generate rework without solving the failure.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is 2 β†’ 1 β†’ 5 β†’ 3 β†’ 4, which follows the logic of progressive diagnosis by layers: physical before logical, control plane before data plane, and current state validation before investigating past actions.

Step 2 first: confirming that the virtual cross-connection exists and the physical layer is active is the prerequisite for any subsequent analysis. Without physical connectivity, the remaining steps are irrelevant.

Step 1 next: verifying if private peering is configured in the Azure portal for the new circuit. Without configured peering, no routes will be advertised regardless of physical state.

Step 5 after: verifying routes from the MSEE perspective in the Azure portal to confirm that Azure is indeed advertising the expected prefixes. This isolates whether the problem is on the Azure side or in the path to the customer.

Step 3 after MSEE validation: with confirmation that routes exist in MSEE, verifying what's reaching the customer router points to where the problem is occurring in the return path.

Step 4 last: investigating if the old circuit was prematurely deprovisioned is a hypothesis about a past action that only makes sense to verify after exhausting current technical causes.

Sequences B, C, and D make the mistake of advancing to higher layers (routing, customer BGP table) before validating foundations (physical layer and peering configuration), which can lead to incorrect diagnoses or rework due to lack of visibility into the real infrastructure state.


Troubleshooting Tree: Select an ExpressRoute Connectivity Model​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color legend:

ColorNode type
Dark blueInitial symptom (entry point)
Medium blueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeValidation or intermediate verification

To use this tree when facing a real problem, start with the root node and identify which of the four main symptoms best describes what was observed. Follow the branches by answering diagnostic questions based on what can be directly verified in the environment: Azure portal, customer router, BGP command outputs, and provider confirmations. Each path leads to an identified cause in red, followed by a recommended action in green and a validation step in orange that confirms whether the correction was effective before closing the diagnosis.