Skip to main content

Troubleshooting Lab: Configure Azure private peering

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A company has just completed migrating its database servers to VMs in Azure. The ExpressRoute circuit was provisioned by the provider three days ago and the status returned by the portal is Provisioned. The responsible engineer configured Azure private peering with the following parameters:

PeeringType        : AzurePrivatePeering
PeerASN : 65001
PrimaryPeerAddress : 192.168.100.0/30
SecondaryPeerAddress: 192.168.100.4/30
VlanId : 200
State : Enabled

A connection was created between the circuit and the Virtual Network Gateway of the production VNet. The gateway is of type ExpressRoute, SKU Standard, and was provisioned a week ago. The VNet contains subnets in the 10.10.0.0/16 ranges.

The engineer runs a connectivity test from an on-premises server and cannot reach any VM in the VNet. When checking the routes learned by the gateway with the command below, the result is empty:

Get-AzVirtualNetworkGatewayLearnedRoute `
-ResourceGroupName "rg-prod" `
-VirtualNetworkGatewayName "gw-expressroute-prod"

# Output:
# (empty - no routes learned)

The provider confirmed that the physical layer of the circuit is operational and that BGP on their side is configured correctly with ASN 65001.

What is the root cause of the absence of routes learned by the gateway?

A) The prefix 192.168.100.0/30 used as PrimaryPeerAddress overlaps with the VNet address space, causing routing conflict.

B) The Standard SKU of the ExpressRoute gateway does not support route learning via BGP; the HighPerformance SKU is required.

C) The provider is configured with ASN 65001, but this ASN is reserved by Microsoft for internal use in Azure private peering, making the BGP session invalid.

D) The on-premises router is advertising prefixes to Azure, but Azure cannot return the VNet routes because the provider is configuring BGP with the client's ASN and not with the ASN designated for the provider side.


Scenario 2 β€” Action Decision​

The network team has identified that the cause of an ExpressRoute connectivity failure is a VLAN ID conflict: the provider assigned VLAN 300 for the private peering circuit, but the current configuration in Azure registers VLAN 200. The BGP session was never established for this reason.

The environment is production. The ExpressRoute circuit is the only network path between the on-premises datacenter and the VMs in Azure. A two-hour maintenance window has been approved for tonight. There is no backup circuit or site-to-site VPN configured as fallback. The application team confirmed that all dependent services have already been notified and are in maintenance mode.

What is the correct action to take at this moment?

A) Delete the ExpressRoute circuit and recreate a new one with the correct VLAN ID, to ensure that no residual configuration causes future problems.

B) Wait for the approved maintenance window and then update the peering VLAN ID in Azure from 200 to 300, aligning with the value configured by the provider.

C) Immediately update the VLAN ID in Azure without waiting for the maintenance window, since the BGP session is already inactive and there is no additional impact to production.

D) Request the provider to reconfigure their side to VLAN 200, keeping the Azure configuration unchanged, to minimize the number of changes in the environment.


Scenario 3 β€” Root Cause​

An ExpressRoute circuit with Azure private peering has been operational for months. No changes were made in the last 30 days. On Monday morning, the operations team receives connectivity outage alerts from all on-premises servers to VMs in a specific VNet. Other VNets connected to the same circuit continue working normally.

The engineer checks the peering state and finds it as Enabled. The ExpressRoute gateway of the affected VNet shows Succeeded state. The gateway logs show no BGP errors. The identity team reports that they performed a permission cleanup in Microsoft Entra ID the previous Friday, unrelated to networking.

The engineer then checks the connections associated with the affected VNet gateway:

Get-AzVirtualNetworkGatewayConnection `
-ResourceGroupName "rg-vnet-app" `
-Name "conn-expressroute-app"

# ProvisioningState : Succeeded
# ConnectionStatus : Unknown
# EgressBytesTransferred : 0
# IngressBytesTransferred : 0

The impacted VNet has addressing 172.20.0.0/16. The other functional VNets use addresses in the 10.x.x.x space.

What is the most likely root cause for the isolated failure in this VNet?

A) The gateway BGP was automatically restarted by Azure during a platform update, and the VNet 172.20.0.0/16 prefixes have not yet been re-advertised.

B) The connection between the gateway and the ExpressRoute circuit was deleted or became invalid, as the ConnectionStatus Unknown with zero bytes transferred indicates absence of data plane, not just routing failure.

C) The address space 172.20.0.0/16 conflicted with prefixes advertised by the provider, causing Azure to suppress the route for this VNet.

D) The permission cleanup performed in Microsoft Entra ID removed the role assignment necessary for the gateway to maintain the connection active with the circuit.


Scenario 4 β€” Diagnostic Sequence​

An engineer receives a ticket reporting that on-premises servers cannot reach VMs in Azure via ExpressRoute. The circuit was recently configured. The engineer has access to the Azure portal, PowerShell, and the colocation provider team.

The available investigation steps are:

  1. Verify if a connection exists between the Virtual Network Gateway and the ExpressRoute circuit.
  2. Confirm with the provider if the physical status of the circuit (layer 1) is operational.
  3. Execute Get-AzVirtualNetworkGatewayLearnedRoute to verify if the gateway is learning routes via BGP.
  4. Check the Azure private peering state on the circuit and confirm if the BGP session is Enabled.
  5. Validate if on-premises prefixes are being correctly advertised by the customer router to the provider.

What is the correct progressive diagnostic sequence?

A) 2 -> 4 -> 1 -> 3 -> 5

B) 4 -> 2 -> 5 -> 1 -> 3

C) 1 -> 3 -> 2 -> 4 -> 5

D) 2 -> 1 -> 4 -> 3 -> 5


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: C

ASN 65515 is reserved by Microsoft for internal use in Azure private peering. When the provider configures BGP using this ASN on the peer side, Azure silently rejects the session, as it interprets the announcement as a loop or internal conflict. The practical result is that no routes are learned by the gateway, exactly the symptom described.

The clue in the statement is the provider's confirmation that BGP is configured with the client's ASN 65001. The relevant information here is that the provider must use their own ASN (public or private, different from reserved ones), not the client's ASN or an ASN reserved by Microsoft.

Alternative A is a classic distractor: the /30 addresses used in peering (192.168.100.x) are the point-to-point BGP session addresses, completely separate from the VNet space (10.10.0.0/16). There is no overlap. Alternative B is false because the Standard SKU fully supports BGP and route learning. Alternative D describes an asymmetric routing scenario that doesn't apply here, as the problem occurs before BGP session establishment.

The most dangerous distractor is A, as it leads the engineer to investigate peering addressing instead of checking the validity of configured ASNs.


Answer Key β€” Scenario 2​

Answer: B

The cause is identified, the solution is technically simple (correct the VLAN ID in Azure), and the approved maintenance window is already available. Waiting for the window is the correct action because the procedure, even though the BGP session is inactive, still involves a change in production environment with declared dependencies, and the window was formally approved with application team alignment.

Alternative C seems reasonable at first glance because the BGP session is already inactive. However, ignoring the approved maintenance window violates the established change management process, especially in an environment without fallback. Alternative A is technically incorrect because deleting the circuit is a destructive and unnecessary action to correct only the VLAN ID. Alternative D might work in theory, but introduces a second change in the provider environment when the simpler and more controlled correction is on the Azure side.

The real consequence of choosing alternative C would be exposing the organization to unplanned production change risk without formal support activated, even if the technical impact is minimal in this case.


Answer Key β€” Scenario 3​

Answer: B

The central symptom is ConnectionStatus: Unknown combined with zero bytes transferred in both directions. This specific state indicates that the connection's data plane is completely inactive, which is consistent with a deleted, corrupted, or invalid connection. The fact that peering is Enabled and the gateway is in Succeeded state confirms that problems are isolated to the connection, not the circuit or gateway itself.

The irrelevant information in the statement is the permission cleanup in Microsoft Entra ID. Identity permissions do not control the state of network connections in ExpressRoute. This detail was purposely inserted to induce alternative D, which would be the most dangerous distractor: leading the engineer to open a ticket to the identity team while the real problem is in the network connection.

Alternative A is plausible in theory, but would be refuted by checking learned routes and wouldn't explain the ConnectionStatus Unknown. Alternative C is dismissible because a prefix conflict would cause route suppression, not total absence of data plane with this specific status.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is 2 -> 4 -> 1 -> 3 -> 5, which follows the logic of progressive layer-by-layer diagnosis:

OrderStepJustification
1Confirm physical layer with providerWithout physical connectivity, no logical step will help
2Check private peering state and BGPConfirms if BGP session was established
3Check if connection existsValidates the link between circuit and gateway
4Check routes learned by gatewayConfirms if control plane is working end-to-end
5Validate prefixes advertised by on-premises routerInvestigates what is being advertised by the client

Alternative C starts checking the connection before confirming if BGP was even established, skipping layers. Alternative B starts with peering state without first confirming the physical layer, which can generate incorrect diagnosis if the problem is physical. Alternative D reverses the order between checking the connection and checking BGP state, leading to investigating the data plane before confirming the control plane.


Troubleshooting Tree: Configure Azure private peering​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color legend:

ColorNode type
Dark blueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or verification

To use this tree when facing a real problem, start with the root node that describes the observed symptom and answer each diagnostic question based on what you can verify directly in the portal, via PowerShell, or with the provider. Each answer eliminates a branch and brings the diagnosis closer to the real cause. When you reach a red node, the cause is identified; the green node immediately connected to it indicates the corrective action to execute. Never skip a level: each layer of the tree validates a different plane of the solution, from the physical layer to the BGP control plane and data plane.