Troubleshooting Lab: Create and configure a local network gateway
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A network engineer reports that the Site-to-Site VPN connection between Azure and the SΓ£o Paulo branch office was successfully established three weeks ago and was operating normally. After maintenance performed by the telecommunications operator last weekend, the connection now shows Connected status in the Azure portal, but traffic between VMs in the Azure VNet and on-premises servers no longer flows in either direction.
During the investigation, the engineer collects the following information:
Virtual Network Gateway:
SKU: VpnGw1
Gateway type: Vpn
VPN type: Route-based
Status: Running
Connection:
Status: Connected
Shared Key: (configured and unchanged)
IKE Protocol: IKEv2
Local Network Gateway:
IP address: 177.23.45.10
Address space: 10.50.0.0/16
On-premises VPN Device (after operator maintenance):
Current public IP: 177.23.45.88
Managed prefix: 10.50.0.0/16
The engineer also verifies that the on-premises firewall was not changed during maintenance and that the IKE permission rule (UDP 500 and 4500) remains active.
What is the root cause of the problem?
A) The VpnGw1 SKU does not support automatic reconnection after remote peer IP change, requiring an upgrade to VpnGw2.
B) The public IP address registered in the local network gateway does not match the current public IP of the on-premises VPN device after the operator's maintenance.
C) The IKEv2 protocol lost compatibility with the on-premises device after maintenance, and the connection should be reconfigured to IKEv1.
D) The shared key was invalidated by Azure after the inactivity period during the operator's maintenance.
Scenario 2 β Action Decisionβ
The infrastructure team identified that the local network gateway of a production Site-to-Site VPN connection is configured with the prefix 172.16.0.0/12, which represents the entire RFC 1918 private addressing range used by that company. This was done originally to simplify the configuration.
Recently, the company created a new development environment in Azure with VNet 172.20.0.0/16. VMs in this VNet cannot communicate with other resources in the same VNet when the VPN gateway is active in the production VNet.
The cause has been confirmed: the 172.16.0.0/12 prefix configured in the local network gateway encompasses the 172.20.0.0/16 range, causing Azure to route traffic destined for the new VNet through the VPN tunnel instead of keeping it local.
The production VPN connection cannot be interrupted. The team has permission to edit network resources but cannot cause downtime to the application that depends on the VPN.
What is the correct action to take at this moment?
A) Delete the current local network gateway and recreate it with the correct and specific prefixes, since generic prefixes cannot be replaced by specific prefixes in an existing resource.
B) Add peering between the production VNet and the development VNet to work around the routing problem without changing the local network gateway.
C) Edit the address space of the existing local network gateway, replacing 172.16.0.0/12 with the actual specific prefixes of the on-premises network, without recreating the resource or connection.
D) Create a UDR (User Defined Route) in the development VNet subnet pointing 172.20.0.0/16 to the Internet next hop, forcing traffic to ignore the VPN gateway.
Scenario 3 β Root Causeβ
An administrator configures a new Site-to-Site VPN connection with BGP enabled. He creates the local network gateway with the following configurations:
Local Network Gateway:
Name: lng-filial-rj
IP address: 200.100.50.25
Address space: 10.100.0.0/24
BGP settings:
ASN: 65001
BGP peer IP address: 10.100.0.1
The virtual network gateway was created with ASN 65515. The connection is established and the status shows Connected. However, the on-premises network routes do not appear in the effective route table of VMs in the Azure VNet, even after waiting 15 minutes.
The administrator verifies that the on-premises VPN device is correctly advertising the prefixes via BGP and that the IPsec tunnel is active. The shared key is correct on both sides. The virtual network gateway SKU is VpnGw1.
Partial output from on-premises device:
BGP session state: Active (trying to establish)
Peer address configured: 10.100.0.1
Local BGP IP: 10.100.0.1
What is the root cause of the BGP route exchange failure?
A) The VpnGw1 SKU does not support BGP, requiring at least VpnGw2 to enable this functionality.
B) The address space 10.100.0.0/24 configured in the local network gateway conflicts with the BGP peer address and prevents BGP session establishment.
C) The on-premises BGP peer IP address (10.100.0.1) was configured equal to the BGP session source IP on the on-premises device, causing the device to try to open a BGP session with itself.
D) The virtual network gateway and local network gateway are using different ASNs, and BGP requires both sides to use the same ASN to establish the session.
Scenario 4 β Diagnostic Sequenceβ
An engineer receives the ticket: "The Site-to-Site VPN has had Disconnected status since this morning. Yesterday it was working normally."
No changes were recorded on the Azure side. The engineer has access to the Azure portal and the on-premises VPN device.
The available investigation steps are:
[P] Check if the public IP of the on-premises VPN device has changed
[Q] Check the connection status in the Azure portal and collect diagnostic logs
[R] Confirm if the shared key is identical on both sides
[S] Check if the IP registered in the local network gateway matches the device's current IP
[T] Try to recreate the VPN connection in the Azure portal
What is the correct sequence for progressive diagnosis?
A) T, Q, P, S, R
B) Q, P, S, R, T
C) S, P, Q, R, T
D) P, R, S, Q, T
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The determining clue is in the collected data: the current public IP of the on-premises VPN device after the operator's maintenance is 177.23.45.88, but the local network gateway still registers 177.23.45.10. Azure uses the local network gateway IP address to identify the remote peer for IKE negotiation. When the actual device IP doesn't match the registered one, Azure doesn't recognize IKE packets originating from the new address, and the session cannot be properly renegotiated.
The Connected status that remains in the portal is a residual state from the previous tunnel, which hasn't yet expired by timeout. This detail is purposely misleading and represents the most common diagnostic error in this scenario: concluding that the connection is active because the portal didn't immediately display Disconnected.
The information about the on-premises firewall (UDP 500 and 4500 allowed) is irrelevant to this diagnosis: the problem isn't traffic blocking, but peer identity.
Alternative A is incorrect because the gateway SKU has no relation to reconnection after IP change. Alternative C has no technical foundation in the scenario: nothing indicates protocol incompatibility. Alternative D is incorrect because Azure doesn't invalidate shared keys due to inactivity.
Acting based on alternative C would be the most dangerous error: reconfiguring the IKE protocol on an active production connection would introduce unnecessary downtime without solving the real problem.
Answer Key β Scenario 2β
Answer: C
The address space of the local network gateway is an editable field at any time, without needing to recreate the resource or connection. The correct action is to replace the generic prefix 172.16.0.0/12 with the actual specific prefixes of the on-premises network, correcting the improper routing without causing downtime.
The critical constraint of the scenario is that the production VPN connection cannot be interrupted. This eliminates alternative A, which requires deletion and recreation of the local network gateway (an operation that brings down the connection). The premise that generic prefixes cannot be replaced by specific prefixes is technically false.
Alternative B works around the problem but doesn't solve it: peering between VNets doesn't correct the incorrect route learned by the gateway and may introduce unnecessary complexity. Alternative D is dangerous: redirecting 172.20.0.0/16 to Internet as next hop would break connectivity of development VMs with Azure, not just with the VPN.
Answer Key β Scenario 3β
Answer: C
The on-premises device output reveals that the configured BGP peer address is 10.100.0.1 and the local BGP IP is also 10.100.0.1. That is, the device was configured to open a BGP session with itself. The session remains in Active state (trying to establish) because it never finds an external peer to respond.
The root cause is incorrect configuration on the on-premises device: the BGP peer IP address should be the Azure BGP peer IP address (obtained from virtual network gateway settings), not the device's own local address.
The address space 10.100.0.0/24 configured in the local network gateway (alternative B) doesn't interfere with BGP session establishment. When BGP is active, the recommendation is to use the peer IP as /32 in the address space, but this only affects initial static routing, not the BGP session itself.
Alternative A is incorrect: VpnGw1 fully supports BGP. Alternative D reveals a common misconception: BGP is a routing protocol between different autonomous systems (eBGP in this context), so different ASNs are expected and correct.
The information that the IPsec tunnel is active and the shared key is correct is irrelevant to BGP failure: IPsec and BGP are independent layers. The tunnel can be active without the BGP control session being established.
Answer Key β Scenario 4β
Answer: B
The correct sequence is: Q, P, S, R, T.
Progressive diagnostic reasoning always goes from broadest to most specific, avoiding destructive actions before validating the diagnosis.
Q comes first because collecting connection status and diagnostic logs in the portal is the starting point: it defines whether the problem is IPsec tunneling, authentication, or routing, and guides all subsequent steps.
P comes second because on-premises public IP change is the most frequent cause of sudden disconnection without changes on the Azure side. Checking this early eliminates or confirms the main hypothesis.
S comes third as a direct consequence of P: if the IP changed, the local network gateway needs to be updated. This step only makes sense after confirming the current IP (P).
R comes fourth because the shared key rarely changes on its own, and checking it before confirming the IP would shift focus to a less likely cause. But it should be verified before any corrective action.
T comes last because recreating the connection is a corrective action that should only occur after exhausting identifiable causes. Recreating before diagnosing can mask the real cause and introduce unnecessary downtime.
Troubleshooting Tree: Create and configure a local network gatewayβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark blue (almost black) | Initial symptom or entry point |
| Blue | Diagnostic question or decision point |
| Red | Identified cause or confirmed failure state |
| Green | Recommended action or resolution |
| Orange | Intermediate validation or checkpoint |
To use this tree when facing a real problem, start with the root node describing the observed symptom and follow the branches by answering each question based on what you can directly verify in the Azure portal or on-premises device. Each bifurcation eliminates a hypothesis and narrows the diagnosis. Never advance to a corrective action (green node) without having gone through all decision nodes in the corresponding path.