Troubleshooting Lab: Plan and implement network segmentation and address spaces

Diagnostic Scenarios

Scenario 1 — Root Cause

An operations team receives a ticket reporting that VMs in subnet 10.2.4.0/24 of VNet-App cannot communicate with VMs in subnet 10.2.8.0/24 of VNet-Data. Both VNets are in the same region and were configured with peering three weeks ago, working normally until yesterday.

The responsible engineer collects the following information:

The peering between VNet-App and VNet-Data shows Connected status in the Azure portal
No NSG rules have been changed in the last seven days
Last night, the architecture team performed an address space expansion of VNet-Data, adding the 10.3.0.0/16 block to support new environments
The VPN gateway associated with VNet-Data is operational with Connected status
Connectivity tests with Test-NetConnection return timeout on port 3389 between VMs

PS C:\> Test-NetConnection -ComputerName 10.2.8.15 -Port 3389

ComputerName     : 10.2.8.15
RemoteAddress    : 10.2.8.15
RemotePort       : 3389
InterfaceAlias   : Ethernet
SourceAddress    : 10.2.4.10
TcpTestSucceeded : False

The team confirms that the destination subnet's NSG allows RDP from 10.2.4.0/24 and that the destination VM is running.

What is the root cause of connectivity loss?

A) An NSG rule was changed in an unregistered manner and is blocking traffic between subnets
B) Adding a new address space to VNet-Data invalidated the existing peering, which now needs to be removed and recreated
C) VNet-Data's VPN gateway is competing with the peering for traffic routing, causing a loop
D) Subnet 10.2.8.0/24 ran out of available addresses after the expansion, preventing communication

Scenario 2 — Diagnostic Sequence

An engineer receives the following report: VMs in a new subnet 10.0.5.0/24, recently created in an existing VNet, cannot resolve internal DNS names or access the internet. VMs in other subnets of the same VNet work normally. No NSG has been associated with the new subnet yet.

The available investigation steps are:

Check if the new subnet has an associated UDR with default route pointing to an NVA that is not operational
Confirm if the range 10.0.5.0/24 overlaps with any other existing subnet in the VNet
Check if VMs in the new subnet received a valid IP address within the expected range
Test connectivity from a VM in the new subnet using ping 168.63.129.16 to validate reach to Azure control plane
Compare the VNet's DNS settings with the effective settings on VMs in the new subnet

What is the correct diagnostic sequence?

A) 2 → 3 → 5 → 4 → 1
B) 3 → 2 → 4 → 5 → 1
C) 1 → 4 → 3 → 2 → 5
D) 5 → 3 → 1 → 4 → 2

Scenario 3 — Root Cause

A company operates a hub-and-spoke architecture. The VNet-Hub contains an Azure Firewall and is peered with VNet-Spoke-A and VNet-Spoke-B. The three VNets were deployed two months ago and communication between spokes, routed through the firewall in the hub, was working correctly.

Today, after a new team member performed a routine maintenance task on VNet-Spoke-A, VMs in that VNet lost access to the internet and VNet-Spoke-B. Access to VNet-Hub continues working normally from VNet-Spoke-A.

The engineer observes:

The peering between VNet-Spoke-A and VNet-Hub shows Connected status
The "Use remote gateways" configuration is enabled on the peering from VNet-Spoke-A side
The "Allow gateway transit" configuration is enabled on the peering from VNet-Hub side
The UDR associated with the VMs subnet in VNet-Spoke-A contains a 0.0.0.0/0 route with next hop pointing to the Azure Firewall's private IP

After investigation, the engineer discovers that during maintenance a new subnet called snet-management was added to VNet-Spoke-A, without any associated UDR.

Users' complaint is that VMs in the original VMs subnet continue without external access, not the VMs in the new subnet.

What is the root cause of the problem in the original subnet VMs?

A) Adding the new subnet corrupted the effective routing table of the original VMs subnet
B) The peering was automatically restarted by creating the new subnet and lost the gateway transit configurations
C) The UDR associated with the original VMs subnet was disassociated or modified during the new subnet creation
D) The new subnet without UDR is generating a more specific route in the VNet that overwrites the default route of other subnets

Scenario 4 — Action Decision

The cause has already been identified: during an address reorganization process, an administrator removed the peering between VNet-Prod and VNet-Shared to add a new address block to VNet-Shared. The new block was successfully added, but the administrator forgot to recreate the peering. The production environment has been degraded for 40 minutes. Critical applications depend on services hosted in VNet-Shared, including an internal DNS server and a license server.

The administrator now has the following information:

The new block added to VNet-Shared is 172.20.0.0/16
The original address space of VNet-Prod is 10.10.0.0/16
The original address space of VNet-Shared is 10.20.0.0/16
There is no overlap between any of the blocks
The administrator has Network Contributor permissions on both VNets
Documentation of the original peering exists, including the Allow forwarded traffic and Allow gateway transit configurations that were enabled

What is the correct action to take at this moment?

A) Open a ticket for the architecture team to review the new address space before recreating the peering, to avoid future problems
B) Immediately recreate the peering in both directions between VNet-Prod and VNet-Shared, restoring the original documented configurations
C) Recreate the peering only from VNet-Prod to VNet-Shared side, as the reverse direction is optional in urgent cases
D) Remove the newly added 172.20.0.0/16 block, recreate the peering with the original space and then plan the expansion correctly

Answer Key and Explanations

Answer Key — Scenario 1

Answer: B

The definitive clue in the scenario is the execution of VNet-Data address space expansion the night before, exactly when connectivity was lost. Azure requires that active peerings be removed before modifying the address space of a peered VNet. When this change is made without removing the peering, the peering state can become internally inconsistent, even though the portal displays Connected. The Connected status visible in the portal reflects the control plane, not necessarily the validity of propagated routes. The solution is to remove the peering and recreate it so Azure synchronizes the new address space in both VNets' routing tables.

The information about the operational VPN gateway is irrelevant to this diagnosis and was purposefully included to induce alternative C. The gateway does not interfere with peering routing when gateway transit configurations are not in the described conflict. Alternative A is plausible, but the scenario explicitly states that no NSG rules were changed. Acting based on alternative C, investigating the gateway, would delay resolution of an active incident without benefit.

Answer Key — Scenario 2

Answer: A

The correct sequence is 2 → 3 → 5 → 4 → 1, as it follows diagnostic logic from most basic and structural to most specific and operational.

The first step (2) validates if the subnet range is structurally valid within the VNet, as overlap prevents communication from creation. Next (3), confirming VMs received valid IP address eliminates provisioning failures before any functional test. The following step (5) compares DNS settings, identifying if the name resolution problem originates from VNet configuration or VMs. Step (4) tests reach to Azure control plane, which depends on correct routing. Only after confirming basic routing works (or doesn't work) does it make sense to investigate (1) if there's a misconfigured UDR directing traffic to an invalid destination.

Starting with step 1 (alternative C) would be a classic mistake of going directly to the most complex cause without validating foundations. Starting with step 5 (alternative D) ignores that addressing problems or overlap would invalidate any DNS test results.

Answer Key — Scenario 3

Answer: C

The root cause is that the UDR was disassociated or modified from the original VMs subnet during the maintenance operation. The observed behavior is coherent: access to VNet-Hub works (direct peering routing, no UDR dependency for the hub), but access to internet and VNet-Spoke-B fails (both depend on the 0.0.0.0/0 route pointing to Azure Firewall, configured via UDR). Without this route, traffic to external destinations follows the system default route, which may not have valid exit to internet or to the remote spoke.

The information about "Use remote gateways" and "Allow gateway transit" is irrelevant to this diagnosis and serves as a distractor for alternative B. Alternative A describes behavior that doesn't exist in Azure: creating a subnet doesn't alter routing tables of other subnets. Alternative D confuses the concept of more specific route with the behavior of subnets without UDR, which simply inherit system routes without influencing other subnets.

The most dangerous distractor is A, as it would lead the engineer to investigate a supposed system routing table corruption, consuming time on a path without real solution.

Answer Key — Scenario 4

Answer: B

The cause is identified, permissions are available, there's no address overlap, and documentation of original configurations exists. All preconditions to recreate the peering immediately are satisfied. The production environment has been degraded for 40 minutes with impact on critical services. The correct action is to recreate the peering in both directions with the original documented configurations, restoring service as quickly as possible.

Alternative A ignores that diagnosis is complete and the solution is known. Escalating to architecture without technical need prolongs the incident without justification. Alternative C is wrong because Azure peering requires configuration on both sides to be established; creating only one side leaves the peering in Initiated state, with no traffic flowing. Alternative D would be a setback that would discard a planned and valid modification, besides not resolving the incident faster, as it would require a new future expansion window.

The most dangerous distractor is C, as the administrator could interpret the Initiated status as partially functional and waste time trying to diagnose why traffic still doesn't flow.

Troubleshooting Tree: Plan and implement network segmentation and address spaces

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Color Legend:

Color	Node Type
Dark blue	Initial symptom (entry point)
Blue	Diagnostic question (binary decision or observable)
Red	Identified cause
Green	Recommended action or resolution
Orange	Intermediate validation or verification

To use this tree when facing a real problem, start with the root node describing the connectivity symptom and answer each question based on what is directly observable in the Azure portal or via tools like Get-AzEffectiveRouteTable and Test-NetConnection. Follow the path until reaching a red identified cause node and then execute the corresponding green action. If the first path taken doesn't resolve the problem, return to the last orange validation node and reassess the answer given at that point.

Diagnostic Scenarios​

Scenario 1 — Root Cause​

Scenario 2 — Diagnostic Sequence​

Scenario 3 — Root Cause​

Scenario 4 — Action Decision​

Answer Key and Explanations​

Answer Key — Scenario 1​

Answer Key — Scenario 2​

Answer Key — Scenario 3​

Answer Key — Scenario 4​

Troubleshooting Tree: Plan and implement network segmentation and address spaces​