Skip to main content

Troubleshooting Lab: Create and configure virtual networks and subnets

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An administrator is configuring the network infrastructure for a new production environment in the East US region. The virtual network vnet-prod was created three days ago with the address space 172.16.0.0/16. Two subnets were added later:

snet-web:  172.16.10.0/24
snet-app: 172.16.20.0/24

One VM was provisioned in snet-web and another in snet-app. The administrator reports that the VM in snet-web can communicate normally with external resources, but cannot reach the VM in snet-app by private IP, even though both VMs are in the same resource group and region.

Information collected during investigation:

# Connectivity test executed from the VM in snet-web
ping 172.16.20.4
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1

# NSG associated with snet-web: no custom outbound rules
# NSG associated with snet-app: no custom inbound rules
# Effective routes on the NIC of the VM in snet-web: no entries for 172.16.20.0/24

The administrator also mentions that the resource group was created with a mandatory tagging policy that has not yet been applied to all resources.

What is the root cause of the communication failure between the two VMs?

A) The NSG of snet-app is blocking inbound traffic from other subnets by default. B) The address space 172.16.0.0/16 does not cover the intervals of the two subnets, preventing internal routing. C) The subnet snet-app was created after the VNet provisioning and was not correctly associated with the address space, resulting in missing system route. D) The subnets snet-web and snet-app were not added to the VNet's address space, and therefore no system route exists between them.


Scenario 2 β€” Action Decision​

The cause of the problem has been identified: the address space of VNet vnet-hub is configured as 10.0.0.0/24, but the team needs to add a new subnet snet-analytics with the range 10.0.1.0/24 to host a data processing cluster.

The environment has the following known constraints:

  • VNet vnet-hub is in active peering with vnet-spoke-01 and vnet-spoke-02
  • There are production VMs running on existing subnets of vnet-hub
  • The peering with vnet-spoke-01 uses the space 10.1.0.0/16 and with vnet-spoke-02 uses 10.2.0.0/16
  • The security team requires approval for any change that affects production connectivity
  • The scheduled maintenance window is in 48 hours

What is the correct action to take at this time?

A) Immediately add the space 10.0.0.0/16 to VNet vnet-hub, expanding the current range to accommodate the new subnet. B) Wait for the maintenance window, expand the VNet's address space to 10.0.0.0/16 with security team approval and then create the subnet snet-analytics. C) Recreate VNet vnet-hub with the correct space, migrating existing resources before the maintenance window. D) Create a separate VNet with the space 10.0.1.0/24 and connect it via peering immediately, without impact on production VMs.


Scenario 3 β€” Root Cause​

A development team reports that a newly deployed application cannot resolve internal DNS names of other resources in the same VNet. VNet vnet-dev was configured one week ago. The VMs can access the internet normally and ping private IP addresses of other VMs without problems.

Output collected from one of the affected VMs:

# DNS resolution fails for internal hostname
nslookup vm-backend.internal.contoso.com
Server: 168.63.129.16
Address: 168.63.129.16

** server can't find vm-backend.internal.contoso.com: NXDOMAIN

# Ping by IP works normally
ping 10.5.2.10
Reply from 10.5.2.10: bytes=32 time=1ms TTL=128

# VNet DNS configuration (via portal)
DNS servers: 10.5.0.5, 10.5.0.6

The administrator informs that servers 10.5.0.5 and 10.5.0.6 are custom Windows Server DNS instances provisioned in subnet snet-infra. The VNet was created from a reused ARM template from another project. The resource group has 12 resources in total.

What is the root cause of the DNS resolution failure?

A) The default Azure DNS (168.63.129.16) is being queried even with custom servers configured, indicating that VMs have not received updated DNS configurations via DHCP. B) The custom DNS servers at 10.5.0.5 and 10.5.0.6 were not configured to forward unresolved queries to Azure DNS, and therefore internal names from the private zone are not resolved. C) The Azure private DNS zone was not linked to VNet vnet-dev, preventing resolution of .internal.contoso.com names. D) The reused ARM template created the VNet with incorrect DNS configuration and the change made in the portal has not yet propagated to the VMs because they were not restarted.


Scenario 4 β€” Diagnostic Sequence​

An administrator receives the following alert from a monitoring system:

"Connectivity failure detected between vnet-onprem-linked and vnet-core. Traffic from 192.168.10.0/24 cannot reach 10.100.5.0/24."

Both environments were communicating normally until yesterday. No changes were registered via Change Management in the last 24 hours by application teams. VNet vnet-core is in peering with 4 other VNets in addition to vnet-onprem-linked.

The following investigation steps are available:

  • Step P: Verify the effective routes on the NIC of a VM in vnet-core to confirm if the route to 192.168.10.0/24 is present.
  • Step Q: Verify the peering status between vnet-onprem-linked and vnet-core in the Azure portal.
  • Step R: Confirm if the address spaces of the VNets are still correct and have not been changed.
  • Step S: Inspect the NSGs applied to relevant subnets for rules that block traffic between the two ranges.
  • Step T: Attempt a test ping from a VM in 192.168.10.0/24 to an IP in 10.100.5.0/24 to confirm and reproduce the failure.

Which diagnostic sequence follows the correct logic of progressive investigation?

A) T -> Q -> R -> P -> S B) Q -> T -> P -> R -> S C) P -> Q -> T -> S -> R D) T -> S -> Q -> P -> R


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: D

The definitive clue is in the effective routes output: no entries for 172.16.20.0/24. In a correctly configured VNet, Azure automatically creates system routes for all subnet prefixes that are part of the VNet's address space. If the route doesn't exist in the effective table, the subnets are not being recognized as belonging to the same routing space.

Distractor A is the most common diagnostic error: blaming the NSG before checking the routing plane. An NSG blocking traffic would appear as dropped packets, not as absence of route. Distractor B is factually incorrect, as 172.16.0.0/16 covers both 172.16.10.0/24 and 172.16.20.0/24. Distractor C confuses the symptom with a technical impossibility: subnets created after the VNet work normally as long as the address space accommodates them.

The information about the tagging policy is deliberately irrelevant and has no relation to routing between subnets. Acting based on distractor A would lead the administrator to create unnecessary permissive NSG rules, introducing security risk without solving the real problem.


Answer Key β€” Scenario 2​

Answer: B

The critical constraint that eliminates other alternatives is the combination of two factors: there are active production VMs and there is a mandatory security approval process, with a maintenance window available in 48 hours. Expanding a VNet's address space in Azure can be performed without downtime for existing resources, making it a safe operation when properly planned.

Distractor A represents the technically correct action, but applied without respecting the governance process. Security team approval exists precisely for changes that affect production connectivity, and ignoring it would be a process violation even if the technical action were valid. Distractor C is unnecessarily destructive: recreating the VNet would imply production interruption and loss of configured peerings, when expanding the address space solves the problem without any of these impacts. Distractor D creates unnecessary complexity and potential address overlap if not carefully planned.


Answer Key β€” Scenario 3​

Answer: D

The central clue is in the nslookup output: the responding server is 168.63.129.16, which is the address of Azure recursive DNS. If the custom servers 10.5.0.5 and 10.5.0.6 were being effectively queried, they would appear as the server in the response. The fact that default Azure DNS is responding indicates that VMs still have the old DNS configuration cached via DHCP, and a restart or forced DHCP lease renewal would force application of the new configurations.

Distractor B describes a real and common problem in environments with custom DNS, but is incompatible with the observed evidence: if custom servers were queried and didn't know how to respond, the server shown in nslookup would be one of them, not 168.63.129.16. Distractor C confuses Azure private DNS with custom DNS via Windows server. Distractor A describes exactly the symptom, not the cause, a classic diagnostic error of confusing what is observed with why it occurs.

The information about the number of resources in the resource group is deliberately irrelevant and should not influence diagnosis.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is T -> Q -> R -> P -> S, which follows the principle of progressive diagnosis: confirm the problem before investigating causes, check the most likely failure component (the peering), validate underlying network configurations, verify the resulting routing plane, and lastly, inspect filtering rules.

The first step should always be reproduce and confirm the failure (T), avoiding investigating a problem that may have resolved itself or be intermittent. Next, since traffic between VNets stopped working without recorded changes, the peering status (Q) is the most likely candidate, as peerings can fail due to changes in address spaces. Verifying address spaces (R) next confirms if a silent change caused conflict. Effective routes (P) validate if the data plane reflects the expected state. NSGs (S) come last because, if peering and routes are correct and the problem persists, filtering is the remaining hypothesis.

Distractor D is the most dangerous error: placing NSG inspection as the second step wastes time investigating filtering when the routing plane may be completely broken.


Troubleshooting Tree: Create and configure virtual networks and subnets​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorMeaning
Dark blueInitial symptom (entry point)
BlueDiagnostic question (binary or verifiable decision)
OrangeIntermediate verification or validation
RedIdentified cause
GreenRecommended action or resolution

To use this tree when facing a real problem, start with the root node describing the observed symptom and answer each question based on what you can verify directly in the portal, via CLI, or within the VM. Follow the path corresponding to your answer until you reach a red cause identification node, then execute the associated green action. If the action doesn't resolve the problem, return to the previous branching point and evaluate if any previous answer may have been incorrect or incomplete.