Troubleshooting Lab: Troubleshoot Network Connectivity

Diagnostic Scenarios

Scenario 1 — Root Cause

A Windows VM hosted in the prod-app subnet of a VNet in the East US region is inaccessible via RDP (port 3389) from the corporate network. The team reports that the VM was working normally until yesterday afternoon, when a colleague performed a security update on the subnet NSG.

The administrator checks the VM status in the portal and confirms it is Running. Boot diagnostics show no errors. The public IP associated with the VM NIC is active and responds to ping.

The IP Flow Verify output returns the following:

Access : Deny
RuleName : DenyAllInbound
Direction : Inbound
Protocol : TCP
SourcePort : *
DestinationPort : 3389

The administrator also notes that the NSG associated with the VM NIC has the following rule:

Priority : 100
Name     : Allow-RDP-NIC
Access   : Allow
Protocol : TCP
Direction: Inbound
Source   : 203.0.113.0/24 (corporate network)
DestPort : 3389

What is the root cause of RDP inaccessibility?

A) The VM's public IP is being routed through an invalid route, preventing packets from reaching the NIC before NSG evaluation.

B) The subnet NSG does not have a permission rule for port 3389, and since IP Flow Verify evaluates the subnet NSG before the NIC NSG, traffic is denied before reaching the permissive rule on the NIC.

C) The Allow-RDP-NIC rule in the NIC NSG is configured with an incorrect source IP range that does not cover the corporate network.

D) The VM's Windows Firewall is blocking RDP traffic after the security update applied yesterday.

Scenario 2 — Action Decision

The infrastructure team has identified that a production VM, responsible for processing real-time orders, lost connectivity to the database hosted in another VNet. The cause has been confirmed: the VNet Peering between the two VNets was accidentally removed by an automation script executed 20 minutes ago. The database is healthy and accessible from other VMs.

The order system is degraded but still operational via local cache. The SLA provides for maintenance windows only on Sundays. The responsible engineer has the Network Contributor role in the subscription.

What is the correct action to take at this moment?

A) Open an emergency ticket for the security team to review the automation script before recreating the peering, as the removal may have been intentional.

B) Immediately recreate the VNet Peering between the two VNets, configuring parameters according to the documented previous state, without waiting for the maintenance window.

C) Wait for Sunday's maintenance window to recreate the peering, as network connectivity changes in production outside the window may cause additional impact.

D) Add a temporary static route in the Route Tables of both VNets pointing to each other's address prefixes, bypassing the need for peering.

Scenario 3 — Root Cause

A Linux VM in the data-subnet started receiving errors when trying to access the Azure Storage Account via public endpoint. The administrator confirms that the Storage Account is accessible from their local machine and that other VMs in different subnets of the same VNet can access it normally.

The VM in question was recently migrated from the old-subnet to the data-subnet without changes to the VM's network configurations. The administrator verifies and confirms that the data-subnet has a Service Endpoint enabled for Microsoft.Storage.

The error returned on the VM is:

curl https://mystorageaccount.blob.core.windows.net/container/file.txt
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to
mystorageaccount.blob.core.windows.net:443

The administrator also verifies that the Route Table associated with the data-subnet contains the following entry automatically created by the Service Endpoint:

Prefix  : Storage.EastUS
NextHop : VirtualNetworkServiceEndpoint

The Storage Account has the following firewall configuration:

Public network access : Enabled from selected virtual networks and IP addresses
Virtual networks      : old-subnet (VNet: prod-vnet)
IP rules              : (none)

What is the root cause of the problem?

A) The Service Endpoint enabled on the data-subnet is causing a routing conflict with the default internet egress route, preventing TLS traffic from being established correctly.

B) The Storage Account firewall configuration still authorizes only the old-subnet, and the VM now originates traffic through the data-subnet, which is not in the allowed networks list.

C) The VM needs to be restarted after subnet migration so that the new subnet's Service Endpoint is recognized by the operating system network stack.

D) The SSL error indicates that the Storage Account certificate has expired, which is independent of network and subnet configurations.

Scenario 4 — Diagnostic Sequence

A VM in the frontend-subnet cannot reach an external API on the internet via HTTPS (port 443). The VM has a public IP associated with its NIC. The environment uses a custom Route Table associated with the frontend-subnet. No recent changes have been reported.

The following investigation steps are available, out of order:

Check if a 0.0.0.0/0 route exists in the Route Table with a valid next hop for egress traffic
Use Network Watcher's Connection Troubleshoot to test connectivity from the VM to the external endpoint on port 443
Check if the subnet and NIC NSGs have outbound rules that allow traffic to port 443
Confirm that the VM is in Running state and that the Network Watcher agent is installed and active
Analyze the Connection Troubleshoot result and identify if the problem is in NSG, routing, or destination

What is the correct investigation sequence?

A) 1 → 3 → 2 → 4 → 5

B) 4 → 2 → 5 → 3 → 1

C) 2 → 4 → 1 → 3 → 5

D) 4 → 3 → 1 → 2 → 5

Answer Key and Explanations

Answer Key — Scenario 1

Answer: B

IP Flow Verify evaluates NSGs in the following order for inbound traffic: first the subnet NSG, then the NIC NSG. If the subnet NSG does not have any explicit allow rule for port 3389, the default DenyAllInbound rule (priority 65500) is applied, and traffic is dropped before being evaluated by the NIC NSG. The IP Flow Verify output confirms this: the rule name responsible for blocking is DenyAllInbound, which belongs to the subnet NSG.

The decisive clue in the scenario is that the colleague updated the subnet NSG, not the NIC, and that IP Flow Verify cites the DenyAllInbound rule without indicating a custom rule name.

Alternative D represents the most dangerous reasoning error: the administrator could connect via console or Bastion to inspect Windows Firewall, wasting time on the wrong layer. Windows Firewall would only be relevant if the NSG was allowing traffic and the connection still failed. The information about ping responding to the public IP is irrelevant: ping uses ICMP, not TCP 3389, and doesn't validate anything about RDP policy.

Answer Key — Scenario 2

Answer: B

The cause has already been identified and confirmed: the peering was accidentally removed. The system is degraded but not completely stopped. The Network Contributor role is sufficient to recreate peerings. None of the scenario constraints prevent immediate action: the SLA mentions maintenance windows for planned changes, not for restoring connectivity lost due to accidents.

Waiting for Sunday's window (alternative C) would be the most severe consequence, as the system would operate in degraded mode for days unnecessarily. Alternative A ignores that the cause has already been confirmed as accidental (the automation script executed inadvertently) and introduces an unjustified delay.

Alternative D represents a technically invalid solution to replace peering: static routes between VNets without peering do not establish connectivity, as Azure's data plane requires peering to exist for routing between VNets to work. This alternative is the most dangerous distractor as it appears to be a legitimate workaround.

Answer Key — Scenario 3

Answer: B

The Service Endpoint redirects subnet traffic to the service via Azure backbone, but the Storage Account firewall controls which subnets have authorized access. The VM was migrated to the data-subnet, but the Storage Account firewall still lists only the old-subnet as an allowed network. Traffic reaches the Storage Account coming from the data-subnet, which is not authorized, resulting in rejection.

The SSL error presented on the VM is a symptom of connection refused or reset by the server side, not a certificate error. This detail is purposely misleading to induce the reader toward alternative D. The VirtualNetworkServiceEndpoint route in the Route Table confirms that the Service Endpoint is correctly configured on the subnet, eliminating alternative A.

The information that other VMs in different subnets can access the Storage is irrelevant to the diagnosis: those subnets may or may not be in the permissions list, and the scenario doesn't specify. The correct focus is the comparison between the subnet from which the VM now originates traffic and the list of authorized subnets in the Storage Account firewall.

Answer Key — Scenario 4

Answer: B

The correct sequence is: 4 → 2 → 5 → 3 → 1.

The progressive diagnostic reasoning follows this logic:

Step 4: Before using any Network Watcher diagnostic tool, it's necessary to confirm that the VM is Running and that the agent is active. Without the agent, Connection Troubleshoot will fail with an infrastructure error, not a network error.
Step 2: With the agent confirmed, execute Connection Troubleshoot directly to the external endpoint on port 443. This tool aggregates multiple checks in a single execution.
Step 5: Analyze the result. If the problem is identified as NSG or routing, investigation deepens into that specific layer.
Step 3: With the result pointing to NSG, check outbound rules for the subnet and NIC NSG.
Step 1: If NSG is correct, check the Route Table to identify if the 0.0.0.0/0 route next hop is valid.

Alternative D seems reasonable but inverts the logic by manually checking NSG and routing before using the diagnostic tool that already performs this analysis in an integrated way, duplicating effort and increasing the risk of human error in reading rules.

Troubleshooting Tree: Troubleshoot Network Connectivity

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Color Legend:

Dark blue: initial symptom, investigation entry point
Blue: diagnostic question, observation-based decision
Red: identified cause
Green: corrective action or resolution state
Orange: intermediate validation or diagnostic tool

To use this tree when facing a real problem, start at the root node and answer each question based on what is observable in the environment: VM state in the portal, IP Flow Verify result, Route Tables configuration, subnet list in service firewall, and VNet Peering status. Each answer eliminates a set of hypotheses and directs to the cause with the fewest possible steps. Never skip VM state verification before triggering network diagnostic tools.

Diagnostic Scenarios​

Scenario 1 — Root Cause​

Scenario 2 — Action Decision​

Scenario 3 — Root Cause​

Scenario 4 — Diagnostic Sequence​

Answer Key and Explanations​

Answer Key — Scenario 1​

Answer Key — Scenario 2​

Answer Key — Scenario 3​

Answer Key — Scenario 4​

Troubleshooting Tree: Troubleshoot Network Connectivity​

Diagnostic Scenarios

Scenario 1 — Root Cause

Scenario 2 — Action Decision

Scenario 3 — Root Cause

Scenario 4 — Diagnostic Sequence

Answer Key and Explanations

Answer Key — Scenario 1

Answer Key — Scenario 2

Answer Key — Scenario 3

Answer Key — Scenario 4

Troubleshooting Tree: Troubleshoot Network Connectivity