Skip to main content

Theoretical Foundation: Troubleshoot network connectivity


1. Initial Intuition​

Imagine you called someone and the call didn't connect. To discover the problem, you follow a progressive logic: does your phone have signal? Is the number correct? Is the other person's line active? Is call blocking configured? Each check eliminates a hypothesis and points to the next one.

Diagnosing network connectivity in Azure follows exactly this structured reasoning. When a VM cannot communicate with another, when a user cannot access a service, or when an application is timing out, the cause can be at any point in the path: NSG blocking, incorrect route, disconnected peering, firewall dropping packets, DNS resolving to wrong address, or simply the destination service is not listening on the expected port.

Network connectivity troubleshooting in Azure is the ability to use a set of tools and a systematic methodology to locate exactly where in the path communication is being interrupted.


2. Context​

All concepts studied in previous modules converge here: VNets, subnets, peerings, NSGs, UDRs, public IPs. A connectivity problem can originate from any of these components.

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Azure Network Watcher is the central network diagnostic service. It contains a set of specific tools for each type of problem. Understanding what each tool does and when to use it is the core of this module.


3. Building the Concepts​

3.1 Azure Network Watcher​

Network Watcher is a regional service that needs to be enabled in each region where you want to use its diagnostic tools. It is automatically enabled when you create a VNet in a region, but in some situations it might be disabled.

Path: Monitor > Network Watcher or search for "Network Watcher" in the portal.

Network Watcher organizes its tools in categories:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

3.2 Main Tools and What Each One Answers​

ToolQuestion it answersLayer
IP Flow Verify"Would this packet be allowed or blocked by NSG?"L4: NSG
Next Hop"Where does this traffic go? Which route is being used?"L3: Routing
Effective Security Rules"Which NSG rules are currently active on this NIC?"L4: NSG
Connection Troubleshoot"Does this TCP/ICMP connection work between these two points?"L3-L7: End-to-end
VPN Troubleshoot"Why is the VPN Gateway or VPN connection having problems?"VPN
Packet Capture"What exactly is passing through this NIC?"L2-L7: Raw capture
NSG Flow Logs"What traffic passed (or was blocked) in recent days?"L4: History
Topology"What does the network topology of this region look like visually?"Overview
Connection Monitor"Is this connection working continuously?"Continuous monitoring

3.3 Diagnostic Methodology: From Simple to Specific​

Before opening tools, it's important to have a methodology. Random investigation wastes time. The most effective model is to divide the problem into layers:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

4. Structural View​

The Path of a Packet and Where It Can Be Blocked​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Each arrow is a potential blocking point. The right tool diagnoses each specific point.


5. Practical Operation​

Tool 1: IP Flow Verify​

What it does: verifies if a packet with certain parameters (protocol, port, direction, source and destination IP) would be allowed or denied by NSG rules applied to a specific NIC.

Path: Network Watcher > IP Flow Verify

Required parameters:

  • VM (and its NIC)
  • Direction: Inbound or Outbound
  • Protocol: TCP or UDP
  • Local IP (from the VM)
  • Local port
  • Remote IP (source or destination)
  • Remote port

Result: indicates "Access Allowed" or "Access Denied", and which specific NSG rule made the decision.

# Via CLI
az network watcher test-ip-flow \
--direction Inbound \
--protocol TCP \
--local 10.0.1.4:80 \
--remote 40.68.100.50:12345 \
--vm /subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-web \
--nic /subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Network/networkInterfaces/nic-vm-web \
--resource-group NetworkWatcherRG \
--watcher-resource-group NetworkWatcherRG

Non-obvious behavior: IP Flow Verify only checks NSGs. It doesn't consider Azure Firewall, UDRs, or VM operating system firewalls. If the answer is "Access Allowed" but the connection still fails, the problem is at another layer.

Tool 2: Next Hop​

What it does: for a source IP (VM) and a destination IP, it informs what the next hop is and which route is being used.

az network watcher show-next-hop \
--resource-group NetworkWatcherRG \
--watcher-resource-group NetworkWatcherRG \
--vm vm-app \
--source-ip 10.0.1.4 \
--dest-ip 8.8.8.8

Typical result:

{
"nextHopIpAddress": "",
"nextHopType": "Internet",
"routeTableId": "System Route"
}

If Next Hop returns None, traffic is being dropped by routing (blackhole). If it returns an unexpected IP, a UDR is redirecting traffic to the wrong place.

Tool 3: Effective Security Rules​

What it does: displays all NSG rules currently effective on a NIC, including rules from the NIC's NSG and the subnet's NSG, already combined and ordered by priority. It clearly shows Azure's default rules and user-created rules.

az network watcher show-security-group-view \
--resource-group NetworkWatcherRG \
--watcher-resource-group NetworkWatcherRG \
--vm vm-web \
--network-interfaces /subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Network/networkInterfaces/nic-vm-web

Difference from IP Flow Verify: Effective Security Rules shows all rules (complete view); IP Flow Verify simulates a specific packet and tells which rule applies. Use Effective Security Rules for auditing; use IP Flow Verify for diagnosing a specific flow.

Tool 4: Connection Troubleshoot​

What it does: tests end-to-end connectivity between a source (Azure VM) and a destination (VM, URL, or IP). Verifies if the TCP or ICMP connection works, and in case of failure, indicates the problem point.

az network watcher test-connectivity \
--resource-group NetworkWatcherRG \
--source-resource /subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-app \
--dest-address 10.0.2.4 \
--dest-port 1433 \
--protocol Tcp

Result includes:

  • connectionStatus: Reachable or Unreachable
  • avgLatencyInMs: average latency
  • hops: list of hops with status at each one, showing where the block occurs
# Test connectivity to external URL
az network watcher test-connectivity \
--resource-group NetworkWatcherRG \
--source-resource /subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-app \
--dest-address "https://www.microsoft.com" \
--dest-port 443 \
--protocol Https

Limitation: Connection Troubleshoot requires the Network Watcher Agent to be installed on the source VM. On Windows VMs, it's the AzureNetworkWatcherExtension extension. On Linux VMs, it's NetworkWatcherAgentLinux. The portal automatically requests installation if needed.

Tool 5: VPN Troubleshoot​

What it does: diagnoses problems in VPN Gateways and site-to-site VPN connections. Analyzes detailed gateway logs and returns a report with the identified problem.

az network watcher troubleshooting start \
--resource-id /subscriptions/<sub-id>/resourceGroups/rg-networking/providers/Microsoft.Network/virtualNetworkGateways/vpn-gw-prod \
--resource-type vnetGateway \
--storage-account <storage-account-id> \
--storage-path "https://minhaconta.blob.core.windows.net/diagnostics" \
--resource-group NetworkWatcherRG

VPN diagnosis saves results to a Storage Account (mandatory) and can take several minutes. The result includes a status (Healthy, NotConnected, Unknown) and problem details.

Tool 6: Packet Capture​

What it does: captures network traffic on a VM NIC and saves it to a .pcap file (Wireshark compatible). It's the last resort tool when others haven't identified the problem.

az network watcher packet-capture create \
--resource-group NetworkWatcherRG \
--vm vm-web \
--name capture-vm-web-01 \
--storage-account /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Storage/storageAccounts/diagnosticstore \
--file-path /var/captures/capture-vm-web-01.cap \
--time-limit 120 \
--filters '[{"protocol": "TCP", "remotePort": "80"}]'

Key points:

  • Requires Network Watcher Agent on the VM
  • Captures at the Azure virtual NIC level, not inside the VM OS
  • Can be filtered by protocol, port, IP to reduce volume
  • Time or file size limit automatically stops capture

Tool 7: NSG Flow Logs​

What it does: logs all traffic (allowed and denied) passing through an NSG, with timestamps, IPs, ports, protocol and decision. It's a historical log, not a real-time diagnostic tool.

Versions:

  • Version 1: logs basic flow (allowed/denied, IPs, ports)
  • Version 2: adds bytes and packets information per flow

Logs are saved to a Storage Account in JSON format and can be integrated with Traffic Analytics (Log Analytics) for visualization and KQL queries.

# Enable NSG Flow Logs version 2
az network watcher flow-log create \
--resource-group NetworkWatcherRG \
--name flow-log-nsg-backend \
--nsg /subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Network/networkSecurityGroups/nsg-subnet-backend \
--storage-account /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Storage/storageAccounts/flowlogstore \
--enabled true \
--format JSON \
--log-version 2 \
--retention 30 \
--traffic-analytics true \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/law-monitoring \
--interval 10

Tool 8: Connection Monitor​

What it does: monitors connections continuously and periodically, with alerts when connectivity changes. Unlike Connection Troubleshoot (point-in-time), Connection Monitor runs tests at regular intervals.

Use cases:

  • Continuously monitor connectivity between application layers
  • Detect intermittent problems that don't appear in point-in-time tests
  • Measure latency over time for SLAs

6. Implementation Methods​

6.1 Azure Portal​

When to use: interactive diagnosis, exploratory investigation, cases where CLI is less convenient.

The portal offers all Network Watcher tools with visual interface and immediate feedback. The Topology tool generates an interactive network diagram that is exclusive to the portal (no functional equivalent in CLI).

Portal advantage for Topology: visually shows how VNets, subnets, VMs, NSGs and peerings are connected in the region, facilitating topology understanding before diagnosing.

6.2 Azure CLI​

When to use: automated diagnostic scripts, environments without portal access, pipeline integration.

CLI covers all Network Watcher tools with JSON output that can be processed by scripts. It's the most efficient approach for batch diagnosis or automation.

Check if Network Watcher is enabled in a region:

az network watcher list --output table

Enable Network Watcher in a region:

az network watcher configure \
--resource-group NetworkWatcherRG \
--locations brazilsouth \
--enabled true

6.3 PowerShell​

# IP Flow Verify
Test-AzNetworkWatcherIPFlow `
-NetworkWatcher (Get-AzNetworkWatcher -Location "brazilsouth") `
-TargetVirtualMachineId (Get-AzVM -Name "vm-web" -ResourceGroupName "rg-producao").Id `
-Direction "Inbound" `
-Protocol "TCP" `
-RemoteIPAddress "40.68.100.50" `
-LocalIPAddress "10.0.1.4" `
-LocalPort "80" `
-RemotePort "54321"

# Next Hop
Get-AzNetworkWatcherNextHop `
-NetworkWatcher (Get-AzNetworkWatcher -Location "brazilsouth") `
-TargetVirtualMachineId (Get-AzVM -Name "vm-app" -ResourceGroupName "rg-producao").Id `
-SourceIPAddress "10.0.1.4" `
-DestinationIPAddress "8.8.8.8"

7. Control and Security​

Required Permissions for Network Watcher​

OperationMinimum permission
Use IP Flow Verify, Next Hop, Effective Security RulesNetwork Contributor on VM and network resources
Packet CaptureNetwork Contributor + access to Storage Account
NSG Flow LogsNetwork Contributor + Storage Blob Contributor on Storage
Connection TroubleshootNetwork Contributor + access to source VM
VPN TroubleshootNetwork Contributor on Gateway + Storage

NSG Flow Logs and Sensitive Data​

NSG Flow Logs record source and destination IP addresses of all traffic. In environments processing personal data (LGPD/GDPR), these logs may contain sensitive information. Consider:

  • Minimum necessary retention (configurable, recommended 90 days for analysis)
  • Storage Account encryption
  • Restricted access via RBAC to Storage containing logs

8. Decision Making​

Which tool to use for each symptom?​

SymptomInitial toolNext step if not resolved
VM not responding on port XIP Flow Verify (check NSG)Connection Troubleshoot (check complete path)
Traffic going to wrong placeNext Hop (check route)Effective Security Rules (check if NSG interferes)
"Works sometimes, not always"Connection Monitor (continuous monitoring)NSG Flow Logs (flow history)
VPN doesn't connectVPN TroubleshootCheck IKE/IPSec configuration
Want to see what's going throughPacket CaptureAnalyze .pcap in Wireshark
Traffic security auditNSG Flow Logs + Traffic AnalyticsKQL queries in Log Analytics
Don't know where to startTopology (overview)IP Flow Verify + Next Hop in sequence

When the problem is in the OS vs. Azure network?​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

9. Best Practices​

Enable NSG Flow Logs on all production NSGs from the start: historical logs are invaluable for retroactive diagnosis. When a problem is reported "it happened yesterday at 2 PM," without Flow Logs it's impossible to see what occurred. The storage cost of logs is low compared to the diagnostic value.

Use Traffic Analytics for pattern visualization: raw NSG Flow Logs are difficult to analyze. Integrated with Log Analytics with Traffic Analytics enabled, you have visual dashboards of flows and can write KQL queries to investigate specific patterns.

Create a network diagnosis infrastructure as code: standardized diagnostic scripts that execute IP Flow Verify, Next Hop, and Connection Troubleshoot in sequence for an IP pair reduce diagnosis time and ensure nothing is forgotten.

Install Network Watcher Agent on all production VMs: without the agent, Connection Troubleshoot and Packet Capture don't work. Install the agent at VM creation time via extension, don't wait until you need it.

Document expected network topology: having a reference diagram of expected topology makes it easier to identify when Next Hop returns an unexpected value. Without reference, it's hard to know if a specific next hop is correct or wrong.


10. Common Errors​

Using IP Flow Verify and concluding that "NSG is ok" without checking other layers

IP Flow Verify says the packet would be allowed by the NSG. The administrator concludes the problem isn't in the Azure network. But the connection still fails because there's an Azure Firewall in the path (UDR redirects to the Firewall) with a rule blocking the traffic. IP Flow Verify doesn't analyze Azure Firewall, only NSGs. The complete sequence should include Next Hop to check if there's an NVA/Firewall in the path.

Not checking the correct direction in IP Flow Verify

The administrator checks IP Flow Verify in the Inbound direction to the destination VM and finds "Allowed". But the problem is in the Outbound traffic from the source VM (an NSG on the source subnet is blocking outbound). It's necessary to check both directions: Outbound at the source AND Inbound at the destination.

Testing with Connection Troubleshoot without Network Watcher Agent installed

The tool returns an "agent not found" error or fails silently. The administrator interprets this as a network problem when it's actually the absence of the agent. Always confirm the agent is installed before using Connection Troubleshoot.

Confusing "Access Allowed" in IP Flow Verify with "connection works"

"Access Allowed" means the NSG would permit the traffic. It doesn't mean the connection will work: there could be a UDR redirecting to a Firewall that blocks, there could be a routing blackhole, there could be the VM's Windows Firewall blocking. IP Flow Verify is just one piece of the diagnosis.

Analyzing NSG Flow Logs without considering that the log is sampled

During high traffic periods, NSG Flow Logs may not record 100% of flows. It uses sampling. For security forensic analysis where every flow matters, consider Packet Capture together with Flow Logs.


11. Operation and Maintenance​

KQL Queries for Traffic Analytics​

With NSG Flow Logs and Traffic Analytics enabled, queries in Log Analytics reveal patterns:

Traffic blocked by NSG in the last 24 hours:

AzureNetworkAnalytics_CL
| where TimeGenerated > ago(24h)
| where FlowStatus_s == "D" // D = Denied
| summarize count() by NSGName_s, NSGRuleNumber_d, DestIP_s, DestPort_d
| sort by count_ desc
| top 20 by count_

Top 10 traffic sources for a specific VM:

AzureNetworkAnalytics_CL
| where TimeGenerated > ago(1h)
| where DestIP_s == "10.0.1.4"
| summarize Bytes = sum(OutboundBytes_d) by SrcIP_s
| sort by Bytes desc
| top 10 by Bytes

Connections blocked by specific IP (incident investigation):

AzureNetworkAnalytics_CL
| where TimeGenerated between (datetime(2024-03-01) .. datetime(2024-03-02))
| where SrcIP_s == "203.0.113.50"
| project TimeGenerated, SrcIP_s, SrcPort_d, DestIP_s, DestPort_d, FlowStatus_s, NSGName_s
| sort by TimeGenerated asc

Latency Diagnosis Between Regions​

For latency problems between VMs in different regions (Global VNet Peering), Connection Monitor with continuous measurement is the most suitable tool:

# Create Connection Monitor between VMs in different regions
az network watcher connection-monitor create \
--name cm-prod-dr-connectivity \
--resource-group NetworkWatcherRG \
--location brazilsouth \
--source-resource /subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-prod \
--dest-resource /subscriptions/<sub-id>/resourceGroups/rg-dr/providers/Microsoft.Compute/virtualMachines/vm-dr \
--dest-port 443 \
--monitoring-interval 30

Important Limits​

ItemLimit
Active Packet Capture per region10 simultaneous
Maximum Packet Capture duration5 hours
Maximum Packet Capture file size1 GB
NSG Flow Logs maximum retention365 days
Connection Monitor: minimum intervals30 seconds

12. Integration and Automation​

Automated Diagnosis with Azure Functions​

For environments where connectivity problems are frequent, an Azure Function can execute diagnostics automatically in response to alerts:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

This transforms manual diagnosis from 20-30 minutes into an automatic report generated in seconds when an alert is triggered.

### Integration with Microsoft Sentinel

NSG Flow Logs sent to Log Analytics can be connected to Microsoft Sentinel for security analysis and anomaly detection:

```kql
// Detect port scanning (many different ports from the same IP)
AzureNetworkAnalytics_CL
| where TimeGenerated > ago(1h)
| where FlowStatus_s == "D"
| summarize PortsAttempted = dcount(DestPort_d) by SrcIP_s
| where PortsAttempted > 20
| sort by PortsAttempted desc

Systematic Diagnosis Script​

A script that executes the complete diagnosis sequence for a source-destination pair:

#!/bin/bash
SOURCE_VM="vm-app"
SOURCE_IP="10.0.1.4"
DEST_IP="10.0.2.4"
DEST_PORT="1433"
RG="rg-producao"
NW_RG="NetworkWatcherRG"
SOURCE_VM_ID=$(az vm show --name $SOURCE_VM --resource-group $RG --query id -o tsv)
SOURCE_NIC=$(az vm show --name $SOURCE_VM --resource-group $RG \
--query "networkProfile.networkInterfaces[0].id" -o tsv)

echo "=== 1. IP Flow Verify (Outbound from source) ==="
az network watcher test-ip-flow \
--direction Outbound \
--protocol TCP \
--local $SOURCE_IP:12345 \
--remote $DEST_IP:$DEST_PORT \
--vm $SOURCE_VM_ID \
--nic $SOURCE_NIC \
--resource-group $NW_RG \
--watcher-resource-group $NW_RG

echo "=== 2. Next Hop from source to destination ==="
az network watcher show-next-hop \
--resource-group $NW_RG \
--vm $SOURCE_VM \
--source-ip $SOURCE_IP \
--dest-ip $DEST_IP \
--watcher-resource-group $NW_RG

echo "=== 3. Connection Troubleshoot ==="
az network watcher test-connectivity \
--source-resource $SOURCE_VM_ID \
--dest-address $DEST_IP \
--dest-port $DEST_PORT \
--protocol Tcp \
--resource-group $NW_RG \
--watcher-resource-group $NW_RG

13. Final Summary​

Essential points:

  • Network Watcher is the central network diagnostics service in Azure. It needs to be enabled per region.
  • Each tool diagnoses a specific layer: IP Flow Verify (NSG), Next Hop (routing), Connection Troubleshoot (end-to-end), Packet Capture (raw), NSG Flow Logs (historical).
  • The correct methodology is progressive: check NSG, then routing, then firewall/NVA, then destination VM OS.
  • NSG Flow Logs is a historical monitoring tool, not real-time diagnosis. Enable preventively.

Critical differences:

  • IP Flow Verify checks only NSGs. Doesn't analyze Azure Firewall, UDRs, or OS firewalls. "Allowed" doesn't mean the connection works.
  • Connection Troubleshoot vs. Connection Monitor: Troubleshoot is point-in-time (executes once); Monitor is continuous (executes periodically with alerts).
  • Effective Security Rules vs. IP Flow Verify: Effective Security Rules shows all active rules (complete view); IP Flow Verify simulates a specific packet and says which rule decides.
  • Next hop "None" means blackhole (traffic discarded by routing), not absence of route. It's different from route not found.

What needs to be remembered:

  • Connection Troubleshoot and Packet Capture require the Network Watcher Agent installed on the VM. Install as extension during provisioning.
  • The most efficient diagnosis order: IP Flow Verify β†’ Next Hop β†’ Connection Troubleshoot β†’ Packet Capture.
  • NSG Flow Logs save to Storage Account and can be integrated with Log Analytics via Traffic Analytics for KQL queries.
  • VPN Troubleshoot saves results to Storage Account (mandatory) and can take several minutes to complete the analysis.
  • For intermittent problems, Connection Monitor with continuous monitoring is much more effective than point-in-time tests.
  • IP Flow Verify checks both directions independently: always test Outbound at the source AND Inbound at the destination.