Skip to main content

Troubleshooting Lab: Create and configure network security groups (NSGs) and application security groups

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A development team reports that application VMs cannot communicate with database VMs on port 5432. The environment uses ASGs for segmentation: asg-app for application VMs and asg-db for database VMs. All VMs are in the same VNet, in distinct subnets.

The administrator checks the NSG associated with the database subnet and finds the following inbound rules:

Priority | Name                | Source       | Destination | Port | Action
---------|---------------------|--------------|-------------|------|-------
100 | Allow-App-To-DB | asg-app | asg-db | 5432 | Allow
200 | Allow-Monitoring | 10.0.10.5/32 | asg-db | 9100 | Allow
65000 | AllowVnetInBound | VirtualNetwork | Any | Any | Allow
65500 | DenyAllInBound | Any | Any | Any | Deny

The administrator confirms that the application VM NICs are associated with asg-app and the database VM NICs are associated with asg-db. The NSG is correctly associated with the database subnet. The team mentions that the problem started after a VM migration to new SKU sizes performed the previous week, but there were no changes to the NSG rules.

What is the root cause of the connectivity failure?

A) The AllowVnetInBound rule with priority 65000 is being overridden by the Allow-App-To-DB rule, creating a conflict between ASG and service tag in the same NSG.

B) During the SKU migration, the VM NICs were recreated and lost their association with the ASG, causing traffic from application VMs to no longer match the Allow-App-To-DB rule.

C) The Allow-Monitoring rule is consuming processing slots and causing timeouts on port 5432 connections.

D) The NSG associated with the subnet does not evaluate ASG-based rules for traffic between distinct subnets; the NSG should be associated with the database VM NICs.


Scenario 2 β€” Action Decision​

The security team identified that an NSG rule created six months ago allows unrestricted access to port 22 (SSH) from any source for a group of development VMs:

Priority | Name           | Source | Destination | Port | Action
---------|----------------|--------|-------------|------|-------
100 | Allow-SSH-All | Any | Any | 22 | Allow

The cause is known: the rule was created temporarily for a project and was never removed. The NSG is associated with the development subnet, which contains 12 active VMs used by three different teams during business hours. It's 2 PM on a Friday. The security team requires that unrestricted access be eliminated today. The development teams have not been notified and there is no scheduled maintenance window.

What is the correct action to take at this moment?

A) Immediately delete the Allow-SSH-All rule and create a new rule allowing SSH only from known corporate IP addresses, without prior notification, as the security risk justifies immediate action.

B) Change the priority of the Allow-SSH-All rule to 4096, reducing its effectiveness without removing it, while waiting for a maintenance window next week.

C) Notify the development teams immediately, create the restrictive rule with priority lower than 100 in parallel, validate that legitimate connections continue working, and only then remove the Allow-SSH-All rule.

D) Associate the NSG to a specific NIC instead of the subnet to limit the rule's scope while the problem is evaluated more calmly next week.


Scenario 3 β€” Root Cause​

An administrator receives the following alert from the monitoring system at 3 AM:

ALERT: VM 'vm-webfront-03' unreachable on port 443
Source: Azure Monitor
Time: 03:12 UTC
Affected resource: /subscriptions/.../vm-webfront-03
Last successful probe: 02:47 UTC

The administrator accesses the portal and uses the IP flow verify tool from Network Watcher with the following parameters:

VM:              vm-webfront-03
Direction: Inbound
Protocol: TCP
Local address: 10.0.1.15
Local port: 443
Remote address: 203.0.113.42
Remote port: 54231

Result: DENY β€” Rule: DenyAllInBound (NSG: nsg-frontend)

The administrator checks the nsg-frontend NSG and finds the rules below:

Priority | Name              | Source    | Destination | Port | Action
---------|-------------------|-----------|-------------|------|-------
100 | Allow-HTTPS | Any | Any | 443 | Allow
200 | Allow-HTTP | Any | Any | 80 | Allow
65500 | DenyAllInBound | Any | Any | Any | Deny

The VM vm-webfront-03 has two NICs: nic-primary (10.0.1.15) and nic-mgmt (10.0.2.30). The nsg-frontend NSG is associated with the subnet-frontend subnet (10.0.1.0/24). The administrator confirms that there were no changes to the NSG rules in the last 48 hours. The VM was working normally until 02:47 UTC.

What is the most likely root cause of the blocking?

A) The Allow-HTTPS and Allow-HTTP rules have the destination field configured as Any, which prevents the NSG from correctly associating them with the VM's IP.

B) The nsg-frontend NSG was disassociated from the subnet or was additionally associated with nic-primary, and the Allow-HTTPS rule is not present in the NSG that effectively processes the NIC traffic.

C) The VM has two NICs and Azure is routing inbound traffic through nic-mgmt (10.0.2.30), which belongs to a different subnet where an equivalent rule to Allow-HTTPS does not exist.

D) The IP flow verify returned an incorrect result because the tool does not consider rules in NSGs associated with subnets, only NSGs associated with NICs.


Scenario 4 β€” Diagnostic Sequence​

A production VM stopped responding on port 8080 after a maintenance window in which three simultaneous activities were performed: adding a new NSG rule, associating a second NSG to the VM's NIC, and including the NIC in a new ASG.

The administrator needs to diagnose which change caused the blocking. Below are five available investigation steps:

  1. Use IP flow verify in Network Watcher to confirm which rule is blocking traffic and in which NSG it is located
  2. Check change logs via Activity Log to identify the exact sequence of changes made during maintenance
  3. Confirm if the VM can communicate on other ports to determine if the blocking is total or partial
  4. Review all rules in the NSG associated with the subnet and the NSG associated with the NIC, comparing priorities and overlaps
  5. Verify if the NIC is associated with the correct ASG and if any rule depends on this ASG as source or destination

What is the correct diagnostic sequence?

A) 2 β†’ 1 β†’ 3 β†’ 4 β†’ 5

B) 3 β†’ 1 β†’ 4 β†’ 5 β†’ 2

C) 1 β†’ 4 β†’ 2 β†’ 3 β†’ 5

D) 3 β†’ 2 β†’ 1 β†’ 4 β†’ 5


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The central clue is in the combination of two facts: the problem started after the VM SKU migration, and there were no changes to the NSG rules. When a VM is resized to a new SKU in a way that requires NIC recreation, the association between the NIC and the ASG is lost. Without this association, traffic originating from application VMs does not match the Allow-App-To-DB rule, which uses asg-app as a source filter. The traffic then falls to the default DenyAllInBound rule.

The information about the Allow-Monitoring rule is purposely irrelevant: it only affects port 9100 and does not interfere with rule processing for port 5432. Including it in the statement induces the reader to look for a configuration problem in the rules, diverting from the actual event.

Alternative D describes a restriction that does not exist: NSGs associated with subnets evaluate ASG rules normally for traffic between subnets. Acting based on this alternative would lead the administrator to reorganize NSG associations without solving the real problem.


Answer Key β€” Scenario 2​

Answer: C

The scenario presents two critical constraints: teams are in active hours and were not notified. Alternative A ignores the second constraint and risks interrupting legitimate SSH connections from 12 VMs in use, without prior validation. Alternative B does not eliminate the security risk within the required timeframe and represents an incomplete action disguised as caution. Alternative D completely ignores the deadline imposed by the security team and does not solve the problem.

Alternative C is the only one that simultaneously satisfies all constraints: eliminates the risk within the required timeframe, notifies those affected, validates the continuity of legitimate access before removing the problematic rule, and follows the correct sequence of adding before removing. The most dangerous distractor is A, as it seems technically correct and urgent, but removal without prior validation in an environment with active users can cause immediate interruption for teams that depend on SSH.


Answer Key β€” Scenario 3​

Answer: C

The decisive clue is that the VM has two NICs in different subnets. The monitoring alert and IP flow verify use the IP 10.0.1.15, which belongs to nic-primary in subnet-frontend. However, inbound traffic routing in VMs with multiple NICs depends on how routes are configured. If external traffic destined for the VM is being received by nic-mgmt (10.0.2.30), the NSG applied to that NIC's subnet is what processes the traffic, not nsg-frontend. If there is no equivalent rule to Allow-HTTPS in the subnet-mgmt NSG, blocking occurs by the default DenyAllInBound of that NSG.

The fact that there were no changes to the nsg-frontend rules in the last 48 hours rules out problems in that NSG. Alternative A describes non-existent behavior: the Any field in the destination is valid and does not prevent matching. Alternative D is false: IP flow verify considers NSGs from both subnet and NIC.


Answer Key β€” Scenario 4​

Answer: B

The correct diagnostic reasoning starts from the observable symptom and progresses toward the cause progressively, without assuming the origin before confirming the problem. Step 3 (verify if blocking is total or partial) establishes the problem scope before any other action. Step 1 (IP flow verify) precisely identifies the blocking rule and responsible NSG, eliminating the work of manually reviewing all configurations. Step 4 (review rules and priorities of both NSGs) deepens the analysis in the identified NSG. Step 5 (verify ASG association) tests the specific ASG hypothesis. Step 2 (Activity Log) confirms the sequence of changes and ends the diagnosis with audit evidence.

Alternative D starts with step 3 correctly, but jumps to Activity Log before using IP flow verify, which means reviewing a change history without yet knowing which component to investigate. Alternative A starts with Activity Log without confirming the symptom, which wastes time on auditing before reproducing and locating the problem.


Troubleshooting Tree: Create and configure network security groups (NSGs) and application security groups​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Legend

ColorNode Type
Dark blueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate verification or validation

To use this tree when facing a real problem, start with the root node describing the blocked connectivity symptom. Answer each question based on what you directly observe in the environment, without assuming the cause. Intermediate verification nodes indicate where to collect evidence before proceeding. Each path ends with a named cause and a specific action, eliminating the need to test unfounded hypotheses.