Skip to main content

Troubleshooting Lab: Configure an NSG for Remote Server Administration, Including Azure Bastion

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team deployed Azure Bastion at the Standard SKU level in a VNet called vnet-mgmt-prod. The Bastion subnet was created with the name AzureBastionSubnet and CIDR 10.0.255.0/27. An NSG called nsg-bastion was associated with this subnet. The team reports that when attempting to initiate an RDP session via Azure portal to a VM in subnet 10.0.1.0/24, the connection fails immediately with the message:

Failed to connect. Please check your Bastion configuration and try again.

The analyst verifies the NSG nsg-bastion and collects the following inbound rules:

Name                    Priority  Direction  Access  Protocol  Source          Port
----------------------- --------- ---------- ------- --------- --------------- -----
Allow-HTTPS-Inbound 100 Inbound Allow TCP Internet 443
Allow-GatewayMgr 200 Inbound Allow TCP GatewayManager 443
DenyAll-Inbound 300 Inbound Deny * * *

And the following outbound rules:

Name                    Priority  Direction  Access  Protocol  Source          Destination     Port
----------------------- --------- ---------- ------- --------- --------------- --------------- -----
Allow-SSH-RDP-Out 100 Outbound Allow TCP * VirtualNetwork 22,3389
Allow-AzureCloud-Out 200 Outbound Allow TCP * AzureCloud 443
DenyAll-Outbound 300 Outbound Deny * * * *

Bastion has Succeeded status in the portal. The target VM is running and has an NSG that allows RDP traffic from the Bastion subnet.

What is the root cause of the problem?

A) The CIDR 10.0.255.0/27 of the AzureBastionSubnet is insufficient for the Standard SKU, which requires a subnet of at least /26.

B) The outbound rule Allow-AzureCloud-Out should use the AzureCloud service tag on port 443, but is configured with the source as * instead of AzureBastionSubnet.

C) There is a missing inbound rule that allows traffic from the AzureBastionHostCommunication service tag for communication between Bastion instances.

D) The outbound rule Allow-SSH-RDP-Out is configured correctly, but the DenyAll-Outbound rule with priority 300 is evaluated before the allow rules, blocking all traffic.


Scenario 2 β€” Action Decision​

A company's security team identified that the NSG nsg-admin-vms, associated with the subnet-admin containing administration VMs, has the following inbound rule:

Name: Allow-RDP-Any
Priority: 100
Direction: Inbound
Access: Allow
Protocol: TCP
Source: *
Destination: VirtualNetwork
Port: 3389

The cause is confirmed: this rule exposes the RDP port directly to the internet. The company has decided to migrate remote access to Azure Bastion as a permanent solution, but the Bastion deployment is estimated to be completed in 5 business days. The administration VMs are actively used by a team of 8 administrators working remotely. The company does not have a VPN Gateway configured in the VNet.

What is the correct action to take at this time?

A) Remove the Allow-RDP-Any rule immediately and ask administrators to wait until the Bastion deployment is complete.

B) Replace the Allow-RDP-Any rule with a rule that restricts the source to known public IP addresses of remote administrators, maintaining necessary access until Bastion is available.

C) Associate a new NSG without inbound rules to the subnet-admin to immediately block all inbound traffic.

D) Reduce the priority of the Allow-RDP-Any rule from 100 to 4000 to decrease its impact while Bastion is being deployed.


Scenario 3 β€” Root Cause​

A VM called vm-db-01 in the subnet-data of vnet-core-prod is not responding to SSH connections initiated from Azure Bastion. Bastion is properly configured in the AzureBastionSubnet with CIDR 10.0.255.0/26 and its NSG has all mandatory rules applied. The analyst verifies the NSG nsg-data, associated with subnet-data, and obtains the following inbound rules:

Name                     Priority  Direction  Access  Protocol  Source              Port
------------------------ --------- ---------- ------- --------- ------------------- -----
Allow-Bastion-SSH 100 Inbound Allow TCP 10.0.255.0/26 22
Allow-App-to-DB 200 Inbound Allow TCP 10.0.1.0/24 5432
Allow-HTTPS-Internal 300 Inbound Allow TCP VirtualNetwork 443
DenyAll-Inbound 1000 Inbound Deny * * *

The team confirms that Bastion initiates the SSH session without configuration errors in the portal, but the connection remains in loading state and times out after 60 seconds. The vm-db-01 is running and the SSH service is active and listening normally on port 22, as verified with direct access via serial console. The VM disk was expanded from 128 GB to 256 GB 2 days ago.

What is the root cause of the problem?

A) The Allow-Bastion-SSH rule uses CIDR 10.0.255.0/26 as source, but Bastion on Standard SKU can originate traffic from IPs outside this CIDR when using advanced session features.

B) The VM disk expansion caused a temporary instability state that prevents the SSH daemon from accepting new external connections, even with the service apparently active.

C) The NSG nsg-data does not have an explicit outbound rule allowing the subnet-data to respond SSH traffic back to Bastion, and the default outbound rules block this return traffic.

D) The operating system firewall of vm-db-01 is blocking SSH connections originating from the Bastion subnet CIDR, despite the NSG allowing the traffic.


Scenario 4 β€” Diagnostic Sequence​

An administrator reports losing SSH access to a Linux VM called vm-ops-02 in vnet-ops-prod. Access was exclusively via Azure Bastion. No changes were made to the VM in the last 24 hours. Bastion was working normally yesterday. When attempting to start the session in the portal, the error returned is:

Connection failed. Unable to reach the virtual machine.

The available investigation steps are:

  • [P] Verify if VM vm-ops-02 is running and if its NIC is in connected state.
  • [Q] Verify if the NSG associated with the VM's NIC or subnet has an inbound rule allowing SSH on port 22 from the AzureBastionSubnet CIDR.
  • [R] Verify if Azure Bastion has operational status and if the resource was not accidentally deleted or restarted.
  • [S] Verify if the AzureBastionSubnet maintains its original CIDR and if the Bastion NSG has all mandatory rules intact.
  • [T] Use Azure serial console to verify if the SSH daemon is active and if the OS firewall was not altered.

What is the correct investigation sequence?

A) R, S, P, Q, T

B) P, Q, R, S, T

C) Q, R, P, S, T

D) S, R, Q, P, T


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: A

Azure Bastion Standard SKU requires the AzureBastionSubnet to have a minimum size of /26, which equals 64 addresses. The CIDR 10.0.255.0/27 provides only 32 addresses, which is the minimum for Basic SKU. Standard SKU supports features like simultaneous sessions at greater scale, and the undersized subnet prevents proper allocation of Bastion instances, resulting in connection failure even with the resource appearing as Succeeded in the portal.

The decisive clue is the combination of the declared Standard SKU and the /27 CIDR. The Succeeded status in the portal does not guarantee that Bastion is operational, as this status reflects only resource creation, not its ability to operate correctly.

Distractor C is technically relevant in other contexts: the AzureBastionHostCommunication tag is mandatory for communication between Bastion nodes in SKUs with multiple instances, but its absence would cause degradation or partial failure, not immediate failure in all connections as described. Additionally, the scenario does not mention configured multiple instances.

Distractor D represents fundamental confusion about how NSGs work: rules are evaluated in ascending priority order, therefore an allow rule with priority 100 is always evaluated before a deny rule with priority 300. This distractor would only be true if the deny priority was lower than the allow priority.

Acting based on distractor D would lead to creating unnecessary additional rules, without solving the undersized subnet problem that is the real cause.


Answer Key β€” Scenario 2​

Answer: B

The critical constraint of the scenario is that the 8 administrators need continuous remote access during the 5 days until Bastion is available, and there is no VPN Gateway as an alternative. Removing the rule without replacement (distractor A) would cause total work interruption for the team for days, which is unacceptable. The correct solution is to reduce exposure as much as possible within the constraints: limiting the RDP source to known public IPs of administrators eliminates exposure to the entire internet while maintaining necessary access.

Distractor A ignores the operational continuity constraint. Distractor C is even more extreme: associating an NSG without rules would block not only RDP but all inbound traffic, including any legitimate VM communication.

Distractor D is the most dangerous: changing a rule's priority does not restrict its source. An Allow-RDP-Any rule with priority 4000 still allows traffic from any source if no other rule contradicts it beforehand. If the NSG does not have an explicit deny rule with priority lower than 4000, the exposure remains identical.


Answer Key β€” Scenario 3​

Answer: D

The default outbound rules of an NSG allow traffic to VirtualNetwork and to Internet. This means the subnet-data can respond to Bastion via TCP without needing an explicit outbound rule. The inbound NSG of nsg-data allows SSH from the correct source. Bastion is configured correctly and initiates the session without error. The SSH service is active on the VM. With all these verifications ruling out the NSG and Bastion, the cause is in the VM's operating system.

The decisive clue is the combination: connection that initiates but times out after 60 seconds, and SSH confirmed as active via serial console. This behavior pattern indicates that the packet reaches the VM (NSG allows), the SSH daemon is active, but something in the OS refuses or drops the connection after initial handshake. The OS firewall (like iptables or ufw) may have been altered by an automated process or security policy independently of manual changes.

The disk expansion information is irrelevant: disk operations in Azure do not affect the SSH daemon or OS firewall state.

Distractor C is factually incorrect: Azure NSG default outbound rules allow outbound traffic to VirtualNetwork, which includes the Bastion subnet. There is no need for an explicit outbound rule for this scenario.

Distractor A invents non-existent behavior: Bastion Standard SKU does not originate traffic outside the AzureBastionSubnet CIDR.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is: R, S, P, Q, T.

The correct diagnostic reasoning for a Bastion access failure starts with the intermediate component, Bastion itself, and progressively advances to the target components.

  1. R verifies if Bastion is operational. If the resource was deleted or is degraded, all other verifications are irrelevant.
  2. S verifies Bastion configuration integrity: subnet CIDR and mandatory NSG rules. Problems here affect all sessions, not just one VM.
  3. P verifies if the target VM is running and connected. A stopped VM or one with disconnected NIC would be inaccessible regardless of Bastion.
  4. Q verifies if the VM's NSG allows SSH traffic originated from the Bastion subnet. This is the most common cause of connectivity failure when Bastion is healthy.
  5. T uses serial console to verify the VM's internal state, like SSH daemon and OS firewall, only after ruling out all external causes.

Alternative B starts with the VM before verifying Bastion, which can lead to erroneous conclusions: an apparently healthy VM might seem like the problem when actually Bastion has failed. Alternative C starts with the VM's NSG before confirming Bastion is active, wasting time on configurations that might be correct. Alternative D starts with the Bastion subnet, which is reasonable, but skips verifying the resource's operational status before diving into configuration details.


Troubleshooting Tree: Configure an NSG for Remote Server Administration, Including Azure Bastion​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

  • Dark blue: initial symptom, entry point to the tree
  • Blue: diagnostic question node, requires observation or active verification
  • Red: identified cause, confirmed problem root
  • Green: recommended action or resolution applicable to the context
  • Orange: validation or intermediate verification after a corrective action

To use this tree when facing a real problem, start by identifying if access failed via Azure Bastion or via direct connection with NSG rules. For the Bastion path, verify the resource and its subnet before inspecting the target VM. For the direct path, verify the port, protocol, and rule source in the NSG. In both paths, the operating system firewall is investigated last, after ruling out all infrastructure causes. Each branch eliminates a class of problems and directs to the most specific next verification, avoiding premature jumps to internal diagnostics before confirming external ones.