Skip to main content

Troubleshooting Lab: Plan and configure subnetting for services, including virtual network gateways, private endpoints, service endpoints, firewalls, application gateways, VNet-integrated platform services, and Azure Bastion

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A platform team deployed an Azure Firewall Premium in a hub VNet. The subnet created for the firewall is named AzureFirewallSubnet with a /26 prefix. The resource was successfully provisioned and network and application rules were configured according to the approved design.

Two days after deployment, the operations team enables IDPS (Intrusion Detection and Prevention System) on Azure Firewall Premium and begins to observe high latency in connections passing through the firewall. An analyst checks the logs and notices that legitimate packets are being dropped intermittently. The network team reports that no rules have been changed since the initial deployment. The security team confirms that the TLS intermediate certificate used in inspection is valid and within its expiration date.

The firewall logs show entries like:

Action: Deny
Rule: IDPS Signature Match
Signature ID: 2031234
Severity: Medium
Protocol: TCP
Source: 10.1.2.45:54321
Destination: 20.45.132.11:443

The source application subnet (app-subnet) has a User Defined Route (UDR) with default route 0.0.0.0/0 pointing to the Azure Firewall's private IP. The team confirms that the same UDR was in use before IDPS activation without any issues.

What is the root cause of the observed behavior?

A) The Azure Firewall /26 subnet is insufficient for Premium mode with active IDPS, causing packet drops due to lack of internal processing resources.

B) IDPS signatures are classifying legitimate traffic as threats, and the IDPS policy is configured in Alert and Deny mode instead of Alert Only.

C) The UDR in app-subnet is causing asymmetric routing because there's no explicit return route for the source subnet range in Azure Firewall.

D) The TLS intermediate certificate, although valid, is not being recognized by Azure Firewall Premium's trust chain, causing packet inspection failures.


Scenario 2 β€” Action Decision​

The network team identified that a VPN Gateway in production is not processing Point-to-Site client connections. After investigation, they confirmed the cause: the GatewaySubnet has a /29 prefix, and the gateway was upgraded to VpnGw2 SKU three days ago to support a larger volume of simultaneous connections. The increase in internal gateway instances exhausted the available IP addresses in the subnet, and new instances cannot be provisioned.

The environment has the following constraints:

  • The VNet has available address space for expansion
  • There are 47 active Site-to-Site connections on this gateway, all in use by critical branch offices
  • The approved maintenance window starts in 6 hours
  • The team does not have permission to recreate the gateway outside the maintenance window

What is the correct action to take at this time?

A) Immediately recreate the VPN Gateway with a new SKU and a /27 GatewaySubnet, taking advantage that the issue has already caused partial disruption.

B) Expand the VNet address space now and resize the GatewaySubnet to /27 during the approved maintenance window in 6 hours.

C) Immediately downgrade the SKU to VpnGw1 to reduce the number of internal instances and free up IPs in the current subnet.

D) Create a second GatewaySubnet with /27 prefix in parallel and migrate the gateway to it before the maintenance window.


Scenario 3 β€” Root Cause​

A development team reports that an application hosted on a VM in the app-subnet (10.2.1.0/24) cannot connect to an Azure Storage Account via Private Endpoint. The Private Endpoint was created a week ago and, according to the platform team, worked correctly in initial tests.

The VM can resolve the name mystorageaccount.blob.core.windows.net but the TCP connection on port 443 times out without response. The security team reports that no changes have been made to the NSG rules of the app-subnet since the initial tests. The administrator verifies the Private Endpoint configuration and confirms that the approval status is Approved.

The administrator runs the following command from the VM:

nslookup mystorageaccount.blob.core.windows.net

Server: 168.63.129.16
Address: 168.63.129.16

Non-authoritative answer:
Name: mystorageaccount.privatelink.blob.core.windows.net
Address: 20.45.132.90

The network team confirms that the provisioned Private Endpoint IP is 10.2.3.5. The app-subnet and the Private Endpoint subnet (pe-subnet, 10.2.3.0/24) are in the same VNet. A UDR was added to the pe-subnet two days ago to route outbound traffic to an NVA.

What is the root cause of the observed problem?

A) The NSG of app-subnet is blocking outbound traffic on port 443 to the 10.2.3.0/24 range, despite the security team stating there were no changes.

B) DNS resolution is returning the Storage Account's public IP instead of the Private Endpoint's private IP, indicating that the private DNS zone is not properly integrated with the VNet.

C) The UDR added to pe-subnet is redirecting return traffic to the NVA, causing asymmetric routing and TCP connection drops.

D) The Approved status of the Private Endpoint is necessary but not sufficient; the connection fails because the Private Endpoint was not associated with a DNS Group after approval.


Scenario 4 β€” Collateral Impact​

A team identified that the Azure Application Gateway v2 was experiencing intermittent failures when provisioning new instances during traffic spikes. The cause was confirmed: the Application Gateway subnet (appgw-subnet, /26) had only 3 free IP addresses, insufficient to accommodate new instances during autoscaling.

To resolve the issue, the team took the following action: expanded the appgw-subnet prefix from /26 to /24, freeing up more than 250 additional addresses. The Application Gateway resumed normal scaling and the failures ceased.

What secondary consequence can this action cause?

A) Expanding the subnet prefix may overwrite addresses already allocated to other resources in the VNet if the new range conflicts with existing subnets or the VNet's address space.

B) The Application Gateway v2 will lose its autoscaling configuration upon detecting the subnet change and will need to be manually reconfigured.

C) NSG rules associated with appgw-subnet will be automatically removed by Azure upon detecting the prefix change, requiring manual recreation.

D) The Application Gateway frontend's private IP will be automatically reallocated to a new address within the expanded range, breaking references in firewall rules and UDRs.


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The central clue is in the logs: packets are being dropped due to IDPS Signature Match, and this started exactly after IDPS activation. The default mode for new IDPS policies in Azure Firewall Premium is Alert and Deny for medium and high severity signatures. When legitimate traffic matches a known threat signature (false positive), the firewall drops the packet without exception in Deny mode.

The information about the valid TLS certificate is deliberately irrelevant. The problem is not in TLS inspection, and the logs don't indicate certificate failures; they indicate IDPS signature matches. This information exists to attract readers to option D.

Option C about asymmetric routing would be plausible in another context, but the UDR was working before IDPS activation and no routing changes were made. The symptom is also not consistent with asymmetry, which normally causes total timeout, not drops with signature logs.

The most dangerous distractor is option C, as asymmetric routing is a real and common cause in Azure Firewall environments. Acting on it would lead to changing UDRs without solving the real problem, potentially breaking routing for other subnets.


Answer Key β€” Scenario 2​

Answer: B

The critical constraint in the scenario is clear: the team does not have permission to recreate the gateway outside the maintenance window, and the window starts in 6 hours. The correct action is to prepare the groundwork now (expand the VNet address space, as this causes no disruption) and execute the destructive operation during the approved window.

Option A ignores the permission constraint and would destroy the 47 active Site-to-Site connections. Recreating a VPN Gateway is an operation that can take between 30 and 45 minutes and interrupts all existing connections.

Option C seems like a quick fix, but downgrading SKU on a gateway with 47 active connections is also an operation that causes disruption. Additionally, it doesn't solve the structural problem.

Option D describes something technically impossible: a VNet can only have one GatewaySubnet. It's not possible to create a second one.

The correct reasoning here is to distinguish between what can be done now without impact (preparation) and what must wait for the maintenance window (execution).


Answer Key β€” Scenario 3​

Answer: B

The decisive proof is in the nslookup output: the name mystorageaccount.blob.core.windows.net is resolving to 20.45.132.90, which is a public IP, and not to 10.2.3.5, which is the Private Endpoint IP. When DNS resolution returns the public IP, the TCP connection never reaches the Private Endpoint, regardless of any other configuration.

For the Private Endpoint to work correctly, the private DNS zone privatelink.blob.core.windows.net must be linked to the VNet, and the A record must point to the private IP. The nslookup output shows that the CNAME to privatelink is being resolved, but to the public IP, indicating that the private zone is not linked to the VNet or has an incorrect record.

The UDR in pe-subnet is the irrelevant information in the scenario. While UDRs in Private Endpoint subnets can cause problems in some contexts, the symptom here is incorrect DNS resolution, not routing. The packet doesn't even reach the correct IP for routing to be relevant.

The most dangerous distractor is option A, as a pressured analyst might ignore the nslookup output and investigate the NSG first. This would consume time and solve nothing, as traffic is not reaching the private endpoint to be filtered by the NSG.


Answer Key β€” Scenario 4​

Answer: A

Expanding a subnet prefix (/26 to /24) enlarges the IP range that subnet occupies within the VNet's address space. If the VNet's address space was not previously expanded to accommodate the new range, or if other subnets already occupy part of the intended 10.x.x.0/24 range, the operation may fail or, in incorrect configurations, create address overlap.

More specifically: Azure prevents creating overlapping subnets in the same VNet, so the operation would fail. But if the VNet space was expanded without checking conflicts with subnets in peered VNets, there may be address overlap at the inter-VNet routing level, which Azure doesn't automatically block at expansion time.

Option B is false: Application Gateway doesn't lose autoscaling configurations due to subnet size changes. Option C is false: Azure doesn't automatically remove NSGs during subnet resizing. Option D is technically possible in some reallocation scenarios, but the Application Gateway frontend IP is allocated statically at creation time and is not automatically reassigned due to subnet changes.


Troubleshooting Tree: Plan and configure subnetting for services​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or verification

To use this tree when facing a real problem, start with the root node describing the general symptom and answer each diagnostic question based on what is observable in the environment. Orange validation nodes indicate where to collect evidence before proceeding. Follow the path until reaching a red identified cause node, then execute the corresponding green action. Never skip intermediate validation steps, as the visible symptom often points to the wrong cause.