Troubleshooting Lab: Create a Backend Pool

Diagnostic Scenarios

Scenario 1 — Root Cause

The operations team reports that the Azure Application Gateway for a production web application started returning HTTP 502 errors for all users. The gateway was deployed three weeks ago and was working normally until yesterday afternoon.

The responsible engineer accesses the health probes dashboard and finds the following state:

Backend Pool: pool-webapp
  Target: webapp-prod.azurewebsites.net
  Status: Unhealthy
  Probe response: TCP connect OK | HTTP 200 received | Host header mismatch

Backend Settings: bs-webapp
  Protocol: HTTPS
  Port: 443
  Pick hostname from backend target: Disabled
  Host name override: appgw-internal.contoso.com

Additional information collected during the investigation:

The App Service responds normally when accessed directly via browser
The App Service TLS certificate is valid and not expired
The Application Gateway subnet has an NSG associated, but the inbound rules on port 65200-65535 are correctly configured
The App Service Plan is on Standard S2 tier

What is the root cause of the 502 error?

A) The Application Gateway subnet NSG is blocking communication with the backend
B) The backend settings configuration is sending an incorrect host header, which the App Service rejects because it doesn't match its configured hostname
C) The App Service Plan on Standard tier doesn't support integration with Application Gateway
D) The HTTPS protocol in backend settings requires the App Service certificate to be manually uploaded to the gateway

Scenario 2 — Action Decision

During an architecture review, it was identified that a Standard Azure Load Balancer internal is configured with the backend pool in NIC mode, directly associating the NICs of four VMs located in VNet-Production. The identified issue is that two new VMs need to be added to the pool, but these VMs are in VNet-DR, connected via active and functional VNet Peering.

The environment has the following restrictions:

The Load Balancer is critical and processes real-time financial transactions
The available maintenance window is 30 minutes, starting in 10 minutes
Reconfiguring the backend pool to IP mode requires removal and recreation of the existing pool, with brief interruption
Adding VMs from VNet-DR directly in the current NIC mode is not technically possible

What is the correct action to take at this moment?

A) Immediately start recreating the backend pool in IP mode to include the VMs from VNet-DR within the available maintenance window
B) Add the VMs from VNet-DR via IP address directly in the current NIC mode, leveraging the active peering
C) Postpone the change to a planned maintenance window with more time, documenting the risk and keeping the current pool in operation
D) Remove the four current VMs from the pool, recreate the pool in IP mode and re-add all six VMs during the 30-minute window

Scenario 3 — Root Cause

An engineer is configuring a new Standard Azure Load Balancer public to distribute traffic among three VMs. After completing the configuration, they run a connectivity test from an external machine and get no response. During investigation, they collect the following information:

# Connectivity test
$ curl -v http://52.178.45.10:80
* Trying 52.178.45.10:80...
* Connection timed out after 30000 milliseconds
* Closing connection 0

# Backend pool status
$ az network lb address-pool show \
    --resource-group rg-producao \
    --lb-name lb-frontend \
    --name pool-vms \
    --query "loadBalancerBackendAddresses"
[
  { "name": "vm01-ip", "ipAddress": "10.0.1.4" },
  { "name": "vm02-ip", "ipAddress": "10.0.1.5" },
  { "name": "vm03-ip", "ipAddress": "10.0.1.6" }
]

# Health probe
Probe: probe-http-80
Status: All backends Unhealthy
Protocol: HTTP | Port: 80 | Path: /healthcheck

Additional information:

All three VMs have IIS installed and the default website active on port 80
The NSG associated with the VM NICs allows inbound traffic on port 80 from any source
The Load Balancer was created with Standard SKU and Standard SKU public IP
The VMs don't have individual public IPs associated
The engineer configured the backend pool in IP address mode using the private IPs of the VMs

What is the root cause of the failure?

A) The health probe is configured with the /healthcheck path, which doesn't exist on the VMs, causing the Load Balancer to mark all backends as unhealthy and not forward traffic
B) VMs without public IPs cannot be members of backend pools in public Load Balancers in IP address mode
C) The VM NICs NSG blocks traffic from the Load Balancer, as the source IP of health probes is the public frontend IP
D) Standard SKU Load Balancer requires the backend pool in IP address mode to be associated with a VNet explicitly declared in the pool configuration

Scenario 4 — Diagnostic Sequence

An Azure Application Gateway with WAF enabled presents the following behavior: legitimate user requests are being intermittently blocked with HTTP 403, while the backend pool reports all members as Healthy. The environment includes a backend pool with two App Service instances and correctly configured health probes.

The engineer receives the following available investigation steps:

Check WAF logs in Log Analytics to identify which rules are triggering and for which URIs
Confirm that health probes are returning HTTP 200 for all backend pool members
Verify if the backend pool is configured with FQDN or IP address as destination
Analyze the headers of blocked requests to identify common patterns among them
Check if WAF mode is set to Detection or Prevention

What is the correct investigation sequence for this symptom?

A) 2 -> 1 -> 5 -> 4 -> 3
B) 5 -> 1 -> 4 -> 2 -> 3
C) 3 -> 2 -> 1 -> 4 -> 5
D) 2 -> 3 -> 5 -> 1 -> 4

Answer Key and Explanations

Answer Key — Scenario 1

Answer: B

The central clue is in the health probe log: HTTP 200 received | Host header mismatch. The backend is responding successfully at the TCP layer and even returning HTTP 200, but the Application Gateway registers a host header mismatch. App Service, being a multitenancy platform, routes requests based on the Host header. When the backend setting sends appgw-internal.contoso.com as the host header, App Service doesn't recognize this hostname and rejects the request in a way that the gateway interprets as backend failure, generating 502.

The irrelevant information purposefully inserted is the NSG rules state on port 65200-65535: this configuration is necessary for Application Gateway internal infrastructure operation, but has no relation to the described failure, since the probe is reaching the backend and receiving a response. Focusing on NSG would be a classic diagnostic error of confusing infrastructure verification with the actual cause.

The most dangerous distractor is D, as it leads the engineer to look for a TLS certificate problem when the log itself indicates that the TLS layer worked and HTTP 200 was received.

Answer Key — Scenario 2

Answer: C

The critical restriction that defines the correct answer is available time. Recreating the backend pool in IP mode is the technically correct solution, but requires removal and recreation with service interruption. With only 10 minutes until the window starts and 30 minutes duration, the risk of not completing the operation within the deadline in a financial transactions environment is too high to be accepted without proper planning.

Alternative A fails by ignoring the time constraint and production interruption risk. Alternative B is technically impossible: NIC mode doesn't accept resources outside the Load Balancer's local VNet, and active peering doesn't change this association limitation. Alternative D aggravates the risk by including removal of the four existing VMs before ensuring recreation will be completed on time.

The real consequence of executing alternative A or D with completion failure would be leaving the Load Balancer without operational backend pool, interrupting financial transaction processing for indefinite time.

Answer Key — Scenario 3

Answer: A

The information set points to a single cause: the /healthcheck path configured in the health probe doesn't exist on the VMs, which run only the IIS default website. IIS returns HTTP 404 for this path, and Standard Load Balancer interprets any response other than 2xx or 3xx as failure, marking all backends as Unhealthy. With all backends marked as unhealthy, Standard Load Balancer stops forwarding traffic, resulting in timeout on external client connections.

The irrelevant information is that VMs don't have individual public IPs. In IP address backend pool mode, Standard Load Balancer doesn't require public IPs on destination VMs; communication occurs via internal network. This information was included to lead the reader to alternative B, which represents a common misconception about IP mode operation.

The most dangerous distractor is C, as NSG behavior regarding health probes is an area of real confusion. However, the statement itself informs that the NSG allows inbound traffic on port 80 from any source, eliminating this hypothesis based on available information.

Answer Key — Scenario 4

Answer: A

The correct sequence is: 2 -> 1 -> 5 -> 4 -> 3.

The progressive diagnostic reasoning starts from the most basic state to the most specific:

First confirm that the backend pool is healthy (step 2), eliminating the hypothesis that 403s are caused by backend failure and not by WAF. Then analyze WAF logs (step 1), which is the direct evidence source for 403 blocks. With logs in hand, check WAF mode (step 5), as a WAF in Prevention mode actively blocks, while Detection only logs. With mode confirmed, analyze patterns in blocked requests (step 4) to identify the triggered rule. Step 3, checking if destination is FQDN or IP, is relevant for other types of failure, but not for 403 blocks generated by WAF, being the lowest priority step in this context.

Alternative B starts with WAF mode before consulting logs, missing the opportunity to have concrete evidence before forming hypotheses. Alternative C starts by checking backend pool destination type, which is irrelevant for WAF blocking symptoms. The common reasoning error in distractors is starting investigation with hypotheses without first collecting the most direct available evidence, which in this case are the WAF logs themselves.

Troubleshooting Tree: Create a Backend Pool

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Color Legend:

Color	Node Type
Dark Blue	Initial symptom (entry point)
Blue	Diagnostic question
Orange	Intermediate verification or validation
Red	Identified cause
Green	Recommended action or resolution

When facing a real problem, start with the root node identifying whether the symptom is backends marked as unhealthy or traffic not forwarded even with healthy backends. Follow the branches answering each question based on what is directly observable in the portal or via CLI, without skipping steps. Each red path indicates a cause that must be confirmed before executing the corresponding green action. Orange nodes indicate points where additional verification is needed before concluding the diagnosis.

Diagnostic Scenarios​

Scenario 1 — Root Cause​

Scenario 2 — Action Decision​

Scenario 3 — Root Cause​

Scenario 4 — Diagnostic Sequence​

Answer Key and Explanations​

Answer Key — Scenario 1​

Answer Key — Scenario 2​

Answer Key — Scenario 3​

Answer Key — Scenario 4​

Troubleshooting Tree: Create a Backend Pool​