Skip to main content

Troubleshooting Lab: Map requirements to features and capabilities of Azure Application Gateway

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

The operations team reports that after a successful migration to Azure, the portal.contoso.com application is inaccessible via HTTPS. The Application Gateway was deployed three days ago and worked correctly for the first two days. There have been no changes to the application or backend pools since deployment.

The analyst checks the diagnostic logs and finds the following entry:

[ERROR] Backend health probe failed
Target: 10.1.2.10:443
Protocol: HTTPS
Status: Connection timeout
Probe path: /healthcheck
Last success: 2026-03-25T14:32:00Z

Information collected by the analyst:

  • The TLS certificate configured on the Application Gateway listener expires in 2027
  • The Network Security Group (NSG) associated with the Application Gateway subnet allows inbound traffic on ports 80 and 443
  • The backend pool contains two VM instances (10.1.2.10 and 10.1.2.11)
  • The security team reports they applied a new NSG policy to the backend subnet yesterday afternoon
  • The application on the VMs uses port 443 with a self-signed certificate

What is the root cause of the observed failure?

A) The TLS certificate of the Application Gateway listener expired, preventing the establishment of new HTTPS connections.

B) The NSG applied to the backend subnet is blocking health probes originating from the Application Gateway.

C) The health probe is configured with the incorrect protocol; since the backend uses a self-signed certificate, the probe should use HTTP instead of HTTPS.

D) The Application Gateway has reached the backend instance limit supported by the current SKU.


Scenario 2 β€” Action Decision​

The cause of the problem has been identified: the main routing rule of the Application Gateway is associated with an HTTP Settings object that points to port 8080 of the backend, but the backend instances were reconfigured by the development team to listen on port 443. The environment is production with an active SLA and there are users currently connected.

The responsible engineer has Contributor permission on the Application Gateway resource and needs to restore service as quickly as possible without causing additional interruption to users who have active sessions.

What is the correct action to take at this moment?

A) Delete the current HTTP Settings object and create a new one pointing to port 443, as HTTP Settings objects do not allow editing after creation.

B) Edit the backend port in the existing HTTP Settings object from 8080 to 443 and apply the change, which will take effect without gateway restart.

C) Restart the Application Gateway to force renegotiation of connections with the backend on the new port.

D) Create a new Application Gateway with the correct configuration and update DNS to point to the new resource, ensuring zero downtime.


Scenario 3 β€” Root Cause​

An e-commerce application hosted behind an Application Gateway with WAF_v2 begins showing HTTP 403 errors for a specific subset of users. Other users access normally. The development team confirms that there have been no changes to the application code in the last 72 hours.

The analyst collects the following data from the Application Gateway logs:

[WAF] Action: Blocked
RuleId: 942100
RuleGroup: REQUEST-942-APPLICATION-ATTACK-SQLI
ClientIP: 189.45.12.33
URI: /busca?q=produto%27+OR+%271%27%3D%271
Message: SQL Injection Attack Detected via libinjection

[WAF] Action: Blocked
RuleId: 942100
RuleGroup: REQUEST-942-APPLICATION-ATTACK-SQLI
ClientIP: 201.76.88.14
URI: /busca?q=notebook%27+OR+%271%27%3D%271
Message: SQL Injection Attack Detected via libinjection

The product manager informs that complaints are coming from users who use the advanced search functionality, which allows the use of the apostrophe character in product name search terms.

The WAF is operating in Prevention mode. The security team confirms that the managed rule set (OWASP CRS 3.2) is active without customizations.

What is the root cause of the 403 errors?

A) The Application Gateway has a bug in the current firmware version that causes false positives in rule 942100 for any request with special characters in the query string.

B) The WAF is blocking legitimate requests because rule 942100 interprets the search pattern with apostrophe as an SQL injection attempt, and there is no rule exclusion configured for this path.

C) Prevention mode is incorrectly activated; the expected behavior would be to only log requests, not block them, indicating a configuration error in the operation mode.

D) The affected users are being blocked by a geo-restriction policy automatically applied by the WAF when detecting Brazilian IPs.


Scenario 4 β€” Diagnostic Sequence​

An engineer receives the following report: "The Application Gateway is responding HTTP 502 Bad Gateway for all requests from a specific listener, but other listeners on the same gateway work normally."

The available investigation steps are:

  1. Check the health status of the backend pool associated with the failing listener in the "Backend health" tab of the portal
  2. Confirm if the failing listener is associated with an active routing rule
  3. Check diagnostic logs for probe errors for the specific backend pool instances
  4. Validate if the HTTP Settings object associated with the rule points to the correct port and protocol of the backend
  5. Access one of the backend pool instances directly via private IP to confirm if the application responds

What is the correct investigation sequence for this symptom?

A) 1 β†’ 3 β†’ 4 β†’ 5 β†’ 2

B) 2 β†’ 1 β†’ 3 β†’ 4 β†’ 5

C) 3 β†’ 1 β†’ 2 β†’ 5 β†’ 4

D) 5 β†’ 4 β†’ 1 β†’ 3 β†’ 2


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue in the scenario is the timing of the failure: the service worked correctly for two days and stopped after applying a new NSG policy to the backend subnet, performed yesterday afternoon. The log confirms that health probes are timing out, not returning certificate or protocol errors.

The Application Gateway sends health probes from IP addresses within its own subnet. For these probes to reach backend instances in another subnet, the backend subnet's NSG must allow inbound traffic originating from the Application Gateway subnet. A new restrictive policy applied to the backend subnet is the only change temporally correlated with the failure onset.

Alternative A is a clear distractor: the scenario explicitly states that the listener certificate expires in 2027. Alternative C represents a technical misconception: the Application Gateway supports backends with self-signed certificates via trusted root certificate configuration in HTTP Settings; this does not require changing the probe protocol to HTTP. Alternative D is technically implausible given the context and is not supported by any presented data. Acting on alternative C would be especially dangerous because it would remove encryption between the gateway and backend without solving the real cause.


Answer Key β€” Scenario 2​

Answer: B

The HTTP Settings object in Application Gateway is an editable resource. The backend port can be changed directly in the existing object without needing deletion and recreation. The change takes effect after the configuration application process, which in Application Gateway v2 occurs without data plane process restart, preserving active connections.

Alternative A is wrong because it's based on a false premise: HTTP Settings objects can be edited. Deleting and recreating the object would be unnecessary and introduce risk of inconsistency during the recreation window. Alternative C is the most dangerous: restarting the Application Gateway in production with active users would cause immediate interruption of all sessions, violating the explicit scenario constraint. Alternative D would be a valid solution in a context of catastrophic failure without editing possibility, but represents unnecessary overengineering here and introduces operational complexity and DNS propagation risk without justification.


Answer Key β€” Scenario 3​

Answer: B

The logs are the central evidence: the WAF is operating in Prevention mode and rule 942100 from the SQLI group is blocking requests containing the ' OR '1'='1 pattern in the query string. This pattern is indeed characteristic of classic SQL Injection and is detected by the libinjection engine used in OWASP CRS.

The real problem is a false positive: legitimate users typing product names with apostrophes are generating URIs that, when URL-encoded, reproduce the pattern detected by the rule. The correct solution would be to create a rule exclusion for the query string argument q in the /busca path, or configure a custom rule that explicitly allows this pattern for this context.

The information about no code changes in the last 72 hours is purposefully irrelevant: the problem is not in the application, but in the interaction between legitimate search functionality behavior and WAF rules.

Alternative C represents a serious conceptual error: Prevention mode is the correct documented and expected behavior; active blocking is its function, not a configuration defect. Alternative D introduces a non-existent concept: the WAF does not apply automatic geo-restriction by country without explicit configuration. The most dangerous distractor is A, as it could lead the operator to look for a firmware update instead of investigating and fixing the rule exclusion.


Answer Key β€” Scenario 4​

Answer: B

The correct diagnostic sequence for HTTP 502 on a specific listener follows the principle of progressive elimination from outermost to innermost, validating each layer before advancing.

The first step (2) confirms if the listener is actually linked to an active rule: without this confirmation, all subsequent steps may be investigating the correct configuration and ignoring a resource association problem. The second step (1) uses the consolidated backend health view, which is the most direct indicator for HTTP 502. The third step (3) deepens the diagnosis in logs to identify the specific probe failure pattern. The fourth step (4) validates if the port and protocol configuration in HTTP Settings is compatible with what the backend expects. The fifth step (5), directly accessing the backend instance, is the most operationally costly and should be reserved to confirm that the problem is in the backend and not in the gateway configuration chain.

Sequence A skips routing rule validation and starts with health status without context. Sequence C begins with logs before validating rule existence, inverting triage logic. Sequence D starts with the most costly step (direct backend access) before any triage on the gateway itself, which wastes time and may lead to incorrect conclusions if the backend is healthy and the problem is gateway configuration.


Troubleshooting Tree: Map requirements to features and capabilities of Azure Application Gateway​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color legend:

ColorNode type
Dark blueInitial symptom (entry point)
Medium blueDiagnostic question (binary or state decision)
OrangeValidation or intermediate verification
RedIdentified cause
GreenRecommended action or resolution

To use this tree when facing a real problem, start with the root node identifying the observed symptom: the gateway is returning an HTTP error, not responding, or blocking specific users. From there, follow the diagnostic questions responding based on what is verifiable in the portal or logs. Each branch progressively eliminates hypotheses until reaching an identified cause (in red), from which a concrete resolution action (in green) emerges. Orange nodes indicate where to collect more data before proceeding.