Skip to main content

Troubleshooting Lab: Configure HTTP Settings

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A production Application Gateway routes traffic to a backend hosted on Azure App Service. The application was working normally until the security team renewed the App Service TLS certificate, replacing a certificate issued by an internal private CA with a certificate issued by DigiCert. After the renewal, the gateway started returning HTTP 502 Bad Gateway for all users.

The team checks the Application Gateway logs and finds the following entry:

BackendHttpStatusCode: 0
BackendResponseLatency: 0ms
Reason: BackendConnectionError
BackendServer: myapp.azurewebsites.net:443

During the investigation, the team also observes that:

  • The Application Gateway SKU is Standard V2
  • The Request timeout in the HTTP Setting is configured for 20 seconds
  • The HTTP Setting uses HTTPS protocol on port 443
  • The Use well-known CA certificate option is disabled
  • An authentication certificate from the previous private CA is still loaded in the HTTP Setting
  • The App Service responds normally when accessed directly through the browser

What is the root cause of the observed 502 error?

A) The 20 second timeout is insufficient to establish TLS connections with the App Service after certificate renewal

B) The Application Gateway is trying to validate the backend certificate using the previous private CA certificate, which doesn't match the new certificate issued by DigiCert

C) The Standard V2 SKU doesn't support HTTPS communication with Azure App Service backends

D) Certificate renewal on the App Service changed the listening port from 443 to a non-standard port, causing connection failure


Scenario 2 β€” Action Decision​

The operations team identified the cause of a production incident: the Override backend path field in an HTTP Setting is configured with the value /legacy/, overriding the correct path that the new API version expects to receive. The application returns HTTP 404 for all requests because the /legacy/ endpoint was removed in the latest deployment.

The incident context is as follows:

  • The Application Gateway is in production and processes approximately 800 requests per minute
  • There's no scheduled maintenance window for the next 4 hours
  • Changing the HTTP Setting causes a brief gateway re-provisioning period, estimated between 1 and 3 minutes
  • The team has write permission on the Application Gateway resource
  • An identical staging environment is available but isn't receiving real traffic at the moment
  • The contracted availability SLA is 99.9%

What is the correct action to take at this moment?

A) Apply the Override backend path correction immediately in the production environment, as the 1 to 3 minute re-provisioning represents less impact than keeping the HTTP 404 active

B) Fix first in the staging environment, validate the behavior, and wait for the scheduled maintenance window to apply to production

C) Create a new HTTP Setting with the correct path and associate it to the routing rules without removing the current HTTP Setting, eliminating re-provisioning

D) Revert the API deployment to restore the /legacy/ endpoint while the definitive fix is planned with a maintenance window


Scenario 3 β€” Root Cause​

A team migrated a set of microservices to separate backends in the Application Gateway. Each microservice has its own backend pool and its own HTTP Setting. After migration, the authentication service starts showing intermittent failures, while other services operate normally.

The Application Gateway logs for the authentication service show:

BackendHttpStatusCode: 200
BackendResponseLatency: 4200ms
Reason: Timeout
BackendServer: auth-service.internal:8443

The team collects the following additional information:

  • The authentication service HTTP Setting has Request timeout configured for 30 seconds
  • The authentication service performs queries to an external LDAP directory that, under load, can take up to 45 seconds to respond
  • Cookie-based affinity is enabled in the authentication service HTTP Setting
  • Other HTTP Settings have timeout configured for 60 seconds
  • The Health probe associated with the HTTP Setting uses HTTP protocol, while the backend listens on HTTPS

The team concludes that Cookie-based affinity is causing overload on specific instances in the pool. Is this hypothesis correct?

A) Yes, Cookie-based affinity concentrates requests on specific instances, causing elevated latency and intermittent timeouts

B) No. The root cause is that the Request timeout of 30 seconds is lower than the backend's maximum response time under load, causing premature request termination

C) No. The root cause is that the Health probe uses HTTP while the backend listens on HTTPS, causing healthy instances to be marked as unavailable

D) Yes. Cookie-based affinity combined with HTTPS on the backend prevents correct balancing between pool instances


Scenario 4 β€” Diagnostic Sequence​

An engineer receives an alert indicating that the Application Gateway is returning HTTP 502 for requests destined to a specific backend. The environment consists of an Application Gateway with HTTP Setting configured for HTTPS on port 443, an associated Custom probe, and a backend pool with three instances.

The available investigation steps are:

  1. Check if backend pool instances are marked as Unhealthy in the backend health view in the Azure portal
  2. Access one of the backend instances directly on port 443 to confirm the service is responding
  3. Verify the protocol and path configured in the Custom probe to ensure they match what the backend expects to receive
  4. Verify if the backend TLS certificate is valid and if the Application Gateway is configured to trust it
  5. Review Application Gateway diagnostic logs to identify the specific error code returned when attempting to connect to the backend

What is the correct investigation sequence?

A) 2, 1, 5, 3, 4

B) 1, 5, 2, 3, 4

C) 5, 1, 3, 4, 2

D) 1, 2, 5, 4, 3


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

Explanation:

  • The decisive clue is in the combination of two facts: the HTTP Setting has the Use well-known CA certificate option disabled and still loads the certificate from the previous private CA. This means the Application Gateway continues trying to validate the backend certificate against that private CA, but the new App Service certificate was issued by DigiCert. Validation fails before any HTTP data is exchanged, resulting in BackendConnectionError with zero latency, exactly what the logs show.
  • The information about the 20 second timeout is purposely irrelevant in this scenario. When failure occurs at the TLS layer before connection establishment, the request timeout is never reached, which is confirmed by BackendResponseLatency: 0ms.
  • Alternative C is technically incorrect: Standard V2 SKU fully supports HTTPS with App Service.
  • Acting based on alternative A (increasing timeout) would be the most dangerous distractor, as it would lead the team to modify irrelevant configurations while the 502 persists.

Answer Key β€” Scenario 2​

Answer: A

Explanation:

  • The critical constraint in the scenario is that the application is returning HTTP 404 for all requests at this exact moment, in production, with 800 requests per minute. Maintaining the current state means continuous and measurable impact on the SLA. The 1 to 3 minute re-provisioning is a finite and controlled impact, while the 404 is an indefinite and growing impact.
  • Alternative B ignores the most important constraint: waiting 4 hours with active 404 is incompatible with 99.9% SLA.
  • Alternative C represents a relevant technical misconception: it's not possible to associate two different HTTP Settings to the same rules simultaneously in an atomic way; the transition would still cause re-provisioning.
  • Alternative D is the most dangerous, as it proposes reverting the application as a solution to a gateway configuration problem, creating technical debt and ignoring the real cause.

Answer Key β€” Scenario 3​

Answer: B

Explanation:

  • The log shows BackendResponseLatency: 4200ms with Timeout status, and the HTTP Setting timeout entry is at 30 seconds. The scenario explicitly states that the backend can take up to 45 seconds to respond under load. The cause is direct: the gateway terminates the connection before the backend completes processing.
  • The Cookie-based affinity hypothesis is the main distractor and represents a classic reasoning error: focusing on a visible configuration that's different from other services, ignoring the quantitative correlation between configured timeout and backend response time.
  • Alternative C is plausible in another context, but the scenario indicates that failures are intermittent and correlated with load, not with constant instance unavailability, which rules out the incorrect probe hypothesis as the root cause in this scenario.
  • The information about other HTTP Settings having a 60 second timeout is relevant by contrast, not irrelevant: it confirms that the 30 second timeout is atypically low for this environment.

Answer Key β€” Scenario 4​

Answer: B

Explanation:

  • The correct sequence follows progressive diagnostic logic: from most comprehensive to most specific, and from visible symptom to underlying cause.
  • Step 1 (check health in portal) immediately answers the most important question: does the gateway know the backend is inaccessible? If all instances are marked as Unhealthy, the investigation follows the probe and connectivity path. If they're Healthy, the problem is routing configuration.
  • Step 5 (diagnostic logs) refines the diagnosis with the specific error code, which distinguishes TLS failure, timeout, connection refusal, or backend HTTP error.
  • Step 2 (direct backend access) validates whether the problem is with the gateway or the backend itself.
  • Step 3 (verify probe) and step 4 (verify TLS certificate) are specific investigations that only make sense after confirming the problem is in the connectivity layer between gateway and backend.
  • Alternative A, by starting with direct backend access, consumes time on infrastructure verification before even confirming how the gateway is interpreting the pool state.

Troubleshooting Tree: Configure HTTP Settings​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (root)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or verification

To use this tree when facing a real problem, start with the root node identifying the observed symptom (502 or 404). Follow the first diagnostic question and answer with the observable state in the environment, without presuming the cause. At each branch, you eliminate a class of hypotheses and narrow the diagnostic space. When you reach a red node, the cause is identified; the immediately following green node indicates the corrective action. Orange nodes signal that an intermediate verification is necessary before proceeding further, avoiding actions based on incomplete diagnosis.