Troubleshooting Lab: Identify appropriate use cases for Azure Front Door
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
An operations team reports that users in the Asia-Pacific region are experiencing high latency when accessing a global web application. The environment uses Azure Front Door with two configured backends: one in East US and another in West Europe.
The health probes are returning status 200 for both backends. Front Door logs confirm that traffic originating from Tokyo is consistently being routed to West Europe. The team reports they performed a TLS certificate update on the East US backend three days ago, and the SKU being used is Standard.
The relevant routing configuration is as follows:
{
"routingRules": [
{
"name": "global-route",
"routeConfiguration": {
"routeType": "Forward",
"backendPool": "primary-pool"
}
}
],
"loadBalancingSettings": {
"sampleSize": 4,
"successfulSamplesRequired": 2,
"additionalLatencyMilliseconds": 0
},
"backends": [
{ "address": "app-eastus.azurewebsites.net", "weight": 50, "priority": 1 },
{ "address": "app-westeurope.azurewebsites.net", "weight": 50, "priority": 1 }
]
}
What is the root cause of the inefficient routing for Asia-Pacific users?
A) The TLS certificate update in East US corrupted the health probe for that backend, causing Front Door to treat it as degraded
B) The additionalLatencyMilliseconds field is configured as zero, eliminating the latency tolerance that would allow Front Door to consider the geographically closer backend
C) Both backends have the same weight and priority, causing Front Door to distribute traffic via round-robin without considering latency
D) The Standard SKU does not support latency-based routing; this feature is only available in the Premium SKU
Scenario 2 β Action Decisionβ
The security team identified that the WAF policy associated with the Azure Front Door profile is operating in Detection mode. It was found that requests with SQL injection patterns are reaching the production backend without being blocked.
The cause is confirmed: Detection mode only logs suspicious requests in the logs without blocking traffic. The technical solution is to change the mode to Prevention.
The operational context is as follows:
- The application processes real-time financial transactions
- There is a scheduled maintenance window for that week, in 48 hours
- The development team reported that some managed rules from the Microsoft_DefaultRuleSet_2.1 ruleset are generating false positives in critical application functionalities, blocking legitimate operations in previous staging environment tests
- The security team has permission to change the WAF policy immediately
What is the correct action to take at this moment?
A) Immediately change the WAF policy mode to Prevention, as application security supersedes any risk of false positives in production
B) Wait for the maintenance window, review and adjust the rules that generate false positives before activating Prevention mode
C) Create specific rule exclusions for the functionalities that generate false positives and then activate Prevention mode still outside the maintenance window
D) Replace the Microsoft_DefaultRuleSet_2.1 ruleset with custom rules before changing the mode to Prevention
Scenario 3 β Root Causeβ
An architect receives a call reporting that Azure Front Door is returning 503 Service Unavailable error for all users of an e-commerce application. The environment has three backends configured in different regions.
The Front Door logs show the following output:
2026-03-15T10:42:11Z | OriginHealthStatus: Unhealthy | Backend: app-eastus | ProbeStatus: 404
2026-03-15T10:42:11Z | OriginHealthStatus: Unhealthy | Backend: app-westeurope | ProbeStatus: 404
2026-03-15T10:42:11Z | OriginHealthStatus: Unhealthy | Backend: app-brazilsouth | ProbeStatus: 404
The architect verifies that all three backends respond normally when accessed directly through the browser using their FQDN addresses. The DevOps team mentions they performed a deployment the previous afternoon that included a refactoring of the application routes. The Azure subscription is within quota limits and there are no regional availability alerts on the Azure Service Health dashboard.
What is the root cause of the 503 error returned by Front Door?
A) The previous afternoon's deployment introduced a failure in the backends that prevents them from responding to traffic routed by Front Door, while direct responses still work due to browser cache
B) The path configured for Front Door health probes no longer exists after the application route refactoring, causing all backends to be marked as unhealthy
C) All three backends are simultaneously overloaded, and Front Door interprets response timeouts as 404 status in the health probes
D) The subscription reached the Front Door simultaneous connections limit, and the 503 error is generated by the service itself before reaching the backends
Scenario 4 β Diagnostic Sequenceβ
An engineer is alerted that users are receiving outdated content from a web application served by Azure Front Door with caching enabled. The application was updated with new product prices, but some users still see the old values. Front Door is configured with Premium SKU and active cache rules on product routes.
The available investigation steps are:
- Verify if the cache purge operation was executed after the price update deployment
- Confirm if the backends are returning updated values when accessed directly
- Verify the Cache-Control and max-age values configured in Front Door routing rules
- Confirm if the Premium SKU is active and if the Front Door profile is associated with the correct endpoint
- Analyze the response headers received by the user to identify if the response comes from cache (header X-Cache: HIT)
What is the correct investigation sequence for this symptom?
A) 4 β 1 β 2 β 5 β 3
B) 2 β 5 β 3 β 1 β 4
C) 5 β 2 β 3 β 1 β 4
D) 4 β 5 β 2 β 3 β 1
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The key clue is the value "additionalLatencyMilliseconds": 0. Azure Front Door uses this parameter to define a latency tolerance window within which multiple backends are considered equally close. With the value set to zero, any latency difference, even minimal, between backends results in strict selection of the one with lowest measured latency from the PoP that served the request. The issue is that this PoP may not be the closest to the end user, and without configured tolerance, geographically more suitable backends may be bypassed by insignificant margins.
The information about the TLS certificate update is irrelevant: health probes return 200 for both backends, ruling out any degradation hypothesis. It was intentionally included to divert diagnosis toward A.
Distractor C represents a common misconception: the same weight and priority enable load balancing between eligible backends, but the eligibility criteria has already been determined by latency-based routing before the load balancing stage.
Distractor D is factually incorrect: latency-based routing is available in the Standard SKU.
Acting on A would lead the engineer to revoke and reissue the TLS certificate unnecessarily, wasting time without solving the actual problem.
Answer Key β Scenario 2β
Answer: B
The critical constraint in the scenario is the proven existence of false positives in critical functionalities, documented in previous staging tests. Activating Prevention mode immediately, as proposed by A, would block legitimate financial operations in production, causing direct business impact.
The correct action is to use the 48 hours available until the maintenance window to review and adjust the rules that generate false positives before activating effective blocking.
Distractor C is technically valid as an approach, but creates operational risk: creating rule exclusions in production outside a maintenance window, without complete validation, could both leave security gaps and introduce new unforeseen false positives.
Distractor D represents a disproportionate action to the problem: replacing the managed ruleset with completely custom rules requires significant effort, eliminates automatic coverage of emerging vulnerabilities provided by Microsoft, and is not justified merely by the existence of some specific false positives.
The correct reasoning here is: the cause is identified and the solution is known, but the documented false positive constraint requires that activation be done in a controlled manner within the planned window.
Answer Key β Scenario 3β
Answer: B
The definitive evidence is in the logs: all backends return 404 for health probes, but respond normally when accessed directly through the browser. This specific pattern indicates that the problem is not in the backends themselves, but in the path that Front Door is querying to verify backend health.
After the route refactoring in the previous afternoon's deployment, the endpoint configured in the health probes (for example, /health or /status) was likely moved or removed. Front Door interprets the 404 as an unhealthy backend and, with all backends marked as Unhealthy, returns 503 to users.
The information about the subscription being within quota limits and Service Health without alerts is irrelevant and was included to divert diagnosis toward D.
Distractor A is the most dangerous: the browser cache hypothesis explaining direct responses seems superficially plausible, but is refuted by the fact that the architect accessed the backends directly via FQDN, which normally doesn't use the same cache as a request via Front Door.
Acting on A would lead the team to exhaustively investigate the backends without finding a failure, wasting time while the real problem, the health probe path, remains active.
Answer Key β Scenario 4β
Answer: B
The correct sequence is: 2 β 5 β 3 β 1 β 4.
Progressive diagnostic reasoning requires starting from the point closest to the symptom and eliminating hypotheses before escalating the investigation:
| Step | Action | Justification |
|---|---|---|
| 2 | Confirm if backend returns updated values | Verify if the problem exists at the source before investigating cache |
| 5 | Analyze X-Cache header in user response | Confirm if content comes from Front Door cache or origin |
| 3 | Verify cache rules and max-age | Identify if configured TTL is keeping old content |
| 1 | Verify if purge was executed | Confirm whether cache invalidation was performed after deployment |
| 4 | Confirm SKU and profile association | Verify infrastructure only if previous steps don't identify the cause |
Distractor A starts by verifying the SKU and profile, which are stable infrastructure elements and unlikely as the cause of a problem that emerged after a specific deployment.
Distractor D also starts with the SKU, and despite checking the X-Cache header second, delays backend verification to the third step, investigating configuration before confirming if the origin is correct.
Starting with the backend (step 2) is the correct principle: if the origin still returns the old value, the problem is not in the cache and all TTL and purge investigation would be wasted time.
Troubleshooting Tree: Identify appropriate use cases for Azure Front Doorβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark Blue | Initial symptom (investigation entry point) |
| Medium Blue | Diagnostic question (binary decision or observable) |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Intermediate validation or verification |
To use this tree when facing a real problem, start with the root node by identifying the type of symptom observed: unhealthy backends, inefficient routing, outdated content, or malicious traffic reaching backends. Follow the diagnostic questions by answering yes or no based on what you can verify directly in the Azure portal, Front Door logs, or HTTP response headers. Each path leads to a named cause and a specific action, preventing plausible but incorrect hypotheses from consuming investigation time.