Skip to main content

Troubleshooting Lab: Configure caching

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A global web application is distributed via Azure Front Door with caching enabled. The operations team opens a ticket reporting that users in Europe are receiving responses with latency equivalent to the origin, as if the cache is not being used. Users in North America do not report the same issue.

The responsible engineer collects the following information:

GET /products/list HTTP/1.1
Host: app.contoso.com
Accept-Encoding: gzip
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...

Response observed in Front Door logs for European requests:

X-Cache: MISS
X-Azure-Ref: 0ABC123...
Cache-Control: no-cache

The Front Door profile was created two days ago. The team confirms that there has been no configuration change in the last 24 hours. The endpoint's TLS certificate is valid and the origin responds with HTTP 200 in all cases.

What is the root cause of the observed behavior?

A) The newly configured TLS certificate has not yet been propagated to European POPs, causing fallback to the origin
B) The origin response includes the Cache-Control: no-cache header, preventing storage in Front Door POPs
C) Front Door is still in cache warm-up period for European POPs, and the behavior will normalize automatically
D) The Authorization header in the request causes Front Door to classify the request as non-cacheable by default


Scenario 2 β€” Action Decision​

The cause of an incident has been identified: Azure CDN is delivering an outdated version of a critical JavaScript file (/static/app.v2.js) after an emergency update to the payment system. The correct file is already available at the origin. The environment is production with high volume of ongoing transactions.

The team has the following permissions and restrictions:

RestrictionDetail
Maintenance windowNot available for the next 4 hours
CDN purge permissionAvailable to the responsible engineer
Origin accessAvailable
CDN profile configuration changeRequires change management approval

What is the correct action to take at this time?

A) Change the Cache-Control: max-age at the origin to zero and wait for natural expiration of existing cache in POPs
B) Execute a targeted purge of the /static/app.v2.js path in the CDN profile to force immediate revalidation
C) Disable caching in the CDN profile for the affected endpoint until the maintenance window is available
D) Redirect traffic directly to the origin by changing CDN routing rules


Scenario 3 β€” Root Cause​

A platform team configures Azure Front Door to serve a financial reports API. Caching is enabled with a cacheDuration of 6 hours. After a week in production, the security team identifies that reports from different clients are being delivered cross-referenced in approximately 3% of requests.

Excerpt of the applied configuration:

{
"cacheConfiguration": {
"queryStringCachingBehavior": "IgnoreQueryString",
"dynamicCompression": "Disabled",
"cacheDuration": "0.06:00:00"
}
}

Example of requests that generate the problem:

GET /api/reports?year=2024 HTTP/1.1
X-Customer-ID: 1001
Authorization: Bearer token_cliente_1001

GET /api/reports?year=2024 HTTP/1.1
X-Customer-ID: 1002
Authorization: Bearer token_cliente_1002

The team reports that the API was migrated from an on-premises environment where there was no cache layer. TLS is configured correctly and the origin returns HTTP 200 with Content-Type: application/json for all requests.

What is the root cause of data leakage between clients?

A) The cacheDuration of 6 hours is excessive for financial data, causing expired tokens to continue being accepted by the cache
B) Disabling dynamicCompression forces Front Door to treat all responses as static, ignoring authentication headers
C) The IgnoreQueryString behavior combined with the absence of X-Customer-ID and Authorization headers in the cache key causes requests from different clients to share the same cache entry
D) The API returns Content-Type: application/json, and Front Door interprets this content type as dynamic, causing instability in the cache key


Scenario 4 β€” Diagnostic Sequence​

An engineer receives the following report: users of an application distributed via Azure CDN Standard from Microsoft are receiving outdated HTML content after a redeploy. The CDN is configured with a TTL of 1 hour. The redeploy occurred 40 minutes ago.

The available investigation steps are:

P1 β€” Verify if the origin is already serving updated content by accessing it directly by IP
P2 β€” Execute a purge of the affected content in the CDN profile
P3 β€” Verify the Cache-Control and Expires headers returned by the origin in the current response
P4 β€” Confirm if the TTL configured in the CDN profile is in Override or Honor origin mode
P5 β€” Analyze CDN logs to verify if requests are resulting in HIT or MISS

What is the correct investigation sequence?

A) P5, P1, P3, P4, P2
B) P1, P5, P4, P3, P2
C) P3, P4, P1, P5, P2
D) P2, P1, P3, P4, P5


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The definitive clue is in the logs collected by the engineer: the response includes Cache-Control: no-cache and the cache status is MISS. Azure Front Door respects the no-cache directive sent by the origin when the profile is configured in Honor origin mode. This prevents the response from being stored in POPs, causing each request to be forwarded to the origin regardless of region.

The irrelevant information in the scenario is the TLS certificate, confirmed as valid, and the profile creation time. Both are unrelated to the described cache behavior.

Alternative A is technically implausible as a cause: TLS certificate propagation is independent of cache behavior and would not cause systematic MISS. Alternative C is the most dangerous distractor: the concept of "cache warm-up" exists, but does not explain why the Cache-Control: no-cache header is present in the response. Alternative D is partially plausible, as Front Door may not cache requests with Authorization, but the header present in the origin response is the direct cause confirmed by the logs.

Acting based on alternative C would be the most costly mistake: the engineer would wait for a normalization that would never occur, as the cause is at the origin, not in the POP warm-up time.


Answer Key β€” Scenario 2​

Answer: B

Targeted purge of the specific path is the only action that simultaneously meets all scenario restrictions: it does not require a maintenance window, does not require change management approval, does not affect the rest of production traffic, and produces immediate effect.

Alternative A fails because changing max-age at the origin does not remove content already stored in POPs. The outdated file would continue to be delivered until natural expiration, which could take until the complete TTL. Alternative C requires profile configuration change, which is explicitly blocked by change management. Alternative D also requires routing configuration change, subject to the same restriction. Both are valid actions in other contexts, but inapplicable given the set of restrictions presented.


Answer Key β€” Scenario 3​

Answer: C

The root cause is the combination of two factors: the IgnoreQueryString behavior eliminates the query string from the cache key, and the X-Customer-ID and Authorization headers are not included in the cache key by default. As a result, two requests from different clients to the same path with the same query string produce the same cache key, and Front Door delivers the first client's response to the second.

The irrelevant information in the scenario is the migration from the on-premises environment. This detail contextualizes the absence of prior cache experience, but is not a technical cause of the problem.

Alternative A confuses TTL with token security: cacheDuration controls how long an entry remains in cache, not whether tokens are revalidated. Alternative B is technically incorrect: dynamicCompression only affects payload compression, with no effect on cache key logic. Alternative D is a sophisticated distractor: Front Door does not determine cache strategy by the response's Content-Type, but by cache control headers and profile configuration.

The real impact of not correcting this in a financial environment goes beyond technical SLA: it represents a data privacy violation that may have regulatory implications.


Answer Key β€” Scenario 4​

Answer: B

The correct sequence follows progressive diagnostic logic: first confirm that the problem exists in the cache layer (not at the origin), understand the current cache behavior, understand why the cache is behaving this way, and only then act.

P1 confirms that the origin already has the correct content, isolating the problem to the CDN layer. P5 confirms that requests are resulting in HIT, meaning cached content is being delivered instead of origin content. P4 identifies whether the profile TTL is overriding or respecting origin headers, which determines if the behavior is expected or a misconfiguration. P3 verifies what the origin is now sending as cache instruction, information necessary to decide corrective action. P2 is the final corrective action, executed only after complete diagnosis.

Alternative D is the most common error under pressure: executing the purge immediately without confirming that the origin is already updated would result in revalidation that would still return outdated content if P1 had not been verified first. Alternative A changes the order of validation steps in a way that can lead to incorrect conclusions about where the cause lies.


Troubleshooting Tree: Configure caching​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or redirection

To use this tree when facing a real problem, start at the root node describing the observed symptom. At each question node, answer based on what you can verify directly: CDN logs, origin HTTP headers, profile configuration. Follow the path corresponding to your answer until you reach an identified cause node, then proceed to the associated recommended action. If the problem persists after the action, return to the nearest validation node and reassess the hypothesis.