Skip to main content

Troubleshooting Lab: Configure an Azure Front Door, including routing, origins, and endpoints

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A global application is served by Azure Front Door Standard. The endpoint is configured with a validated custom domain and a /* route pointing to an origin group with two Azure App Service instances, one in East US and another in West Europe. Session affinity is disabled.

Over the past two days, the support team receives sporadic reports from users in Europe who, after authenticating to the application, are redirected back to the login screen after a few clicks. Users in the United States do not report this issue. The team verifies that both origins are marked as Healthy in health probes. App Service logs show sessions being created and destroyed repeatedly for the same European users.

Additional information collected by the team:

  • The custom domain TLS certificate was renewed three weeks ago without incidents
  • Average probe latency for West Europe is 12ms, within expected range
  • Application authentication state is stored in memory within the App Service process, without distributed cache
  • The Front Door SKU was migrated from Classic to Standard 45 days ago

What is the root cause of the observed behavior?

A) The renewed TLS certificate was not properly propagated to the West Europe origin, causing intermittent TLS session renegotiation.

B) Front Door's latency-based routing directs requests from the same European user to different origins between requests, and session state is not shared between them.

C) The Classic to Standard SKU migration altered health probe behavior, which now validates response content and is causing silent failover.

D) The custom domain lost its binding with the endpoint during certificate renewal, and Front Door is routing some requests directly to the origin without processing session cookies.


Scenario 2 β€” Action Decision​

The operations team identified that a Rule Set configured in Azure Front Door is blocking legitimate requests from a business partner. The rule in question evaluates the User-Agent header and rejects values associated with crawlers, but the partner's automated system uses a User-Agent that matches the blocked pattern.

The cause is confirmed: the rule condition is too broad and captures legitimate traffic. The partner needs access restored within 30 minutes maximum. The environment is production. The Rule Set affects all profile routes. No staging environment is available for prior testing.

No other rule in the Rule Set depends on the problematic rule. The team has full access to the Front Door profile in the Azure portal.

What is the correct action to take at this moment?

A) Delete the complete Rule Set from the profile to immediately remove the block, and recreate the other rules after restoring partner access.

B) Edit the problematic rule condition to add an exception based on a specific header or IP from the partner, save and validate the behavior.

C) Disassociate the Rule Set from all profile routes, restore access and reschedule the rule correction for a planned maintenance window.

D) Create a new dedicated route for the partner's IP with higher priority, without an associated Rule Set, and keep the original route unchanged.


Scenario 3 β€” Root Cause​

An engineer configures a new Azure Front Door Premium profile for an internal API. The origin is an Azure API Management (APIM) deployed in internal mode (internal VNet). The engineer enables Private Link in the origin configuration within the origin group and awaits connection approval.

After approval and complete deployment, external tests via https://api-global.contoso.com return HTTP 200 correctly. However, health probes in the Front Door dashboard show the origin as Unhealthy persistently, even with the API responding normally.

Logs collected during investigation:

[Front Door Health Probe] GET /health
Host: api-global.contoso.com
Response: Connection refused (TCP)
Probe interval: 30s
Probe protocol: HTTPS
Probe path: /health

The engineer verifies that:

  • The /health endpoint exists in APIM and returns HTTP 200 when called manually via curl from within the VNet
  • The Private Endpoint was approved and has Succeeded status
  • The APIM TLS certificate is issued by an internal corporate CA
  • The public DNS record for api-global.contoso.com correctly points to the Front Door endpoint

What is the root cause of the Unhealthy status in probes?

A) The /health path is not publicly exposed in APIM; since probes originate from Front Door IPs and don't traverse Private Link, they attempt direct connection and are blocked.

B) The TLS certificate issued by an internal CA is not trusted by Front Door, which rejects the TLS connection during probing and logs the error as TCP connection refusal.

C) Azure Front Door Premium health probes do not support origins accessed via Private Link; they always attempt direct connection through the origin's public IP.

D) APIM's internal mode blocks any incoming connection that doesn't originate from within the VNet, and the approved Private Link doesn't cover health probe traffic.


Scenario 4 β€” Diagnostic Sequence​

An engineer receives the following production alert:

"Users report HTTP 404 on https://loja.contoso.com/checkout. All other application routes respond normally."

The Azure Front Door profile has the following structure:

Endpoint: loja.contoso.com
Configured routes:
- Route A: domain=loja.contoso.com, path=/* β†’ Origin Group: BackendGeral
- Route B: domain=loja.contoso.com, path=/checkout* β†’ Origin Group: BackendCheckout

Available investigation steps are:

  1. Verify if Origin Group BackendCheckout has at least one Healthy origin
  2. Confirm if Route B is enabled and associated with the correct endpoint
  3. Test https://loja.contoso.com/checkout directly via curl with correct Host header
  4. Check Front Door logs to see if requests to /checkout are being routed to Route A or Route B
  5. Access the BackendCheckout backend directly and confirm that the /checkout path exists and returns 200

What is the correct diagnostic sequence for this symptom?

A) 3 -> 1 -> 5 -> 2 -> 4

B) 4 -> 2 -> 1 -> 5 -> 3

C) 2 -> 4 -> 1 -> 5 -> 3

D) 1 -> 5 -> 2 -> 4 -> 3


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The central symptom is repeated session destruction for European users with state stored in memory within the process of the App Service. This is the complete diagnosis: without distributed cache and without session affinity, Front Door can send consecutive requests from the same user to different origins. Since each App Service instance maintains its own state in memory, the session created on the first instance doesn't exist on the second, forcing new authentication.

The decisive clue in the statement is the combination of two facts: in-memory state and disabled session affinity. American users don't report the problem possibly because, due to latency, they tend to be routed more consistently to the same origin, while European users are geographically closer to West Europe but can oscillate between the two origins.

The information about the renewed TLS certificate is intentionally irrelevant; the renewal occurred three weeks ago without reported incidents. Distractor A tries to exploit this detail.

Distractor C is technically incorrect: SKU migration changes available configurations, but doesn't change silent probe failover behavior in a way that would destroy sessions. Distractor D describes a domain unbinding scenario that would break access for all users, not just Europeans.

Acting based on distractor A would lead the team to investigate TLS for hours without finding the real problem, while users continue getting logged out.


Answer Key β€” Scenario 2​

Answer: B

The cause is identified and confirmed: the rule condition is too broad. The critical constraint is the 30-minute deadline in production. The correct action is minimal surgery: edit only the problematic rule to add a precise exception (by custom header, source IP, or condition combination), save and validate. This solves the problem without affecting other rules and without removing existing protections.

Distractor A is the most dangerous: deleting the complete Rule Set removes all other security rules from the production profile. Even temporarily, this exposes the application. The word "recreate" indicates a time-consuming operation prone to error.

Distractor C is also dangerous to a lesser degree: disassociating the Rule Set from all routes removes all protections, not just the problematic rule, and postpones the correction to an indefinite window.

Distractor D is technically invalid in this context: Front Door doesn't use client source IP as route selection criteria; routes are based on domain and path, not on requester IP.


Answer Key β€” Scenario 3​

Answer: A

The log is the diagnostic key: Connection refused (TCP). This indicates the probe can't even initiate TCP handshake with the origin, meaning the problem occurs before any TLS validation. If the problem were the certificate (distractor B), the log would show TLS error or failed handshake, not TCP refusal.

The real cause is APIM's internal mode architecture: it has no public interface. Azure Front Door health probes are triggered from Microsoft infrastructure and, even with Private Link configured for data traffic, health probes follow a different path. In this scenario, probes try to reach the origin through the configured hostname, which doesn't resolve to an accessible public IP, resulting in TCP refusal.

Distractor C is false as an absolute statement: Front Door Premium with Private Link supports probes via Private Link, but requires correct hostname configuration and probe path compatible with the private endpoint. Distractor D is partially true (internal APIM blocks external connections), but the technical explanation of "Private Link doesn't cover probes" is imprecise; the real problem is that probes still attempt direct connection when the hostname isn't resolving to the correct private endpoint.

The information about public DNS pointing to Front Door is irrelevant for probe diagnosis: public DNS serves client traffic, not probe traffic.


Answer Key β€” Scenario 4​

Answer: C

The correct sequence is: confirm if Route B is active (2), check logs to see which route requests are being routed to (4), check if the origin group has healthy origins (1), validate if the backend responds on the correct path (5), and finally confirm via external curl (3).

The progressive diagnostic reasoning starts from the control component (routing) to the data component (origin) and ends with external validation. Starting with curl (alternative A) or backend (alternative D) without first confirming routing is correct means investigating in the wrong place: the 404 can come from Front Door (non-existent or disabled route) or from the origin (path doesn't exist), and this distinction determines where to act.

Alternative B starts with logs (4), which would be reasonable, but jumping straight to logs before verifying if the route is even enabled (2) means you might be analyzing logs from a component the system isn't even using correctly. Sequence C ensures you validate the configuration layer before interpreting observed behavior.

Step 3 (external curl) is the final validation because it confirms end-to-end behavior after all internal hypotheses have been ruled out or confirmed.


Troubleshooting Tree: Configure an Azure Front Door, including routing, origins, and endpoints​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark blueInitial symptom (entry point)
Medium blueDiagnostic question (decision based on observation)
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or verification

To use this tree when facing a real problem, start with the root node by identifying whether the error is visible to the client or restricted to internal probes. Follow each branch by answering questions based on what you can observe directly: returned HTTP code, Front Door logs, directly accessed origin behavior, and health probe status. Each red node indicates where to focus the correction. Orange nodes indicate you still need to collect more data before acting. Never skip steps toward a cause before answering the previous node's question.