Skip to main content

Troubleshooting Lab: Configure Transport Layer Security (TLS)

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team receives calls from users reporting that the corporate website, published via Azure Application Gateway (SKU v2), shows certificate errors in the browser only for a subset of users. Initial investigation reveals that affected users are in branch offices that use older clients. Users in headquarters offices with updated browsers do not report any issues.

The responsible engineer checks the portal and confirms that the SSL certificate installed on the HTTPS listener is valid, issued by a recognized public CA, and within its validity period. The certificate uses a 2048-bit RSA key. The team also confirms that DNS is resolving correctly to the gateway's public IP.

The SSL policy applied to the Application Gateway is configured as follows:

{
"sslPolicy": {
"policyType": "Custom",
"minProtocolVersion": "TLSv1_2",
"cipherSuites": [
"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256"
]
}
}

What is the root cause of the error observed in branch office clients?

A) The 2048-bit RSA certificate is not supported by the configured ECDHE cipher suites, causing handshake failures.

B) The custom SSL policy sets TLS 1.2 as the minimum version and cipher suites exclusively with ECDHE, excluding older clients that only support TLS 1.0 or 1.1 with legacy RSA cipher suites.

C) Application Gateway SKU v2 does not support custom SSL policies; a predefined policy must be used for the handshake to complete correctly.

D) The certificate installed on the listener does not include the Subject Alternative Name (SAN) required for branch office clients, causing validation to fail in those environments.


Scenario 2 β€” Diagnostic Sequence​

An engineer receives the following alert from an application published via Azure Front Door Premium:

Error: SSL handshake failed for origin 'api-backend-eastus.azurewebsites.net'
Code: OriginTlsError
Details: The origin presented a certificate that could not be validated.
Timestamp: 2025-10-14T03:22:11Z

Front Door is configured with end-to-end TLS. The backend is an Azure App Service with a certificate managed by the platform itself. The custom domain in Front Door was configured six months ago without issues. There have been no Front Door configuration changes in the last 30 days.

The available investigation steps are:

  1. Verify if the App Service certificate was renewed correctly and if the Common Name matches the origin hostname configured in Front Door.
  2. Confirm that the origin certificate name verification option is enabled in the origin group configuration.
  3. Check the renewal history of the App Service managed certificate in the Azure portal.
  4. Test HTTPS connectivity directly to the App Service endpoint from an external client to confirm the certificate is presented correctly.
  5. Review App Service access logs to identify if Front Door requests are reaching the backend.

What is the correct investigation sequence for this symptom?

A) 5 β†’ 2 β†’ 1 β†’ 3 β†’ 4

B) 2 β†’ 5 β†’ 3 β†’ 1 β†’ 4

C) 4 β†’ 1 β†’ 3 β†’ 2 β†’ 5

D) 1 β†’ 3 β†’ 4 β†’ 2 β†’ 5


Scenario 3 β€” Root Cause​

An organization exposes internal APIs via Azure API Management (APIM) integrated with a virtual network in external mode. The security team enabled mutual TLS authentication (mTLS) and configured the policy below to validate the presented client certificate:

<inbound>
<validate-client-certificate
validate-revocation="true"
validate-trust="true"
validate-not-before="true"
validate-not-after="true"
certificate-ids="valid-client-cert" />
</inbound>

After deployment, external partners report that their requests are being rejected with HTTP 403, even when presenting valid certificates issued by the corporate CA. The team confirms that:

  • The Negotiate client certificate option is enabled in APIM.
  • Partner certificates were added to the APIM portal as trusted client certificates.
  • APIM is running on Premium SKU and virtual network integration is active and healthy.
  • The virtual network firewall allows inbound traffic on port 443.

What is the root cause of the rejections?

A) APIM Premium SKU does not support real-time revocation validation via validate-revocation="true" in external virtual network mode, causing policy failure.

B) Partner certificates were added to APIM as client certificates, but the policy references a specific certificate-ids that does not match the identifier used in the portal when registering these certificates.

C) Revocation validation (validate-revocation="true") requires APIM to access CRL or OCSP endpoints of the corporate CA, and the virtual network configuration is blocking this outbound traffic.

D) Virtual network integration in external mode prevents APIM from processing validate-client-certificate policies; this feature is exclusive to internal mode.


Scenario 4 β€” Action Decision​

The cause has been identified: an Application Gateway in production is experiencing handshake failures because the SSL certificate on the listener expired two hours ago. The certificate was self-managed and no renewal alert was configured. The affected environment processes financial transactions and the current impact is total for end users.

The team has the following information available:

  • A new valid certificate, issued by the corporate CA, has already been generated and is available in PFX format with known password.
  • The normal certificate update process in Application Gateway via portal takes approximately 3 minutes and does not require gateway restart.
  • The team also considers migrating certificate management to Azure Key Vault with automatic renewal to prevent recurrence.
  • There is a scheduled maintenance window next Saturday at 2 AM.

What is the correct action to take at this moment?

A) Wait for next Saturday's maintenance window to replace the certificate and simultaneously configure Key Vault integration for automatic renewal, performing both changes together.

B) Immediately replace the expired certificate on the Application Gateway listener with the available new PFX certificate, restoring the service, and plan Key Vault migration for later.

C) First configure Key Vault integration and point the listener to the new vault-managed certificate, as this approach solves the immediate problem and prevents recurrence in a single action.

D) Temporarily redirect traffic to HTTP while Key Vault integration is configured, ensuring service continuity during the definitive fix.


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue is in the combination of two elements: the policy defines TLS 1.2 as the minimum version and restricts cipher suites exclusively to ECDHE algorithms. Older clients, typically found in corporate branch environments with less frequent updates, often only support TLS 1.0 or 1.1, in addition to depending on legacy RSA cipher suites like TLS_RSA_WITH_AES_256_CBC_SHA. Since none of these combinations are accepted by the configured policy, the handshake fails before the certificate is even presented.

The information about DNS resolving correctly and the certificate being within validity is deliberately irrelevant in this scenario. The problem is not the certificate itself, but the protocol negotiation.

Alternative A represents a common technical misconception: RSA certificates are completely compatible with ECDHE cipher suites, as RSA is used for authentication while ECDHE is the key exchange mechanism. Alternative C is false: SKU v2 fully supports custom policies. Alternative D describes a real problem in other contexts, but there is no indication in the statement that the SAN is missing or incorrect.

The most dangerous distractor is D, as it could lead the engineer to replace a valid certificate without solving the real problem, wasting time in production.


Answer Key β€” Scenario 2​

Answer: C

The correct sequence is 4 β†’ 1 β†’ 3 β†’ 2 β†’ 5, following the diagnostic logic of validating what is closest to the failure before diving deeper.

The error indicates that Front Door cannot validate the origin certificate. The first step is to confirm directly (step 4) if the App Service is presenting a valid and externally accessible certificate, isolating whether the problem is in the certificate itself or in the Front Door configuration. Next, verify (step 1) if the certificate's Common Name matches the hostname configured as origin, as a mismatch here would cause exactly this error. Step 3 then deepens the investigation into the renewal history, since the certificate is platform-managed and may have been renewed with a new Common Name. Step 2 checks the validation configuration in the Front Door origin group. Step 5 checks App Service logs and is last, as it's only necessary if previous steps don't reveal the cause.

Alternative A starts by checking access logs, which is not useful before confirming if the certificate is valid. Alternative B starts with Front Door configuration, ignoring that the error is explicitly related to the origin certificate. Alternative D is correct: it starts at the declared failure point and progresses from most likely to least likely cause.

The detail that there were no Front Door changes in the last 30 days is the irrelevant information in the scenario, inserted to induce the reader to dismiss Front Door as the source of the problem. The most likely cause is automatic App Service certificate renewal with a Common Name different from the configured hostname.


Answer Key β€” Scenario 3​

Answer: C

The root cause is in the APIM's outbound network path. Revocation validation (validate-revocation="true") requires APIM to actively query CRL (Certificate Revocation List) or OCSP (Online Certificate Status Protocol) endpoints of the corporate CA at validation time. In environments with virtual network integration, this outbound traffic to CA endpoints needs to be explicitly allowed. If the network security group or route table blocks this traffic, revocation validation fails, and the policy rejects the certificate even though it is technically valid.

The statement mentions that the firewall allows inbound traffic on port 443, but says nothing about outbound traffic to CRL/OCSP endpoints. This is the relevant detail that distinguishes the real cause from the alternatives.

Alternative B is a plausible distractor, but the statement confirms that certificates were added to APIM. Alternative A is false: Premium SKU supports revocation validation regardless of virtual network mode. Alternative D is technically incorrect: the validate-client-certificate policy works in both modes.

The most dangerous distractor is B, as it would lead the engineer to reconfigure certificate-ids repeatedly without solving the real problem, keeping the service degraded longer.


Answer Key β€” Scenario 4​

Answer: B

The context clearly defines the critical constraint: the impact is total and immediate in a financial production environment. The correct action is the one that solves the problem in the shortest time possible without introducing additional risks. Replacing the PFX certificate on the Application Gateway listener is approximately a 3-minute operation, without restart, with the new certificate already available. This restores the service immediately.

Key Vault migration is a relevant improvement, but it's an architectural change that should not be performed under active incident pressure. Mixing emergency correction with structural change increases error risk and prolongs impact.

Alternative A ignores urgency: waiting for Saturday's maintenance window with total production impact is unacceptable. Alternative C seems efficient, but configuring Key Vault integration during an active incident is a more complex operation prone to errors, potentially prolonging downtime. Alternative D is the most dangerous: redirecting financial transactions to HTTP, even temporarily, violates security and compliance requirements and is likely blocked by organizational policies.


Troubleshooting Tree: Configure Transport Layer Security (TLS)​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question
OrangeIntermediate verification or validation
RedIdentified cause
GreenRecommended action or resolution

To use this tree when facing a real problem, start at the root node with the observed symptom and objectively answer each diagnostic question. The "Yes" and "No" branches progressively lead to the cause, eliminating hypotheses at each level. When an orange intermediate verification node is reached, collect the indicated data before proceeding. Upon reaching a red identified cause node, the corresponding green recommended action indicates the precise correction for that diagnostic path.