Troubleshooting Lab: Diagnose and Resolve Client-Side and Authentication Issues
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A development team reports that the application stopped communicating with Azure SQL Database after an environment migration. The application is hosted on a VM within the vnet-app VNet (10.10.0.0/16). The database was configured with a Private Endpoint in the snet-data subnet (10.10.2.0/24). The administrator confirms that the Private Endpoint status is Approved and that the NSG of the snet-data subnet allows traffic on port 1433 from any source within the VNet.
During investigation, the developer runs the following test from the VM:
# DNS resolution test
nslookup meu-sql.database.windows.net
Server: 168.63.129.16
Address: 168.63.129.16#53
Non-authoritative answer:
Name: meu-sql.database.windows.net
Address: 20.40.180.12
# TCP connectivity test
Test-NetConnection -ComputerName meu-sql.database.windows.net -Port 1433
TcpTestSucceeded : False
The administrator mentions that the VNet DNS server is configured as Default (Azure-provided) and that a Private DNS Zone called privatelink.database.windows.net was created in the same Resource Group. The SQL Server firewall allows only connections via Private Endpoint (public access disabled).
What is the root cause of the connectivity failure?
A) The NSG of the snet-data subnet is blocking port 1433, as inbound rules don't apply to traffic within the same VNet when the destination is a Private Endpoint.
B) The Private DNS Zone privatelink.database.windows.net is not linked to the vnet-app VNet, causing the resolution to return the public IP instead of the private IP of the endpoint.
C) The Azure default DNS server (168.63.129.16) doesn't support name resolution for Private Endpoints and must be replaced with a custom DNS.
D) The Azure SQL Database firewall is blocking the connection even with public access disabled, because the Virtual Network rule wasn't added for the snet-data subnet.
Scenario 2 β Action Decisionβ
The security team identified that a misconfigured Microsoft Entra Conditional Access policy is preventing all users in a critical group from accessing the Azure portal in production. The cause was confirmed: the policy applies location-based access control that blocks corporate IPs due to an error in the trusted IP list. The policy is in Enabled mode.
The environment has the following constraints:
- It's 2 PM on a Friday; the identity team finishes at 3 PM and there's no on-call support over the weekend
- The affected group includes platform administrators responsible for production monitoring
- A second emergency access account (break glass account) is available, with no Conditional Access applied
- Changing the trusted IP list requires approval from a second Entra ID administrator, who is available now
- Reverting the policy to Report-only mode would restore access immediately without removing the policy
What is the correct action to take at this moment?
A) Use the break glass account to access the portal, fix the trusted IP list in the policy, and wait for the second administrator's approval before reactivating the policy in Enabled mode.
B) Change the policy mode to Report-only immediately to restore access for the affected group, then fix the IP list with the second administrator still available.
C) Delete the problematic Conditional Access policy now and recreate a new correct policy next week, after complete review with the security team.
D) Wait until Monday to fix the policy together with the complete security team, using the break glass account to cover critical operations over the weekend.
Scenario 3 β Root Causeβ
A network administrator receives complaints that users from a branch office connected via Site-to-Site VPN can access servers in the Azure VNet normally, but cannot access a specific system hosted in a second VNet connected to the hub via VNet Peering. The topology is hub-and-spoke.
Branch (on-premises)
|
VPN S2S
|
VNet Hub (10.0.0.0/16) <---> VNet Spoke-A (10.1.0.0/16)
| [inaccessible system]
VNet Gateway
The administrator verifies and confirms:
- The peering between Hub and Spoke-A is Connected on both sides
- The Allow Forwarded Traffic option is enabled on the peering from the Hub side
- The NSG of the system subnet in Spoke-A allows the necessary port from any source
- The Local Network Gateway contains the 10.1.0.0/16 prefix in the address space
- The latency reported by users for accessible servers in the Hub is normal (12ms)
Virtual Network Gateway diagnostic logs show:
[INFO] Route learned via BGP: 10.0.0.0/16 (Hub VNet)
[INFO] Route learned via BGP: 192.168.1.0/24 (on-premises)
[WARN] No route advertised for prefix: 10.1.0.0/16
What is the root cause of the problem?
A) The NSG of the system subnet in Spoke-A is blocking traffic originated outside the VNet, as the VirtualNetwork service tag doesn't encompass networks connected via VPN.
B) The Use Remote Gateways option is not enabled on the peering from the Spoke-A side, preventing the Hub gateway from advertising the 10.1.0.0/16 prefix to the branch via VPN.
C) The Local Network Gateway already contains the 10.1.0.0/16 prefix, which creates a conflict with BGP-learned routing and causes traffic to be dropped.
D) The Allow Gateway Transit option is not enabled on the peering from the Hub side, preventing the gateway from forwarding traffic between the VPN and Spoke-A.
Scenario 4 β Diagnostic Sequenceβ
A user reports being unable to authenticate to an application published via Microsoft Entra Application Proxy. The reported symptom is: after entering credentials in the Microsoft Entra ID login portal, the browser displays a generic 500 Internal Server Error and the user is not redirected to the application.
The available investigation steps are:
- Step P: Check sign-in logs in Microsoft Entra ID to identify if authentication in Entra ID was completed successfully before the error
- Step Q: Check if the Application Proxy connector is active and has recent heartbeat in the portal
- Step R: Test access to the application's internal URL directly from the server where the connector is installed
- Step S: Check if the Application Proxy external URL is correctly mapped to the internal URL in the application settings
- Step T: Restart the connector service on the on-premises server
Which diagnostic sequence is correct?
A) T -> P -> Q -> S -> R
B) P -> Q -> S -> R -> T
C) Q -> P -> R -> S -> T
D) P -> S -> Q -> R -> T
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The nslookup output is the decisive clue: DNS returned the IP 20.40.180.12, which is an Azure SQL public IP. This confirms that DNS resolution is not going through the Private DNS Zone. For Azure DNS to resolve Private Endpoint names correctly, the privatelink.database.windows.net zone needs to be linked to the VNet from where the query originates. Creating the zone in the same Resource Group is irrelevant without the link.
The irrelevant information in this scenario is the NSG configuration of the snet-data subnet. Traffic doesn't even reach the Private Endpoint because the client is trying to connect to the public IP, which is blocked by the SQL firewall. The NSG has no role in this failure.
Alternative C is the most dangerous distractor: the Azure default DNS server (168.63.129.16) supports resolution via Private DNS Zones normally; the problem is not the DNS server, but the absence of the zone link to the VNet. Acting based on this alternative would lead to an unnecessary change in the VNet DNS without solving the real problem.
Answer Key β Scenario 2β
Answer: B
The set of constraints clearly defines the action window: there's less than an hour before the team finishes, the affected group includes production administrators, and the weekend without on-call makes any prolonged impact critical. Changing the policy to Report-only restores access immediately, preserves the existing policy (no risk of configuration loss), and still allows fixing the IP list with the second administrator available at the same time.
Alternative A makes a sequencing error: using the break glass account to fix the IPs and waiting for approval keeps the production group blocked during the entire approval process and through the weekend if something fails.
Alternative C is the most dangerous: deleting the policy removes a security layer without immediate replacement, and recreating next week leaves the environment without location control for days. Alternative D is technically the most irresponsible given the context, as it leaves production administrators without direct access for the entire weekend.
Answer Key β Scenario 3β
Answer: D
The Virtual Network Gateway log is the definitive clue: No route advertised for prefix: 10.1.0.0/16. The gateway is not advertising the Spoke-A prefix to the branch via VPN. The reason is that the Allow Gateway Transit option on the peering from the Hub side is not enabled. This option allows the Hub gateway to forward traffic between external connections (like the VPN) and peered VNets. Without it, the gateway ignores Spoke-A as a reachable destination.
The irrelevant information is the presence of the 10.1.0.0/16 prefix in the Local Network Gateway. This prefix indicates what the branch wants to reach, but doesn't guarantee that Azure advertises this route back. The problem is on the Azure side, not the on-premises side.
Alternative B describes a complementary condition: Use Remote Gateways on the spoke is necessary for the spoke to use the hub gateway, but the log confirms that the problem is in route advertisement by the gateway, which points to Allow Gateway Transit on the hub. Both options need to be enabled together, but given the log, the absence of Allow Gateway Transit on the hub is the confirmed root cause.
Answer Key β Scenario 4β
Answer: B
The correct sequence follows the logic of progressive diagnosis of the Application Proxy flow, from the outermost point to the innermost:
| Step | Action | Justification |
|---|---|---|
| P | Check sign-in logs in Entra ID | Determines if authentication was completed before the 500 error |
| Q | Check connector status | Confirms if the transport plane between cloud and on-premises is active |
| S | Check URL mapping | Confirms if the connector knows where to forward the request |
| R | Test internal URL from connector server | Confirms if the internal application is responding |
| T | Restart connector | Corrective action, never first step |
The 500 error occurs after authentication in Entra ID (the user sees the login screen), so the first step is to confirm in the log if the token was issued correctly (P). Restarting the connector (T) before any diagnosis (alternative A) is the most common troubleshooting error: applies a blind corrective action that can mask the real problem without solving it, and if the connector wasn't the problem, the diagnosis starts from scratch.
Troubleshooting Tree: Diagnose and Resolve Client-Side and Authentication Issuesβ
Color legend:
| Color | Node type |
|---|---|
| Dark blue | Initial symptom (entry point) |
| Blue | Diagnostic question (binary decision or observable) |
| Orange | Intermediate validation or verification |
| Red | Identified cause |
| Green | Recommended action or resolution (implicit in cause nodes) |
When facing a real problem, start from the root node describing the symptom and choose the branch that corresponds to what you can directly observe: is DNS resolving to public or private IP? Did authentication get processed by Entra ID? Is the VPN tunnel up? Each answer eliminates an entire set of hypotheses. Orange nodes indicate that evidence needs to be collected before advancing. Red nodes end the reasoning with the precise cause, which directly leads to corrective action.