Skip to main content

Troubleshooting Lab: Configure private endpoints for Azure PaaS

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A development team reports that an application hosted on a VM within vnet-app (East US region) cannot connect to an Azure Storage Account after creating a private endpoint. The administrator verifies the following:

  • The private endpoint was created in subnet snet-pe within vnet-app.
  • The Private DNS Zone privatelink.blob.core.windows.net exists in the environment.
  • The VM can ping other resources in the same VNet normally.
  • The Storage Account still has public access enabled.
  • The application reports timeout when trying to access storprod001.blob.core.windows.net.

The administrator runs the following test from the VM:

nslookup storprod001.blob.core.windows.net

Output:

Server:  168.63.129.16
Address: 168.63.129.16#53

Non-authoritative answer:
Name: storprod001.blob.core.windows.net
Address: 20.150.47.131

The team mentions that the Storage Account was created six months ago and never presented problems before the attempt to migrate to private access. The private endpoint was created last night.

What is the root cause of the problem?

A. The NSG applied to subnet snet-pe is blocking outbound traffic from the VM to port 443.

B. The Private DNS Zone is not linked to vnet-app, so DNS continues resolving the FQDN to the Storage Account's public IP.

C. The Storage Account's public access needs to be disabled before the private endpoint starts working.

D. The private endpoint was created in the wrong subnet; it should be in the VM's subnet and not in snet-pe.


Scenario 2 β€” Action Decision​

The cause of the problem has already been identified: a production Azure Key Vault has an active private endpoint, but the Private DNS Zone privatelink.vaultcore.azure.net was accidentally deleted by a team member. As a consequence, all applications accessing the Key Vault via private name are failing with connection refused errors, impacting a production payment system.

The administrator has Contributor permissions on the Resource Group where the DNS Zone existed, but does not have Private DNS Zone Contributor permissions at the subscription level. The security team requires that any changes to private DNS zones be logged via change ticket, a process that takes 48 hours for approval.

The payment system has a 99.9% SLA and each minute of downtime generates contractual penalty. The senior engineer responsible for the Key Vault is available and has the necessary permissions.

What is the correct action to take at this moment?

A. Immediately recreate the Private DNS Zone and link it to the VNet using your own Contributor permissions, without awaiting approval, given the production impact.

B. Open the change ticket, document the impact and wait for the 48-hour approval to maintain compliance with the security process.

C. Immediately contact the senior engineer to recreate the Private DNS Zone with the correct permissions, while the change ticket is opened in parallel for retroactive compliance.

D. Temporarily revert the Key Vault to public access until the DNS Zone is recreated within the formal change process.


Scenario 3 β€” Root Cause​

An administrator configures a private endpoint for an Azure SQL Database in snet-data within vnet-hub. The Private DNS Zone privatelink.database.windows.net was created and linked to vnet-hub. The A record was automatically created by Azure.

Two days later, the data team reports that queries to the database from VMs in vnet-spoke (peered with vnet-hub) work, but an application running on an Azure Kubernetes Service (AKS) in the same vnet-spoke consistently fails with the following error:

dial tcp: lookup sql-prod.database.windows.net on 10.0.0.10:53: no such host

The administrator verifies:

  • The peering between vnet-hub and vnet-spoke is active and bidirectional.
  • VMs in vnet-spoke resolve sql-prod.database.windows.net correctly to 10.1.2.5 (private IP).
  • The AKS cluster uses Azure CNI and pods receive IPs directly from subnet snet-aks in vnet-spoke.
  • The AKS cluster was created with the default Kubernetes DNS option, which uses CoreDNS.
  • The Private DNS Zone privatelink.database.windows.net is linked to both vnet-hub and vnet-spoke.

What is the root cause of the DNS resolution error in the AKS pods?

A. The peering between vnet-hub and vnet-spoke does not propagate Private DNS Zone records, so pods cannot resolve the name.

B. The AKS CoreDNS is not configured to forward queries for the privatelink.database.windows.net domain to Azure DNS (168.63.129.16), so queries don't reach the Private DNS Zone.

C. The Private DNS Zone should be linked to subnet snet-aks specifically, not just to the VNet.

D. The error indicates that the A record in the Private DNS Zone was automatically deleted after 48 hours, which is Azure's default behavior for records created by private endpoints.


Scenario 4 β€” Diagnostic Sequence​

An administrator receives the following alert: an Azure Function with VNet Integration enabled cannot access an Azure Service Bus that has a private endpoint configured. The function returns SocketException: Connection refused when trying to publish messages.

The available investigation steps are out of order:

  1. Verify if the Private DNS Zone privatelink.servicebus.windows.net is linked to the VNet integrated by the Azure Function.
  2. Confirm if the Service Bus public access was disabled and if the firewall rule allows the Function's IP.
  3. Test DNS resolution of the Service Bus FQDN from the Function's execution environment using nslookup or Resolve-DnsName.
  4. Verify if the Azure Function's VNet Integration is configured and pointing to the correct VNet.
  5. Confirm that the Service Bus private endpoint was created in the correct subnet and that the private IP is allocated.

What is the correct diagnostic sequence?

A. 5 -> 1 -> 4 -> 3 -> 2

B. 4 -> 5 -> 1 -> 3 -> 2

C. 3 -> 1 -> 4 -> 5 -> 2

D. 2 -> 4 -> 1 -> 5 -> 3


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The definitive clue is in the nslookup output: the FQDN is resolving to 20.150.47.131, which is a public IP. If the Private DNS Zone were linked to vnet-app, Azure DNS (168.63.129.16) would return the private endpoint's private IP, typically in the subnet's address range. The fact that the resolver is 168.63.129.16 itself but returns a public IP confirms it consulted the public DNS because it didn't find the private zone associated with the VNet.

The irrelevant information in the scenario is the Storage Account's age ("created six months ago"). It has no relation to the private endpoint or DNS Zone behavior.

Alternative C represents the most dangerous misconception: disabling public access is not a prerequisite for the private endpoint to work. It can coexist with public access during a migration. Acting on this alternative would break public access without solving the DNS problem.

Alternative D is technically incorrect: the private endpoint must be in a VNet subnet, but it doesn't need to be the same subnet as the VM. Any subnet within the same VNet (or VNet with access via peering) is valid.


Answer Key β€” Scenario 2​

Answer: C

The scenario presents two constraints that make other alternatives unfeasible: the administrator does not have permission to recreate the DNS Zone (eliminates A), and the formal process requires 48 hours (eliminates B as immediate action given the ongoing contractual penalty). Alternative D is technically valid as emergency mitigation, but exposes the production Key Vault publicly, violating the security model and creating a greater risk than the original problem.

Alternative C is the only one that respects all constraints simultaneously: solves the problem urgently using someone who has permission, maintains compliance record through the parallel ticket, and doesn't expose the resource publicly.

The correct reasoning here is to identify that "correct action" doesn't mean "technically possible action," but "action that meets the set of security, permission, and SLA constraints simultaneously."


Answer Key β€” Scenario 3​

Answer: B

The critical clue is that VMs in the same vnet-spoke resolve the name correctly, but AKS pods don't. This immediately eliminates alternative A: if peering didn't propagate the DNS Zone, VMs would also fail. The problem is specific to the pods' execution environment.

CoreDNS, which is Kubernetes' default DNS resolver, manages DNS queries from pods. By default, it doesn't automatically forward Azure domains to Azure DNS. It's necessary to configure a CoreDNS ConfigMap with a forward rule for privatelink.* domains pointing to 168.63.129.16. Without this, queries go through CoreDNS, which doesn't find them and returns no such host.

Alternative C is technically incorrect: Private DNS Zones are linked to entire VNets, not individual subnets. Alternative D is false: A records created by private endpoints are permanent while the endpoint exists.

The potentially distracting information is the detail about Azure CNI: it's relevant to understand that pods have VNet IPs, but it's not the cause of the DNS problem. Azure CNI solves network connectivity, not name resolution.


Answer Key β€” Scenario 4​

Answer: B

The correct sequence follows progressive diagnostic logic: first verify if network integration exists (step 4), then confirm if the private endpoint is correctly provisioned with allocated IP (step 5), then validate if DNS resolution is pointing to the private IP (step 1), then test effective FQDN resolution in the Function's execution environment (step 3), and finally verify firewall and public access configurations (step 2).

Sequence B (4 -> 5 -> 1 -> 3 -> 2) respects the diagnostic pyramid: confirm infrastructure before testing behavior, and test behavior before adjusting policies. Starting with step 3 (alternative C) or step 2 (alternative D) inverts the logic: there's no point testing DNS before confirming that VNet Integration and the endpoint are working, and there's no point reviewing firewall before confirming that DNS resolves to the correct IP.

The most dangerous distractor is alternative C, which starts with DNS testing. It seems reasonable, but without confirming that VNet Integration is active (step 4), the nslookup result can be misleading, as the Function could be resolving DNS outside the VNet context.


Troubleshooting Tree: Configure private endpoints for Azure PaaS​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Legend:

ColorMeaning
Dark blueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause or corrective action
GreenConfirmed resolution
OrangeIntermediate verification before concluding

To use this tree when facing a real problem, start at the root node (observed symptom) and answer each question based on what you can verify directly in the environment. The first step is always to test DNS resolution of the resource's FQDN: if the result is a public IP, the problem is in the DNS layer. If it's a private IP but access still fails, the problem is in the network layer or the resource's firewall. Follow the corresponding path until you reach an identified cause before executing any corrective action.