Skip to main content

Troubleshooting Lab: Integrate Private Link and Private Endpoint with DNS

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An operations team reports that applications on a VM in the Spoke VNet started failing when trying to connect to an Azure SQL Database via Private Endpoint after a DNS migration performed the previous week. The migration consisted of moving private DNS zone management from individual zones per team to a centralized zone in the Hub VNet, managed by the platform team.

The network team confirms that:

  • The peering between Hub and Spoke is active and working
  • The privatelink.database.windows.net zone exists in the Hub and contains the correct A record with the private IP of the endpoint
  • The Private Endpoint is provisioned and with Succeeded status
  • The VM can ping other resources within the Spoke VNet without problems
  • The DNS governance Azure Policy was applied three days ago

The team executes the following command from the VM:

nslookup myserver.database.windows.net

Observed output:

Server:  168.63.129.16
Address: 168.63.129.16

Non-authoritative answer:
Name: myserver.privatelink.database.windows.net
Address: 52.179.xxx.xxx

What is the root cause of the private resolution failure?

A) The A record in the privatelink.database.windows.net zone is pointing to the SQL server's public IP.

B) The private DNS zone privatelink.database.windows.net is not linked to the Spoke VNet where the VM is located.

C) The Azure Policy applied three days ago is blocking the creation of A records in private DNS zones.

D) The peering between Hub and Spoke was not configured with the "Use remote gateways" option, preventing DNS propagation.


Scenario 2 β€” Action Decision​

The cause of a production incident has been accurately identified: an Azure DNS Private Resolver was deployed in a hub VNet to allow on-premises DNS servers to forward privatelink.* zone queries to Azure. However, the resolver's inbound endpoint was created in a subnet that lacks the mandatory Microsoft.Network/dnsResolvers delegation.

The environment has the following constraints at the moment:

  • The subnet in question contains other workloads (monitoring VMs) that cannot be moved immediately
  • The inbound endpoint has Failed status and is not responding to queries
  • A second maintenance window is available in four hours
  • The platform team has permission to create new subnets and modify the DNS Resolver
  • It's not possible to delete and recreate the inbound endpoint in the same subnet without fixing the delegation

What is the correct action to take now?

A) Add the Microsoft.Network/dnsResolvers delegation to the existing subnet and wait for automatic recovery of the inbound endpoint.

B) Create a new subnet with the correct delegation, recreate the inbound endpoint in that subnet, and update the on-premises conditional forwarders to the new IP.

C) Wait for the next maintenance window to move the monitoring VMs and only then fix the delegation on the original subnet.

D) Remove the inbound endpoint with Failed status, fix the delegation on the subnet, and recreate the endpoint at the same IP address.


Scenario 3 β€” Root Cause​

An engineer is investigating why a VM in an isolated VNet, without peering and without hub integration, cannot resolve the FQDN of a Private Endpoint for a storage account. The environment was manually configured by the engineer two days ago.

The engineer shares the following configuration inventory:

ComponentStatus
Private EndpointProvisioned, IP: 10.0.1.5
Private DNS zone privatelink.blob.core.windows.netExists
A record mystorageaccount in the zonePresent, points to 10.0.1.5
Zone link to isolated VNetPresent
NSG on Private Endpoint subnetAllows outbound traffic port 443

The engineer runs diagnostics from the VM:

nslookup mystorageaccount.blob.core.windows.net 168.63.129.16

Output:

Server:  168.63.129.16
Address: 168.63.129.16

Non-authoritative answer:
Name: mystorageaccount.privatelink.blob.core.windows.net
Address: 52.239.xxx.xxx

The engineer notes that the endpoint subnet NSG was recently modified to block traffic on port 53. The storage account has public access disabled.

What is the root cause of the observed behavior?

A) The NSG is blocking DNS traffic on port 53, preventing Azure DNS from reaching the private zone.

B) The A record name in the private DNS zone does not match the subdomain expected by the Azure resolution process.

C) The zone link to the VNet does not have the auto-registration option enabled, which prevents resolution of manual records.

D) The private DNS zone exists but is not correctly linked to the VNet where the VM performs the query.


Scenario 4 β€” Diagnostic Sequence​

A team receives the following report from a monitoring system:

Production application returns connection error when trying to access an Azure Key Vault via FQDN. The error occurs only for on-premises clients. Clients within the Azure VNet access the same Key Vault normally using the same FQDN.

The team has the following investigation steps available:

  1. Verify if the Azure DNS Private Resolver inbound endpoint has an IP accessible by the on-premises network
  2. Confirm that the privatelink.vaultcore.azure.net zone exists and contains the correct A record
  3. Verify if on-premises DNS servers have a conditional forwarder for privatelink.vaultcore.azure.net
  4. Confirm that the on-premises conditional forwarder points to the inbound endpoint IP, not to 168.63.129.16
  5. Test FQDN resolution from an on-premises server using the configured DNS server

What is the correct diagnostic sequence for this scenario?

A) 2 -> 1 -> 3 -> 4 -> 5

B) 3 -> 2 -> 1 -> 4 -> 5

C) 1 -> 2 -> 4 -> 3 -> 5

D) 2 -> 3 -> 1 -> 4 -> 5


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue is in the nslookup output: Azure DNS (168.63.129.16) is responding, but it's returning the public IP. This indicates that the resolver is working, but it doesn't find the private zone linked to the VM's VNet. If the A record was incorrect (alternative A), the returned IP would be private, but wrong. The problem lies in the absence of the link between the centralized private DNS zone in the Hub and the Spoke VNet where the VM resides.

The information about Azure Policy is purposely irrelevant: the Policy may have been applied, but the technical evidence in the nslookup points directly to a zone link problem, not a record issue. Focusing on the Policy would be the classic mistake of confusing temporal correlation with causality.

Alternative D is the most dangerous distractor: the "Use remote gateways" option in peering controls data traffic via gateway, not DNS propagation. Azure private DNS is governed exclusively by zone links, independent of gateway configurations in peering.


Answer Key β€” Scenario 2​

Answer: B

The critical constraint of the scenario is that the subnet in question contains monitoring VMs that cannot be moved immediately. This eliminates alternative C (wait for maintenance to move the VMs) as an immediate action and also makes alternative A unfeasible, since adding delegation to a subnet with existing workloads can be blocked by Azure when there are allocated resources that conflict with the delegation.

Alternative D ignores an important operational fact: when deleting the inbound endpoint, the associated private IP is released and will probably not be the same upon recreation, which would require updating the conditional forwarders anyway, but without consistency guarantee.

The correct action is to create a new subnet with proper delegation, recreate the inbound endpoint, and update the on-premises forwarders. This approach solves the problem within the available window without impacting the monitoring VMs and without depending on a future window.


Answer Key β€” Scenario 3​

Answer: B

The inventory shows that the zone link to the VNet exists and the A record also exists. Based on this, alternatives C and D are directly eliminated by the evidence: link present and A record present.

The clue about the NSG blocking port 53 is purposely irrelevant. Azure DNS (168.63.129.16) is a virtual resolver of the Azure platform that operates internally in the VNet infrastructure: it doesn't go through NSGs. Blocking port 53 in a Private Endpoint subnet NSG doesn't affect DNS resolution from VMs in other subnets.

The real cause lies in the A record name. Azure's resolution process for Private Endpoints works as follows: the FQDN mystorageaccount.blob.core.windows.net is redirected via CNAME to mystorageaccount.privatelink.blob.core.windows.net. The A record in the private zone must have the exact name mystorageaccount, without the .blob.core.windows.net suffix. If the record was created with a different name (for example, the complete FQDN as name), the match fails and Azure DNS falls back to public resolution.

The most dangerous distractor is alternative A, because the NSG detail on port 53 was deliberately inserted to attract hasty diagnoses.


Answer Key β€” Scenario 4​

Answer: A

The correct diagnostic sequence follows the logic of first validating the data infrastructure (the DNS zone and record) before investigating the forwarding path, and only then confirming if the on-premises path is pointing to the right place.

Sequence A (2 -> 1 -> 3 -> 4 -> 5) is correct because:

  • Step 2 confirms that the zone and record exist correctly in Azure, establishing that the problem is path-related, not data-related
  • Step 1 verifies if the inbound endpoint has an IP accessible from on-premises, which is the routing infrastructure needed before any forwarder configuration
  • Step 3 verifies if the conditional forwarder exists in on-premises DNS
  • Step 4 validates if the forwarder points to the correct destination (inbound endpoint) and not to 168.63.129.16, which is inaccessible from on-premises
  • Step 5 is the final test that validates the entire chain

Sequence B is a common mistake: checking the forwarder before confirming that the destination exists and is reachable leads to circular diagnoses. Sequence C reverses the logical order by checking the endpoint IP before confirming that the DNS data is correct. Validating the path without having correct data at the destination is inefficient.


100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color legend:

ColorMeaning
Dark blueInitial symptom or entry point
BlueDiagnostic question (verifiable decision)
RedIdentified cause or configuration failure
GreenCorrective action or confirmed resolution

When facing a real problem, start at the root node (observed symptom) and answer each diagnostic question based on what is verifiable in the environment: resource existence, configuration status, command result. Each branch eliminates a class of hypothesis. Follow the path that corresponds to the actual observed state until reaching a red node, which identifies the cause, or green node, which confirms that the chain is intact. Never skip an intermediate question: the preceding verification node often eliminates the most plausible distractor of the next level.