Troubleshooting Lab: Integrate Private Link and Private Endpoint with DNS
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
An operations team reports that applications on a VM in the Spoke VNet started failing when trying to connect to an Azure SQL Database via Private Endpoint after a DNS migration performed the previous week. The migration consisted of moving private DNS zone management from individual zones per team to a centralized zone in the Hub VNet, managed by the platform team.
The network team confirms that:
- The peering between Hub and Spoke is active and working
- The
privatelink.database.windows.netzone exists in the Hub and contains the correct A record with the private IP of the endpoint - The Private Endpoint is provisioned and with
Succeededstatus - The VM can ping other resources within the Spoke VNet without problems
- The DNS governance Azure Policy was applied three days ago
The team executes the following command from the VM:
nslookup myserver.database.windows.net
Observed output:
Server: 168.63.129.16
Address: 168.63.129.16
Non-authoritative answer:
Name: myserver.privatelink.database.windows.net
Address: 52.179.xxx.xxx
What is the root cause of the private resolution failure?
A) The A record in the privatelink.database.windows.net zone is pointing to the SQL server's public IP.
B) The private DNS zone privatelink.database.windows.net is not linked to the Spoke VNet where the VM is located.
C) The Azure Policy applied three days ago is blocking the creation of A records in private DNS zones.
D) The peering between Hub and Spoke was not configured with the "Use remote gateways" option, preventing DNS propagation.
Scenario 2 β Action Decisionβ
The cause of a production incident has been accurately identified: an Azure DNS Private Resolver was deployed in a hub VNet to allow on-premises DNS servers to forward privatelink.* zone queries to Azure. However, the resolver's inbound endpoint was created in a subnet that lacks the mandatory Microsoft.Network/dnsResolvers delegation.
The environment has the following constraints at the moment:
- The subnet in question contains other workloads (monitoring VMs) that cannot be moved immediately
- The inbound endpoint has
Failedstatus and is not responding to queries - A second maintenance window is available in four hours
- The platform team has permission to create new subnets and modify the DNS Resolver
- It's not possible to delete and recreate the inbound endpoint in the same subnet without fixing the delegation
What is the correct action to take now?
A) Add the Microsoft.Network/dnsResolvers delegation to the existing subnet and wait for automatic recovery of the inbound endpoint.
B) Create a new subnet with the correct delegation, recreate the inbound endpoint in that subnet, and update the on-premises conditional forwarders to the new IP.
C) Wait for the next maintenance window to move the monitoring VMs and only then fix the delegation on the original subnet.
D) Remove the inbound endpoint with Failed status, fix the delegation on the subnet, and recreate the endpoint at the same IP address.
Scenario 3 β Root Causeβ
An engineer is investigating why a VM in an isolated VNet, without peering and without hub integration, cannot resolve the FQDN of a Private Endpoint for a storage account. The environment was manually configured by the engineer two days ago.
The engineer shares the following configuration inventory:
| Component | Status |
|---|---|
| Private Endpoint | Provisioned, IP: 10.0.1.5 |
Private DNS zone privatelink.blob.core.windows.net | Exists |
A record mystorageaccount in the zone | Present, points to 10.0.1.5 |
| Zone link to isolated VNet | Present |
| NSG on Private Endpoint subnet | Allows outbound traffic port 443 |
The engineer runs diagnostics from the VM:
nslookup mystorageaccount.blob.core.windows.net 168.63.129.16
Output:
Server: 168.63.129.16
Address: 168.63.129.16
Non-authoritative answer:
Name: mystorageaccount.privatelink.blob.core.windows.net
Address: 52.239.xxx.xxx
The engineer notes that the endpoint subnet NSG was recently modified to block traffic on port 53. The storage account has public access disabled.
What is the root cause of the observed behavior?
A) The NSG is blocking DNS traffic on port 53, preventing Azure DNS from reaching the private zone.
B) The A record name in the private DNS zone does not match the subdomain expected by the Azure resolution process.
C) The zone link to the VNet does not have the auto-registration option enabled, which prevents resolution of manual records.
D) The private DNS zone exists but is not correctly linked to the VNet where the VM performs the query.
Scenario 4 β Diagnostic Sequenceβ
A team receives the following report from a monitoring system:
Production application returns connection error when trying to access an Azure Key Vault via FQDN. The error occurs only for on-premises clients. Clients within the Azure VNet access the same Key Vault normally using the same FQDN.
The team has the following investigation steps available:
- Verify if the Azure DNS Private Resolver inbound endpoint has an IP accessible by the on-premises network
- Confirm that the
privatelink.vaultcore.azure.netzone exists and contains the correct A record - Verify if on-premises DNS servers have a conditional forwarder for
privatelink.vaultcore.azure.net - Confirm that the on-premises conditional forwarder points to the inbound endpoint IP, not to
168.63.129.16 - Test FQDN resolution from an on-premises server using the configured DNS server
What is the correct diagnostic sequence for this scenario?
A) 2 -> 1 -> 3 -> 4 -> 5
B) 3 -> 2 -> 1 -> 4 -> 5
C) 1 -> 2 -> 4 -> 3 -> 5
D) 2 -> 3 -> 1 -> 4 -> 5
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The decisive clue is in the nslookup output: Azure DNS (168.63.129.16) is responding, but it's returning the public IP. This indicates that the resolver is working, but it doesn't find the private zone linked to the VM's VNet. If the A record was incorrect (alternative A), the returned IP would be private, but wrong. The problem lies in the absence of the link between the centralized private DNS zone in the Hub and the Spoke VNet where the VM resides.
The information about Azure Policy is purposely irrelevant: the Policy may have been applied, but the technical evidence in the nslookup points directly to a zone link problem, not a record issue. Focusing on the Policy would be the classic mistake of confusing temporal correlation with causality.
Alternative D is the most dangerous distractor: the "Use remote gateways" option in peering controls data traffic via gateway, not DNS propagation. Azure private DNS is governed exclusively by zone links, independent of gateway configurations in peering.
Answer Key β Scenario 2β
Answer: B
The critical constraint of the scenario is that the subnet in question contains monitoring VMs that cannot be moved immediately. This eliminates alternative C (wait for maintenance to move the VMs) as an immediate action and also makes alternative A unfeasible, since adding delegation to a subnet with existing workloads can be blocked by Azure when there are allocated resources that conflict with the delegation.
Alternative D ignores an important operational fact: when deleting the inbound endpoint, the associated private IP is released and will probably not be the same upon recreation, which would require updating the conditional forwarders anyway, but without consistency guarantee.
The correct action is to create a new subnet with proper delegation, recreate the inbound endpoint, and update the on-premises forwarders. This approach solves the problem within the available window without impacting the monitoring VMs and without depending on a future window.
Answer Key β Scenario 3β
Answer: B
The inventory shows that the zone link to the VNet exists and the A record also exists. Based on this, alternatives C and D are directly eliminated by the evidence: link present and A record present.
The clue about the NSG blocking port 53 is purposely irrelevant. Azure DNS (168.63.129.16) is a virtual resolver of the Azure platform that operates internally in the VNet infrastructure: it doesn't go through NSGs. Blocking port 53 in a Private Endpoint subnet NSG doesn't affect DNS resolution from VMs in other subnets.
The real cause lies in the A record name. Azure's resolution process for Private Endpoints works as follows: the FQDN mystorageaccount.blob.core.windows.net is redirected via CNAME to mystorageaccount.privatelink.blob.core.windows.net. The A record in the private zone must have the exact name mystorageaccount, without the .blob.core.windows.net suffix. If the record was created with a different name (for example, the complete FQDN as name), the match fails and Azure DNS falls back to public resolution.
The most dangerous distractor is alternative A, because the NSG detail on port 53 was deliberately inserted to attract hasty diagnoses.
Answer Key β Scenario 4β
Answer: A
The correct diagnostic sequence follows the logic of first validating the data infrastructure (the DNS zone and record) before investigating the forwarding path, and only then confirming if the on-premises path is pointing to the right place.
Sequence A (2 -> 1 -> 3 -> 4 -> 5) is correct because:
- Step 2 confirms that the zone and record exist correctly in Azure, establishing that the problem is path-related, not data-related
- Step 1 verifies if the inbound endpoint has an IP accessible from on-premises, which is the routing infrastructure needed before any forwarder configuration
- Step 3 verifies if the conditional forwarder exists in on-premises DNS
- Step 4 validates if the forwarder points to the correct destination (inbound endpoint) and not to
168.63.129.16, which is inaccessible from on-premises - Step 5 is the final test that validates the entire chain
Sequence B is a common mistake: checking the forwarder before confirming that the destination exists and is reachable leads to circular diagnoses. Sequence C reverses the logical order by checking the endpoint IP before confirming that the DNS data is correct. Validating the path without having correct data at the destination is inefficient.
Troubleshooting Tree: Integrate Private Link and Private Endpoint with DNSβ
Color legend:
| Color | Meaning |
|---|---|
| Dark blue | Initial symptom or entry point |
| Blue | Diagnostic question (verifiable decision) |
| Red | Identified cause or configuration failure |
| Green | Corrective action or confirmed resolution |
When facing a real problem, start at the root node (observed symptom) and answer each diagnostic question based on what is verifiable in the environment: resource existence, configuration status, command result. Each branch eliminates a class of hypothesis. Follow the path that corresponds to the actual observed state until reaching a red node, which identifies the cause, or green node, which confirms that the chain is intact. Never skip an intermediate question: the preceding verification node often eliminates the most plausible distractor of the next level.