Troubleshooting Lab: Design private DNS zones
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
An operations team reports that newly created VMs in the virtual network vnet-app-eastus cannot resolve the name db01.data.internal, which points to a private endpoint of an Azure SQL Database. The environment was configured three weeks ago and worked normally until yesterday.
The responsible engineer verifies the following:
# Executed inside a VM in vnet-app-eastus
nslookup db01.data.internal
Server: 168.63.129.16
Address: 168.63.129.16
*** db01.data.internal: Non-existent domain
The team reports that the day before, a new administrator created a second virtual network link between vnet-app-eastus and the data.internal zone because the original link had been created without auto registration and he wanted to "recreate it with the correct settings". The original link was removed and the new one was added. The data.internal zone contains 47 manually created A records. The application key vault was rotated on the same day, which generated alerts that distracted the team during the investigation.
The DNS server configured in the vnet-app-eastus settings points to 168.63.129.16.
What is the root cause of the resolution failure?
A) The key vault rotation corrupted the credentials used by Azure's internal DNS resolver.
B) Removing and recreating the virtual network link did not automatically restore the existing A records in the zone, which need to be recreated manually.
C) The new virtual network link is still in propagation state and has not been activated by Azure DNS yet.
D) The db01.data.internal record doesn't exist in the data.internal zone because it was never created; the previous link with auto registration disabled never generated this record.
Scenario 2 β Action Decisionβ
The platform team identified that VMs in vnet-spoke-02 cannot resolve names from the private DNS zone services.hub.internal, which is only linked to vnet-hub. The cause was confirmed: there is no virtual network link between vnet-spoke-02 and the services.hub.internal zone. The environment uses hub-spoke topology with bidirectional peering configured and working. The application in vnet-spoke-02 is in production and processes financial transactions. The security team requires that any changes to private DNS zones be approved by a change advisory board (CAB), whose next cycle occurs in 48 hours. The agreed maintenance window for this VNet is at 2 AM the following day.
What is the correct action to take at this moment?
A) Immediately create the virtual network link between vnet-spoke-02 and the services.hub.internal zone, since the cause is already confirmed and creating a link does not cause interruption to existing traffic.
B) Configure a custom DNS server in vnet-spoke-02 with conditional forwarder for services.hub.internal pointing to 168.63.129.16, as a temporary solution, without needing CAB approval.
C) Document the cause, open the approval process in the CAB and wait for the approval cycle before any changes to the zone or links.
D) Create the virtual network link during the 2 AM maintenance window the following day, without submitting to CAB, since it's a non-destructive action.
Scenario 3 β Root Causeβ
A company recently migrated their workloads to Azure and configured private endpoints for Azure Blob Storage. The private DNS zone privatelink.blob.core.windows.net was created and linked to the virtual network vnet-migration. The A record for contosostorage.privatelink.blob.core.windows.net points to 10.1.4.5, which is the private endpoint IP.
A VM in vnet-migration executes the following test:
nslookup contosostorage.blob.core.windows.net
Server: 10.0.0.4
Address: 10.0.0.4
Non-authoritative answer:
Name: contosostorage.blob.core.windows.net
Address: 20.38.98.100
The IP 20.38.98.100 is the service's public IP. The administrator confirms that the virtual network link is active, the A record exists in the zone, and the private endpoint is successfully provisioned. The VM has internet access enabled via NAT Gateway, configured two months ago without issues. The subnet's Network Security Group does not block internal traffic.
What is the root cause of the observed behavior?
A) The NAT Gateway is intercepting DNS queries and redirecting them to external resolvers before Azure private DNS can respond to them.
B) The DNS server configured in the VNet points to 10.0.0.4, which is a custom server that does not forward queries to 168.63.129.16, preventing resolution by the private zone.
C) The A record in the private zone has the incorrect name; it should be contosostorage.blob.core.windows.net and not contosostorage.privatelink.blob.core.windows.net.
D) The virtual network link was created without auto registration, which prevents resolution of manually created A records in the zone.
Scenario 4 β Collateral Impactβ
A team resolves a DNS resolution problem in vnet-analytics by enabling auto registration on the existing virtual network link between this network and the private zone analytics.internal. Before the change, the zone contained 9,800 manually created A records over six months, referring to private endpoints, legacy VMs, and internal aliases. Enabling auto registration is successful and the problematic VMs are now resolved correctly.
What is the most relevant secondary consequence of this action?
A) The existing 9,800 records are automatically removed by Azure DNS, since auto registration assumes exclusive control of the zone and deletes conflicting manual entries.
B) The limit of 10,000 records generated by auto registration will be reached quickly, since each VM in the network now contributes additional records, potentially preventing registration of new VMs.
C) Auto registration changes the TTL of all existing records in the zone to 10 seconds, increasing load on the Azure DNS resolver.
D) Zones with auto registration enabled no longer accept manual creation of new A records, blocking future updates to private endpoints.
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: D
The central clue is in the description of the original link: it was created without auto registration enabled. The record db01.data.internal is a private endpoint, meaning a manually created record, not automatically generated. This means the original link was never responsible for this record; it existed in the zone independently of the link.
What matters then is to verify if the record is still in the zone. The statement says the zone contains "47 manually created A records" and that nslookup returns Non-existent domain, which indicates that the record db01.data.internal is simply not among those 47. The most defensible hypothesis is that it was never created, or was removed at some point, and the previous link never generated it because it had auto registration disabled.
The information about key vault rotation is the intentional noise in the scenario: it created operational distractions, but has no causal relationship with DNS resolution. NAT or application credentials do not affect the existence of records in private DNS zones.
Alternative C is the most dangerous distractor: it attributes the failure to a transient propagation state, leading the engineer to wait instead of investigating. In practice, links become active within seconds and the problem already existed since the recreation.
Answer Key β Scenario 2β
Answer: C
The scenario presents a confirmed cause and an explicit critical restriction: the governance process requires CAB approval for changes to private DNS zones. This restriction is not technical, it's organizational, but it's equally binding.
Alternative A is technically correct in an environment without governance restrictions: creating a virtual network link really doesn't interrupt existing traffic. However, it completely ignores the mandatory approval process, making it incorrect in the given context.
Alternative B attempts to circumvent the process via alternative solution, but a custom DNS server with forwarder is also an infrastructure change subject to the same governance; additionally, it introduces unnecessary complexity when the correct solution is simple.
Alternative D is the most dangerous: it uses the maintenance window as justification to ignore the CAB, which violates the governance process even though the action is technically harmless.
The correct discipline here is: identified cause doesn't mean authorization to act. In regulated environments, the sequence is document, approve, execute.
Answer Key β Scenario 3β
Answer: B
The definitive evidence is in the nslookup output: the consulted server is 10.0.0.4, not 168.63.129.16. This means the VNet is configured with a custom DNS server that, when receiving the query for contosostorage.blob.core.windows.net, resolves directly via internet instead of forwarding to the Azure resolver.
The correct resolution chain for private endpoints depends on the resolver reaching 168.63.129.16, which then queries the private zone and returns the endpoint IP. A custom server that doesn't have conditional forwarder for privatelink suffixes breaks this chain and returns the public IP.
The NAT Gateway is the irrelevant noise in the scenario: it operates at layer 3 and doesn't intercept or alter DNS queries. The fact that it's been configured for two months without issues reinforces that it's not the cause.
Alternative C confuses the record name in the zone with the queried name: the public CNAME redirects to contosostorage.privatelink.blob.core.windows.net, which is exactly the name that should exist as an A record in the private zone. This is correct in the statement.
Alternative D represents a classic misconception: auto registration is irrelevant for manually created A records, which exist in the zone independently of this configuration.
Answer Key β Scenario 4β
Answer: B
Enabling auto registration in a network with many VMs makes Azure DNS start creating A records for each VM automatically. The limit for records generated by auto registration in a single private zone is 10,000. The zone already has 9,800 manually created records. With auto registration active, each VM added to the network contributes new records, and the sum can quickly exceed the limit, causing new VMs to be unable to have their records created in the zone.
Alternative A is the most seductive, but is false: Azure DNS does not remove manual records when enabling auto registration. Manually created records and automatically generated records coexist in the zone; Azure only adds the automatic ones without touching the existing ones.
Alternative D is also false: zones with auto registration continue to accept manual A records normally. The two modalities are not mutually exclusive.
Alternative C invents behavior that doesn't exist: the TTL of existing records is not altered by enabling auto registration.
The real impact is silent: there's no immediate error, but as new VMs are created, their records simply don't appear in the zone, reproducing exactly the symptom from Scenario 2 of the previous technical lab.
Troubleshooting Tree: Design private DNS zonesβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark Blue | Initial symptom (entry point) |
| Blue | Diagnostic question (binary or verifiable decision) |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Intermediate validation or verification |
To use this tree when facing a real problem, always start from the root node, the DNS resolution failure symptom, and follow the branches answering each question based on what is observable in the environment: VNet DNS configuration, existence and state of virtual network link, record presence in the zone, and zone name. Each answer eliminates a class of causes and narrows the path to the real cause. Orange nodes indicate points where investigation requires additional verification before concluding the diagnosis, avoiding hasty actions based on unconfirmed hypotheses.