Troubleshooting Lab: Link a private DNS zone to a VNet
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A development team reports that VMs provisioned in the Spoke-VNet cannot resolve the name api.corp.internal, but VMs in the Hub-VNet resolve normally. The network team confirms that:
- Peering between Hub-VNet and Spoke-VNet is active and bidirectional
- The private DNS zone
corp.internalexists and contains the A record forapi - Spoke-VNet VMs can access Hub-VNet resources directly via IP
- The Spoke-VNet NSG allows unrestricted outbound traffic on port 53
- The Spoke-VNet was created three days ago as part of an environment expansion
The output of the command executed on the Spoke-VNet VM is:
$ nslookup api.corp.internal
Server: 168.63.129.16
Address: 168.63.129.16
** server can't find api.corp.internal: NXDOMAIN
What is the root cause of the resolution failure?
A) The NSG is blocking DNS queries despite the outbound rule, because an inbound rule for responses on port 53 is missing.
B) The Spoke-VNet does not have a resolution link configured for the private DNS zone corp.internal.
C) The peering between VNets was not configured to propagate DNS routes, preventing queries from reaching the private zone.
D) The A record for api was created while the Spoke-VNet did not yet exist, making it invisible to VNets added later.
Scenario 2 β Action Decisionβ
The infrastructure team identified that the private DNS zone prod.internal is linked to VNet-Prod with auto-registration enabled. The same VNet was recently linked to a second zone, monitoring.internal, also with auto-registration enabled. Since then, new VMs provisioned in VNet-Prod are not being registered in either zone.
The cause has already been identified by the team: VNet-Prod violates the limit of a single zone with auto-registration per VNet, and the second link with auto-registration is blocking the expected behavior.
The environment is an active production with dozens of VMs already registered in the prod.internal zone. The team has a maintenance window available only in 48 hours. There is no permission to remove or recreate the prod.internal zone link without change committee approval.
What is the correct action to take now?
A) Immediately remove the prod.internal zone link and recreate it to force auto-registration record resynchronization.
B) Disable auto-registration on the monitoring.internal zone link, restoring correct behavior in the prod.internal zone without affecting existing records.
C) Delete the monitoring.internal zone and create a new zone with a different name to work around the registration conflict.
D) Wait for the maintenance window and, within it, recreate both links with auto-registration enabled simultaneously to force conflict resolution.
Scenario 3 β Root Causeβ
A network engineer reports that VMs in a VNet are creating duplicate DNS records. The private zone dev.internal shows multiple A records for the same hostname, with different IPs, some already deactivated. The engineer suspects a high TTL problem.
The environment has:
- 1 VNet with link to
dev.internalwith auto-registration enabled - VMs with dynamic IPs that are frequently reallocated
- Default TTL for zone records configured at 10 seconds
- 3 VM snapshots restored in the last week from base images
The portal output shows:
dev-vm-01 A 10.0.1.4 TTL: 10s
dev-vm-01 A 10.0.1.17 TTL: 10s
dev-vm-01 A 10.0.1.29 TTL: 10s
What is the root cause of the duplicate records?
A) The 10-second TTL is too low and causes Azure to recreate the record before the previous one expires, accumulating entries.
B) VMs restored from snapshots retained the original hostname and, when provisioned with new IPs, Azure registered new entries without removing previous ones associated with the original instances.
C) The dev.internal zone reached the maximum record limit and is accepting new entries without processing pending deletions.
D) The auto-registration link created a conflict because the VNet has subnets in different availability zones, generating one entry per zone.
Scenario 4 β Diagnostic Sequenceβ
A production VM stops resolving names in the private zone infra.internal after a routine operation performed by the identity and access team. The network engineer receives the ticket and needs to diagnose the cause.
The available investigation steps are:
- Verify if the
infra.internalzone exists and contains the expected records - Confirm if the VM can resolve public names like
microsoft.com - Verify if the link between the VM's VNet and the
infra.internalzone still exists and is active - Check if there were recent changes to access policies or control plane locks on the VNet
- Test resolution of another name in the same zone from a different VM in the same VNet
Which diagnostic sequence represents the correct progressive approach?
A) 1, 3, 5, 2, 4
B) 2, 1, 3, 5, 4
C) 3, 1, 5, 2, 4
D) 4, 2, 1, 3, 5
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The determining clue is in the nslookup output: the queried server is 168.63.129.16, which is Azure's recursive resolver. This means the VM is correctly configured to use Azure DNS, but the resolver returns NXDOMAIN. This behavior indicates that Azure's resolver does not associate the Spoke-VNet with any private zone that contains corp.internal, which occurs exactly when there is no resolution link configured between that VNet and the zone.
The information about NSG and port 53 is irrelevant in this scenario: the resolver 168.63.129.16 is accessed internally by Azure without depending on NSG rules, and the query clearly reached the server since there was an NXDOMAIN response. Alternative A confuses DNS behavior with application traffic subject to NSG. Alternative C represents a classic misconception: peering has no relationship with private DNS link propagation. Alternative D is technically unfounded, as DNS records are visible to any VNet linked to the zone, regardless of when they were created.
The most dangerous distractor is alternative C, as the active bidirectional peering leads the diagnosis to the network layer instead of examining the DNS resolution plane.
Answer Key β Scenario 2β
Answer: B
The cause is already stated: there is a second link with auto-registration enabled in conflict with the first. The correct action is to eliminate the conflict with the least possible impact within the scenario constraints. Disabling auto-registration on the monitoring.internal zone link resolves the conflict without removing any link, without requiring change committee approval, and without affecting existing records in the prod.internal zone.
Alternative A directly violates the scenario constraint: removing the prod.internal link requires approval and may impact the dozens of VMs already registered. Alternative C is a destructive and unnecessary action that is not justified by the presented constraints. Alternative D ignores that the problem can already be resolved now with zero impact, and waiting 48 hours keeps the environment in a degraded state unnecessarily.
The most dangerous distractor is alternative D, as it seems prudent to wait for the maintenance window, but confuses a low-risk immediate action with an intervention that would require a formal window.
Answer Key β Scenario 3β
Answer: B
When a VM is restored from a snapshot, it assumes the same hostname as the base image. If this VM receives a new IP through dynamic allocation, the auto-registration mechanism creates a new A record for the hostname with the new IP. However, previous records associated with instances that were replaced or terminated in an unconventional way, such as snapshot restorations, are not automatically removed by Azure, because the normal deallocation lifecycle that triggers record removal did not occur.
The information about the 10-second TTL is irrelevant and purposefully included to divert the diagnosis to the cache expiration plane, which has no relationship with record accumulation in the zone. Low TTL only affects how long external resolvers keep the record in cache, not the number of records in the zone. Alternative A represents exactly this diagnostic error. Alternatives C and D have no technical foundation for the described symptom.
The most dangerous distractor is alternative A, as TTL is visible and numerical data that attracts attention and seems related to the problem of records that "don't go away."
Answer Key β Scenario 4β
Answer: B
The correct sequence is 2, 1, 3, 5, 4.
The first step is to verify if the VM can resolve public names. This immediately separates a general DNS problem, such as inaccessible DNS server, from a problem specific to the private zone. If public names resolve, DNS works and the problem is localized. Next, confirm that the zone exists and contains the expected records eliminates the hypothesis of accidental deletion. The third step verifies if the link between the VNet and zone is active, which is the most common cause for private resolution loss. The fourth step tests another VM in the same VNet to determine if the problem is with the specific VM or the entire VNet. Lastly, investigating control plane changes closes the diagnosis with operational context, which requires more effort and is more efficient after simpler hypotheses are eliminated.
Alternative D starts with the most complex investigative step before validating simpler hypotheses, which represents the diagnostic error of going directly to the suspected cause without progressive validation.
Troubleshooting Tree: Link a private DNS zone to a VNetβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark Blue | Initial symptom (entry point) |
| Blue | Diagnostic question |
| Red | Identified cause |
| Green | Recommended action or resolution |
To use this tree when facing a real problem, start with the root node describing the observed symptom and follow each branch by answering the questions with what you can directly verify in the environment. Each answer eliminates a set of hypotheses and directs to the next verification. When reaching a red node, the cause is identified; the green node immediately below indicates the corresponding corrective action. Never skip intermediate questions, as the order reflects the investigative complexity progression from simplest to most specific.