Troubleshooting Lab: Design Public DNS Zones

Diagnostic Scenarios

Scenario 1 — Root Cause

A team migrated the hosting of the contoso.com domain from a legacy DNS provider to Azure DNS. The zone was created, all records were manually recreated in the portal, and delegation was updated at the registrar with the four name servers provided by Azure.

Three days after the migration, the monitoring team opens a ticket reporting that users in certain regions of Europe can resolve www.contoso.com normally, while users in Brazil and the United States receive the old IP address from the previous provider.

The responsible engineer checks the Azure portal and confirms that the A record for www is correct and points to the new IP. The TTL for the record in Azure is configured as 300 seconds. The zone at the legacy provider still exists, but the team guarantees that access credentials to the old panel were revoked two days ago.

What is the root cause of the observed behavior?

A) The TTL of 300 seconds is too low, causing inconsistent resolutions between recursive servers that ignore values below 600 seconds.

B) The A record in Azure is correct, but the legacy provider is still authoritative for some resolvers because the TTL of the previous SOA record has not yet expired in the caches of recursive resolvers in the affected regions.

C) The zone at the legacy provider still exists and, since the registrar maintains multiple active NS entries simultaneously, some resolvers still query the old name servers that return the original IP.

D) Microsoft Entra ID has not completed the propagation of DNS permissions after revoking the legacy provider credentials, blocking resolution in the affected regions.

Scenario 2 — Action Decision

The operations team identified that the cause of intermittent email delivery failures to fabrikam.com is an incorrect MX record in the public zone hosted in Azure DNS. The current record points to mail.fabrikam.com with priority 10, but should point to mailrelay.fabrikam.net with priority 10.

The environment has the following constraints:

The fabrikam.com domain is in production with active email traffic
The current TTL of the MX record is 3600 seconds
The approved maintenance window starts in 4 hours
The team has permission to edit DNS records at any time, without need for additional approval
There is a contingency email server configured and functional at mailbackup.fabrikam.net

What is the correct action to take now, before the maintenance window?

A) Immediately correct the MX record pointing to mailrelay.fabrikam.net and wait for natural propagation of the 3600-second TTL.

B) Reduce the TTL of the current MX record to 60 seconds now, so that when the correction is applied during the maintenance window, propagation will be fast.

C) Immediately add a second MX record pointing to mailbackup.fabrikam.net with priority 20 to reduce email loss until the maintenance window, and correct the main record during the window.

D) Wait for the maintenance window to reduce the TTL and only then correct the MX record, avoiding any changes in production outside the approved time.

Scenario 3 — Root Cause

A company configured the public zone dev.contoso.com in Azure DNS as a separate zone from contoso.com. The goal was to delegate the dev subdomain for a development team to manage their own records.

After configuration, records within dev.contoso.com like api.dev.contoso.com work correctly when tested directly against the child zone's name servers. However, when any external user tries to resolve api.dev.contoso.com from public resolvers, the response is NXDOMAIN.

The engineer verifies the following information:

; Direct query to child zone name server
$ nslookup api.dev.contoso.com ns1-05.azure-dns.com
Server:  ns1-05.azure-dns.com
Address: 150.171.11.5

Name:    api.dev.contoso.com
Address: 10.0.1.50

; Query via public resolver
$ nslookup api.dev.contoso.com 8.8.8.8
Server:  dns.google
Address: 8.8.8.8

** server can't find api.dev.contoso.com: NXDOMAIN

The team confirms that the contoso.com zone is hosted in Azure DNS and is working correctly for other records. The network team reports that no firewall rules block external DNS queries to Azure name servers.

What is the root cause of the observed behavior?

A) The IP address returned by the child zone name server (10.0.1.50) is private, and public resolvers filter responses with RFC 1918 addresses by default.

B) The dev.contoso.com zone was created in Azure DNS, but no NS delegation record was added in the parent zone contoso.com pointing to the child zone's name servers.

C) Azure DNS does not support subdomain delegation within the same subscription; the child zone must be in a different subscription for delegation to work correctly.

D) The child zone dev.contoso.com is using different name servers from the parent zone contoso.com, and public resolvers require name servers to be identical to accept delegation.

Scenario 4 — Diagnostic Sequence

An engineer receives the following report: "The website shop.tailwindtraders.com stopped resolving externally right after a DNS change."

He has access to the Azure portal, the domain registrar, and command-line tools. The following investigation steps are available, but were listed out of order:

Step P: Query the zone's authoritative name servers directly with nslookup shop.tailwindtraders.com <nameserver> to verify if the record exists and is correct at the source.
Step Q: Check in the Azure portal if the tailwindtraders.com zone exists and if the A record for shop is present with the correct value.
Step R: Check at the registrar if the four Azure name servers are correctly configured as authoritative servers for the domain.
Step S: Use an external tool like dig +trace shop.tailwindtraders.com to observe the complete resolution path and identify at which level the failure occurs.
Step T: Query a public resolver like 8.8.8.8 and compare the result with the direct query to the authoritative name server.

Which investigation sequence follows the correct logic of progressive diagnosis, from most basic to most specific?

A) Q, R, P, T, S

B) S, T, R, Q, P

C) R, Q, P, S, T

D) Q, P, R, S, T

Answer Key and Explanations

Answer Key — Scenario 1

Answer: C

The decisive clue in the statement is that the zone at the legacy provider still exists. When the registrar is updated with the new Azure name servers, but the old zone remains active at the previous provider, the expected behavior is that resolvers that still have the old NS record cached will query the legacy provider's name servers and receive the old IP. This explains the regional inconsistency: resolvers that renewed their cache after delegation resolve correctly; others still point to the previous provider.

The information about credential revocation is deliberately irrelevant: DNS is not controlled by user authentication; the zone continues responding regardless of who can access the administrative panel. Option B describes a real phenomenon related to SOA TTL, but SOA does not control NS delegation; what matters is the TTL of NS records at the registrar and TLD level. Option A reverses the logic: low TTL favors faster propagation, not inconsistency. Option D is technically nonsensical for this context: Microsoft Entra ID does not participate in public DNS resolution.

The most dangerous distractor is option B, because the reasoning about SOA TTL seems technical and plausible, leading the engineer to wait passively instead of acting to remove the legacy zone.

Answer Key — Scenario 2

Answer: B

The correct logic here is to prepare the correction before executing it. With a TTL of 3600 seconds, any change made now will take up to an hour to propagate to resolvers that have already cached the record. Reducing the TTL immediately to 60 seconds ensures that when the correction is applied during the maintenance window (in 4 hours), propagation will be completed in minutes, minimizing impact.

Option A is technically valid action, but applied at the wrong time: correcting the record now, with TTL of 3600, means propagation will take up to an hour, during which delivery will continue failing or being inconsistent, without team control. Option C seems reasonable, but adds unnecessary complexity and doesn't solve the main failure; the contingency server would only receive emails that the incorrect primary server cannot deliver, which depends on the sending server's retry behavior. Option D ignores a useful prior action window and leaves the problem unmitigated for 4 more hours.

Answer Key — Scenario 3

Answer: B

The evidence in the statement points directly to the cause: direct query to the child zone name server works perfectly, but public resolvers return NXDOMAIN. This means the record exists and is correct in the child zone, but the delegation path is broken. The classic cause of this symptom is the absence of NS records in the parent zone that point to the child zone's name servers. Without these NS records in contoso.com, resolvers traversing the hierarchy reach the parent zone, find no delegation for dev.contoso.com, and return NXDOMAIN.

The information about the private IP address (10.0.1.50) is deliberately irrelevant to the diagnosis: public resolvers do not filter responses with RFC 1918 IPs at the DNS layer; they return what the authoritative server responds. NXDOMAIN indicates absence of authoritative response, not content filtering. Option C is false: Azure DNS supports subdomain delegation within the same subscription without restriction. Option D describes behavior that doesn't exist in the DNS protocol; parent and child zone name servers are intentionally different in delegation.

Answer Key — Scenario 4

Answer: A

The correct sequence is Q, R, P, T, S, which follows the reasoning of progressive elimination from simplest to most complex:

Q — First check if the record exists in Azure DNS. If it doesn't exist, the cause is identified immediately.
R — Confirm if delegation at the registrar is correct. There's no point in having the record exist if name servers aren't delegated correctly.
P — Query the authoritative name server directly to confirm it responds correctly for the record.
T — Compare the result from the authoritative name server with that from a public resolver to isolate whether the problem is at the source or in propagation.
S — Use dig +trace as the final step to map exactly at which level of the hierarchy resolution fails, confirming the diagnosis.

Option B starts with the most complex tool (dig +trace), which is inefficient: if the record simply doesn't exist in Azure, the entire trace is unnecessary. Option C skips checking the record in the portal before querying the name server directly, missing the most basic confirmation step. Option D omits checking delegation at the registrar, which is one of the most common causes of external resolution failure.

Troubleshooting Tree: Design Public DNS Zones

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Color Legend:

Color	Node Type
Dark Blue	Initial symptom (entry point)
Blue	Diagnostic question
Red	Identified cause
Green	Recommended action or resolution
Orange	Validation or intermediate verification

To use this tree when facing a real problem, start at the root node describing the observed symptom and follow the branches by answering each question based on what you can directly verify in the environment. Blue questions require active verification before proceeding; don't skip steps. When you reach a red node, the cause is identified; the immediately connected green node indicates the corrective action. Orange nodes indicate intermediate validations that confirm or rule out hypotheses before proceeding to the next branch.

Diagnostic Scenarios​

Scenario 1 — Root Cause​

Scenario 2 — Action Decision​

Scenario 3 — Root Cause​

Scenario 4 — Diagnostic Sequence​

Answer Key and Explanations​

Answer Key — Scenario 1​

Answer Key — Scenario 2​

Answer Key — Scenario 3​

Answer Key — Scenario 4​

Troubleshooting Tree: Design Public DNS Zones​