Troubleshooting Lab: Design and Implement Azure DNS Private Resolver

Diagnostic Scenarios

Scenario 1 — Root Cause

A company's network team reports that VMs hosted in a Spoke VNet cannot resolve domain names belonging to the prod.azure.internal private DNS zone, which is linked to the Hub VNet. VMs in the Hub VNet itself resolve the same names without any issues.

The environment was configured as follows:

Hub VNet (10.1.0.0/16)
  └── Private zone "prod.azure.internal" linked (auto-registration: disabled)
  └── DNS Private Resolver
        ├── Inbound Endpoint: 10.1.1.4 (subnet: snet-dns-inbound)
        └── Outbound Endpoint (subnet: snet-dns-outbound)
              └── Ruleset: forwards "corp.local" to 192.168.0.53

Spoke VNet (10.2.0.0/16)
  └── Peering with Hub: enabled (bidirectional)
  └── DNS Server: 10.1.1.4

During investigation, the team executes the following test from a VM in the Spoke VNet:

$ nslookup app01.prod.azure.internal 10.1.1.4
Server:  10.1.1.4
Address: 10.1.1.4#53

** server can't find app01.prod.azure.internal: NXDOMAIN

The team confirms that peering is active, that the Spoke VNet DNS server correctly points to the inbound endpoint, and that the VM can ping 10.1.1.4. The forwarding ruleset for corp.local was validated and works normally from the Spoke VNet.

What is the root cause of the prod.azure.internal resolution failure?

A) The inbound endpoint does not have the capability to resolve private DNS zones when the query originates from a VNet different from the one hosting the resolver

B) The private zone prod.azure.internal is not linked to the Spoke VNet, therefore the resolver cannot see it when responding to queries from this network

C) Auto-registration is disabled in the private zone, preventing VM records from the Spoke VNet from being created and resolved

D) Peering between Hub and Spoke does not propagate DNS configuration, requiring the DNS server to be configured directly in the Hub VNet

Scenario 2 — Action Decision

The operations team identified that DNS queries for the finance.corp.local domain are failing from all spoke VNets in a hub-and-spoke topology. The cause has been confirmed: the DNS Forwarding Ruleset associated with the Hub's outbound endpoint contains a rule for finance.corp.local pointing to an on-premises DNS server that was decommissioned. The correct address for the new server is 10.0.200.20.

The environment is in production. The ruleset-corp ruleset is shared by four spoke VNets and contains 11 other active rules that are working normally. A maintenance window is scheduled in 6 hours.

What is the correct action to take at this moment?

A) Delete the entire ruleset-corp ruleset and recreate all 12 rules with the correct address during the maintenance window

B) Remove the ruleset association from all spoke VNets immediately to stop incorrect forwarding while the fix is planned

C) Edit only the finance.corp.local rule in the existing ruleset, updating the destination address to 10.0.200.20, without waiting for the maintenance window

D) Create a new ruleset with the correct rule and associate it with the spoke VNets, keeping the old ruleset active to preserve the other rules

Scenario 3 — Root Cause

A network engineer receives a ticket reporting that on-premises servers cannot resolve names of VMs registered in the vm.internal private zone in Azure. Hybrid connectivity is established via Site-to-Site VPN, and the tunnel is active with normal latency.

The engineer collects the following information:

On-premises DNS server: 192.168.1.10 (Windows Server)
Conditional forwarder configured on 192.168.1.10:
  Domain: vm.internal
  Forwards to: 10.0.2.5

DNS Private Resolver (Hub VNet):
  Inbound Endpoint: 10.0.1.4 (subnet: snet-inbound /28)
  Outbound Endpoint: 10.0.2.5 (subnet: snet-outbound /28)
  Zone "vm.internal" linked to Hub VNet

Test executed on the on-premises server:

C:\> nslookup web01.vm.internal 10.0.2.5
DNS request timed out.
    timeout was 2 seconds.
*** Request to 10.0.2.5 timed out

The engineer verifies that the VPN tunnel is active, that 10.0.1.4 responds correctly when queried directly from the on-premises server, and that VMs within the Hub VNet resolve web01.vm.internal without problems. The Network Security Group (NSG) of the snet-inbound subnet allows UDP/53 traffic from any source.

What is the root cause of the failure?

A) The NSG of the snet-inbound subnet is blocking queries from the on-premises server, as the allow rule is not specific enough

B) The conditional forwarder on the on-premises server is pointing to the outbound endpoint (10.0.2.5) instead of the inbound endpoint (10.0.1.4), which is the correct component to receive external queries

C) The vm.internal zone needs to be linked also to the outbound endpoint subnet so the resolver can respond to it

D) The on-premises DNS server needs to use TCP protocol instead of UDP to forward queries through the VPN tunnel

Scenario 4 — Diagnostic Sequence

A VM in a Spoke VNet cannot resolve names from the legacy.corp domain forwarded to an on-premises DNS server. Below are five possible investigation steps, presented out of order:

[P1] Verify if the DNS Forwarding Ruleset is associated with the Spoke VNet

[P2] Query the destination on-premises DNS server directly to confirm
     it responds to the "legacy.corp" domain

[P3] Execute nslookup for "legacy.corp" from the affected VM and capture
     the returned error

[P4] Confirm the outbound endpoint is provisioned and associated with the
     correct ruleset

[P5] Verify if the rule for "legacy.corp" exists in the ruleset and if the
     destination IP is correct

What is the correct investigation sequence?

A) P3 → P1 → P4 → P5 → P2

B) P1 → P3 → P5 → P4 → P2

C) P3 → P4 → P1 → P5 → P2

D) P2 → P3 → P1 → P5 → P4

Answer Key and Explanations

Answer Key — Scenario 1

Answer: B

The decisive clue is in the asymmetric behavior: the Hub VNet resolves the zone normally, but the Spoke VNet does not. The resolver's inbound endpoint responds to queries in the context of zones linked to the VNet where the resolver is provisioned. For a Spoke VNet to also see a private zone when querying this resolver, the zone needs to be linked to the Spoke VNet independently.

The information about the correct functioning of forwarding for corp.local is intentionally irrelevant: it confirms that the resolver is operational and that the ruleset works, but has no relation to private zone visibility.

Alternative A is false: the inbound endpoint resolves private zones for any origin, as long as the zone is correctly linked. Alternative C confuses auto-registration (automatic A record creation mechanism for VMs) with zone resolution capability: disabling auto-registration does not prevent queries to manually created records. Alternative D describes non-existent behavior: peering does not interfere with DNS server propagation configured per VNet.

The most dangerous distractor is C, as disabled auto-registration seems suspicious and can divert diagnosis to the zone instead of the network link.

Answer Key — Scenario 2

Answer: C

The cause is known, the impact is surgical (a single domain rule), and the other 11 rules in the ruleset are working normally. Editing only the problematic rule is the precise, proportional action without risk to the rest of the environment. Modifying a rule in an existing ruleset does not require service interruption or maintenance window, as it does not affect other rules or the ruleset associations with spoke VNets.

Alternative A is destructive and unnecessary: deleting the entire ruleset would interrupt the 11 working rules and require recreating everything, generating much greater impact than the original problem. Alternative B removes protection from all spoke VNets for all domains while the fix is not made, trading a specific problem for a broad interruption. Alternative D would create conflict with rulesets associated with the same VNets and does not solve the problem cleanly: two active rulesets for the same set of VNets generates ambiguity.

Answer Key — Scenario 3

Answer: B

The clue is in the collected data itself: the inbound endpoint 10.0.1.4 responds correctly when queried directly, but the conditional forwarder on the on-premises server points to 10.0.2.5, which is the outbound endpoint. The outbound endpoint does not accept external DNS queries: it is exclusively an outbound interface for forwarding queries from Azure to external destinations. Sending queries to it from outside the virtual network results in timeout, exactly as observed.

The VPN tunnel status and normal latency are irrelevant information: they confirm network connectivity, but do not eliminate a configuration error in the forwarder destination. The NSG of the snet-inbound subnet is also a false clue: the relevant subnet for this flow would be snet-outbound, and anyway the forwarder configuration points to the wrong address before any filtering.

Alternative D is the most dangerous distractor: the TCP vs UDP protocol issue in VPN tunnels is a legitimate concern in other contexts, but does not explain the observed timeout when the inbound endpoint responds normally through the same network path.

Answer Key — Scenario 4

Answer: A

The correct sequence is P3 → P1 → P4 → P5 → P2.

Progressive diagnostic reasoning always starts from the observed symptom before any assumption about the cause. P3 establishes the starting point: what error the VM returns. Then, P1 verifies the most common failure condition in this scenario (ruleset not associated with Spoke VNet), as without this association no forwarding rule is applied. P4 confirms that the outbound endpoint exists and is linked to the correct ruleset, validating the forwarding chain. P5 goes down to the specific rule level to verify if the domain and destination IP are correct. Finally, P2 validates the destination server in isolation, confirming if the problem is in Azure or the on-premises server.

Alternative D makes the classic mistake of validating the external server before confirming that the resolver's internal chain is intact, wasting diagnostic effort on a component that may be working normally.

Troubleshooting Tree: Design and Implement Azure DNS Private Resolver

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Color Legend:

Color	Node Type
Dark Blue	Initial symptom (entry point)
Blue	Diagnostic question (binary decision)
Red	Identified cause
Green	Recommended action or resolution
Orange	Intermediate validation or verification

When facing a real failure involving Azure DNS Private Resolver, always start with the root node and answer each question based on what is directly observable: connectivity tests, nslookup outputs, configurations visible in the portal or via CLI. Follow the path that corresponds to the observed behavior, without skipping steps. Orange nodes indicate points where intermediate validation is necessary before continuing diagnosis. When reaching a red node, the cause is identified and the corresponding green node indicates the precise action to execute.

Diagnostic Scenarios​

Scenario 1 — Root Cause​

Scenario 2 — Action Decision​

Scenario 3 — Root Cause​

Scenario 4 — Diagnostic Sequence​

Answer Key and Explanations​

Answer Key — Scenario 1​

Answer Key — Scenario 2​

Answer Key — Scenario 3​

Answer Key — Scenario 4​

Troubleshooting Tree: Design and Implement Azure DNS Private Resolver​