Skip to main content

Troubleshooting Lab: Integrate a Private Link service with on-premises clients

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

A company connects its on-premises environment to Azure via ExpressRoute with private peering. A Private Endpoint was created in VNet vnet-prod (prefix 10.2.8.0/24) to expose an Azure SQL Database. The private DNS zone privatelink.database.windows.net was created and linked to vnet-prod. An Azure DNS Private Resolver was deployed in the same VNet with an Inbound Endpoint on IP 10.2.8.100, and on-premises DNS servers were configured to forward queries for the database.windows.net domain to this IP.

The team reports that DNS queries performed from on-premises servers return the correct Private Endpoint IP (10.2.8.50). However, TCP connections on port 1433 to this IP fail with timeout.

Additional information collected by the team:

# Test executed from on-premises server
Test-NetConnection -ComputerName 10.2.8.50 -Port 1433

ComputerName : 10.2.8.50
RemoteAddress : 10.2.8.50
RemotePort : 1433
InterfaceAlias : Ethernet0
SourceAddress : 192.168.10.25
PingSucceeded : False
TcpTestSucceeded : False

# Route verified on on-premises gateway
Network Gateway Interface
10.2.0.0/16 via ExpressRoute GW-ER

The MTU configured on ExpressRoute links is 1500 bytes. The ExpressRoute circuit shows bandwidth utilization below 20%.

What is the root cause of the connectivity failure?

A) The DNS Private Resolver does not support forwarding queries for privatelink zones; a custom DNS server on VM is required.

B) The 10.2.0.0/16 route covers the Private Endpoint prefix, but the NSG associated with the Private Endpoint subnet is blocking traffic from source 192.168.10.0/24 on port 1433, and endpoint network policies are enabled on the subnet.

C) The ExpressRoute circuit does not advertise the 10.2.8.0/24 prefix to the on-premises environment because this prefix is not being propagated by the VNet gateway.

D) The Private Endpoint was created without manual approval of the connection by the service provider, and therefore the connection state is Pending, not Approved.


Scenario 2 β€” Action Decision​

The network team identified that on-premises clients cannot access a Private Endpoint because the Private Endpoint network policy (PrivateEndpointNetworkPolicies) is enabled on the subnet and an NSG is blocking traffic originating from the on-premises prefix. The cause was confirmed through NSG flow log analysis.

The environment is production, with active SLA and maintenance window only on Saturdays. Access to the Private Endpoint is needed only for a batch application that runs at 3 AM on-premises. Today is Thursday. The team has permission to modify NSGs and subnet policies without additional approval.

What is the correct action to take at this moment?

A) Immediately disable PrivateEndpointNetworkPolicies on the subnet to remove NSG restrictions and restore access without impact to other resources.

B) Add a permit rule to the NSG for the on-premises prefix on the correct port, applying the change now, since adding a permit rule does not cause interruption to existing connections.

C) Wait for Saturday's maintenance window to disable PrivateEndpointNetworkPolicies and adjust the NSG simultaneously, avoiding any changes in production outside the window.

D) Recreate the Private Endpoint in a subnet without an associated NSG, redirecting traffic immediately.


Scenario 3 β€” Root Cause​

A team configured a Private Link Service to expose an internal application hosted behind a Standard Internal Load Balancer. The service was published and a consumer in another subscription created a Private Endpoint pointing to this service. The connection status on the provider side appears as Pending.

The provider engineer checks the portal and observes:

Private Link Service: pls-app-prod
Alias: pls-app-prod.abc12345.brazilsouth.azure.privatelinkservice

Private Endpoint Connections:
Name: pe-consumer-001
State: Pending
Description: Awaiting approval
Consumer Subscription: sub-parceiro-01

The engineer reports that subscription sub-parceiro-01 was in the auto-approval list when the service was configured. He also confirms that the Load Balancer is healthy, that backends respond normally, and that the Private Link Service subnet has PrivateLinkServiceNetworkPolicies disabled.

What is the root cause of the connection's Pending state?

A) The Private Link Service subnet has PrivateLinkServiceNetworkPolicies enabled, preventing external connections from being automatically approved.

B) The Private Link Service alias was used by the consumer instead of the complete Resource ID, and connections via alias always require manual approval, regardless of the auto-approval list.

C) Subscription sub-parceiro-01 is not correctly included in the Private Link Service auto-approval list, either due to a typo in the ID or absence of the record.

D) The Private Endpoint was created in a different region from the Private Link Service region, and inter-regional connections require mandatory manual approval.


Scenario 4 β€” Diagnostic Sequence​

An engineer receives the following report: on-premises clients connected via site-to-site VPN are trying to access a service exposed by a Private Endpoint in the Azure VNet, but connections fail with timeout. DNS correctly resolves the private IP of the endpoint.

The following investigation steps are available, out of order:

  1. Verify if the Private Endpoint subnet prefix is being advertised by the VPN gateway to the on-premises environment via BGP or static route.
  2. Confirm that the IP returned by on-premises DNS corresponds to the Private Endpoint private IP and not the service's public IP.
  3. Verify if there is an NSG on the Private Endpoint subnet with PrivateEndpointNetworkPolicies enabled that blocks the on-premises source prefix.
  4. Test TCP connectivity of the service port from a VM within the Azure VNet itself to the Private Endpoint IP.
  5. Review NSG flow logs to identify if traffic reaches the subnet and is dropped, or if it doesn't reach the subnet.

What is the correct investigation sequence?

A) 2 β†’ 1 β†’ 4 β†’ 5 β†’ 3

B) 1 β†’ 2 β†’ 3 β†’ 4 β†’ 5

C) 4 β†’ 2 β†’ 1 β†’ 3 β†’ 5

D) 2 β†’ 4 β†’ 1 β†’ 5 β†’ 3


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue is in the combination of two data points: DNS resolves correctly (ruling out DNS problem), PingSucceeded is False (ruling out routing problem at ICMP level) and the 10.2.0.0/16 route covers the destination (ruling out missing route). The only path that explains TCP timeout while routing is apparently correct is an NSG blocking traffic. For the NSG to act on the Private Endpoint IP, PrivateEndpointNetworkPolicies must be enabled, which is the prerequisite for NSGs to be applied to the endpoint.

The information about MTU and circuit utilization is intentionally irrelevant: traffic doesn't even reach the endpoint for MTU to matter. Alternative C is dismissible because the 10.2.0.0/16 route covers 10.2.8.0/24 and ping failed, indicating blocking rather than missing route. Alternative D would be visible in the Private Endpoint connection state, not result in network TCP timeout. Alternative A is factually incorrect: DNS Private Resolver supports privatelink zones.

The most dangerous distractor is C, as the engineer could spend hours investigating the ExpressRoute circuit while the real problem is in an NSG local to the subnet.


Answer Key β€” Scenario 2​

Answer: B

The cause has already been identified and confirmed: the NSG is blocking with the policy enabled. The scenario constraint is active production SLA and maintenance window. The correct action is to add a permit rule to the NSG now, because adding an Allow rule to an NSG does not interrupt existing connections. It's a non-destructive operation that solves the problem immediately, within available permissions.

Alternative A (disabling PrivateEndpointNetworkPolicies) would be a broader scope change: it would remove the ability to apply NSGs and UDRs to all Private Endpoints in the subnet, potentially affecting other endpoints and security controls. In production, this change has collateral impact and should be carefully evaluated. Alternative C ignores the fact that the batch application fails every night until Saturday, which is unacceptable. Alternative D recreates the endpoint, which causes complete service downtime during recreation.


Answer Key β€” Scenario 3​

Answer: C

The statement says the subscription was in the auto-approval list "when the service was configured". This doesn't guarantee that the subscription ID is correct today. The most accurate cause is that subscription sub-parceiro-01 is not effectively in the Private Link Service auto-approval list with the correct ID. This can occur due to typos in the subscription GUID, inadvertent removal from the list, or using a subscription alias instead of the ID.

Alternative A is dismissed by the statement itself, which confirms that PrivateLinkServiceNetworkPolicies is disabled. Alternative B is factually incorrect: connections via alias can also be automatically approved if the subscription is in the list. Alternative D is incorrect: Private Link supports inter-regional connectivity and doesn't require manual approval for this reason.

The information about Load Balancer and backend health is irrelevant to the Pending state, which is an access control mechanism in the management plane, not data connectivity.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is: 2 β†’ 1 β†’ 4 β†’ 5 β†’ 3.

The progressive diagnostic reasoning starts from what has already been partially confirmed (DNS resolves) and advances in layers:

Step 2 confirms that the returned IP is indeed the private one and not the public one, validating the starting point. Step 1 verifies if the endpoint prefix is being advertised to on-premises, because without a route traffic never leaves the client. Step 4 isolates whether the problem is in the network between on-premises and Azure or in the endpoint itself, testing from within the VNet. If it works from inside, the problem is in the on-premises path. Step 5 uses flow logs to determine if traffic reaches the subnet or is dropped before, distinguishing routing problem from NSG problem. Step 3 is the most specific and only makes sense after confirming that traffic indeed reaches the subnet (confirmed by step 5).

Alternative B starts with routing verification before confirming DNS state, losing the reference that DNS has already been partially validated in the statement. Alternative C reverses logic by testing from within the VNet before validating if on-premises routing exists, wasting the fastest elimination step.


100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate validation or verification

To use this tree when facing a real problem, start at the root node and answer each diagnostic question based on what is observable in the environment, without assuming causes. Each branch eliminates a hypothesis and directs to the next verification. When you reach a red node (identified cause), execute the corresponding action in the green node connected to it. In validation nodes (orange), the test result determines which path to follow, preventing an incorrect assumption from consuming diagnostic time.