Troubleshooting Lab: Create a Private Link service
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A company's platform team exposed an internal service via Private Link service associated with an internal Standard Load Balancer. The consumer team created a Private Endpoint in their VNet, the connection was approved by the provider, and the status shows Approved. However, when trying to connect to the application from a consumer VM, all attempts fail with timeout.
The responsible engineer collected the following information:
# Connectivity test from consumer VM
Test-NetConnection -ComputerName 10.100.5.4 -Port 443
TcpTestSucceeded : False
PingSucceeded : False
# Private Endpoint status (portal / CLI)
ProvisioningState : Succeeded
ConnectionState : Approved
# NSG applied to Private Endpoint subnet (consumer)
Rule: AllowVnetInBound -- Allow -- Any -- Any
Rule: DenyAllInbound -- Deny -- Any -- Any
# NSG applied to provider NAT subnet
Rule: AllowAzureLoadBalancerInBound -- Allow
Rule: DenyAllInBound -- Deny -- Any -- Any
Rule: AllowVnetInBound -- Allow -- Any -- Any
The provider's Load Balancer is healthy, backends respond on port 443, and the TLS certificate is valid. The Private Endpoint subnet has only one provisioned endpoint, with no other resources.
What is the root cause of the connectivity failure?
A) The Approved status is displayed before DNS propagation is complete, and the timeout occurs because the Private Endpoint IP has not yet been resolved correctly.
B) The NSG applied to the provider's NAT subnet is blocking return traffic, as the Private Link service network policy requires that the NAT subnet NSG be disabled or that explicit rules allow traffic originated from NAT IPs.
C) The NSG applied to the consumer's Private Endpoint subnet is blocking outbound traffic from the VM toward the endpoint, as there is no explicit outbound rule allowing port 443.
D) The Private Link Network Policy is enabled on the provider's NAT subnet, which prevents NAT IP allocation and causes timeouts in connections.
Scenario 2 β Action Decisionβ
The problem cause has been identified: the Private Link service in a production environment was configured with Public visibility, allowing any subscription in the organization to discover the alias and create Private Endpoints. The security team detected three pending connections originating from unknown and unauthorized subscriptions.
The service is in production and serves legitimate connections from two approved consumers. Any interruption must be avoided. The engineer has write permission on the Private Link service resource and needs to act immediately.
What is the correct action to take at this time?
A) Recreate the Private Link service with Restricted visibility, listing only authorized subscriptions, and recreate the Private Endpoints of legitimate consumers pointing to the new service.
B) Reject the three unauthorized pending connections and then change the Private Link service visibility to Restricted, adding only legitimate subscriptions, without interrupting already approved connections.
C) Immediately delete the Private Link service and recreate with the correct configuration, as visibility cannot be changed on an existing service.
D) Keep visibility as Public and add an NSG on the NAT subnet blocking IPs from unauthorized subscriptions, using pending connections as reference.
Scenario 3 β Root Causeβ
A provider configured a Private Link service with automatic approval enabled for the consumer's subscription. The consumer created a Private Endpoint and received Approved status immediately. Even so, the application returns a connection refused error on port 8080.
# Output from consumer command
curl -v http://10.100.7.10:8080/health
* Trying 10.100.7.10:8080...
* connect to 10.100.7.10 port 8080 failed: Connection refused
* Failed to connect to 10.100.7.10 port 8080
# Private Link service configuration (summary)
Frontend IP Configuration: 10.0.2.20 (Standard Internal LB)
Load Balancer: lb-producao-001
NAT Subnet: pls-nat-subnet (10.0.3.0/28)
Private Link Network Policy: Disabled
# Load Balancer lb-producao-001 -- Load balancing rules
| Name | Frontend IP | Frontend Port | Backend Port | Protocol |
|---------------|--------------|----------------|---------------|----------|
| rule-https | 10.0.2.20 | 443 | 443 | TCP |
| rule-http | 10.0.2.20 | 80 | 80 | TCP |
The platform team informs that the service certificate was renewed two days ago and that the NAT subnet has 12 available IPs.
What is the root cause of the observed error?
A) The certificate renewal two days ago caused a reset in active Private Link service connections, and new attempts fail until the service is restarted.
B) The Private Link service is associated with the correct Load Balancer frontend IP, but there is no load balancing rule configured for port 8080, causing traffic to reach the Load Balancer and be dropped due to the absence of a corresponding rule.
C) The NAT subnet with /28 prefix is insufficient for the volume of simultaneous connections, and NAT IP exhaustion causes refusal of new connections.
D) The Private Link Network Policy is disabled on the NAT subnet, which prevents traffic on port 8080 from being forwarded correctly to the backend.
Scenario 4 β Diagnostic Sequenceβ
A consumer reports that the Private Endpoint created for a provider service has Disconnected status. The provider confirms that the Private Link service exists and is operational for other consumers. No recent changes were declared by either side.
The investigation steps below are out of order:
- Verify if the provider's Private Link service still exists and is in Succeeded state
- Confirm if the consumer is using the correct Resource ID or alias in the Private Endpoint configuration
- Check the Private Endpoint activity history to identify if there was any recent delete or recreation operation
- Confirm if the connection on the provider side was rejected, removed, or was never approved
- Verify if the consumer's private DNS is resolving the endpoint FQDN to the correct IP
Which sequence represents the correct diagnostic reasoning, from most comprehensive to most specific?
A) 2 β 1 β 4 β 3 β 5
B) 1 β 4 β 3 β 2 β 5
C) 5 β 2 β 1 β 4 β 3
D) 3 β 1 β 4 β 2 β 5
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: D
The decisive clue is in the configuration block: Private Link Network Policy does not appear as disabled on the provider's NAT subnet. For the Private Link service to function correctly, this policy must be disabled on the subnet designated as NAT subnet. When enabled, Azure cannot allocate the necessary NAT IPs to perform SNAT for consumer connections, resulting in timeout even with connection status as Approved.
The Approved status reflects only the control plane (the connection was accepted administratively), not the data plane. This eliminates any hypothesis based on approval or DNS as a cause.
Alternative C is the most dangerous distractor: the consumer-side NSG seems suspicious at first glance, but the rules shown apply to inbound traffic to the subnet, not outbound traffic from the VM. Traffic originated by the consumer VM toward the Private Endpoint is outbound traffic and would not be blocked by the listed rules. Acting on the consumer NSG would waste time without solving the real problem.
The information about valid TLS certificate and healthy backends is purposefully irrelevant: since traffic never reaches the backend (data plane failure in Private Link), TLS state has no influence on the symptom.
Answer Key β Scenario 2β
Answer: B
The scenario imposes two simultaneous constraints: do not interrupt active legitimate connections and act immediately. Alternative B is the only one that satisfies both. The visibility and auto-approval list of a Private Link service can be edited in-place without recreating the resource or impacting already approved connections. Rejecting pending connections and restricting visibility are control plane operations that do not affect ongoing traffic.
Alternative A is technically valid as a long-term solution, but recreates the service and requires legitimate consumers to recreate their Private Endpoints, generating a downtime window. This violates the no-interruption constraint.
Alternative C is factually incorrect: visibility is an editable attribute of the existing service.
Alternative D is the most dangerous distractor because it seems to solve the problem without recreation, but NSGs on the NAT subnet control the data plane, not service visibility. Unauthorized subscriptions would continue to be able to discover the alias and submit new pending connections indefinitely.
Answer Key β Scenario 3β
Answer: B
The cause is explicit in the provided data, but requires the reader to cross-reference information: the error is Connection refused on port 8080, and the Load Balancer rules table shows only rules for ports 443 and 80. The Private Link service forwards traffic to the Load Balancer frontend exactly as received; if the Load Balancer has no rule for the requested port, it drops the packet and the connection is refused.
The certificate renewal two days ago is irrelevant information included purposefully. Certificate renewal does not affect the Load Balancer or Private Link service data plane, and the symptom (connection refused) is structurally different from a TLS error.
Alternative C about NAT IP exhaustion would be plausible in timeout scenarios, not connection refused. A /28 subnet provides 11 usable IPs for NAT, sufficient for most scenarios. Additionally, the statement explicitly informs that there are 12 available IPs, data that should eliminate this hypothesis.
Alternative D inverts the correct logic: disabling Private Link Network Policy is the necessary state for service operation, not a cause of failure.
Answer Key β Scenario 4β
Answer: B
The correct sequence is 1 β 4 β 3 β 2 β 5, following the logic of eliminating hypotheses from most comprehensive to most specific:
- Step 1: confirming that the provider's Private Link service still exists is the first filter. If the service was deleted, all Private Endpoints pointing to it become Disconnected regardless of any other configuration.
- Step 4: if the service exists, checking the connection state on the provider side determines if there was explicit rejection or removal of approval.
- Step 3: the Private Endpoint activity history reveals if the resource was recreated or underwent some operation that broke the link with the service.
- Step 2: it only makes sense to verify the Resource ID or alias after confirming that the service exists and there is no recent operation explaining the problem.
- Step 5: DNS resolution is validated last, as Disconnected status indicates the problem is in the control plane, not name resolution. DNS resolution only becomes relevant after confirming the control plane is intact.
Sequence C (5 β ...) is the most common distractor: it starts with DNS resolution, which is a second-level symptom and not the cause of a Disconnected status. Starting with DNS would lead the engineer down the wrong path before checking if the service still exists.
Troubleshooting Tree: Create a Private Link serviceβ
Color Legend:
| Color | Node Type |
|---|---|
| Dark blue | Initial symptom (entry point) |
| Blue | Diagnostic question (binary or state decision) |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Intermediate validation or verification |
To use this tree when facing a real problem, start with the root node describing the connectivity symptom. Follow each question responding based on what you observe, not what you suspect. Each bifurcation eliminates an entire class of causes. Upon reaching a red identified cause node, the corresponding green action is the next operational step. If the orange validation node confirms that traffic is flowing but the application still fails, the path continues through DNS resolution, ensuring the diagnosis covers both control plane and data plane.