Troubleshooting Lab: Configure service endpoints for Azure platform as a service (PaaS)
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
An operations team reports that an application hosted on a VM within the snet-app subnet (VNet vnet-prod, East US region) started receiving errors when attempting to write files to an Azure Storage Account. The environment was working correctly until yesterday afternoon.
The administrator checks the Storage Account and finds the following configuration:
Networking > Firewalls and virtual networks
Public network access: Enabled from selected virtual networks and IP addresses
Virtual networks:
vnet-prod / snet-app Status: Succeeded
Firewall:
(no IPs added)
Default action: Deny
Exceptions:
Allow trusted Microsoft services: Enabled
The service endpoint for Microsoft.Storage is listed as provisioned on the snet-app subnet. The VM has private IP 10.1.1.10 and associated public IP 20.50.80.100. The subnet NSG has the following outbound rules:
Priority Name Port Protocol Source Destination Action
100 Allow-HTTPS-Internet 443 TCP VirtualNetwork Internet Allow
200 Allow-Storage-Tag 443 TCP VirtualNetwork Storage Allow
65000 AllowVnetOutbound * * VirtualNetwork VirtualNetwork Allow
65001 AllowInternetOutbound * * * Internet Allow
65500 DenyAllOutbound * * * * Deny
The security team reports they performed an access key rotation on the Storage Account yesterday. The network administrator reports no changes were made to VNet or subnet configurations.
What is the root cause of the access failure?
A) The service endpoint was inadvertently disabled during key rotation, as this process resets the Storage Account network configurations.
B) The application is using an access key that was invalidated by the rotation, causing requests to be rejected with authentication errors, not network errors.
C) The NSG rule with priority 100 is intercepting traffic before the specific Storage Service Tag rule, incorrectly redirecting it to the internet.
D) The VM's public IP (20.50.80.100) is not listed in the Storage Account firewall, and traffic is being routed through the public path after key rotation.
Scenario 2 β Root Causeβ
An administrator receives the following alert: a production application hosted on VMs in the snet-backend subnet (VNet vnet-hub) can no longer access an Azure Key Vault. The Key Vault is configured to accept connections only via service endpoint.
The administrator runs the following command to check the endpoint status:
az network vnet subnet show \
--resource-group rg-network \
--vnet-name vnet-hub \
--name snet-backend \
--query "serviceEndpoints"
Output:
[
{
"locations": ["eastus"],
"provisioningState": "Succeeded",
"service": "Microsoft.KeyVault"
}
]
The administrator then checks the Key Vault network rules:
az keyvault network-rule list --name kv-prod-001 --resource-group rg-app
Output:
{
"bypass": "AzureServices",
"defaultAction": "Deny",
"ipRules": [],
"virtualNetworkRules": [
{
"id": "/subscriptions/xxx/resourceGroups/rg-network/providers/Microsoft.Network/virtualNetworks/vnet-hub/subnets/snet-backend",
"ignoreMissingVnetServiceEndpoint": false
}
]
}
The infrastructure team reports that two days ago, the snet-backend subnet was deleted and recreated with the same name and address range to resolve a NAT Gateway configuration issue. The Key Vault Service Level Agreement (SLA) is 99.9% and no incidents were registered on Azure Status.
What is the root cause of the failure?
A) The NAT Gateway associated with the subnet is intercepting traffic destined for the Key Vault and changing the source IP, causing the endpoint to stop working.
B) The service endpoint for Microsoft.KeyVault was lost when the subnet was recreated and, despite being successfully reprovisioned, the subnet resource registration in the Key Vault still points to the old subnet ID, which no longer exists.
C) The ignoreMissingVnetServiceEndpoint: false field is blocking access because the service endpoint is in a reprovisioning state after the subnet recreation.
D) The AzureServices bypass configured in the Key Vault is conflicting with the VNet rule, causing rejection of connections originated from internal Azure services.
Scenario 3 β Action Decisionβ
The problem cause has been identified: the Azure SQL Database in a production environment is accepting connections from any public IP address because the service endpoint was enabled on the snet-api subnet, but no network rules were created in the SQL Database to restrict access. The Default action remains as Allow.
The operational context is as follows:
- The application is in active production with approximately 800 simultaneous users
- Any interruption of database connectivity would completely bring down the application
- The security team requires public access to be blocked today
- The administrator has permission to modify SQL Database firewall rules
- There is no scheduled maintenance window for the next 6 hours
What is the correct action to take at this moment?
A) Immediately change the SQL Database Default action to Deny, as the service endpoint already ensures that the snet-api subnet will continue to have access and no applications will be impacted.
B) First add the VNet rule for snet-api to the SQL Database firewall, validate that the application maintains connectivity, and only then change the Default action to Deny.
C) Enable Private Endpoint on the SQL Database before any changes, as the service endpoint does not offer sufficient security for production environments with this access profile.
D) Wait for the next 6-hour maintenance window to make the changes, as modifying database firewall rules in production without a scheduled window violates ITIL best practices.
Scenario 4 β Diagnostic Sequenceβ
A VM in the snet-data subnet cannot access an Azure Storage Account. The service endpoint for Microsoft.Storage should be enabled. An administrator needs to investigate the problem.
The available diagnostic steps are:
[P1] Check if the Storage Account has Default action configured as Deny
and if the snet-data subnet is listed in the allowed VNet rules
[P2] Verify if the Microsoft.Storage service endpoint is successfully
provisioned on the snet-data subnet via az network vnet subnet show
[P3] Review the outbound rules of the NSG associated with snet-data to confirm
that traffic on port 443 to the Storage Service Tag is allowed
[P4] Test connectivity from the VM using curl or Test-NetConnection
to confirm if the problem is network or authentication related
[P5] Check Storage Account diagnostic logs (StorageBlobLogs)
to identify if requests reach the service and with what error
Which sequence represents the correct order of progressive diagnosis?
A) P4 β P2 β P1 β P3 β P5
B) P2 β P1 β P3 β P4 β P5
C) P1 β P3 β P2 β P5 β P4
D) P3 β P2 β P4 β P1 β P5
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The decisive clue in the problem statement is the information that the security team performed an access key rotation on the Storage Account the previous day, exactly when the problem began. The network configuration (service endpoint, VNet rules, NSG) is intact and correct, which eliminates any network cause as the primary hypothesis.
Key rotation immediately invalidates the previous key. If the application still uses the old key in its configuration (connection string, environment variable, or outdated Azure Key Vault), all requests will be rejected with authentication error (403 AuthenticationFailed), not with network connectivity error. This symptom is frequently confused with network failure because the practical effect (access denied) appears identical.
The information about the VM's public IP (20.50.80.100) is deliberately irrelevant: with the service endpoint active and the subnet correctly listed in VNet rules, routing is done through Microsoft's backbone regardless of public IP. Alternative A invents non-existent behavior (key rotation does not affect network configurations). Alternative C is incorrect because priority 100 rule (Allow-HTTPS-Internet) and rule 200 (Allow-Storage-Tag) do not conflict; traffic to Storage via service endpoint matches the Storage tag, not the generic Internet destination. Alternative D returns to public IP as the cause, ignoring that the endpoint is functioning.
The most dangerous distractor is alternative D, as it leads the administrator to add a public IP to the Storage Account firewall as a solution, which would worsen the security posture without solving the real problem.
Answer Key β Scenario 2β
Answer: B
The root cause lies in the combination of two facts stated in the problem: the subnet was deleted and recreated two days ago, and the ignoreMissingVnetServiceEndpoint field is set to false.
When a subnet is recreated, it receives a new internal Resource ID, even if the name and address range are identical. The VNet rule in the Key Vault references the Resource ID of the original subnet, which no longer exists. With ignoreMissingVnetServiceEndpoint: false, the Key Vault treats this invalid reference as lack of permission and rejects connections.
The output of the az keyvault network-rule list command confirms this: the id field in virtualNetworkRules points to the resource path, and although the name is the same, the underlying ID has changed. The solution is to remove the old rule and add the recreated snet-backend subnet again.
The information about Key Vault SLA and absence of incidents on Azure Status is deliberately irrelevant: it serves to divert reasoning toward a Microsoft service failure, when the cause is local. Alternative A is incorrect because NAT Gateway changes the source IP of outbound traffic to the internet, but traffic via service endpoint does not pass through NAT Gateway. Alternative C misinterprets the ignoreMissingVnetServiceEndpoint field, which controls behavior when the endpoint is absent, not when it's reprovisioning. Alternative D describes a non-existent conflict between bypass and VNet rules.
Answer Key β Scenario 3β
Answer: B
The critical constraint of the scenario is active production with 800 simultaneous users and no maintenance window. Changing the Default action to Deny before confirming that the VNet rule is correctly applied and functional would immediately interrupt all application connectivity to the database.
The correct sequence is: add the VNet rule for snet-api, validate that the application continues operating normally with this rule active, and only after confirmation, change the Default action to Deny. This approach ensures zero downtime for users and meets the security team's requirement within the deadline.
Alternative A makes the mistake of assuming the service endpoint automatically guarantees access after Deny, without considering that any propagation or configuration issue during the transition would bring down the application. Alternative C is a technically valid solution in another context, but introduces complexity and implementation time incompatible with urgency and lack of window. Alternative D is the most dangerous distractor in organizational terms: the security requirement is for today, and waiting 6 hours with the database publicly exposed violates the requirement explicitly stated in the scenario.
Answer Key β Scenario 4β
Answer: B
The correct sequence is P2 β P1 β P3 β P4 β P5, following the logic of progressive diagnosis from simplest and most fundamental to most specific and costly.
The correct reasoning is:
- P2 first: checking if the service endpoint exists and has
provisioningState: Succeededis the prerequisite for everything. If the endpoint is not provisioned, other checks are irrelevant. - P1 next: with the endpoint confirmed, check if the Storage Account has the subnet listed and
Default actionconfigured correctly. This is the most common cause of rejection after the endpoint is active. - P3 after: with endpoint and Storage rules correct, investigate NSG, which may be blocking outbound traffic to the
StorageService Tag. - P4 in sequence: with configurations validated, test actual connectivity from the VM to confirm if the problem persists and whether it's network or authentication related.
- P5 last: consulting diagnostic logs is the most time-consuming step and should be used to confirm the hypothesis already formed or investigate causes that previous steps did not reveal.
Alternative A starts with connectivity testing, which may generate useful information, but doesn't guide diagnosis without first knowing the configuration state. Alternative C starts with the Storage Account before checking if the endpoint even exists. Alternative D starts with NSG, which is relevant but not the correct starting point for structured investigation.
Troubleshooting Tree: Configure service endpoints for Azure platform as a service (PaaS)β
Color Legend:
| Color | Node Type |
|---|---|
| Dark blue | Initial symptom (entry point) |
| Blue | Diagnostic question (binary decision) |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Intermediate verification or validation |
| Given a real problem, start from the root node (PaaS service access failure symptom) and answer each diagnostic question based on what you observe in the environment. Follow the path corresponding to the obtained answer, traversing the levels until reaching a red node (identified cause) and then to the corresponding green node (action to take). Orange nodes indicate points where additional verification, such as consulting logs or executing a command, is necessary before proceeding. The tree covers the most common failure paths and can be traversed in any order from top to bottom, never skipping levels without confirming the actual state of the environment. |