Troubleshooting Lab: Associate public IP addresses to resources
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A development team reports that a newly created virtual machine in Azure is not accessible from the internet on port 443. The VM hosts an internal HTTPS service that was promoted to production this week.
The responsible engineer checks the configuration and collects the following information:
VM: vm-prod-api-01
Region: East US 2
Size: Standard_D2s_v3
OS: Ubuntu 22.04 LTS
NIC: nic-prod-api-01
Associated public IP: pip-prod-api-01 (Standard SKU, Static)
Assigned public IP: 20.x.x.x (confirmed in portal)
Subnet: snet-backend (10.10.2.0/24)
NSG associated to NIC: nsg-prod-api-01
Active inbound rules in the NSG:
| Priority | Name | Port | Protocol | Source | Destination | Action |
|---|---|---|---|---|---|---|
| 100 | Allow-SSH | 22 | TCP | Internet | Any | Allow |
| 65000 | AllowVnetIn | Any | Any | VNet | VNet | Allow |
| 65500 | DenyAllIn | Any | Any | Any | Any | Deny |
The engineer confirms that the HTTPS service is active and listening on port 443 inside the VM, and that the public IP is correctly associated to the NIC. The resource group has environment and cost center tags properly filled.
What is the root cause of the inaccessibility on port 443?
A) The Standard SKU of the public IP blocks inbound traffic by default while there is no Load Balancer in front of the VM
B) There is no explicit NSG rule allowing inbound traffic on port 443 from the internet
C) The snet-backend subnet is not suitable for hosting VMs with public IP; the resource should be in a front-end subnet
D) The Standard SKU Static public IP requires association to an accelerated network interface to work correctly
Scenario 2 β Action Decisionβ
The platform team identified that a production public Standard Load Balancer is using a Basic SKU public IP address that has been associated for over a year. The load on the environment is high and the Load Balancer serves critical applications with no maintenance window available for the next three weeks.
The cause of the problem is clear: the association of a Basic IP to a Standard Load Balancer is an invalid configuration according to current Microsoft documentation, and the platform entered this state due to a Load Balancer SKU migration performed without complete review of dependent resources.
The team has Contributor permissions on the Resource Group and access to the Azure portal and CLI.
What is the correct action to take at this moment?
A) Immediately delete the Basic public IP and create a new Standard SKU IP, associating it to the Load Balancer without waiting for a maintenance window
B) Convert the Basic public IP to Standard SKU using the command az network public-ip update --sku Standard and reassociate it to the Load Balancer
C) Register the problem in the backlog with high priority, document the current configuration and wait for the maintenance window to execute the replacement with impact control
D) Associate a second Standard SKU public IP to the Load Balancer now and remove the Basic IP after confirming traffic has been redirected
Scenario 3 β Root Causeβ
A network engineer reports that they tried to associate an existing public IP address to a new network interface (NIC) of a VM and received the following error in the Azure portal:
Cannot associate public IP address 'pip-shared-01' to network interface 'nic-vm-new-03'.
The public IP address is already associated to resource '/subscriptions/.../networkInterfaces/nic-vm-legacy-05'.
Operation failed with status: 'Conflict'. Error code: 'PublicIPAddressAlreadyInUse'
The engineer verifies in the portal that the VM associated to nic-vm-legacy-05 has been in the Stopped state for 45 days. The public IP pip-shared-01 is Basic SKU with Dynamic allocation. The capacity team confirms that public IP addresses are available in the subscription pool and that the public IP limit has not been reached.
The engineer believes the problem is caused by the public IP limit per subscription and opens a support ticket to increase quota.
What is the real root cause of the problem?
A) Basic SKU with Dynamic allocation cannot be associated to NICs of new VMs without first being updated to Standard SKU
B) A VM in Stopped state still maintains the NIC association with the public IP; the IP needs to be disassociated before being reassigned
C) The subscription's public IP limit has been reached and the new IP needs to be requested via quota increase
D) Basic SKU public IP with Dynamic allocation is automatically released only when the VM is deallocated, not when it's in Stopped state; it's necessary to deallocate the VM to release the IP
Scenario 4 β Diagnostic Sequenceβ
An operator receives an alert that an application exposed via Azure Firewall stopped responding to external connections. The Azure Firewall is in a hub VNet with multiple public IPs associated via Public IP Prefix. No infrastructure changes have been registered in Change Management in the last 24 hours.
The available investigation steps are:
- Verify if DNAT rules in Azure Firewall are active and point to the correct internal IP of the application
- Confirm that public IP addresses from the Public IP Prefix are still associated to Azure Firewall in the portal
- Check Azure Firewall logs (via Log Analytics) if there's traffic being received and dropped or if no traffic is reaching the firewall
- Test connectivity from outside the VNet directly to the Azure Firewall public IP using
Test-NetConnectionorcurl - Check the effective route table of the Azure Firewall subnet to confirm that outbound traffic is being routed correctly
What is the correct progressive diagnostic sequence for this scenario?
A) 2, 1, 4, 3, 5
B) 4, 3, 2, 1, 5
C) 4, 2, 3, 1, 5
D) 1, 4, 3, 2, 5
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The NSG rules table clearly shows that there is no rule allowing inbound traffic on port 443. The default DenyAllIn rule at priority 65500 blocks any traffic that is not explicitly permitted by lower-numbered rules. The Standard SKU of the public IP indeed requires NSG for traffic control, but this reinforces alternative B: Standard does not block by itself, it only requires that the NSG be configured correctly. The absence of the permission rule for 443 is the direct cause.
Alternative A is the most dangerous distractor: it correctly mixes the secure-by-default behavior of Standard SKU with an incorrect explanation. Standard SKU does not require Load Balancer to work with VMs; it requires NSG. Alternative C is irrelevant: there is no topology restriction preventing VMs with public IP in backend subnets. Alternative D is technically false.
The information about environment and cost center tags is purposely irrelevant and should not influence the diagnosis.
Answer Key β Scenario 2β
Answer: C
The scenario explicitly states that no maintenance window is available for the next three weeks and that the environment is production with high load. The correct action given this constraint is to register, document, and wait for the safe moment to execute the replacement with impact control.
Alternative A represents a technically necessary action, but applied at the wrong time: deleting the IP in production without a window would cause immediate interruption. Alternative B is directly incorrect: conversion from Basic to Standard SKU via CLI is not supported by the platform; the IP must be recreated. This is the most dangerous distractor, as it seems like an elegant and impactless solution, but the command would result in error. Alternative D is a valid approach in scenarios with operational flexibility, but adding a second IP and removing the other still represents a production change outside the window, which contradicts the central constraint of the statement.
Answer Key β Scenario 3β
Answer: B
The error message is direct: PublicIPAddressAlreadyInUse. The public IP is still associated to NIC nic-vm-legacy-05. The Stopped state of the VM does not release the IP association; the NIC remains linked to the resource and, consequently, to the public IP. To reuse the IP, it's necessary to explicitly disassociate it from the current NIC, regardless of the VM state.
Alternative D is the most sophisticated and most dangerous distractor: it introduces a real confusion between Stopped and Deallocated states of a VM, which is a common misconception. However, even if the VM were deallocated, the Basic IP with Dynamic allocation would be automatically released, which would also solve the problem in a different way. But the root cause of the error is not the VM state: it's the active association of the IP to the NIC, which can be undone directly in the portal without changing the VM state.
Alternative C is exactly the irrelevant information included purposely: the statement confirms that the IP limit has not been reached, but the fictional engineer in the scenario acts as if this were the cause, illustrating the diagnostic error of confusing symptom with superficial cause.
Answer Key β Scenario 4β
Answer: C
The correct sequence is 4, 2, 3, 1, 5, which follows progressive diagnostic logic: from external to internal, from simplest to most specific.
Step 4 answers the most fundamental question: is traffic reaching the public IP? If the external connectivity test fails completely, the problem may be before even the firewall. Step 2 confirms if public IPs are still associated to Azure Firewall, eliminating the hypothesis of accidental disassociation. Step 3 analyzes logs to understand if traffic reaches the firewall and what happens to it, separating connectivity problems from rule problems. Step 1 investigates DNAT rules only after confirming that traffic arrives and is being processed. Step 5, subnet route table verification, is the most specific and relevant only if there are indications that internal routing is compromised.
Alternative B (4, 3, 2, 1, 5) skips IP association verification before analyzing logs, which may lead to conclusions based on logs from a different configuration state than the current one. Alternative A starts with IP association without first confirming if the problem is external or internal. Alternative D starts with DNAT rules, which is investigating the most specific cause before validating previous layers.
Troubleshooting Tree: Associate public IP addresses to resourcesβ
Legend:
- Dark blue: initial symptom, investigation entry point
- Medium blue: objective diagnostic question, requires active verification
- Green: recommended action or identified resolution
- Red: identified cause without direct resolution in Azure (external problem)
To use this tree when facing a real problem, start with the root node describing the observed symptom and answer each question based on what is directly verifiable in the portal, CLI, or logs. Follow the branch corresponding to each answer without skipping levels. Each question node represents a specific verification that eliminates or confirms a hypothesis. The goal is to reach a green node (corrective action) or red node (external cause) by traversing only the branches that the real environment state confirms.