Troubleshooting Lab: Implement VNet Peering
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
An operations team reports that two VMs in different VNets cannot communicate with each other, despite the peering appearing as Connected on both sides in the Azure portal. The VNets involved are:
| Attribute | VNet-Alpha | VNet-Beta |
|---|---|---|
| Region | East US | East US |
| Address space | 10.10.0.0/16 | 10.20.0.0/16 |
| Subscription | Sub-Prod | Sub-Prod |
| Peering status | Connected | Connected |
The source VM (vm-alpha-01, IP 10.10.1.4) attempts to reach the destination VM (vm-beta-01, IP 10.20.1.4) via ICMP. The test fails consistently. The team reports that the NSG associated with the destination subnet has been reviewed and has no deny rules for ICMP. The team also mentions that the destination VM was migrated to a different subnet within VNet-Beta three days ago, without any changes to the peering.
The engineer executes the following command on the source VM:
traceroute 10.20.1.4
traceroute to 10.20.1.4 (10.20.1.4), 30 hops max, 60 byte packets
1 * * *
2 * * *
3 * * *
Then checks the effective routes on the NIC of vm-alpha-01:
Address prefix Next hop Next hop type
10.10.0.0/16 - VnetLocal
10.20.0.0/16 - VNetPeering
0.0.0.0/0 - Internet
What is the root cause of the communication failure?
A) The peering between VNet-Alpha and VNet-Beta is in an inconsistent state on the data plane, despite showing Connected on the control plane, and needs to be recreated.
B) The NSG associated with the destination VM's NIC has an implicit rule blocking ICMP that was not checked by the team, different from the subnet NSG.
C) The route for 10.20.0.0/16 is correctly present, but an NSG associated with the NIC of vm-alpha-01 blocks outbound traffic before it leaves the source VM.
D) The address space of the VNets is overlapping in a way not detected by the portal, causing silent packet drops on the data plane.
Scenario 2 β Action Decisionβ
The cause of a connectivity failure in a hub-and-spoke environment has been identified: the peering between VNet-Hub and VNet-Spoke-Finance has the Use remote gateways option enabled on the Spoke side, but the Allow gateway transit option is not enabled on the Hub side.
The environment has the following constraints:
- The VPN Gateway in VNet-Hub is actively in use by 6 other spoke VNets in production
- The scheduled maintenance window starts in 4 hours
- The network team has Network Contributor permission on VNets Hub and Spoke-Finance
- The security team needs to be notified before any changes that affect the gateway
What is the correct action to take at this time?
A) Disable Use remote gateways on the Spoke-Finance side, recreate the peering from scratch on both sides and enable both options simultaneously during the maintenance window.
B) Immediately enable Allow gateway transit on the Hub side, as the change is non-disruptive for other already connected spokes and does not require a maintenance window.
C) Notify the security team, wait for the maintenance window and enable Allow gateway transit on the Hub side within the approved window.
D) Create a new VPN Gateway directly in VNet-Spoke-Finance to eliminate the dependency on the Hub and solve the problem without needing to change the existing peering.
Scenario 3 β Root Causeβ
An engineer attempts to create a peering between VNet-Prod and VNet-Analytics, both in different regions and in distinct subscriptions within the same Microsoft Entra ID tenant. When trying to create the peering through the Azure portal, the engineer receives the following error message:
Error: AuthorizationFailed
Message: "The client 'eng-user@contoso.com' with object id '...' does not have
authorization to perform action
'Microsoft.Network/virtualNetworks/virtualNetworkPeerings/write'
over scope '/subscriptions/sub-analytics-id/resourceGroups/rg-analytics/
providers/Microsoft.Network/virtualNetworks/VNet-Analytics'
or the scope is invalid."
The engineer checks their permissions and finds the following:
Subscription: sub-prod
Role: Network Contributor (scope: /subscriptions/sub-prod)
Subscription: sub-analytics
Role: Reader (scope: /subscriptions/sub-analytics)
The team additionally reports that the peering on the VNet-Prod side was already successfully created by the same engineer yesterday, and that both VNets have non-overlapping address spaces. The analytics team reported that no changes were made to the VNets in the last 72 hours.
What is the root cause of the observed error?
A) Peering between different subscriptions requires the engineer to have the Owner role on at least one of the subscriptions, and the Network Contributor role is not sufficient.
B) The engineer does not have the Microsoft.Network/virtualNetworks/virtualNetworkPeerings/write permission over VNet-Analytics, as their role in the analytics subscription is only Reader.
C) The error occurs because the peering on the VNet-Prod side was created without the VNet-Analytics side existing simultaneously, and Azure invalidates the first side after 24 hours without confirmation.
D) Peering between different regions in distinct subscriptions requires both VNets to be in the same resource group for cross permissions to be resolved correctly.
Scenario 4 β Diagnostic Sequenceβ
An engineer receives the following report: resources in VNet-Spoke-Dev cannot access the on-premises network via VPN Gateway located in VNet-Hub, although other spokes are working normally. The peering between Hub and Spoke-Dev shows Connected status on both sides.
The available investigation steps are:
[P] Check effective routes on the NIC of a VM within VNet-Spoke-Dev
[Q] Verify if Allow gateway transit is enabled on the Hub side peering
[R] Confirm if Use remote gateways is enabled on the Spoke-Dev side peering
[S] Confirm that the VPN Gateway in VNet-Hub is in Running state
[T] Compare Spoke-Dev peering configurations with those of a working spoke
Which investigation sequence represents the most efficient diagnostic reasoning?
A) S, Q, R, T, P
B) P, S, Q, R, T
C) T, Q, R, P, S
D) Q, R, S, P, T
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The decisive clue is in the distinction between the subnet NSG and the NIC NSG. The team checked the NSG associated with the destination subnet and found no blocks, but did not check the NSG associated directly with the destination VM's NIC. In Azure, NSGs can be associated with both subnets and NICs individually, and both are processed independently. An NSG on the NIC can block traffic that has already passed through the subnet NSG.
The information about the VM migration to another subnet three days ago is intentionally irrelevant: the effective routes confirm that 10.20.0.0/16 is accessible via VNetPeering, which rules out any routing problem resulting from the subnet change. The peering is working on both the control plane and data plane for the correct prefix.
Alternative A represents the diagnostic error of resorting to the most drastic action (recreating the peering) without exhausting less invasive hypotheses. Alternative D is impossible given that the address spaces are visibly non-overlapping (10.10.x.x and 10.20.x.x). Alternative C is not supported by the effective routes: if an outbound NSG on the source NIC blocked traffic, the effective routes would still appear, but the block would occur before transmission, making the traceroute equally inconclusive. However, the specific scenario points to the destination, where two levels of NSG exist and only one was verified.
The most dangerous distractor is A: acting by recreating the peering would interrupt connectivity for both VNets during the process, causing unnecessary impact in production.
Answer Key β Scenario 2β
Answer: C
The cause has already been identified and the technical correction is simple: enable Allow gateway transit on the Hub side. However, the scenario establishes two critical constraints that make immediate action (alternative B) incorrect:
- The security team needs to be notified before any changes that affect the gateway.
- There is a scheduled maintenance window in 4 hours.
Although enabling Allow gateway transit is technically non-disruptive for already connected spokes, ignoring the security team notification process violates an explicit scenario constraint. In regulated environments or those with change management processes, executing changes outside the approved process is an operational error regardless of technical impact.
Alternative A adds unnecessary complexity: recreating the peering from scratch would involve temporary unavailability and is not required by the identified cause. Alternative D solves the problem through a technically valid but disproportionate path: creating a second VPN Gateway has high cost, significant provisioning time and contradicts the adopted hub-and-spoke design. The most dangerous distractor is B: the action is technically correct, but executed ignoring the governance process defined in the statement.
Answer Key β Scenario 3β
Answer: B
The error message is direct and self-explanatory: AuthorizationFailed with the action virtualNetworkPeerings/write over the analytics subscription scope. The permissions table confirms that the engineer has only the Reader role in the sub-analytics subscription, which does not grant write permission over network resources.
Creating a peering between subscriptions requires the responsible party to have write permission over the VNet on each side of the peering. The VNet-Prod side was created successfully because the engineer has Network Contributor on that subscription. The VNet-Analytics side fails because Reader does not include the necessary write action.
The information about 72 hours without changes and non-overlapping address spaces is intentionally irrelevant: none of these factors influence the authorization error. Alternative A represents a requirement escalation error: Owner is not necessary, only the write action over the specific VNet. Alternative C describes a behavior that does not exist in Azure: peerings successfully created on one side do not expire or get automatically invalidated. The most dangerous distractor is A: it would lead the engineer to request unnecessarily excessive permissions, violating the principle of least privilege.
Answer Key β Scenario 4β
Answer: A
The correct sequence is S, Q, R, T, P, and the reasoning follows the logic of progressive elimination from simplest to most specific:
S first: confirming that the VPN Gateway is operational eliminates the shared infrastructure failure hypothesis before investigating specific peering configurations.
Q second: verifying Allow gateway transit on the Hub identifies if transit permission is granted by the side that owns the gateway.
R third: verifying Use remote gateways on Spoke-Dev confirms if the spoke is configured to use the remote gateway.
T fourth: comparing with a working spoke is a validation step by contrast, useful after confirming whether individual options are configured or not, revealing subtle configuration discrepancies.
P last: checking effective routes is the final confirmation step, which validates if the data plane correctly reflects the control plane configurations.
Alternative B (P, S, Q, R, T) starts with effective routes, which would show the symptom but not guide the investigation before checking the base infrastructure. Alternative C starts with comparison, which is useful but assumes the engineer already knows what to compare. Alternative D inverts the logical order by investigating peering configurations before confirming the gateway is available.
Troubleshooting Tree: Implement VNet Peeringβ
Color legend:
| Color | Node type |
|---|---|
| Dark blue | Initial symptom (entry point) |
| Blue | Diagnostic question |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Intermediate validation or verification |
To use this tree when facing a real problem, start with the root node describing the connectivity failure symptom. Answer each question based on what you observe in the environment: portal status, effective routes on the NIC, active NSG rules, peering configurations on both sides. Follow the path that corresponds to what you see, without skipping validation steps. Each orange node represents a point where you collect evidence before deciding the next step. Red nodes confirm the cause and green nodes indicate the corrective action to execute.