Troubleshooting Lab: Integrate a virtual hub with a third-party NVA for cloud connectivity
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
A company operates a Virtual WAN hub in the East US region with an integrated third-party NVA. The environment was configured three weeks ago and was functioning normally. After a configuration update performed by the security team on Friday night, branches connected via Site-to-Site VPN began reporting total loss of connectivity with workloads in spoke VNets on Monday morning.
The network team verified the following:
Hub: eastus-vwan-hub
NVA Status: Running (control plane healthy)
VPN Gateway: Active-Active, both instances UP
BGP Sessions (branch -> hub): Established
Spoke VNet A (10.10.0.0/16): Connected to hub
Spoke VNet B (10.20.0.0/16): Connected to hub
Routing Intent: Private Traffic -> NVA (enabled)
Connectivity test (branch 192.168.1.0/24 -> 10.10.1.5):
ICMP: Request timeout
Traceroute: Last hop reached = NVA internal IP (10.0.255.4)
Next hop: no response
The security team informed that the update involved reviewing the Network Security Groups associated with the NVA management subnet and rotating the credentials for accessing the appliance's administrative panel. The vendor confirmed that the NVA is receiving packets correctly.
What is the root cause of the connectivity loss?
A) The administrative panel credential rotation corrupted the BGP session state between the NVA and the hub, bringing down the routes advertised to the spokes.
B) A rule in the NSG of the management subnet is blocking the data traffic passing through the NVA, preventing packet forwarding to the spoke VNets.
C) The NVA is receiving packets, but a forwarding configuration or internal appliance policy was altered during maintenance, preventing packets from being forwarded to the next destination.
D) The Routing Intent was automatically disabled after the hub update, causing routes to the spokes to stop being programmed in the VPN connections.
Scenario 2 β Action Decisionβ
The cause of a failure has been accurately identified: the Routing Intent of a production Virtual WAN hub was configured with the private traffic policy pointing to the wrong NVA. There are two NVAs integrated into the hub: NVA-A, intended for branch traffic, and NVA-B, intended exclusively for internet traffic. The private policy is pointing to NVA-B.
The operational context is as follows:
Environment: Active production
Current impact: Branches can reach spokes, but traffic
is being inspected by the wrong NVA (NVA-B),
with no visible connectivity loss for users
Approved maintenance window: Next Friday, 11 PM
Current time: Tuesday, 2 PM
Emergency change permission: Not granted
Critical dependency: NVA-B sustains all outbound internet traffic
inspection for 3 mission-critical applications
What is the correct action to take at this moment?
A) Immediately redirect the Routing Intent for private traffic to NVA-A, as the incorrect configuration represents a security risk that justifies a change outside the maintenance window.
B) Wait for the approved maintenance window and document the current state, as there is no active connectivity loss, the change has no emergency approval, and a Routing Intent reconfiguration may temporarily affect NVA-B and the applications that depend on it.
C) Escalate to the NVA-B vendor the request to create a separate inspection policy within the appliance, so it can handle both types of traffic simultaneously until the maintenance window.
D) Create a static route in the hub pointing the spoke prefixes directly to NVA-A, bypassing the incorrectly configured Routing Intent without modifying it.
Scenario 3 β Root Causeβ
An engineer is integrating a third-party NVA recently approved by Microsoft's partner program into an existing Virtual WAN hub in the West Europe region. The NVA resource provisioning in the portal was completed with Succeeded status. However, when trying to configure the Routing Intent for private traffic pointing to the new NVA, the portal shows the following behavior:
Routing Intent configuration attempt:
Policy: Private Traffic
Next-hop: NVA-WE-01 (recently provisioned)
Result:
Error: "The selected resource cannot be used as a next-hop
for Routing Intent. Verify the NVA infrastructure
unit configuration."
Existing hub:
Azure Firewall: Not configured
Previous Routing Intent: Not configured
NVA scale units: 0 (recently provisioned, no active units)
Active VPN connections: 4 branches
VNet connections: 6 spokes
The engineer verifies that the NVA resource appears in the hub resource list and that the vendor confirmed the Marketplace offering is in the latest version. The hub was created 18 months ago and never had Routing Intent configured previously.
What is the root cause of the error when configuring Routing Intent?
A) The hub was created more than 12 months ago and requires an infrastructure version update before supporting Routing Intent with third-party NVAs.
B) The NVA was provisioned with zero active scale units, and Routing Intent requires that the NVA has at least one operational scale unit to be eligible as a next-hop.
C) The presence of active VPN and VNet connections in the hub prevents Routing Intent activation, as existing connections need to be removed and reconnected after intent configuration.
D) Routing Intent for private traffic can only be configured when there is also an Azure Firewall present in the hub, acting as a mandatory fallback.
Scenario 4 β Collateral Impactβ
An engineer identifies that traffic between branches and spokes is being routed through the NVA integrated into the hub, but with abnormally high latency (average of 180ms additional). After investigation, concludes that the NVA's scale units are undersized for the current traffic volume: the hub is processing 4.2 Gbps and the NVA is configured with 2 scale units, each supporting up to 1 Gbps.
To resolve this, the engineer increases the scale units from 2 to 6 directly in the portal, during business hours, without notifying dependent teams.
The scale units increase is applied successfully and the latency is resolved.
What secondary consequence can this action cause?
A) Increasing scale units automatically reconfigures the ASNs BGP used by the NVA, requiring all BGP sessions with branches to be manually reestablished.
B) During the process of adding new instances to the scale units, a brief redistribution of active connections between instances may occur, causing microinterruptions or TCP session resets in long-duration flows.
C) Increasing scale units above 4 units automatically disables the Routing Intent for private traffic, requiring manual reconfiguration after the operation.
D) The new scale units are added to a different subnet within the hub, changing the NVA's internal IPs and invalidating all static routes configured in the spoke VNets.
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: C
The decisive clue is in the traceroute: packets reach the NVA's internal IP and stop there, with no response from the next hop. This confirms that the NVA's control plane is healthy (BGP established, Running status, NVA receiving packets), but the data plane is failing in the appliance's internal forwarding.
The information about administrative panel credential rotation is the relevant clue pointing to an alteration in the NVA's internal configuration during maintenance: a forwarding policy, an internal ACL, or a routing rule within the appliance may have been modified or reset during administrative access.
The irrelevant information in the scenario is the review of management subnet NSGs: NSGs on management subnets control access to the NVA's control plane, not the data flow passing through the appliance. Since BGP is established and the NVA is receiving packets, the NSG is clearly not the blocker.
The most dangerous distractor is Alternative B, which focuses on the NSG precisely because it was mentioned in the maintenance. This is the classic pattern of confusing what was changed with what caused the problem. Acting on this hypothesis would lead to modifying NSGs without effect, delaying the real diagnosis.
Answer Key β Scenario 2β
Answer: B
The cause is already identified and stated in the problem. The question is exclusively about which action to take given the set of constraints. The critical constraints are: no approval for emergency change, no active connectivity loss for users, and a real dependency on NVA-B by mission-critical applications.
Reconfiguring the Routing Intent is an operation that momentarily affects the hub's data plane. Executing it without an approved window and without notification to dependent teams puts at risk the three applications that depend on NVA-B for internet inspection, even though the goal is to fix NVA-A.
Alternative A represents a technically correct decision in terms of final destination, but wrong in terms of timing and context, as it ignores governance constraints and real risk to NVA-B. This is the most common error in this type of scenario: prioritizing technical correction over operational risk management.
Alternative D is technically problematic because creating static routes that overlap with Routing Intent introduces inconsistency in the hub's routing model and can generate unpredictable behaviors when Routing Intent is corrected in the maintenance window.
Answer Key β Scenario 3β
Answer: B
The error returned by the portal is direct: "verify the NVA infrastructure unit configuration". The confirmed evidence in the hub state is that scale units are configured as 0. An NVA provisioned without active scale units has no data plane capacity and therefore cannot be eligible as a next-hop for Routing Intent. The "Succeeded" provisioning status indicates only that the resource was registered in the hub, not that it is operational for traffic forwarding.
The irrelevant information is the hub's age (18 months without Routing Intent configured). Compatibility with Routing Intent is not determined by the hub's creation date, but by the internal infrastructure version, which is automatically managed by Microsoft. Focusing on this detail would lead the engineer to open an unnecessary ticket.
The most dangerous distractor is Alternative C, which implies that existing connections would need to be removed. Acting on this hypothesis would cause real connectivity disruption for the 4 branches and 6 spokes connected, without solving the problem. The presence of active connections does not prevent Routing Intent activation in an existing hub.
Answer Key β Scenario 4β
Answer: B
When new instances are added to the scale units pool of an NVA integrated into a Virtual WAN hub, the internal load balancing redistributes active flows between existing and new instances. Long-duration flows, such as persistent TCP sessions, ongoing file transfers, or stateful application connections, may experience microinterruptions or resets during this redistribution.
This is the real and relevant collateral impact for teams operating applications sensitive to session interruptions, and is exactly the type of consequence that justifies prior notification and execution in a maintenance window, even for operations that seem to just "increase capacity".
Alternative A is incorrect because scale units do not alter BGP configurations or ASNs; the control plane remains stable. Alternative C is incorrect because Routing Intent does not have a scale units limit that automatically disables it. Alternative D is incorrect because new instances are added to the same logical pool managed by the hub, without addressing changes visible to spoke VNets.
Troubleshooting Tree: Integrate a virtual hub with a third-party NVA for cloud connectivityβ
Color Legend:
| Color | Meaning |
|---|---|
| Dark blue | Initial symptom (root node) |
| Blue | Diagnostic question |
| Red | Identified cause |
| Green | Recommended action or resolution |
To use this tree when facing a real problem, start with the root node that describes the observed symptom and answer each diagnostic question based on what you can verify directly in the portal, NVA logs, or traceroute results. Each branch eliminates a class of hypotheses and progressively leads you to the cause or corrective action, avoiding precipitated interventions based on the most obvious cause or the last item changed in the environment.