Skip to main content

Troubleshooting Lab: Specify Azure Requirements for Always On VPN

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An organization has deployed Always On VPN with User Tunnel for approximately 200 remote employees. The VPN gateway in Azure uses the VpnGw2 SKU and is configured for authentication via Microsoft Entra ID with OpenVPN protocol. VPN profiles were distributed via Intune three weeks ago and worked correctly until yesterday.

Today, approximately 40% of users report that the User Tunnel is not automatically established after logon. The remaining 60% report no issues. The network team verified that there were no changes to the VPN gateway or profiles distributed by Intune. The security team informs that a Microsoft Entra Conditional Access policy was updated last night to require compliant devices for access to all corporate applications, including the Azure VPN application registered in Entra ID.

Affected users report the following behavior when trying to connect manually:

Connecting to VPN...
Authentication failed. Error: 0x80070522
The VPN connection requires device compliance. Contact your administrator.

The network administrator checks the gateway logs and finds no rejected connection attempts from the Azure side. The helpdesk team informs that all affected devices are newly acquired models, still in the process of complete enrollment in Intune.

What is the root cause of the observed problem?

A) The VpnGw2 SKU has reached the P2S simultaneous connection limit, causing intermittent rejection of new connections.

B) The affected devices do not satisfy the compliance condition required by the updated Microsoft Entra Conditional Access policy, preventing User Tunnel authentication.

C) The OpenVPN protocol is failing at the gateway after the Conditional Access policy update, affecting only part of the users randomly.

D) The VPN profiles distributed by Intune expired after three weeks of use, requiring redistribution to affected devices.


Scenario 2 β€” Action Decision​

The infrastructure team identified that the Device Tunnel of Always On VPN stopped working on a group of devices after an internal CA migration. The cause was confirmed: the root certificate of the new CA was not added as a trusted root certificate on the Azure VPN gateway. The affected devices continue with the machine certificate issued by the new CA, but the gateway rejects authentication.

The environment has the following constraints:

  • It's 8:30 AM on a Monday and the production environment is in full use
  • The affected Device Tunnel is the only channel through which domain controllers are reached to process Group Policies for remote devices
  • The PKI team has the new CA root certificate in .cer format ready for use
  • Recreating the VPN gateway would take approximately 45 minutes of total downtime
  • Adding a trusted root certificate to the VPN gateway does not require recreation and does not cause interruption of active connections

What is the correct action to take at this moment?

A) Recreate the Azure VPN gateway with the updated configuration including the new CA root certificate, accepting the 45 minutes of downtime.

B) Revoke certificates issued by the new CA and reissue all machine certificates from the previous CA until a maintenance window is scheduled.

C) Add the new CA root certificate as a trusted root certificate on the Azure VPN gateway immediately, without interrupting active connections.

D) Wait for a nighttime maintenance window to apply any changes to the VPN gateway, as production modifications during the day are contraindicated.


Scenario 3 β€” Root Cause​

An administrator is deploying Always On VPN for the first time in a branch office with 30 Windows 11 devices joined to local Active Directory domain. The configuration includes Device Tunnel and User Tunnel. The Azure VPN gateway uses IKEv2 for Device Tunnel and OpenVPN for User Tunnel.

After distributing profiles via Intune, the User Tunnel works correctly on all devices. The Device Tunnel, however, is not established on any of them. The administrator runs the following command on one of the affected devices:

Get-VpnConnection -AllUserConnection

Name : CorpDeviceTunnel
ServerAddress : vpngw-corp.vpn.azure.com
TunnelType : Ikev2
AuthenticationMethod : MachineCertificate
ConnectionStatus : Disconnected

Next, they check the certificate store:

Get-ChildItem -Path Cert:\LocalMachine\My | Select-Object Subject, EnhancedKeyUsageList

Subject EnhancedKeyUsageList
------- --------------------
CN=DESKTOP-A1B2C3 {Client Authentication}

The administrator also confirms that the Device Tunnel profile was created as an All User Connection and that the VPN service is running. The help desk informs that the devices are running Windows 11 version 22H2 and that the antivirus was updated the previous week.

What is the root cause of the Device Tunnel failure?

A) The recently updated antivirus is blocking IKEv2 communication on UDP port 500, preventing tunnel negotiation.

B) The Device Tunnel profile was created as All User Connection, which is an incorrect configuration that prevents its functioning; it should be created as an individual user connection.

C) The machine certificate in the LocalMachine\My repository has the Client Authentication EKU, but Device Tunnel requires the certificate to be in the LocalMachine\My repository with the specific Server Authentication EKU additionally configured on the gateway side.

D) The Device Tunnel requires that the account running the VPN service has local system (SYSTEM) privileges, and the All User Connection profile is the correct configuration; the cause is that the machine certificate does not have the Subject Alternative Name (SAN) attribute corresponding to the device's DNS name.


Scenario 4 β€” Diagnostic Sequence​

An administrator receives a report that a specific user's Always On VPN does not automatically reconnect after the device exits hibernation mode. The User Tunnel appears as disconnected and does not attempt reconnection. The Device Tunnel remains active.

The steps below represent valid diagnostic actions, but they are out of order:

  1. Check VPN client logs in %ProgramData%\Microsoft\VpnClient\Logs to identify the error code returned on reconnection attempt
  2. Confirm if the User Tunnel profile has the AlwaysOn property set to true in the XML profile distributed by Intune
  3. Verify if there is a Microsoft Entra Conditional Access policy that requires periodic reauthentication and if the user's token expired during hibernation
  4. Test manual User Tunnel connectivity to confirm if the problem is in automatic reconnection or in authentication itself
  5. Confirm that the RasMan (Remote Access Connection Manager) service restarted correctly after exiting hibernation and is in Running state

What is the correct investigation sequence?

A) 2 -> 5 -> 1 -> 4 -> 3

B) 1 -> 3 -> 2 -> 5 -> 4

C) 5 -> 1 -> 2 -> 4 -> 3

D) 4 -> 1 -> 5 -> 2 -> 3


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The decisive clue in the statement is the combination of two facts: the Microsoft Entra Conditional Access policy was updated to require device compliance, and the affected devices are newly acquired models still in the process of incomplete enrollment in Intune. Devices without complete enrollment do not have a compliance state verifiable by Entra ID, therefore they fail the Conditional Access condition and cannot complete User Tunnel authentication via OpenVPN.

The error message confirms the diagnosis: The VPN connection requires device compliance. The fact that the gateway does not register rejected attempts reinforces that the rejection occurs before reaching the gateway, at the Entra ID authentication layer.

The irrelevant information in the statement is the three-week functioning time of the Intune profiles. It leads the reader to seek a cause related to profile expiration or change, but no Intune profile expires due to usage time.

Alternative A is a plausible distractor, but the VpnGw2 SKU supports hundreds of simultaneous connections and the failure is consistent in a specific group of devices, not random. Alternative C is incorrect because Conditional Access operates at the identity layer, not at the gateway protocol layer. Alternative D describes behavior that does not exist in the platform.

The most dangerous distractor is A: an administrator who starts by checking gateway capacity will waste considerable time before identifying that the problem is device compliance.


Answer Key β€” Scenario 2​

Answer: C

The cause was identified and stated in the scenario: the new CA root certificate is not present as trusted in the VPN gateway. The correct solution is to add this certificate to the gateway immediately, as the scenario itself informs that this operation does not cause interruption of active connections and does not require gateway recreation.

The scenario constraints eliminate the other alternatives:

Alternative A proposes recreating the gateway, causing 45 minutes of total downtime during production hours. The correct operation does not require this, therefore alternative A represents choosing a maximum impact action when a no-impact alternative exists.

Alternative B proposes revoking the new CA certificates and reissuing them from the previous CA. Besides being complex and time-consuming, this action generates a larger PKI problem and does not adequately resolve the root cause.

Alternative D applies a non-existent restriction. Adding a trusted root certificate to the gateway is a non-disruptive operation that can and should be performed immediately when the cause is confirmed and the resource is available.

The real risk of choosing D is keeping the Device Tunnel inactive throughout the entire workday, preventing remote devices from reaching domain controllers and processing Group Policies, with cumulative impact growing throughout the day.


Answer Key β€” Scenario 3​

Answer: D

The scenario contains irrelevant information purposely included: the antivirus update the previous week. The Device Tunnel fails on all 30 devices uniformly, which rules out a localized cause like antivirus blocking, which rarely affects 100% of devices identically.

The decisive clue is in the collected data: the machine certificate exists in the correct repository (LocalMachine\My), has the Client Authentication EKU, and the profile is correctly configured as All User Connection. What the Get-ChildItem output does not show is the presence of the Subject Alternative Name (SAN) in the certificate. The Device Tunnel via IKEv2 requires that the machine certificate contain a SAN that corresponds to the computer name, as it is through this attribute that the gateway validates the machine's identity during IKEv2 negotiation.

Alternative A (antivirus) is the distractor based on irrelevant information. Alternative B is wrong because All User Connection is exactly the correct requirement for Device Tunnel. Alternative C is technically incorrect because Device Tunnel does not require Server Authentication EKU in the client's machine certificate; this EKU is necessary in the VPN server certificate.

The most dangerous distractor is A, as it leads the administrator to investigate and potentially disable antivirus in production without justification, creating an unnecessary security risk.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is: 2 -> 5 -> 1 -> 4 -> 3

The correct diagnostic reasoning goes from the simplest and most verifiable to the most complex:

Step 2 comes first because it verifies if the profile is correctly configured with AlwaysOn = true. If this property is absent or false, no automatic reconnection will occur, and the other steps become unnecessary.

Step 5 comes next because the RasMan service is the system dependency that manages VPN connections. If it did not restart correctly after hibernation, the tunnel will not attempt to reconnect regardless of profile configuration.

Step 1 follows because, with basic conditions confirmed, the logs will provide the exact error code, narrowing down the remaining hypotheses.

Step 4 comes after the logs because it empirically confirms whether the problem is automatic reconnection or authentication, information necessary to decide whether to investigate the profile or the identity layer.

Step 3 is last because it requires correlating behavior with Entra ID policies, a more laborious verification that only makes sense after eliminating more local and verifiable causes.

Alternative B reverses the order and starts with logs before checking basic conditions, which is inefficient. Alternative C starts with the service, skipping the fastest and most fundamental profile verification. Alternative D starts with manual testing, which consumes more time and does not provide diagnostic information without the context of previous steps.


Troubleshooting Tree: Specify Azure Requirements for Always On VPN​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

ColorNode Type
Dark BlueInitial symptom (entry point)
BlueDiagnostic question
RedIdentified cause
GreenRecommended action or resolution
OrangeIntermediate verification or validation

To use this tree when facing a real problem, start with the root node by identifying the observed symptom and answer each diagnostic question based on what is verifiable in the environment. Certificate questions should be validated in the device repository with PowerShell before any action. Compliance questions should be validated in the Intune portal. Gateway questions should be verified directly in the Azure portal. Follow the path to an identified cause node before executing any corrective action, avoiding interventions based on assumptions.