Skip to main content

Troubleshooting Lab: Configure encryption at host for Azure virtual machines

Diagnostic Scenarios​

Scenario 1 β€” Root Cause​

An administrator receives a security request to enable encryption at host on an existing production VM called vm-prod-app01, which is in the East US 2 region and uses the Standard_D4s_v3 size. The VM is configured with two managed data disks and an OS disk, all using SSE with PMK. The subscription's Key Vault is configured with soft delete and purge protection enabled.

The administrator runs the following command:

az vm update \
--resource-group rg-producao \
--name vm-prod-app01 \
--set securityProfile.encryptionAtHost=true

The returned output is:

(BadRequest) EncryptionAtHost is not supported for the VM size Standard_D4s_v3 
in this region. Please deallocate the VM and retry with a supported VM size.
Code: BadRequest

The administrator verifies that the EncryptionAtHost feature flag is with status Registered on the subscription and that the Microsoft.Compute provider was registered correctly after registering the feature. The infrastructure team confirms that another VM in the same subscription, with size Standard_E4s_v3, received encryption at host without problems the previous week.

What is the root cause of the observed error?

A) The VM was not deallocated before running the az vm update command, which prevents any changes to the security profile.

B) The Standard_D4s_v3 size does not support encryption at host in the East US 2 region, and the cause is not related to feature registration nor Key Vault configuration.

C) The Key Vault needs to have an explicit access policy for the VM's managed identity before encryption at host can be enabled.

D) The Microsoft.Compute provider was registered, but the EncryptionAtHost feature flag requires a second propagation cycle of up to 24 hours before being fully operational.


Scenario 2 β€” Action Decision​

The security team identified that a critical database VM (vm-sqlprod-01) is operating without encryption at host enabled. The cause was confirmed: the VM never had the feature enabled since its creation eight months ago. The subscription already has the feature flag registered and the provider updated. The VM size is Standard_M32ms, which appears on the list of supported SKUs for encryption at host.

The operational context is as follows:

FactorDetail
Current VM stateRunning
Available maintenance windowNext window in 6 days
Impact of immediate stopInterruption of active production transactions
Deallocation permissionRequires change management approval
Urgency declared by securityHigh, but without immediate SLA deadline

What is the correct action to take at this moment?

A) Execute az vm update with the securityProfile.encryptionAtHost=true parameter immediately, since VMs with managed OS disk accept this change without deallocation.

B) Deallocate the VM now without formal approval, since the high-declared security risk justifies bypassing the change management process.

C) Register the need in the change management process, await approval and execute the deallocation and encryption at host enablement within the maintenance window in 6 days.

D) Enable Azure Disk Encryption (ADE) immediately as a substitute for encryption at host, since ADE can be applied without deallocation and covers the same protection scope.


Scenario 3 β€” Root Cause​

A Windows VM called vm-dev-win01 is configured with encryption at host enabled and uses a Disk Encryption Set (des-cmk-dev) with customer-managed keys stored in the Key Vault kv-dev-eastus. The VM was provisioned three weeks ago without problems.

This morning, the operations team reports that they cannot initialize the VM. The Azure portal displays the following status:

Provisioning State: Failed
Status Message: The key vault key used for disk encryption
is currently not accessible. Ensure that the key vault is
not soft-deleted and has not had its access revoked.

The administrator verifies in the portal that the Key Vault kv-dev-eastus is visible and active. They also confirm that the Disk Encryption Set des-cmk-dev appears as configured on the VM. Yesterday afternoon, a security engineer rotated the Key Vault keys as part of a quarterly credential renewal process. The same engineer also disabled public network access to the Key Vault as an additional hardening measure, restricting access only via Private Endpoint. The Private Endpoint has not yet been provisioned.

What is the root cause of the VM initialization failure?

A) Key rotation in the Key Vault automatically invalidated the previous version of the key referenced by the Disk Encryption Set, making the disks inaccessible.

B) Public network access to the Key Vault was removed without the Private Endpoint being available, making the Key Vault inaccessible by the Azure Compute control plane.

C) The Disk Encryption Set lost its association with the Key Vault after key rotation, requiring manual reconfiguration of the key reference.

D) The key rotation process corrupted the encryption metadata of the managed disks, requiring restoration from snapshot.


Scenario 4 β€” Diagnostic Sequence​

An administrator receives an alert: the creation of a new VM with encryption at host enabled is failing in a recently provisioned subscription. They need to diagnose the cause efficiently.

The available investigation steps are:

  • P1: Verify if the selected VM SKU is on the list of sizes that support encryption at host in the target region.
  • P2: Execute az feature show --namespace Microsoft.Compute --name EncryptionAtHost to check the feature flag status.
  • P3: Execute az provider show --namespace Microsoft.Compute --query "registrationState" to verify the provider state after feature registration.
  • P4: Try creating the VM again with the same parameter and capture the complete error message to identify the specific failure code.
  • P5: Check in the portal if there's an Azure Policy assigned to the subscription scope that denies or audits VM creation without encryption at host, as this could block the operation even with everything configured.

Which diagnostic sequence is the most efficient and logically correct?

A) P4 β†’ P2 β†’ P3 β†’ P1 β†’ P5

B) P2 β†’ P3 β†’ P1 β†’ P4 β†’ P5

C) P1 β†’ P5 β†’ P2 β†’ P4 β†’ P3

D) P5 β†’ P1 β†’ P4 β†’ P2 β†’ P3


Answer Key and Explanations​

Answer Key β€” Scenario 1​

Answer: B

The error message is explicit in indicating that the Standard_D4s_v3 size does not support encryption at host in the East US 2 region. Encryption at host support depends on both the VM SKU and regional availability, and not all D-series sizes support the feature in all regions.

The confirmatory clue is in the command output itself: the BadRequest code with the description EncryptionAtHost is not supported for the VM size Standard_D4s_v3. The information about the Standard_E4s_v3 VM in the same subscription reinforces that the problem is SKU-specific, not subscription or environment-related.

The information about the Key Vault with soft delete and purge protection is irrelevant in this scenario. These attributes are prerequisites for using customer-managed keys (CMK), but this VM is using PMK, and the failure occurred before any interaction with the Key Vault.

The most dangerous distractor is A, which attributes the error to the absence of deallocation. The returned error is BadRequest due to SKU incompatibility, not a VM state error. Acting based on distractor A would make the administrator unnecessarily deallocate the production VM and still encounter the same error.


Answer Key β€” Scenario 2​

Answer: C

The critical constraint in the scenario is that VM deallocation requires formal change management approval and there's a maintenance window available in 6 days. The urgency declared by security is high, but without immediate SLA, meaning the risk doesn't justify bypassing established operational controls.

Enabling encryption at host on an existing VM requires mandatory deallocation. Therefore, any action that ignores this requirement results in error, as demonstrated by distractor A.

Distractor D represents the most dangerous diagnostic error: ADE and encryption at host don't cover the same scope. Enabling ADE doesn't eliminate the protection gap of disk caches on the host and doesn't substitute encryption at host. Adopting this alternative would generate a false sense of compliance without solving the real problem.

Distractor B violates the principle of operational governance. Even facing security risks, bypassing change management in production without approval can create greater risks than the problem it tries to solve.


Answer Key β€” Scenario 3​

Answer: B

The root cause is blocking network access to the Key Vault without the replacement Private Endpoint being provisioned. The Azure Compute control plane needs to access the Key Vault to retrieve encryption keys at VM initialization time. Without network connectivity to the Key Vault, this process fails regardless of whether the keys are intact.

The confirmatory clue is in the sequence of events: the failure occurred after yesterday's change, which included two simultaneous events: key rotation and removal of public access. Key rotation, when done correctly with versioning in Key Vault and updating the reference in the Disk Encryption Set, doesn't invalidate existing disks. Network blocking, on the other hand, is immediate and total.

The information about the Disk Encryption Set appearing as configured in the portal is irrelevant for diagnosis: it confirms that the association wasn't lost, which eliminates distractor C. The Key Vault's visibility in the portal is also irrelevant, as the portal accesses the Key Vault via the management plane, not through the same path that Azure Compute uses during VM initialization.

The most dangerous distractor is A. An administrator who assumes that key rotation caused the problem might try to revert the rotation or create a new key version, wasting time while the real network blocking remains active.


Answer Key β€” Scenario 4​

Answer: A

The correct sequence is P4 β†’ P2 β†’ P3 β†’ P1 β†’ P5.

The first step should be capturing the complete error message (P4), as the specific failure code directs all subsequent investigation. Without this information, diagnosis is blind.

With the error in hand, verify the feature flag status (P2), as this is the most common prerequisite to fail in new subscriptions. Next, confirm provider registration (P3), which is the second mandatory step frequently forgotten. Only then does it make sense to check SKU restrictions (P1), as this error has its own distinctive message.

P5 (Azure Policy verification) is the last step because it's least likely in a newly provisioned subscription, and its verification only adds value after basic prerequisites have been ruled out.

Sequence B seems logical at first glance because it starts with prerequisites, but ignores that without the exact error (P4), the administrator might check feature flag and provider and erroneously conclude everything is correct, without identifying that the real problem might be in the SKU or a policy. Starting with the concrete symptom, P4, is the correct diagnostic discipline.


Troubleshooting Tree: Configure encryption at host for Azure virtual machines​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Color Legend:

  • Dark blue: initial symptom, investigation entry point
  • Medium blue: diagnostic question, decision node
  • Red: identified cause requiring correction
  • Green: recommended action or problem resolution
  • Orange: validation or intermediate verification before acting

To use this tree when facing a real problem, start at the root node and answer each question based on what you observe in the environment, not what you assume. Follow the path that corresponds to the current system state until reaching a red node (cause) or green node (action). If the path taken doesn't match the observed symptom, return to the last decision node and review the premise before proceeding.