Troubleshooting Lab: Configure and interpret monitoring of virtual machines, storage accounts, and networks by using Azure Monitor Insights
Diagnostic Scenariosβ
Scenario 1 β Root Causeβ
The operations team opened a ticket reporting that the VM Insights dashboard for a Windows Server 2022 VM has not displayed data in the Performance tab for approximately two hours. The VM is responding normally to pings and RDP connections. The administrator accesses the portal and verifies the following:
VM: prod-win-app01
Region: East US
OS: Windows Server 2022 Datacenter
VM Insights: enabled 3 weeks ago
Log Analytics Workspace: la-prod-eastus (East US)
Extensions installed on VM:
- AzureMonitorWindowsAgent | Status: Provisioning succeeded
- DependencyAgentWindows | Status: Provisioning succeeded
Latest logs in workspace (KQL):
Heartbeat
| where Computer == "prod-win-app01"
| order by TimeGenerated desc
| take 5
Result: 0 records in the last 3 hours
The administrator also observes that the VM has an associated Data Collection Rule (DCR) created three weeks ago. The VM's resource group was moved to another subscription yesterday afternoon as part of an administrative reorganization. The Storage Account used for the VM's boot diagnostics is in a different region, but this resource was never used by VM Insights.
What is the root cause of the absence of data in VM Insights?
A) The Dependency Agent entered a silent failure state after the resource group move, interrupting only performance metrics collection.
B) Moving the resource group to another subscription unlinked the association between the VM and the Data Collection Rule, interrupting data transmission to the workspace.
C) The Log Analytics Workspace is in a different region than the new subscription, and regional compliance restrictions automatically blocked data ingestion.
D) The AzureMonitorWindowsAgent extension needs to be manually reinstalled whenever a VM is moved between resource groups.
Scenario 2 β Action Decisionβ
The security team identified that blob access logs from a critical Storage Account are not reaching the linked Log Analytics Workspace. The cause was confirmed by the administrator: the Storage Account's Diagnostic Settings were configured only for aggregate metrics, without including any resource log categories.
The environment has the following constraints:
- The Storage Account is actively used by a payments application in production, with 99.9% SLA
- Diagnostic Settings configuration is a control plane operation and does not affect the Storage Account's data plane
- The audit team requires logs to start being collected within a maximum of 4 hours
- The administrator has the Monitoring Contributor role scoped to the Storage Account
What is the correct action to take at this moment?
A) Create a new Diagnostic Setting on the Storage Account adding the StorageRead, StorageWrite, and StorageDelete categories directed to the workspace, without interrupting the application.
B) Recreate the Storage Account in a new resource group with correct diagnostic configurations from the start, migrating existing data to not lose history.
C) Request a maintenance window from the security team for the weekend, as changes to Diagnostic Settings of production resources require approval and may cause instability.
D) Enable Microsoft Defender for Storage as a temporary alternative, as it generates access logs automatically without needing to alter existing Diagnostic Settings.
Scenario 3 β Root Causeβ
An analyst is investigating why Network Insights does not display traffic flow data for a specific Network Security Group, even though the network topology appears correctly in the dashboard. She shares the following assessment:
NSG: nsg-prod-backend
Region: Brazil South
Network Watcher: enabled in Brazil South
NSG Flow Logs:
Status: Enabled
Storage Account: stflowlogsprod (Brazil South)
Retention: 7 days
Flow Log Version: Version 2
Traffic Analytics:
Status: Disabled
Workspace linked to Network Insights: la-prod-brazilsouth
Workspace created: 6 months ago
Latest KQL query in workspace: returned data from other sources normally
The analyst also mentions that the Network Watcher was recreated in the region two weeks ago after an accidental deletion, and that the Flow Logs Storage Account was created on the same day. The workspace is intact and receiving data from other sources without problems.
What is the root cause of the absence of flow data in Network Insights?
A) The Network Watcher was recently recreated and is still in initialization period, which prevents flow data display in Network Insights for up to 30 days.
B) The NSG Flow Log is configured with Version 2, which is not compatible with Network Insights in the Brazil South region.
C) Traffic Analytics is disabled, and without it the NSG Flow Log data is not processed or sent to the workspace for display in Network Insights.
D) The Flow Logs Storage Account was recently created and has not yet accumulated sufficient data for Network Insights to generate aggregated visualizations.
Scenario 4 β Diagnostic Sequenceβ
An administrator receives the following report: the Map tab of VM Insights for a Linux VM is empty, but the Performance tab displays data normally. The administrator needs to diagnose the problem following a logical and efficient sequence.
The available investigation steps are:
- Verify in the portal if the Dependency Agent is installed and has successful provisioning status on the VM
- Confirm that VM Insights is enabled and that the VM is associated with a Log Analytics Workspace
- Query the workspace with KQL to verify if the VMConnection table contains recent records from the VM
- Verify if the VM's Linux operating system is a distribution supported by the Dependency Agent
- Access the VM via SSH and verify the Dependency Agent process status directly in the operating system
Which sequence represents the most efficient and progressive diagnostic reasoning?
A) 2, 1, 4, 3, 5
B) 1, 2, 5, 3, 4
C) 3, 1, 4, 5, 2
D) 5, 4, 2, 1, 3
Answer Key and Explanationsβ
Answer Key β Scenario 1β
Answer: B
The definitive clue in the scenario is the absence of Heartbeat records in the last three hours, combined with the event that occurred the previous afternoon: moving the resource group to another subscription. When a VM is moved between subscriptions, Data Collection Rules associations are automatically unlinked because the DCR is a resource scoped to the source subscription. Without the associated DCR, the Azure Monitor Agent doesn't know where to send data, even though the extension shows "Provisioning succeeded" status in the portal.
The irrelevant information in the scenario is the boot diagnostics Storage Account in a different region. This resource has no relation to VM Insights or the Azure Monitor Agent, and was included purposely to misdirect the diagnosis.
The most dangerous distractor is D, which would lead the administrator to reinstall the extension without first investigating the DCR, wasting time without solving the real problem. Distractor C is superficially plausible, but regional compliance restrictions are not automatically applied by the platform based on resource movement.
The consequence of acting based on distractor D would be reinstalling the extension, verifying it provisions correctly, and continuing without data, as the real problem (DCR link) would remain intact.
Answer Key β Scenario 2β
Answer: A
The scenario explicitly states the cause: absence of resource log categories in Diagnostic Settings. The correct action is to add the necessary categories directly to the existing Diagnostic Setting or create a new one, an operation that is non-disruptive to the Storage Account's data plane. The scenario itself confirms this by informing that "Diagnostic Settings configuration is a control plane operation and does not affect the data plane".
Distractor B would be correct in a greenfield scenario, but recreating the Storage Account in production would cause severe interruption to the payments application, violating the SLA. Distractor C represents the error of overestimating the risk of a non-disruptive operation and would miss the 4-hour deadline imposed by the audit team. Distractor D is technically plausible as an additional security layer, but Defender for Storage does not replace diagnostic logs for detailed auditing of individual operations.
The Monitoring Contributor role has sufficient permission to create and modify Diagnostic Settings, so there is no permission block that would justify any other approach.
Answer Key β Scenario 3β
Answer: C
The assessment table explicitly shows that Traffic Analytics has Disabled status. This is the component that processes raw NSG Flow Log data stored in the Storage Account and sends it to the Log Analytics workspace in a format queryable by Network Insights. Without active Traffic Analytics, the data flow simply doesn't reach the workspace, regardless of all other components being correct.
The irrelevant information is the Network Watcher recreation two weeks ago. This event does not affect data display in Network Insights when Traffic Analytics is correctly configured. The recreated Network Watcher is already functional, as evidenced by the NSG Flow Log being enabled.
Distractor A is the most dangerous because it creates a false expectation that the problem will resolve itself with time, making the administrator wait without taking action. There is no 30-day initialization period for Network Watcher. Distractor B is factually incorrect: Version 2 is the recommended and supported version in all regions. Distractor D confuses the absence of Traffic Analytics with absence of data in the Storage Account, which are completely distinct problems.
Answer Key β Scenario 4β
Answer: A
The correct sequence is 2, 1, 4, 3, 5, which follows the principle of progressive diagnosis: start from the outermost and most visible level before advancing to more invasive or specific checks.
Step 2 confirms the basic state: VM Insights is enabled and the workspace is linked. Without this, all subsequent steps are irrelevant.
Step 1 verifies the presence and status of the Dependency Agent in the portal, quickly identifying if the component is missing or has provisioning failure.
Step 4 verifies Linux distribution compatibility with the Dependency Agent, as an apparently successful installation may be inactive on an unsupported distro.
Step 3 queries the VMConnection table in the workspace to confirm if any data is arriving, objectively validating the diagnosis before accessing the VM.
Step 5 is the last step because it requires direct VM access via SSH, which is more invasive, requires credentials, and is unnecessary if previous steps already located the problem.
Sequence B makes the error of going to SSH (step 5) before checking OS compatibility (step 4), wasting time on an operational check when the problem may be architectural. Sequence C starts with the KQL query, which only makes sense after confirming the basic environment is configured. Sequence D starts with the most invasive step, completely inverting the progressive diagnostic logic.
Troubleshooting Tree: Configure and interpret monitoring of virtual machines, storage accounts, and networks by using Azure Monitor Insightsβ
Legend:
| Color | Meaning |
|---|---|
| Dark blue | Initial symptom (entry point) |
| Blue | Diagnostic question (path decision) |
| Red | Identified cause |
| Green | Recommended action or resolution |
| Orange | Validation or intermediate verification |
When facing a real problem, start with the root node identifying which resource is missing data: VM, network, or storage. From there, follow the blue questions answering based on what you observe in the portal or logs. Each answer eliminates a set of hypotheses and narrows the path to a red cause node or green action node. Orange nodes indicate that waiting or validation is necessary before concluding the diagnosis. Never skip levels: progressive diagnosis prevents corrective actions applied to the wrong component.