Theoretical Foundation: Configure and interpret monitoring of virtual machines, storage accounts, and networks by using Azure Monitor Insights
1. Initial Intuitionβ
Imagine you are responsible for a physical data center. You have panels with lights indicating each server's status, audio alerts when a disk is full, and daily network consumption reports. Without this, you would only discover problems when an application went down.
In Azure, the equivalent of this control panel is Azure Monitor. It collects performance, availability, and behavior data from all your resources, and Azure Monitor Insights is the specialized layer that transforms this raw data into ready-to-use visualizations and analyses, specific by resource type.
The difference between generic Azure Monitor and Azure Monitor Insights is this: Monitor is the collection and analysis engine; Insights is the dashboard already configured and optimized for each resource type (VMs, Storage, Networks).
2. Contextβ
Azure Monitor exists because cloud resources are dynamic, distributed, and ephemeral. A VM might be consuming 100% CPU without anyone noticing. A storage account might be generating abnormal latency. A VNet might have traffic silently blocked by an NSG.
Without observability, you operate in the dark.
Azure Monitor positions itself as the central observability platform of Azure, collecting three types of data:
- Metrics: numerical values collected at regular intervals (e.g., CPU %, network bytes).
- Logs: structured records of events and operations (e.g., VM login, blob operation).
- Traces: distributed execution traces in applications (more relevant for Application Insights).
Azure Monitor Insights consumes this data and presents it in pre-built visual experiences, without you needing to create dashboards from scratch.
3. Building Conceptsβ
3.1 The Foundation: Metrics vs Logsβ
Before diving into specific Insights, it's essential to distinguish these two pillars:
| Characteristic | Metrics | Logs |
|---|---|---|
| Data type | Numerical, time series | Structured text, records |
| Default retention | 93 days | Configurable (30 to 730 days) |
| Latency | Seconds | 1 to 2 minutes |
| Cost | Included (basic) | Based on ingested volume |
| Query | Metrics Explorer | Log Analytics (KQL) |
| Examples | CPU %, Disk IOPS, Bytes/s | User login, application error, resource creation |
Metrics are ideal for real-time alerts and trend visualizations. Logs are ideal for deep diagnosis, event correlation, and auditing.
3.2 Log Analytics Workspaceβ
The Log Analytics Workspace is the central repository where Azure Monitor logs are stored and queried. It's an Azure resource that you create in a region and subscription.
Important characteristics:
- Multiple resources can send data to the same workspace.
- Queries are made in KQL (Kusto Query Language).
- The workspace defines data retention policy.
- VM and Network Insights depend on a configured workspace.
3.3 Diagnostic Settingsβ
For logs and some detailed metrics to reach the Log Analytics Workspace, you need to configure Diagnostic Settings on each resource.
Each Diagnostic Setting defines:
- What to collect: log and metric categories.
- Where to send: Log Analytics Workspace, Storage Account, Event Hub, or Partner Solutions.
Non-obvious point: platform metrics are automatically collected by Azure without configuration. But resource logs and detailed metrics require explicit Diagnostic Settings.
4. VM Insightsβ
4.1 What it isβ
VM Insights is the specialized monitoring experience for virtual machines and Virtual Machine Scale Sets. It provides ready-made performance visualizations and dependency maps between processes and network connections.
4.2 Prerequisite: Azure Monitor Agentβ
For VM Insights to work, each VM needs to have the Azure Monitor Agent (AMA) installed. AMA replaces legacy agents (MMA and standalone Dependency Agent) and is the currently recommended approach.
AMA collects:
- Operating system performance metrics (CPU, memory, disk, network).
- Windows Event logs and Linux Syslog.
- Process and network connection data (for the Map feature).
AMA installation can be done via:
- VM extension in the portal.
- Azure Policy (for scale deployment).
- Bicep/ARM/Terraform.
4.3 Data Collection Rules (DCR)β
AMA doesn't collect data autonomously. It needs a Data Collection Rule (DCR) that defines what to collect, how to transform, and where to send.
4.4 What VM Insights showsβ
VM Insights presents three main tabs:
Performance Tab: Displays ready-made performance charts for each VM or group of VMs:
- CPU Utilization %
- Available Memory (MB)
- Bytes Sent/Received per second
- Disk I/O (reads and writes)
- Logical Disk Space Used %
This data comes from metrics collected by AMA and can be filtered by time range, subscription, resource group, or individual VM.
Map Tab: Displays a visual dependency map: what processes are running on the VM and which active TCP network connections exist, including source, destination, and port. This is valuable for:
- Discovering undocumented communications between services.
- Mapping dependencies before a migration.
- Diagnosing connectivity failures between applications.
Overview Tab (Get Started): Shows VM Insights enablement status per VM, indicating which have the agent installed and configured.
4.5 Relevant KQL tables for VMsβ
When querying VM data in Log Analytics, the main tables are:
| Table | Content |
|---|---|
Perf | Performance metrics collected by agent (CPU, memory, disk) |
Event | Windows Event Log events |
Syslog | Linux system logs |
VMConnection | TCP connections to and from the VM (requires Map feature) |
VMProcess | Processes running on the VM |
Heartbeat | Agent heartbeat signal every minute |
Example KQL query to check VMs without heartbeat in the last 5 minutes:
Heartbeat
| where TimeGenerated > ago(5m)
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| where LastHeartbeat < ago(5m)
5. Storage Insightsβ
5.1 What it isβ
Storage Insights (part of Azure Monitor for Storage) provides a unified view of Storage Account performance, capacity, and availability. It works for Blob, File, Queue, and Table storage.
5.2 What it monitorsβ
Storage Insights collects two types of data:
Transaction metrics (collected automatically, without Diagnostic Settings):
- Availability %
- Total transactions
- Average and end-to-end latency (E2E Latency)
- Error rate (ServerErrors, ClientErrors)
Capacity metrics (collected automatically):
- Used Capacity
- Blob Count
- Container Count
Resource logs (require enabled Diagnostic Settings):
- StorageRead, StorageWrite, StorageDelete: detailed operations by blob, file, or queue.
- Includes: source IP, operation, status, response time, object size.
5.3 Storage Insights Structural Viewβ
5.4 Relevant KQL tables for Storageβ
| Table | Content |
|---|---|
StorageBlobLogs | Blob operations (read, write, delete) |
StorageFileLogs | Azure Files operations |
StorageQueueLogs | Queue operations |
StorageTableLogs | Table operations |
Example KQL query to identify the top 10 IPs with most blob read operations:
StorageBlobLogs
| where OperationName == "GetBlob"
| summarize Count = count() by CallerIpAddress
| top 10 by Count desc
5.5 Non-obvious Storage Insights behaviorβ
Storage Insights aggregates data from multiple storage accounts in a single view, allowing performance comparison between accounts. This is valuable in environments with dozens of storage accounts.
Another important point: transaction metrics granularity is 1 minute. For latency analysis in critical operations, this may be sufficient to identify spikes, but not for exact correlation with specific log events.
6. Network Insightsβ
6.1 What it isβ
Network Insights (Azure Monitor for Networks) provides a topological and health view of all Azure network resources in a subscription or defined scope. It's the centralized experience for monitoring VNets, NSGs, Load Balancers, Application Gateways, VPN Gateways, ExpressRoute, Private Endpoints, and much more.
6.2 Network Insights Overviewβ
Network Insights is organized in tabs:
Overview: Presents a health summary of all network resources grouped by type: how many are healthy, in alert, or critical state.
Connectivity: Allows testing and visualizing connectivity between resources using Connection Monitor. Shows latency, packet loss, and reachability status between source and destination.
Traffic: Integrates data from NSG Flow Logs and Traffic Analytics to visualize traffic patterns. Shows the most active IP pairs, protocols used, and blocked vs allowed flows.
Diagnostic Toolkit: Groups diagnostic tools like IP Flow Verify, Next Hop, Effective Routes, Security Group View, and Packet Capture, all provided by Network Watcher.
Topology: Displays an interactive visual diagram of network topology: VNets, subnets, VMs, Load Balancers, Gateways, and their connections.
6.3 NSG Flow Logs and Traffic Analyticsβ
NSG Flow Logs is a Network Watcher feature that records information about IP traffic passing through an NSG. Each record includes:
- Source and destination IP
- Source and destination port
- Protocol
- NSG decision (allow/deny)
- Direction (inbound/outbound)
- Volume of bytes and packets (version 2)
Flow Logs are sent to a Storage Account. For interactive analysis, you enable Traffic Analytics, which processes flow logs and sends them to Log Analytics Workspace.
Traffic Analytics KQL table:
| Table | Content |
|---|---|
AzureNetworkAnalytics_CL | Flows processed by Traffic Analytics |
Example query to find traffic blocked by NSGs:
AzureNetworkAnalytics_CL
| where FlowStatus_s == "D"
| summarize BlockedFlows = count() by NSGName_s, DestPort_d
| sort by BlockedFlows desc
6.4 Connection Monitorβ
Connection Monitor is a Network Watcher feature that performs continuous connectivity tests between sources and destinations. It replaces legacy Connection Monitor (classic) and Network Performance Monitor features.
Components:
- Test Group: set of sources and destinations with success criteria.
- Test Configuration: protocol (TCP, HTTP, ICMP), port, test frequency.
- Sources: VMs with Azure Monitor Agent, Arc-enabled servers.
- Destinations: IP address, FQDN, URL, or Azure resource.
Connection Monitor produces metrics for:
- Checks Failed %: percentage of tests that failed.
- Round Trip Time (ms): measured latency.
This data appears in the Connectivity tab of Network Insights.
6.5 Network Watcher: Diagnostic Toolsβ
Network Watcher is the underlying service that provides network diagnostic tools in Azure. It's automatically enabled per region when you create the first network resource.
| Tool | What it does | When to use |
|---|---|---|
| IP Flow Verify | Tests if traffic would be allowed or blocked by NSG | VM can't connect on a port |
| Next Hop | Shows next routing hop for a destination | Diagnose incorrect routing |
| Effective Routes | Lists all effective routes for a NIC | Understand VM's actual routing |
| Security Group View | Lists all effective NSG rules for a NIC | Security auditing |
| Packet Capture | Captures network packets from a VM | Deep protocol diagnosis |
| Connection Troubleshoot | Point-in-time connectivity test between source and destination | Quick connectivity check |
7. Azure Monitor Alertsβ
Alerts are Azure Monitor's automated response layer. They monitor conditions and trigger actions when those conditions are met.
7.1 Alert Componentsβ
7.2 Alert Typesβ
| Type | Data source | Latency | Typical use |
|---|---|---|---|
| Metric Alert | Platform Metrics | Seconds | High CPU, full disk, storage latency |
| Log Alert | Log Analytics (KQL) | 1 to 5 minutes | Error events, missing heartbeat |
| Activity Log Alert | Activity Log | Minutes | Resource deletion, configuration change |
| Smart Detection Alert | Application Insights | Automatic | Application anomalies |
7.3 Action Groupsβ
An Action Group defines actions that will be executed when an alert is triggered. The same Action Group can be reused across multiple Alert Rules.
Action types:
- Notification: Email, SMS, Azure app push, Voice call.
- Automated action: Azure Function, Logic App, Webhook, Automation Runbook, ITSM.
Action Groups best practices:
- Create Action Groups by team or severity, not by resource.
- A "Critical" Action Group can notify multiple channels simultaneously.
- Action Groups support Rate Limiting to avoid notification floods.
8. Workbooksβ
Workbooks are interactive and parameterizable reports within Azure Monitor. VM Insights, Storage Insights, and Network Insights use Workbooks internally to display their visualizations.
You can:
- Use ready-made Workbooks (templates) provided by Azure.
- Customize existing Workbooks.
- Create Workbooks from scratch combining metrics, KQL logs, text, and parameters.
Workbooks support:
- Time series charts.
- KQL result tables.
- Geographic maps.
- KPI tiles.
- Interactive filter parameters (subscription, resource group, time range).
9. Implementation Methodsβ
9.1 Azure Portalβ
When to use: initial configuration, dashboard exploration, ad-hoc analysis, interactive Workbooks.
Path to VM Insights:
Azure Monitor > Insights > Virtual Machines
Path to Storage Insights:
Azure Monitor > Insights > Storage Accounts
Path to Network Insights:
Azure Monitor > Insights > Networks
Limitations: manual configuration for each resource, no repeatability.
9.2 Azure CLIβ
Enable Diagnostic Settings via CLI:
az monitor diagnostic-settings create \
--name "MyDiagSettings" \
--resource "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Compute/virtualMachines/{vmName}" \
--workspace "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.OperationalInsights/workspaces/{workspaceName}" \
--logs '[{"category": "Administrative", "enabled": true}]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]'
Create an Alert Rule via CLI:
az monitor metrics alert create \
--name "HighCPUAlert" \
--resource-group MyRG \
--scopes "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Compute/virtualMachines/{vmName}" \
--condition "avg Percentage CPU > 90" \
--window-size 5m \
--evaluation-frequency 1m \
--action "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Insights/actionGroups/{agName}"
9.3 Azure Policy for scale enablementβ
To ensure all VMs in a subscription or management group have AMA installed and VM Insights enabled, use Azure Policy with DeployIfNotExists effect.
Relevant policies available in the library:
Configure Windows virtual machines to run Azure Monitor AgentConfigure Linux virtual machines to run Azure Monitor AgentConfigure VM Insights data collection rule association
This creates a remediation task that automatically applies the agent and DCR to existing and new VMs.
9.4 Bicep / ARM for Diagnostic Settingsβ
resource diagnosticSetting 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
name: 'MyDiagSettings'
scope: storageAccount
properties: {
workspaceId: logAnalyticsWorkspace.id
logs: [
{
category: 'StorageRead'
enabled: true
}
{
category: 'StorageWrite'
enabled: true
}
{
category: 'StorageDelete'
enabled: true
}
]
metrics: [
{
category: 'Transaction'
enabled: true
}
]
}
}
10. Control and Securityβ
Log Analytics Workspace Access:
- The workspace has independent RBAC from the resources that send data to it.
- The Log Analytics Reader role allows querying data but not modifying configurations.
- The Log Analytics Contributor role allows full workspace management.
- You can configure table-level RBAC to restrict access to specific tables within the workspace.
Sensitive data in logs:
- Storage logs may contain client IPs, blob names, and access patterns.
- NSG Flow Logs contain network traffic information, including internal and external IPs.
- Define appropriate retention policies (minimum necessary) for sensitive data.
Workspace access diagnostics:
- Use Azure Monitor Private Link Scope (AMPLS) to ensure that resource data is sent to the workspace via private network, without traffic over the public internet.
11. Decision Makingβ
When to enable each type of collectionβ
| Situation | What to enable | Reason |
|---|---|---|
| Alert when VM CPU exceeds 90% | Platform Metrics + Metric Alert | Metrics have seconds latency, ideal for reactive alerts |
| Diagnose why a VM went offline | VM Insights + Heartbeat table | Heartbeat records agent availability minute by minute |
| Audit who deleted a blob in Storage | Diagnostic Settings with StorageDelete logs | Resource logs record operations with identity and timestamp |
| Identify which IPs are being blocked by NSG | NSG Flow Logs + Traffic Analytics | Flow Logs record each NSG decision |
| Monitor latency between two services | Connection Monitor | Continuous tests with latency and loss history |
| Understand VM dependencies before migration | VM Insights Map feature | Shows TCP connections and active processes in real time |
| Monitor availability of multiple storage accounts | Storage Insights | Aggregated view without creating individual dashboards |
When to use metrics vs logs for alertsβ
| Criteria | Metrics | Logs (KQL) |
|---|---|---|
| Required latency | Seconds | Minutes |
| Condition type | Numeric threshold | Complex condition, correlation |
| Example | CPU > 90% for 5 min | More than 10 login failures in 1 hour |
| Evaluation cost | Included | Based on volume of data analyzed |
12. Best Practicesβ
Log Analytics Workspace:
- Use a centralized workspace per environment (prod, dev) instead of one per resource. Facilitates event correlation and reduces management cost.
- Configure data retention according to compliance requirements. Default is 30 days; many regulations require 90 or 365 days.
- Monitor data ingestion cost. Use Log Analytics Workspace Insights (yes, there's an Insights for the workspace itself) to identify tables generating more volume.
VM Insights:
- Deploy AMA via Azure Policy to ensure automatic coverage of new VMs.
- Use centralized and reusable DCRs, not one per VM.
- Monitor the
Heartbeattable to detect VMs with stopped agent or offline VM.
Storage Insights:
- Enable resource logs only for storage accounts with sensitive data or audit requirements. Storage logs cost per volume.
- For general-purpose storage accounts, automatic transaction metrics already cover most availability and latency monitoring cases.
Network Insights:
- Enable NSG Flow Logs version 2 (not version 1). Version 2 includes bytes and packets data, essential for Traffic Analytics.
- Configure Traffic Analytics with 10-minute processing interval for more up-to-date visualizations.
- Use Connection Monitor to monitor critical connectivity paths (VM to database, VM to external endpoint).
Alerts:
- Avoid creating alerts directly on individual resources. Prefer alerts on resource groups or subscriptions when possible.
- Configure alert suppression (alert suppression / action rule) for planned maintenance windows.
- Use consistent alert severities (Sev 0 = critical, Sev 4 = informational) and map to different Action Groups.
13. Common Errorsβ
Not configuring Diagnostic Settings and expecting logs to arrive automatically: Platform Metrics arrive without configuration, but Resource Logs don't. Many people enable VM Insights and can't see log data because they forgot Diagnostic Settings.
Confusing Log Analytics Workspace with Storage Account as log destination: You can send logs to Storage Account (for cheap archival) or to Log Analytics (for interactive analysis). Logs in Storage don't appear in Network Insights or VM Insights. Logs need to go to Log Analytics Workspace to be queryable via KQL and displayed in Insights.
Installing the legacy agent (MMA) instead of AMA: The Microsoft Monitoring Agent (MMA) is in deprecation process. For new deployments, always use Azure Monitor Agent (AMA) with DCRs.
Forgetting to link Connection Monitor to a Log Analytics Workspace: Without this link, Connection Monitor test data is not stored and doesn't appear in Network Insights.
NSG Flow Logs enabled but Traffic Analytics not configured: Flow Logs stay in Storage Account as JSON files, but don't appear in Network Insights Traffic tab. Traffic Analytics needs to be enabled separately and pointed to a Log Analytics Workspace.
Creating an Alert Rule without Action Group: The alert triggers, is visible in the portal, but no one gets notified. Action Group is mandatory for notifications or automated actions.
Misinterpreting "Available Memory":
In VM Insights, "Available Memory" is memory available for new allocations. In Linux, low values don't always indicate a problem, as the kernel uses free memory as disk cache (buffer/cache). The relevant metric in Linux is MemAvailable, not MemFree.
14. Operation and Maintenanceβ
Check monitoring coverage:
- In VM Insights (Get Started tab), verify which VMs have the agent installed and configured.
- Use the query
Heartbeat | summarize LastHeartbeat = max(TimeGenerated) by Computerto identify VMs without recent heartbeat.
Check data ingestion:
Usage
| where TimeGenerated > ago(24h)
| summarize TotalGB = sum(Quantity) / 1000 by DataType
| sort by TotalGB desc
Important limits:
| Resource | Limit |
|---|---|
| Maximum retention in Log Analytics | 730 days (with Archive up to 12 years) |
| Metric alerts per subscription | 5,000 |
| Action Groups per subscription | 2,000 |
| NSG Flow Logs retained in Storage | Defined by you (storage cost) |
| Minimum metric alert evaluation frequency | 1 minute |
| Minimum log alert evaluation frequency | 1 minute |
15. Integration and Automationβ
Azure Monitor + Azure Automation:
- Alerts can trigger Azure Automation Runbooks for automatic remediation actions.
- Example: Persistently high CPU triggers a Runbook that automatically increases VM SKU.
Azure Monitor + Logic Apps:
- Action Groups can call Logic Apps for ITSM flows, ticket opening, Teams or Slack notifications.
Azure Monitor + Grafana:
- Azure Managed Grafana integrates natively with Azure Monitor as data source.
- Allows creating Grafana dashboards using Azure Monitor metrics and logs without exporting data.
Azure Monitor + Microsoft Sentinel:
- Sentinel (SIEM/SOAR) consumes data from Log Analytics Workspace.
- NSG Flow Logs, Activity Logs and Storage Resource Logs can feed Sentinel for threat detection.
Continuous data export:
- Use Data Export Rules in Log Analytics to continuously export specific tables to a Storage Account or Event Hub.
- This enables integration with external data pipelines (Azure Data Explorer, third-party SIEM, etc.).
16. Final Summaryβ
Essential points:
- Azure Monitor is the central observability platform, collecting metrics, logs and traces.
- Azure Monitor Insights are pre-built experiences for VMs, Storage and Networks, consuming Azure Monitor data.
- Metrics have seconds latency and 93-day retention. Logs have minutes latency and configurable retention.
- VM Insights requires Azure Monitor Agent (AMA) and a Data Collection Rule (DCR) on each VM.
- Storage Insights uses automatic metrics for transaction and capacity; detailed logs require Diagnostic Settings.
- Network Insights integrates NSG Flow Logs, Traffic Analytics, Connection Monitor and Network Watcher.
- NSG Flow Logs record allow/deny decisions per IP flow. Traffic Analytics processes these logs for visualization in Log Analytics.
- Alerts are composed of Alert Rule (condition) and Action Group (action). An Action Group can be reused across multiple rules.
- Workbooks are the interactive reporting technology used internally by all Insights.
Critical differences:
- Platform Metrics arrive automatically; Resource Logs need Diagnostic Settings.
- AMA replaces MMA; new deployments should always use AMA with DCR.
- NSG Flow Logs in Storage don't appear in Network Insights; Traffic Analytics is required for that.
- Log Analytics is for interactive analysis; Storage Account is for cheap archival.
- Metric Alert has seconds latency; Log Alert has minutes latency.
What needs to be remembered for AZ-104:
- VM Insights requires AMA installed and DCR associated.
- Diagnostic Settings must be explicitly configured for resource logs.
- Traffic Analytics requires NSG Flow Logs enabled AND a Log Analytics Workspace configured.
- Connection Monitor uses Network Watcher and needs AMA on source VMs.
- Alerts without Action Group trigger but don't notify anyone.
- Storage Insights shows data from multiple storage accounts in a unified view.
- The
Heartbeattable is the primary way to check agent availability on VMs.