Skip to main content

Theoretical Foundation: Set Up Alert Rules, Action Groups, and Alert Processing Rules in Azure Monitor


1. Initial Intuition​

Imagine that you manage a physical datacenter. You don't stare at the servers 24 hours a day. Instead, you install sensors: a temperature sensor that triggers a siren if the air conditioning fails, a load sensor that activates an alarm if the disk reaches 90% usage, cameras that alert about suspicious movement outside business hours.

Alert Rules in Azure are these sensors and trigger conditions. You define: "when the VM's CPU stays above 85% for more than 5 minutes, notify me".

Action Groups are the contact list and actions to execute when the alarm triggers: send email to the operations team, send SMS to the on-call manager, call a webhook that automatically opens a ticket in Jira.

Alert Processing Rules are the silence and routing policies: "during scheduled maintenance windows, don't send alerts", or "all production alerts should also trigger the security team".

Together, these three components form Azure's complete alert system.


2. Context​

2.1 The three pillars of the alert system​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

2.2 Types of Alert Rules by data source​

Azure has different types of alert rules depending on where the data comes from:

TypeData sourceLatencyExample usage
Metric alertsAzure Monitor Metrics1-5 minCPU > 85%, Storage > 90%
Log alertsLog Analytics (KQL)5-15 min10+ login failures in 5 min
Activity log alertsAzure Activity Log5-10 minVM deleted, NSG modified
Service Health alertsAzure Service HealthVariableRegion incidents, maintenance
Resource health alertsAzure Resource HealthVariableVM became unavailable

3. Building the Concepts​

3.1 Alert Rule: complete anatomy​

An Alert Rule is composed of:

1. Scope: The resource or set of resources being monitored. Can be a VM, an entire Resource Group, or a Subscription.

2. Condition: What is being measured and when the alert should fire. Defines:

  • Signal: which metric, log or activity to monitor
  • Operator: greater than, less than, equal to
  • Threshold: the value that triggers the alert
  • Aggregation: how values are combined (Average, Maximum, etc.)
  • Evaluation period: time window of data evaluated
  • Frequency: how often to evaluate the condition

3. Action Group: Which Action Group to trigger when the condition is satisfied.

4. Alert Details: Name, description, severity and other alert properties.


3.2 Alert severity​

LevelNameTypical usage
Sev 0CriticalCritical failure with immediate production impact
Sev 1ErrorSerious problem requiring quick action
Sev 2WarningConcerning condition needing attention
Sev 3InformationalRelevant information without urgency
Sev 4VerboseDetailed diagnostics

3.3 Stateful vs Stateless alerts​

Stateful (default for metric alerts): The alert has states: Fired, Resolved. When the condition is no longer true, the alert is automatically resolved and a resolution notification is sent.

Stateless (default for log alerts): Each evaluation that satisfies the condition fires a notification, regardless of previous state. Useful for events that don't have a natural "resolved" concept.

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

3.4 Metric Alert: advanced conditions​

Static threshold: Fixed value compared with the metric.

CPU Percentage > 85  (average, over 5 minutes)

Dynamic threshold: Azure learns the metric's historical behavior and automatically sets thresholds based on deviations from normality. Useful when normal behavior varies by time of day or day of week.

Dimensions in metric alerts: You can create an alert that monitors a metric filtered by dimension. Example: alert for 500 errors in only a specific region.


3.5 Log Alert: KQL query configuration​

Log alerts execute a KQL query at regular intervals and fire when the result satisfies a condition:

Number of results: Alert if the query returns more or less than X lines.

SecurityEvent
| where EventID == 4625
| where TimeGenerated > ago(5m)

Configuration: "If count > 10, fire alert"

Metric measurement: The query calculates a numeric value per resource/time, and the alert fires when this value crosses a threshold.

Perf
| where CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer
| where AvgCPU > 85

3.6 Action Groups: action types​

Notifications:

TypeDescriptionLimitation
Email (Azure Resource Manager role)Email to RBAC role membersUp to 1,000 emails/hour
Email/SMS/Push/VoiceEmail, SMS, Azure app notification, callLimits per subscriber
Azure App PushNotification in Azure mobile appRequires installed app

Actions:

TypeDescriptionWhen to use
Automation RunbookExecutes a PowerShell/Python runbookAutomatic remediation
Azure FunctionInvokes a Function AppCustom logic
Event HubPublishes to Event HubStreaming integration
ITSMOpens ticket in ServiceNow, CherwellIncident management
Logic AppStarts a Logic App workflowComplex orchestration
Secure WebhookCalls HTTPS endpoint with AAD authExternal systems
WebhookCalls HTTPS endpointSimple integrations

3.7 Alert Processing Rules: the orchestrator​

Alert Processing Rules act on already fired alerts. They can:

1. Suppress notifications: Silence alert notifications during maintenance windows. The alert is still fired and recorded, but notifications are not sent.

2. Apply additional Action Group: Add an Action Group to alerts matching defined filters. Example: all severity 0 and 1 alerts in production should also trigger the manager via SMS.

Available filters in Alert Processing Rules:

  • Subscription
  • Resource Group
  • Resource Type
  • Resource
  • Alert Rule
  • Severity
  • Monitor Condition (Fired/Resolved)
  • Alert Context

Scheduling: Can be configured to act always or only in specific time windows (e.g. Saturday and Sunday from 22h to 06h for maintenance windows).


4. Structural View​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

5. Practical Operation​

5.1 Metric Alert: evaluation behavior​

When you configure a metric alert, two parameters define the behavior:

Evaluation frequency: How often the condition is checked. Example: every 5 minutes.

Aggregation granularity (window): What data period is considered in each evaluation. Example: 15-minute window.

If frequency = 5m and window = 15m: every 5 minutes, Azure evaluates the last 15 minutes of aggregated data.

Non-obvious behavior: An alert with window = 15m can take up to 20 minutes to fire after a condition is met (15 min window + up to 5 min evaluation frequency + metric collection latency). For critical alerts that need to fire quickly, use window = 5m and frequency = 1m.


5.2 Suppression vs Disable alert​

There's an important distinction:

Suppress via Alert Processing Rule: The alert is still created and recorded in history. Notifications are suppressed. When the suppression window ends, alerts that continue firing will resume notifying.

Disable the Alert Rule: The condition is no longer evaluated. There's no alert record during the disabled period. Not recommended for maintenance because you lose visibility.


6. Implementation Methods​

6.1 Azure Portal​

When to use: Initial creation, exploring available options, visual troubleshooting.

Creating metric alert: Azure Monitor > Alerts > + Create > Alert rule

Creating Action Group: Azure Monitor > Alerts > Action groups > + Create

Creating Alert Processing Rule: Azure Monitor > Alerts > Alert processing rules > + Create


6.2 Azure CLI​

Creating Action Group:

az monitor action-group create \
--resource-group myRG \
--name "ops-team-ag" \
--short-name "OpsTeam" \
--email-receiver name="OnCall" email-address="oncall@company.com" use-common-alert-schema=true \
--sms-receiver name="Manager" country-code="55" phone-number="11999999999" \
--webhook-receiver name="PagerDuty" service-uri="https://events.pagerduty.com/integration/xxx/enqueue" use-common-alert-schema=true

Creating Metric Alert:

az monitor metrics alert create \
--resource-group myRG \
--name "High-CPU-Alert" \
--scopes <vm-resource-id> \
--condition "avg Percentage CPU > 85" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--action-group <action-group-id> \
--description "Average CPU above 85% for 5 minutes" \
--auto-mitigate true

Creating Log Alert:

az monitor scheduled-query alert create \
--resource-group myRG \
--name "Failed-Login-Alert" \
--scopes <workspace-resource-id> \
--condition-query "SecurityEvent | where EventID == 4625 | where TimeGenerated > ago(5m) | summarize count()" \
--condition-operator "GreaterThan" \
--condition-threshold 10 \
--evaluation-frequency 5m \
--window-size 5m \
--severity 1 \
--action-group <action-group-id>

Creating Activity Log Alert:

az monitor activity-log alert create \
--resource-group myRG \
--name "VM-Delete-Alert" \
--scope /subscriptions/<sub-id> \
--condition category=Administrative and operationName=Microsoft.Compute/virtualMachines/delete \
--action-group <action-group-id>

Creating Alert Processing Rule (maintenance suppression):

az monitor alert-processing-rule create \
--resource-group myRG \
--name "Weekend-Maintenance" \
--rule-type Suppression \
--scopes /subscriptions/<sub-id>/resourceGroups/production-rg \
--filter-severity Sev0 Sev1 Sev2 \
--schedule-recurrence-type Weekly \
--schedule-recurrence Saturday Sunday \
--schedule-start-datetime "2025-01-01 22:00:00" \
--schedule-end-datetime "2025-01-01 06:00:00"

6.3 Bicep​

// Action Group
resource actionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
name: 'ops-team-ag'
location: 'global'
properties: {
groupShortName: 'OpsTeam'
enabled: true
emailReceivers: [
{
name: 'OnCall'
emailAddress: 'oncall@company.com'
useCommonAlertSchema: true
}
]
smsReceivers: [
{
name: 'Manager'
countryCode: '55'
phoneNumber: '11999999999'
}
]
webhookReceivers: [
{
name: 'Automation'
serviceUri: 'https://prod.webhook.office.com/...'
useCommonAlertSchema: true
}
]
}
}

// Metric Alert
resource cpuAlert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
name: 'High-CPU-Alert'
location: 'global'
properties: {
description: 'Average CPU above 85% for 5 minutes'
severity: 2
enabled: true
scopes: [vm.id]
evaluationFrequency: 'PT1M'
windowSize: 'PT5M'
criteria: {
'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
allOf: [
{
name: 'HighCPU'
metricName: 'Percentage CPU'
operator: 'GreaterThan'
threshold: 85
timeAggregation: 'Average'
criterionType: 'StaticThresholdCriterion'
}
]
}
autoMitigate: true
actions: [
{
actionGroupId: actionGroup.id
}
]
}
}

// Alert Processing Rule (maintenance suppression)
resource maintenanceRule 'Microsoft.AlertsManagement/actionRules@2021-08-08' = {
name: 'Weekend-Maintenance'
location: 'global'
properties: {
scopes: [resourceGroup().id]
conditions: [
{
field: 'Severity'
operator: 'Equals'
values: ['sev0', 'sev1', 'sev2']
}
]
actions: [
{
actionType: 'RemoveAllActionGroups'
}
]
schedule: {
recurrences: [
{
recurrenceType: 'Weekly'
daysOfWeek: ['Saturday', 'Sunday']
startTime: '22:00:00'
endTime: '06:00:00'
}
]
}
enabled: true
}
}

7. Control and Security​

7.1 Required permissions​

OperationMinimum role
Create/edit Alert RulesMonitoring Contributor
Create/edit Action GroupsMonitoring Contributor
Create/edit Alert Processing RulesMonitoring Contributor
Only view fired alertsMonitoring Reader
Acknowledge alertsMonitoring Contributor

7.2 Common Alert Schema​

The Common Alert Schema is a standardized JSON format for notifications from all alert types (metric, log, activity log). When enabled in the Action Group, all webhooks and Logic Apps receive the same structure, regardless of alert type.

{
"schemaId": "azureMonitorCommonAlertSchema",
"data": {
"essentials": {
"alertId": "/subscriptions/.../alerts/...",
"alertRule": "High-CPU-Alert",
"severity": "Sev2",
"signalType": "Metric",
"monitorCondition": "Fired",
"monitoringService": "Platform",
"alertTargetIDs": ["/subscriptions/.../virtualMachines/myVM"],
"firedDateTime": "2025-01-15T14:32:00Z"
},
"alertContext": {
"properties": {},
"conditionType": "SingleResourceMultipleMetricCriteria",
"condition": {
"windowSize": "PT5M",
"allOf": [
{
"metricName": "Percentage CPU",
"metricNamespace": "Microsoft.Compute/virtualMachines",
"operator": "GreaterThan",
"threshold": "85",
"timeAggregation": "Average",
"dimensions": [],
"metricValue": 92.3
}
]
}
}
}
}

8. Best Practices​

8.1 Alert fatigue prevention​

Start with higher thresholds: Begin with conservative thresholds and adjust based on false positive rates.

Use dynamic thresholds for seasonal patterns: Services with predictable daily/weekly patterns benefit from machine learning-based thresholds.

Implement alert hierarchy: Use different severities and Action Groups. Not every alert needs to wake someone up.


8.2 Action Group organization​

By team: Create Action Groups per responsible team (network-team-ag, database-team-ag).

By urgency: Create Action Groups for different response levels (critical-24x7-ag, business-hours-ag, info-only-ag).

By environment: Separate production from non-production notifications.


8.3 Alert Processing Rules strategy​

Maintenance windows: Create recurring suppression rules for known maintenance windows.

Environment-specific routing: Use Alert Processing Rules to add environment-specific Action Groups (e.g., all prod alerts also go to management).

Geographic considerations: For global services, route alerts to the appropriate on-call team based on time zones.


This comprehensive foundation covers Azure Monitor's complete alerting system, from individual alert rules to sophisticated processing and routing logic, enabling you to build robust monitoring solutions that scale with your infrastructure needs.

{
"conditionType": "SingleResourceMultipleMetricCriteria",
"condition": {
"windowSize": "PT5M",
"allOf": [...]
}
}

Always use Common Alert Schema whenever possible. It greatly simplifies processing in Logic Apps and webhooks.


7.3 Notification rate limits​

ChannelLimit
Email100 emails/hour per Action Group
SMS1 SMS every 5 minutes per receiver
Voice1 call every 5 minutes per receiver
WebhookNo explicit limit (depends on endpoint)
Azure FunctionNo explicit limit

Important behavior: If multiple alerts fire at the same time and there are many email receivers, Azure silently applies rate limiting. Not all notifications arrive immediately. For high-load scenarios, use webhook or Event Hub as the primary channel.


8. Decision Making​

8.1 Which alert type to use for each scenario​

ScenarioAlert typeReason
VM CPU > 85%Metric AlertPlatform metric with fixed threshold
Anomalous CPU pattern (no known threshold)Metric Alert with Dynamic ThresholdAzure learns historical pattern
10+ login failures in 5 minLog AlertRequires KQL query over logs
VM deleted (audit)Activity Log AlertControl plane event
Azure region with incidentService Health AlertAzure service health
OS disk with over 90%Metric Alert (Guest metrics)Requires Azure Monitor Agent on VM

8.2 When to use Alert Processing Rules vs configuring Action Group in Alert Rule​

SituationApproachReason
Recurring maintenance windowAlert Processing Rule (Suppression)Centralized configuration, no need to modify each Alert Rule
All prod alerts should cc the managerAlert Processing Rule (Add Action Group)Avoids duplicating the Action Group in each Alert Rule
Specific alert with specific actionAction Group directly in Alert RuleSimpler, no need for additional layer
Silence only during specific deploymentAlert Processing Rule with single windowFlexibility without modifying Alert Rules

8.3 Frequency and Window Size: balancing speed and cost​

RequirementFrequencyWindowConsideration
Ultra-fast detection (critical)1 min5 minHigher evaluation cost
Moderate detection (standard)5 min15 minCost/speed balance
Trend alert (not urgent)15 min1 hourLower cost, more smoothed

9. Best Practices​

  • Create reusable Action Groups separated by team (ops-team, security-team, management) and reference them in multiple Alert Rules instead of creating one Action Group per alert.
  • Use Common Alert Schema in all webhooks and Logic Apps to simplify processing.
  • Separate alerts by severity and configure different Action Groups: Sev0/Sev1 triggers SMS + immediate email; Sev2/Sev3 sends email only.
  • Configure auto-mitigate = true in metric alerts to receive automatic resolution notification when the condition is no longer satisfied.
  • Use Alert Processing Rules for maintenance instead of disabling Alert Rules. The alert continues to be evaluated and logged.
  • Document the meaning of each alert in the Alert Rule description: what it means when it fires, what action is expected, what response runbook to use.
  • Group related alerts using the "Alert Rule Name" field as a filter in Alert Processing Rules to apply consistent logic.
  • Test Action Groups using the "Test" button in the portal before trusting them in production. Check if emails arrive, if webhooks respond with 200.
  • Configure service health alerts (Service Health) to receive notifications of incidents and maintenance in the Azure regions you use.

10. Common Errors​

ErrorWhy it happensHow to avoid
Alert constantly firing (alert fatigue)Threshold too sensitive or window too smallAdjust threshold; use dynamic thresholds
Didn't receive critical alert notificationEmail rate limiting activeUse webhook or Azure Function as primary channel for Sev0/Sev1
Alert doesn't fire during real incidentAlert Rule disabled or threshold too highTest regularly with Test Action Group
Alert fires during maintenanceNo suppression Alert Processing RuleCreate APR with maintenance window
Log alert doesn't fireIncorrect KQL query or data still in transitTest query separately in Log Analytics; check ingestion latency
Too many duplicate alertsLow frequency + large window creating overlapsAdjust frequency and window; enable stateful
Action Group doesn't trigger webhookEndpoint with invalid certificate or timeoutUse Secure Webhook with AAD authentication for robustness
Alert Processing Rule doesn't suppressIncorrect filters (severity, scope)Verify filters exactly match the fired alert

11. Operation and Maintenance​

11.1 Viewing fired alerts​

# List active alerts in subscription
az monitor alert list \
--resource-group myRG \
--output table

# View alert history (last 24h)
az monitor alert list \
--state "all" \
--time-range 1d \
--output table

In portal: Azure Monitor > Alerts > Alerts (preview) shows current state of all alerts.


11.2 Acknowledging alerts​

Alerts can have manually managed state:

  • Fired: Active condition, notification sent
  • Acknowledged: Someone acknowledged the alert (being investigated)
  • Resolved: Condition is no longer true
az monitor alert update \
--ids <alert-resource-id> \
--status Acknowledged

11.3 Testing Action Groups​

# Send test notification to an Action Group
az monitor action-group test \
--resource-group myRG \
--action-group-name "ops-team-ag" \
--alert-type servicehealth

Verify that:

  • Email arrived in inbox (not spam)
  • SMS was received
  • Webhook returned 200 OK
  • Azure Function was invoked

11.4 Important limits​

ResourceLimit
Alert Rules per subscription5,000
Action Groups per subscription2,000
Receivers per Action Group10 per type
Alert Processing Rules per subscription1,000
Alert history retention30 days
Minimum metric alert frequency1 minute
Minimum log alert frequency5 minutes

12. Integration and Automation​

12.1 Automatic remediation with Automation Runbook​

Configure an Action Group that calls a runbook for auto-remediation:

# Runbook: automatically restart VM if CPU > threshold
param(
[Parameter(Mandatory=$true)]
[string]$vmName,
[string]$resourceGroup
)

Connect-AzAccount -Identity

Write-Output "Restarting VM $vmName in response to high CPU alert"
Restart-AzVM -ResourceGroupName $resourceGroup -Name $vmName
Write-Output "VM restarted successfully"

12.2 Microsoft Teams integration via Logic App​

  1. Create a Logic App with HTTP trigger
  2. Configure the Logic App to post message to Teams
  3. Add the Logic App endpoint as Webhook in the Action Group

The Common Alert Schema payload allows creating rich messages in Teams with all alert details.


12.3 Azure Policy to ensure alert coverage​

# Policy: ensure all VMs have CPU alert rule
az policy assignment create \
--name "vm-cpu-alert-required" \
--policy "<policy-definition-id>" \
--scope "/subscriptions/<sub-id>"

DeployIfNotExists policies can automatically create Alert Rules on new resources.


13. Final Summary​

Essential concepts:

  • Alert Rule defines the firing condition: which signal to monitor (metric, log, activity log), which threshold, in which time window and with which frequency.
  • Action Group defines what happens when the alert fires: notifications (email, SMS, push) and actions (runbook, function, webhook, ITSM).
  • Alert Processing Rule modifies the behavior of already fired alerts: suppresses notifications in maintenance windows or adds additional Action Groups based on filters.

Critical differences:

  • Metric Alert vs Log Alert: Metric operates over numerical values in near real-time (1-5 min). Log Alert executes KQL query over logs with higher latency (5-15 min).
  • Stateful vs Stateless: Stateful fires once when reaching the condition and resolves automatically. Stateless fires at each evaluation that satisfies the condition.
  • Suppress vs Disable: Suppress (via APR) keeps the alert being evaluated and logged. Disabling the Alert Rule completely stops evaluation.
  • Frequency vs Window: Frequency is how often the condition is evaluated. Window is the period of data considered in each evaluation.

What needs to be remembered:

  • An alert can take up to window + frequency + collection latency to fire after the condition is met.
  • Action Groups have rate limits: 100 emails/hour, 1 SMS every 5 minutes per receiver.
  • Use Common Alert Schema in webhooks and Logic Apps to receive standardized format regardless of alert type.
  • Alert Processing Rules filter by subscription, resource group, resource type, severity, status and alert rule name.
  • For maintenance, use Alert Processing Rules with time window instead of disabling Alert Rules.
  • Test Action Groups with the "Test" button before trusting them in production.
  • Alert history is retained for 30 days in the Azure Monitor portal.