Theoretical Foundation: Query and Analyze Logs in Azure Monitor
1. Initial Intuitionβ
Imagine that your company maintains a central file with records of everything that happens: who entered the building, which systems were accessed, what errors occurred, what changes were made. This file grows terabytes per day. To extract value from it, you need a query language that allows you to ask things like: "Show me all access denials from the last 6 hours, grouped by user, ordered by who tried the most times."
Kusto Query Language (KQL) is this query language for logs stored in Azure Monitor Log Analytics. It's a language designed specifically to query large volumes of time-series data quickly and expressively.
The beauty of KQL is that it reads like a chain of transformations: you take a table of data, filter what doesn't matter, select the relevant fields, group, calculate, and present the result. It's intuitive once you understand the pattern table | operator | operator | operator.
2. Contextβ
2.1 KQL within Azure Monitorβ
2.2 Main tables you need to knowβ
Before writing queries, you need to know where the data is. The most important tables in AZ-104:
| Table | Content |
|---|---|
AzureActivity | Activity Log: resource creations, modifications, deletions |
AzureDiagnostics | Diagnostic logs from multiple resources (legacy) |
StorageBlobLogs | Blob Storage operations |
AzureMetrics | Metrics exported to Log Analytics |
SecurityEvent | Windows security events (Event Log) |
Syslog | Linux system logs |
Heartbeat | Periodic heartbeat from VMs monitored by agent |
SigninLogs | Login attempts to Azure AD |
AuditLogs | Administrative actions in Azure AD |
Event | Generic Windows Event Log events |
Perf | VM performance counters (CPU, memory, disk) |
3. Concept Buildingβ
3.1 KQL mental model: data pipelineβ
The central concept of KQL is the pipe operator (|): each operator receives a table as input, applies a transformation, and passes the result to the next operator.
table | operator1 | operator2 | operator3
It's exactly like Unix/Linux pipes: cat file.log | grep ERROR | sort | uniq -c
3.2 Fundamental operatorsβ
where: Filters rows based on a condition. The most used operator.
AzureActivity
| where TimeGenerated > ago(24h)
| where ActivityStatus == "Failed"
project: Selects only the columns you want (like SELECT in SQL).
AzureActivity
| where TimeGenerated > ago(24h)
| project TimeGenerated, Caller, OperationName, ActivityStatus
extend: Adds a new calculated column without removing existing ones.
Perf
| where CounterName == "% Processor Time"
| extend CPUCategory = iff(CounterValue > 80, "High", "Normal")
summarize: Aggregates data, like GROUP BY in SQL. Always used with aggregation functions.
AzureActivity
| where TimeGenerated > ago(7d)
| summarize OperationCount = count() by Caller, ActivityStatus
| order by OperationCount desc
order by / sort by: Orders results.
SecurityEvent
| where EventID == 4625 // Failed logon
| summarize FailedAttempts = count() by Account
| order by FailedAttempts desc
| take 10
take / limit: Returns only the first N results. Useful for quick exploration.
distinct: Returns unique values from a column.
AzureActivity
| distinct Caller
count: Counts the total number of records.
AzureActivity
| where TimeGenerated > ago(24h)
| count
3.3 Time functions: essential for logsβ
ago(): Returns a timestamp in the past relative to the current moment.
| where TimeGenerated > ago(1h) // last hour
| where TimeGenerated > ago(7d) // last 7 days
| where TimeGenerated > ago(30m) // last 30 minutes
bin(): Groups timestamps into regular intervals. Essential for creating time series.
AzureActivity
| where TimeGenerated > ago(24h)
| summarize EventCount = count() by bin(TimeGenerated, 1h)
| order by TimeGenerated asc
startofday(), startofweek(), startofmonth(): Returns the beginning of the period relative to a timestamp.
3.4 String functionsβ
contains: Checks if a string contains a substring (case-insensitive by default).
| where OperationName contains "delete"
startswith, endswith: Check prefix or suffix.
tolower(), toupper(): Normalize capitalization.
split(): Splits a string into an array by separator.
strcat(): Concatenates strings.
extract(): Extracts a pattern via regex.
| extend IPAddress = extract(@"(\d+\.\d+\.\d+\.\d+)", 1, CallerIpAddress)
3.5 Aggregation functionsβ
| Function | Description | Example |
|---|---|---|
count() | Counts records | summarize Total = count() |
sum(field) | Sums values | summarize TotalBytes = sum(ResponseBodySize) |
avg(field) | Average | summarize AvgLatency = avg(DurationMs) |
max(field) | Maximum value | summarize PeakCPU = max(CounterValue) |
min(field) | Minimum value | summarize MinMemory = min(Available_MBytes) |
percentile(field, N) | N percentile | summarize P95 = percentile(DurationMs, 95) |
dcount(field) | Distinct count | summarize UniqueUsers = dcount(Caller) |
makeset(field) | Creates an array of distinct values | summarize ResourceList = makeset(Resource) |
3.6 join: joining tablesβ
The join combines two tables based on a common field. Similar to SQL JOIN but with specific types:
// Cross-reference application errors with security events in the same period
let errorTimes = AppExceptions
| where TimeGenerated > ago(1h)
| project ErrorTime = TimeGenerated, ProblemId;
SecurityEvent
| where TimeGenerated > ago(1h)
| join kind=inner errorTimes on $left.TimeGenerated == $right.ErrorTime
Most used join types:
inner: Only records with match on both sidesleftouter: All from left, with right match when existsanti: Records from left that DON'T have match on right
3.7 let: variables and subqueriesβ
The let defines named variables or subqueries for reuse.
let threshold = 80;
let highCPUVMs = Perf
| where CounterName == "% Processor Time"
| where CounterValue > threshold
| distinct Computer;
Heartbeat
| where Computer in (highCPUVMs)
| summarize LastHeartbeat = max(TimeGenerated) by Computer
3.8 union: combining results from multiple tablesβ
// See all Storage and SQL errors in the same result
union StorageBlobLogs, AzureDiagnostics
| where TimeGenerated > ago(1h)
| where Level == "Error"
| project TimeGenerated, ResourceType, OperationName, Level
| order by TimeGenerated desc
4. Structural Viewβ
5. Practical Operationβ
5.1 Essential queries for AZ-104β
Who deleted resources in the last 24 hours?
AzureActivity
| where TimeGenerated > ago(24h)
| where OperationName contains "delete"
| where ActivityStatus == "Succeeded"
| project TimeGenerated, Caller, OperationName, ResourceGroup, Resource
| order by TimeGenerated desc
Which VMs are without heartbeat (potentially offline)?
Heartbeat
| where TimeGenerated > ago(5m)
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| where LastHeartbeat < ago(5m)
| order by LastHeartbeat asc
Top 10 IPs with Windows login failures:
SecurityEvent
| where EventID == 4625
| where TimeGenerated > ago(24h)
| summarize FailedAttempts = count() by IpAddress
| top 10 by FailedAttempts
VM CPU usage in the last 4 hours (time series):
Perf
| where TimeGenerated > ago(4h)
| where CounterName == "% Processor Time"
| where InstanceName == "_Total"
| summarize AvgCPU = avg(CounterValue) by Computer, bin(TimeGenerated, 15m)
| render timechart
Storage Account errors in the last 6 hours:
StorageBlobLogs
| where TimeGenerated > ago(6h)
| where StatusCode >= 400
| summarize ErrorCount = count() by StatusCode, OperationName
| order by ErrorCount desc
Azure changes (Activity Log) by operation type:
AzureActivity
| where TimeGenerated > ago(7d)
| where ActivityStatus == "Succeeded"
| summarize OperationCount = count() by OperationName
| top 20 by OperationCount
| render barchart
Which users performed most actions in the subscription:
AzureActivity
| where TimeGenerated > ago(30d)
| summarize ActionCount = count() by Caller
| where Caller !contains "microsoft" // exclude internal Microsoft services
| top 10 by ActionCount
5.2 The render operator: inline visualizationβ
The render transforms query results into a chart directly in Log Analytics Explorer:
// Time series of events per hour
AzureActivity
| where TimeGenerated > ago(24h)
| summarize Count = count() by bin(TimeGenerated, 1h)
| render timechart
// Bar chart by operation type
AzureActivity
| summarize Count = count() by Category
| render barchart
// Pie chart of distribution
AzureActivity
| summarize Count = count() by ActivityStatus
| render piechart
Render types: timechart, barchart, columnchart, piechart, scatterchart, table.
5.3 Non-obvious behaviorsβ
Case-sensitivity: KQL is case-sensitive by default. "Error" is not equal to "error". Use =~ for case-insensitive comparison:
| where ActivityStatus =~ "succeeded" // case-insensitive
| where ActivityStatus == "Succeeded" // case-sensitive
Timestamps: The TimeGenerated field is always in UTC. If you're in UTC-3, an event at 2 PM local time appears as 5 PM in KQL.
Dynamic data types: Some fields are of type dynamic (embedded JSON). To access properties:
| extend statusCode = tostring(Properties.statusCode)
| extend vmName = tostring(Properties.entity.name)
6. Implementation Methodsβ
6.1 Log Analytics Workspace (Azure Portal)β
When to use: Interactive investigation, troubleshooting, data exploration, creating visualizations, saving queries for reuse.
Access: Azure Monitor > Logs or Log Analytics Workspace > Logs
The portal offers:
- IntelliSense to autocomplete tables and fields
- History of executed queries
- Save queries in "Saved Queries" to share with team
- Pin to dashboard
- Export results to CSV or JSON
6.2 Azure CLIβ
# Execute KQL query via CLI
az monitor log-analytics query \
--workspace <workspace-id> \
--analytics-query "AzureActivity | where TimeGenerated > ago(1h) | take 10" \
--output table
# Query with time parameters
az monitor log-analytics query \
--workspace <workspace-id> \
--analytics-query "AzureActivity | where ActivityStatus == 'Failed' | count" \
--start-time 2025-01-15T00:00:00Z \
--end-time 2025-01-15T23:59:59Z
When to use: Automation scripts, periodic reports, CI/CD pipeline integration.
6.3 Azure PowerShellβ
# Execute KQL query
$query = @"
AzureActivity
| where TimeGenerated > ago(24h)
| where ActivityStatus == 'Failed'
| project TimeGenerated, Caller, OperationName
| order by TimeGenerated desc
"@
Invoke-AzOperationalInsightsQuery `
-WorkspaceId <workspace-id> `
-Query $query |
Select-Object -ExpandProperty Results |
Format-Table
6.4 REST API (Azure Monitor Query API)β
For integration with applications and external systems:
curl -X POST \
"https://api.loganalytics.io/v1/workspaces/<workspace-id>/query" \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"query": "AzureActivity | where TimeGenerated > ago(1h) | count",
"timespan": "PT1H"
}'
6.5 Azure Monitor Workbooksβ
Workbooks are interactive dashboards that combine text, KQL queries, metrics, and visualizations. They're ideal for repeatable operations or security reports.
Access: Azure Monitor > Workbooks > + New
You can add multiple sections, each with its own KQL query and visualization type.
7. Control and Securityβ
7.1 Log access controlβ
| Role | Access |
|---|---|
| Log Analytics Reader | Query logs, view configurations |
| Log Analytics Contributor | Everything in Reader + create alerts and workbooks |
| Monitoring Reader | Query logs and metrics (read-only) |
| Owner/Contributor on subscription | Full access |
Table-level RBAC: For sensitive data, you can restrict access by table. A user can have workspace access but not be able to query SecurityEvent or SigninLogs.
az monitor log-analytics workspace table update \
--resource-group myRG \
--workspace-name myLAW \
--name SecurityEvent \
--plan Analytics
7.2 Queries that expose sensitive dataβ
Some tables contain PII (Personally Identifiable Information) or sensitive data:
SigninLogs: IP addresses, usernames, user locationAuditLogs: User actions including data accessSecurityEvent: Usernames and login activityStorageBlobLogs: File names accessed, client IPs
Restrict access to these tables to only teams that need them.
8. Decision Makingβ
8.1 Which operator to use for each needβ
| Need | Operator | Example |
|---|---|---|
| Filter by condition | where | where Level == "Error" |
| Select columns | project | project Time, User, Action |
| Add calculated column | extend | extend Severity = iff(Code>400,"High","Low") |
| Count and group | summarize + count() | summarize Count = count() by User |
| Sort results | order by | order by Count desc |
| Limit results | take / top | top 10 by Count |
| Unique values | distinct | distinct User |
| Join tables | join | join kind=inner T2 on Field |
| Union results | union | union T1, T2 |
| Create variable | let | let threshold = 80; |
| Visualize chart | render | render timechart |
8.2 Incident investigation strategyβ
| Phase | Recommended query | Objective |
|---|---|---|
| Detection | where Level == "Error" with count() | Quantify the problem |
| Time context | bin(TimeGenerated, 5m) with render timechart | See when it started |
| Identify source | summarize count() by Computer/User/Resource | Discover origin |
| Details | project with specific fields + take 100 | Examine individual events |
| Correlation | join with another table | Cross with other events |
9. Best Practicesβ
- Always filter by
TimeGeneratedfirst. Queries without time filter scan entire history and are slow and expensive. Putwhere TimeGenerated > ago(Xh)as the first line after the table. - Use
projectto reduce columns beforesummarize. Fewer columns processed = faster queries. - Use
top N by fieldinstead oforder by | take N.topis optimized for this pattern. - Save useful queries in "Saved Queries" in the portal to share with the team and reuse during incidents.
- Use
letfor complex subqueries and to define thresholds that can be easily adjusted at the beginning of the query. - Test queries with
take 100first to validate logic before executing aggregations on large volumes. - Use
render timechartfor any temporal analysis. Patterns become immediately visible in charts that would be invisible in a table of numbers. - Document investigation queries in a repository (GitHub, Azure DevOps) to build an operations runbook.
10. Common Errorsβ
| Error | Why it happens | How to avoid |
|---|---|---|
| Slow query or timeout | No TimeGenerated filter | Always start with where TimeGenerated > ago(Xh) |
| Incorrect results due to case | Comparison with == on field with variable capitalization | Use =~ for case-insensitive |
dynamic field not accessible | Trying to compare JSON field directly | Use tostring(Properties.field) before comparing |
summarize removing unexpected columns | summarize only keeps columns listed in by | Use project to select before summarize |
join returning Cartesian product | Many matches on both sides | Pre-filter tables before join or use kind=leftouter |
ago() returning wrong period | Confusing ago(7d) with "date X" | Use datetime(2025-01-15) for absolute dates |
| Results with UTC vs local time | TimeGenerated in UTC | Use datetime_local_to_utc() or adjust in presentation |
| Correct query but no results | Data still in transit (ingestion latency) | Wait 5-10 minutes and retry |
11. Operation and Maintenanceβ
11.1 Log Analytics health queriesβ
Ingestion volume per table:
Usage
| where TimeGenerated > ago(7d)
| summarize TotalGB = round(sum(Quantity) / 1024, 2) by DataType
| order by TotalGB desc
Ingestion latency:
AzureActivity
| where TimeGenerated > ago(1h)
| extend IngestDelay = ingestion_time() - TimeGenerated
| summarize AvgDelayMin = avg(IngestDelay / 1m)
Slowest executed queries (requires LAW diagnostics enabled):
LAQueryLogs
| where TimeGenerated > ago(1d)
| order by ResponseDurationMs desc
| take 20
| project TimeGenerated, RequestClientApp, ResponseDurationMs, QueryText
11.2 Important limitsβ
| Aspect | Limit |
|---|---|
| Maximum query execution time | 10 minutes |
| Maximum result size | 64 MB |
| Maximum number of rows in result | 500,000 |
| Maximum time window per query | Limited by workspace retention |
| Concurrent queries per workspace | 200 |
join columns on right side | Maximum 1 million records |
12. Integration and Automationβ
12.1 Alert Rules based on KQLβ
# Create alert based on KQL query
az monitor scheduled-query alert create \
--resource-group myRG \
--name "Failed-Login-Alert" \
--scopes <workspace-resource-id> \
--condition-query "SecurityEvent | where EventID == 4625 | where TimeGenerated > ago(5m) | summarize count() | where count_ > 10" \
--condition-threshold 0 \
--condition-operator "GreaterThan" \
--evaluation-frequency 5m \
--window-size 5m \
--severity 2 \
--action-groups <action-group-id>
12.2 Automated workbook for daily reportβ
Create Workbooks with queries parameterized by time interval. Configure for automatic publishing or email sending via Logic Apps that execute KQL queries via API.
12.3 Exporting query results via Pipelineβ
# Export query result to CSV via CLI
az monitor log-analytics query \
--workspace <workspace-id> \
--analytics-query "AzureActivity | where TimeGenerated > ago(7d) | where ActivityStatus == 'Failed' | project TimeGenerated, Caller, OperationName" \
--output json | \
jq -r '.[] | [.TimeGenerated, .Caller, .OperationName] | @csv' > failed-operations.csv
13. Final Summaryβ
Essential concepts:
- KQL uses the pipeline pattern:
table | operator | operator. Each operator transforms the output of the previous one. - The most important operators are:
where(filter),project(select columns),extend(add columns),summarize(aggregate),order by(sort),take/top(limit). - Always filter by
TimeGeneratedas the first operator for performance and cost. rendertransforms results into inline charts (timechart,barchart,piechart).
Critical differences:
==vs=~:==is case-sensitive;=~is case-insensitive.projectvsextend:projectselects only the listed columns (removes others).extendadds new columns without removing existing ones.takevstop:take Nreturns the first N without specific ordering.top N by fieldreturns the N largest/smallest by a field (more efficient thanorder by | take).count(tabular operator) vscount()(aggregation function):| countcounts everything and returns one row.summarize count() by fieldgroups and counts by category.ago()vsdatetime():ago(7d)is relative to current moment.datetime(2025-01-15)is absolute.
What needs to be remembered:
- The
TimeGeneratedfield is always in UTC. dynamictype fields (JSON) requiretostring(field.subfield)before comparing or using.- The
summarizeoperator removes all columns that don't appear in theby. Useprojectbeforehand to select what you need. - Queries without time filter are slow, expensive, and often timeout.
letis powerful for creating named subqueries and defining constants that make query adjustments easier.- Log ingestion latency is 2 to 5 minutes. Queries for very recent events may not return results yet.
- Save useful queries in Saved Queries of the workspace for reuse during incidents, when time is critical.