Theoretical Foundation: Manage Virtual Machine Sizes
1. Initial Intuitionβ
Imagine you're setting up an office for employees with very different profiles. A graphic designer needs a powerful workstation with lots of memory and GPU. A financial analyst needs lots of CPU to run models. An administrative employee who only uses email and simple spreadsheets works well with a basic computer.
Putting everyone on identical machines would be wasteful or insufficient. The right approach is to size each machine for the function it will perform.
In Azure, VM Size is exactly this sizing decision: choosing how many virtual CPUs, how much memory, what type and amount of temporary storage, and what network capacity each VM will have. Each combination receives a standardized name like Standard_D4s_v5 (4 vCPUs, 16 GB RAM, optimized for general use).
Managing VM Sizes means not only choosing the right size at creation, but also knowing how to change that size when the workload evolves, without needing to recreate the VM from scratch.
2. Contextβ
Why VM Sizes exist as a formal conceptβ
Microsoft operates hundreds of datacenters with specific physical hardware. For a customer to reserve a slice of this hardware, a standardized catalog of configurations that the hardware supports is necessary. This catalog is the set of available VM Sizes.
From the customer's perspective, VM Sizes exist for:
- Cost control: pay exactly for what the workload needs
- Performance guarantee: select hardware optimized for the workload type
- Scalability: increase or decrease resources without recreating infrastructure
What depends on correct VM Size choiceβ
- Cost: most of a VM's cost comes from its size
- Performance: applications with inadequate sizing are slow or fail
- Availability: some SKUs aren't available in all regions
- Functionality: certain capabilities (Ultra SSD, Accelerated Networking, GPU) are only available in specific SKUs
3. Building the Conceptsβ
3.1 VM Size nomenclatureβ
Understanding a VM SKU name is essential for navigating the catalog without needing to memorize each option. The structure is:
[Family][SubFamily][Version][Additives]_v[N]
Example: Standard_D4ds_v5
| Segment | Value | Meaning |
|---|---|---|
| Tier | Standard | Level (Basic was discontinued) |
| Family | D | General purpose (D = General Purpose) |
| vCPU count | 4 | 4 vCPUs |
| Sub-family | d | Local NVMe temporary disk |
| Features | s | Premium Storage support |
| Version | v5 | 5th generation of this SKU |
3.2 Subfamily and feature lettersβ
The s suffix is especially important: without it, the SKU doesn't support Premium SSD. For example, Standard_D4_v5 doesn't support Premium Storage, but Standard_D4s_v5 does.
3.3 VM families by workload typeβ
Azure organizes SKUs into families by purpose:
| Family | Letter | Characteristic | Typical workloads |
|---|---|---|---|
| General Purpose | B, D, D(a/s), DC | Balanced CPU:Memory (1:4) | Web servers, dev/test, small databases |
| Compute Optimized | F, FX | High CPU, less memory (1:2) | Intensive web servers, batch processing |
| Memory Optimized | E, E(a/s), M, Mv2 | Lots of memory (1:8 or more) | SAP HANA, SQL Server, Redis, large caches |
| Storage Optimized | L | High I/O on local disk | NoSQL, data warehouses, big data |
| GPU | N (NC, ND, NV) | NVIDIA GPUs | Machine learning, rendering, visualization |
| High Performance Compute | H, HB, HC | InfiniBand, high CPU | Scientific simulations, CFD, molecular dynamics |
| Confidential Computing | DC | Trusted Execution Env | Workloads with highly sensitive data |
3.4 B series: Burstable VMsβ
The B family deserves special explanation as it works differently from others. Burstable VMs accumulate CPU credits when utilization is below baseline and consume them when they need more CPU.
B VMs are ideal for workloads with low average CPU usage and sporadic peaks: development servers, test environments, CI/CD servers with periodic builds. They are not suitable for workloads with continuous high CPU usage.
3.5 Resizing: how and when it happensβ
Changing the size of an existing VM is called resize. There are two technical scenarios:
Resize within the same hardware family (same cluster):
- Can often be done without deallocation
- Azure tries to resize without moving the VM to another physical host
- May still require brief restart
Resize to another hardware family:
- Requires deallocation of the VM (stop + deallocate, not just stop)
- The VM is moved to a different physical host that supports the new SKU
- Downtime of several minutes
4. Structural Overviewβ
Decision hierarchy for VM Size choiceβ
Comparison of most common D family (general purpose) SKUsβ
| SKU | vCPU | RAM | Temp Disk | Premium Storage | Typical use |
|---|---|---|---|---|---|
| Standard_B2s | 2 | 4 GB | 8 GB | Yes | Dev/test, small sites |
| Standard_D2s_v5 | 2 | 8 GB | None | Yes | Light general purpose |
| Standard_D4s_v5 | 4 | 16 GB | None | Yes | Medium general purpose |
| Standard_D4ds_v5 | 4 | 16 GB | 150 GB NVMe | Yes | General purpose with temp disk |
| Standard_D8s_v5 | 8 | 32 GB | None | Yes | Intensive general purpose |
| Standard_D16s_v5 | 16 | 64 GB | None | Yes | High general purpose |
5. Practical Operationβ
VM resize processβ
Important non-obvious behaviorsβ
Resize without deallocation depends on capacity in current cluster. Even if the new SKU is from the same family, if the current physical host doesn't have capacity to allocate the larger size, the resize will fail. The error message will indicate no capacity is available. The solution is to deallocate and try again (the VM will be allocated on a different host).
Deallocation releases dynamic public IP. If the VM uses a public IP with Dynamic allocation, when deallocating the VM for resize, the IP is released and a new IP is assigned when the VM starts. To avoid this, use public IP with Static allocation before deallocating.
Temporary disk changes size with the SKU. The temporary disk (D: on Windows, /dev/sdb on Linux) has its size determined by the SKU. When resizing to a smaller SKU, the temporary disk shrinks. Data on the temporary disk is lost in any deallocation or resize.
Some functionalities become unavailable when downsizing. If the VM has Accelerated Networking enabled and you resize to a SKU that doesn't support this functionality, it's automatically disabled. The same applies to high-performance disk caches.
VMs in Availability Sets have limited possible SKUs. An Availability Set is mapped to a specific physical cluster. Only SKUs available in that cluster can be used. To use a SKU from another family, it may be necessary to recreate the VM outside the Availability Set.
VMs in Availability Zones have more SKU flexibility. Zones use hardware distributed across multiple datacenters, offering greater variety of available SKUs compared to Availability Sets.
6. Implementation Methodsβ
Azure Portalβ
When to use: one-time resize, visual verification of available SKUs
To resize via portal:
- Portal > Virtual Machines > select the VM
- Side menu > Size (in Settings)
- The portal displays available SKUs for that VM in that region
- Select the new SKU
- Resize
The portal will automatically show only SKUs that are available for that VM in that configuration (respecting Availability Set, zone, etc.).
Limitation: not reproducible, no version control, impractical for multiple VMs.
Azure CLIβ
# List all available SKUs in a region
az vm list-sizes \
--location "brazilsouth" \
--output table
# Filter SKUs by family (e.g., D family)
az vm list-sizes \
--location "brazilsouth" \
--query "[?starts_with(name, 'Standard_D')]" \
--output table
# Check available SKUs for resizing a specific VM
# (considers Availability Set and location)
az vm list-vm-resize-options \
--resource-group "rg-producao" \
--name "vm-web-01" \
--output table
# See current VM SKU
az vm show \
--resource-group "rg-producao" \
--name "vm-web-01" \
--query "hardwareProfile.vmSize" \
--output tsv
# Resize without deallocation (tries first without stopping)
az vm resize \
--resource-group "rg-producao" \
--name "vm-web-01" \
--size "Standard_D4s_v5"
# Resize that requires deallocation (different family)
# Step 1: Preserve public IP if it's dynamic
# Check IP type
az network public-ip show \
--resource-group "rg-producao" \
--name "pip-vm-web-01" \
--query "publicIPAllocationMethod" \
--output tsv
# If Dynamic, change to Static before deallocating
az network public-ip update \
--resource-group "rg-producao" \
--name "pip-vm-web-01" \
--allocation-method Static
# Step 2: Deallocate
az vm deallocate \
--resource-group "rg-producao" \
--name "vm-web-01"
# Step 3: Resize
az vm resize \
--resource-group "rg-producao" \
--name "vm-web-01" \
--size "Standard_E4s_v5"
# Step 4: Start
az vm start \
--resource-group "rg-producao" \
--name "vm-web-01"
# Verify new size
az vm show \
--resource-group "rg-producao" \
--name "vm-web-01" \
--query "hardwareProfile.vmSize" \
--output tsv
# Script: Batch resize of multiple VMs
for vm in vm-web-01 vm-web-02 vm-web-03; do
echo "Resizing $vm to Standard_D4s_v5..."
az vm deallocate \
--resource-group "rg-producao" \
--name "$vm"
az vm resize \
--resource-group "rg-producao" \
--name "$vm" \
--size "Standard_D4s_v5"
az vm start \
--resource-group "rg-producao" \
--name "$vm"
echo "$vm resized successfully."
done
# Check CPU usage to determine if resize is necessary
# (via Azure Monitor)
az monitor metrics list \
--resource "/subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-web-01" \
--metric "Percentage CPU" \
--interval PT1H \
--start-time "$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--aggregation Average \
--output table
Azure PowerShellβ
# List available SKUs in a region
Get-AzVMSize -Location "brazilsouth" |
Sort-Object Name |
Format-Table Name, NumberOfCores, MemoryInMB, MaxDataDiskCount
# Filter D family SKUs
Get-AzVMSize -Location "brazilsouth" |
Where-Object { $_.Name -like "Standard_D*" } |
Format-Table Name, NumberOfCores, MemoryInMB
# See available SKUs for a specific VM
Get-AzVMSize `
-ResourceGroupName "rg-producao" `
-VMName "vm-web-01" |
Format-Table Name, NumberOfCores, MemoryInMB
# See current SKU
(Get-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01").HardwareProfile.VmSize
# Resize without deallocation (same family)
$vm = Get-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01"
$vm.HardwareProfile.VmSize = "Standard_D4s_v5"
Update-AzVM -ResourceGroupName "rg-producao" -VM $vm
# Resize with deallocation (different family)
# Step 1: Deallocate
Stop-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01" -Force
# Step 2: Change size
$vm = Get-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01"
$vm.HardwareProfile.VmSize = "Standard_E4s_v5"
Update-AzVM -ResourceGroupName "rg-producao" -VM $vm
# Step 3: Start
Start-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01"
# Report: list all VMs and their sizes in an RG
Get-AzVM -ResourceGroupName "rg-producao" |
Select-Object Name, @{N="Size"; E={$_.HardwareProfile.VmSize}}, Location |
Format-Table
# Script: Automate resize based on average CPU
$resourceGroup = "rg-producao"
$vmName = "vm-web-01"
$threshold = 80 # CPU % to trigger scale up
$newSize = "Standard_D8s_v5"
$metric = Get-AzMetric `
-ResourceId (Get-AzVM -ResourceGroupName $resourceGroup -Name $vmName).Id `
-MetricName "Percentage CPU" `
-StartTime (Get-Date).AddHours(-4) `
-EndTime (Get-Date) `
-TimeGrainInMinutes 60 `
-AggregationType Average
$avgCpu = ($metric.Data | Measure-Object -Property Average -Average).Average
if ($avgCpu -gt $threshold) {
Write-Output "Average CPU: $avgCpu%. Above $threshold%. Starting resize to $newSize..."
Stop-AzVM -ResourceGroupName $resourceGroup -Name $vmName -Force
$vm = Get-AzVM -ResourceGroupName $resourceGroup -Name $vmName
$vm.HardwareProfile.VmSize = $newSize
Update-AzVM -ResourceGroupName $resourceGroup -VM $vm
Start-AzVM -ResourceGroupName $resourceGroup -Name $vmName
Write-Output "Resize completed."
} else {
Write-Output "Average CPU: $avgCpu%. Within limits. Resize not necessary."
}
Bicepβ
// Define VM Size as parameter for flexibility
@description('VM Size SKU')
@allowed([
'Standard_B2s'
'Standard_D2s_v5'
'Standard_D4s_v5'
'Standard_D8s_v5'
'Standard_E4s_v5'
])
param vmSize string = 'Standard_D4s_v5'
resource vm 'Microsoft.Compute/virtualMachines@2023-03-01' = {
name: 'vm-web-01'
location: 'brazilsouth'
properties: {
hardwareProfile: {
vmSize: vmSize // SKU as parameter, not hardcoded
}
// ... rest of configurations
}
}
7. Control and Securityβ
Azure Policy to limit allowed SKUsβ
To avoid using very expensive SKUs or inappropriate SKUs for an environment:
# Built-in policy: "Allowed virtual machine size SKUs"
# Policy ID: cccc23c7-8427-4f53-ad12-b6a63eb452b3
az policy assignment create \
--name "allowed-vm-sizes" \
--display-name "Only approved SKUs for production" \
--policy "cccc23c7-8427-4f53-ad12-b6a63eb452b3" \
--scope "/subscriptions/<sub-id>/resourceGroups/rg-producao" \
--params '{
"listOfAllowedSKUs": {
"value": [
"Standard_D2s_v5",
"Standard_D4s_v5",
"Standard_D8s_v5",
"Standard_E4s_v5",
"Standard_E8s_v5"
]
}
}'
This ensures that no VM can be created or resized to a SKU outside the approved list, avoiding accidental use of expensive SKUs.
vCPU quotas per subscriptionβ
Each subscription has vCPU limits per VM family and per region. Before planning resizes that significantly increase vCPUs, check and request increases:
# See current vCPU quota per family in a region
az vm list-usage \
--location "brazilsouth" \
--query "[?contains(name.value, 'cores')].{Name: name.localizedValue, Current: currentValue, Limit: limit}" \
--output table
# View quota for specific family
az vm list-usage \
--location "brazilsouth" \
--query "[?name.value=='standardDSv5Family'].{Name: name.localizedValue, Current: currentValue, Limit: limit}" \
--output table
8. Decision Makingβ
Family choice by workload typeβ
| Workload | Recommended family | Reason |
|---|---|---|
| Web server, REST API | D-series (D2s-D8s v5) | Balanced 1:4 CPU:RAM ratio, Premium SSD |
| SQL Server database | E-series or M-series | High memory for buffer pool |
| Redis, Memcached, in-memory cache | E-series | High RAM for in-memory data |
| Machine Learning training | N-series (NC, ND) | GPU required for acceleration |
| Machine Learning inference | N-series or F-series | GPU or high CPU per query |
| Java/JVM application server | E-series | JVM benefits from high RAM |
| CI/CD build server | F-series or B-series | High CPU or burstable |
| Dev/test with sporadic usage | B-series | Low cost, burstable CPU |
| SAP HANA | M-series (Mv2) | Up to TBs of RAM |
| NFS file server | L-series or D-series | High local or network I/O |
When to resize vs. add more VMs (scale up vs. scale out)β
| Situation | Approach | Reason |
|---|---|---|
| Single-threaded application with high CPU | Scale up (larger VM) | Application doesn't parallelize |
| Stateless web app with many requests | Scale out (more VMs + LB) | More efficient load distribution |
| Database with insufficient memory | Scale up (more RAM) | DB is stateful, difficult to distribute |
| Batch processing without critical deadline | Scale out with Spot VMs | Minimized cost |
| Development VM with build spikes | Keep B-series | Burstable covers the spikes |
| Application on B-series VM with continuous high CPU | Scale up to D-series | B-series without credits is inefficient |
9. Best Practicesβ
Always check available SKUs before planning a resize.
The az vm list-vm-resize-options command only shows SKUs that are available for that specific VM at that moment. Available SKUs vary by region, Availability Set, and current cluster capacity.
Use Static Public IP for VMs that will be frequently deallocated. Dynamic IPs change with each deallocate/start. For VMs that go through frequent resizing or are shut down at night, configure static IP to maintain consistent addressing.
Monitor CPU and memory for at least 2 weeks before resizing. One week may not capture weekly patterns (Monday with more load, quieter weekends). Use Azure Monitor to get sufficient historical data before sizing decisions.
For production VMs, plan maintenance windows for resizes that require deallocation. Communicate downtime in advance. A resize to a different family takes only a few minutes, but those minutes need to be planned.
Use tags to track justified sizing.
Tags like LastResizeDate, ResizeReason, OriginalSize help audit sizing decisions over time. Without tracking, in 6 months no one knows why a VM was increased in size.
Consider Reserved Instances after sizing is stabilized. Buying a Reserved Instance (RI) of the wrong size is wasteful. Wait for sizing to stabilize for 2-3 months before purchasing reservations. Savings can reach 72% compared to Pay-As-You-Go for the same SKU with a 3-year commitment.
For dev/test environments, use Azure Dev/Test subscriptions and B-series. Dev/test subscription has reduced pricing and B-series have low base cost. A B2s VM for development can cost 60-70% less than an equivalent D2s_v5 in a production subscription.
10. Common Errorsβ
| Error | Why it happens | How to avoid |
|---|---|---|
| Resize fails due to lack of capacity in cluster | Desired SKU without available hardware in current cluster | Deallocate VM first; reallocation to new host has more options |
| Public IP changes after resize that required deallocation | Dynamic IP released during deallocation | Change to static IP before deallocating |
| Temporary disk data lost after resize | Temporary disk is ephemeral, reset on any host operation | Never store persistent data on temporary disk |
| B-series VMs with 100% CPU constantly and no credits | Understanding burstable as continuous usage | B-series is for workloads with low average usage; change to D-series if usage is continuous |
Choose SKU without s suffix and can't use Premium SSD | Not understanding nomenclature | Always verify that SKU has s if Premium Storage is needed |
| VM resize in Availability Set failing | AS limits SKUs to physical cluster | Check list-vm-resize-options to see available SKUs |
| vCPU quota exhausted when trying to scale up | Not checking quota before planning resize | Monitor quota with alerts at 70% and request increase preventively |
| Accelerated Networking disabled after downsize | Smaller SKU doesn't support the feature | Check feature compatibility before downsizing |
The most expensive errorβ
Choosing a production SKU without researching options and staying on it for years without review. A VM created 3 years ago on D_v3 series can be migrated to D_v5 of the same family with more performance for the same price or less. Microsoft regularly launches new generations with better cost-performance. An annual sizing review can generate significant savings.
11. Operation and Maintenanceβ
Inventory and sizing analysis of all VMsβ
# List all VMs with current SKU
az vm list \
--query "[].{Name: name, RG: resourceGroup, Size: hardwareProfile.vmSize, Location: location}" \
--output table
# Via Resource Graph: all VMs and their SKUs across the entire organization
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| project name, resourceGroup, location, subscriptionId, vmSize=properties.hardwareProfile.vmSize
| order by vmSize"
# Identify VMs with old generation SKUs for upgrade
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| where properties.hardwareProfile.vmSize contains '_v3'
or properties.hardwareProfile.vmSize contains '_v2'
| project name, resourceGroup, vmSize=properties.hardwareProfile.vmSize"
Monitor metrics for resize decisionsβ
# Average CPU from last 24 hours
az monitor metrics list \
--resource "/subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-web-01" \
--metric "Percentage CPU" \
--interval PT1H \
--aggregation Average \
--start-time "$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--output table
# Available memory (for memory resize decisions)
az monitor metrics list \
--resource "/subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-web-01" \
--metric "Available Memory Bytes" \
--interval PT1H \
--aggregation Average \
--start-time "$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--output table
Azure Advisor for sizing recommendationsβ
Azure Advisor analyzes VM CPU and memory usage and generates automatic resize recommendations when it detects overprovisioning:
# View Advisor Cost recommendations (includes VM resizing)
az advisor recommendation list \
--category Cost \
--query "[?contains(shortDescription.problem, 'virtual machine') || contains(shortDescription.problem, 'VM')].{
VM: resourceMetadata.resourceId,
Problem: shortDescription.problem,
Solution: shortDescription.solution,
AnnualSavings: extendedProperties.annualSavingsAmount
}" \
--output table
12. Integration and Automationβ
Auto-sizing with Azure Automation and Azure Monitorβ
For environments where sizing needs to be automatically adjusted based on historical metrics:
Terraform with parametric sizing by environmentβ
variable "environment" {
type = string
default = "dev"
}
locals {
vm_sizes = {
dev = "Standard_B2s"
staging = "Standard_D2s_v5"
prod = "Standard_D4s_v5"
}
vm_size = local.vm_sizes[var.environment]
}
resource "azurerm_linux_virtual_machine" "main" {
name = "vm-app-${var.environment}"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
size = local.vm_size # B2s in dev, D4s_v5 in prod
# ... rest of configurations
}
13. Final Summaryβ
Essential points:
- VM Size defines the amount of vCPU, memory, temporary disk, and network capabilities of a VM
- Nomenclature follows the pattern
Standard_[Family][vCPUs][subfamily][features]_v[N]; thessuffix indicates Premium Storage support - Main families: B (burstable), D (general purpose), E (memory optimized), F (compute optimized), N (GPU), L (storage optimized), M (extreme memory)
- Resize within the same family can be done without deallocation (with possible brief reboot); resize to another family requires deallocation
- Deallocation releases dynamic public IPs; use static IP to avoid address change
Critical differences:
- Stop vs. Deallocate: Stop keeps VM on physical host (doesn't change SKU, charges for VM); Deallocate releases host (allows SKU change, doesn't charge for stopped VM)
- Resize vs. Scale out: Resize increases resources of one VM (vertical scaling); Scale out adds more VMs (horizontal scaling)
- B-series vs. D-series: B-series is burstable (variable CPU with credits, lower base cost); D-series has dedicated CPU (consistent performance)
- SKU with
svs. withouts: withssupports Premium SSD and disk caching; withoutslimits to Standard SSD/HDD
What needs to be remembered for AZ-104:
- The CLI command to list resize options for a specific VM is:
az vm list-vm-resize-options - The command to resize is:
az vm resize --size <new-size> - For resize that requires deallocation:
az vm deallocate+az vm resize+az vm start - The built-in policy to restrict SKUs is: "Allowed virtual machine size SKUs" (ID:
cccc23c7-8427-4f53-ad12-b6a63eb452b3) - Available SKUs vary by region; what exists in East US may not exist in brazilsouth
- VMs in Availability Sets have SKUs limited to the AS physical cluster
- The temporary disk has its size determined by the SKU and loses data on every deallocation or resize