Skip to main content

Theoretical Foundation: Manage Virtual Machine Sizes


1. Initial Intuition​

Imagine you're setting up an office for employees with very different profiles. A graphic designer needs a powerful workstation with lots of memory and GPU. A financial analyst needs lots of CPU to run models. An administrative employee who only uses email and simple spreadsheets works well with a basic computer.

Putting everyone on identical machines would be wasteful or insufficient. The right approach is to size each machine for the function it will perform.

In Azure, VM Size is exactly this sizing decision: choosing how many virtual CPUs, how much memory, what type and amount of temporary storage, and what network capacity each VM will have. Each combination receives a standardized name like Standard_D4s_v5 (4 vCPUs, 16 GB RAM, optimized for general use).

Managing VM Sizes means not only choosing the right size at creation, but also knowing how to change that size when the workload evolves, without needing to recreate the VM from scratch.


2. Context​

Why VM Sizes exist as a formal concept​

Microsoft operates hundreds of datacenters with specific physical hardware. For a customer to reserve a slice of this hardware, a standardized catalog of configurations that the hardware supports is necessary. This catalog is the set of available VM Sizes.

From the customer's perspective, VM Sizes exist for:

  • Cost control: pay exactly for what the workload needs
  • Performance guarantee: select hardware optimized for the workload type
  • Scalability: increase or decrease resources without recreating infrastructure

What depends on correct VM Size choice​

  • Cost: most of a VM's cost comes from its size
  • Performance: applications with inadequate sizing are slow or fail
  • Availability: some SKUs aren't available in all regions
  • Functionality: certain capabilities (Ultra SSD, Accelerated Networking, GPU) are only available in specific SKUs

3. Building the Concepts​

3.1 VM Size nomenclature​

Understanding a VM SKU name is essential for navigating the catalog without needing to memorize each option. The structure is:

[Family][SubFamily][Version][Additives]_v[N]

Example: Standard_D4ds_v5

SegmentValueMeaning
TierStandardLevel (Basic was discontinued)
FamilyDGeneral purpose (D = General Purpose)
vCPU count44 vCPUs
Sub-familydLocal NVMe temporary disk
FeaturessPremium Storage support
Versionv55th generation of this SKU

3.2 Subfamily and feature letters​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

The s suffix is especially important: without it, the SKU doesn't support Premium SSD. For example, Standard_D4_v5 doesn't support Premium Storage, but Standard_D4s_v5 does.

3.3 VM families by workload type​

Azure organizes SKUs into families by purpose:

FamilyLetterCharacteristicTypical workloads
General PurposeB, D, D(a/s), DCBalanced CPU:Memory (1:4)Web servers, dev/test, small databases
Compute OptimizedF, FXHigh CPU, less memory (1:2)Intensive web servers, batch processing
Memory OptimizedE, E(a/s), M, Mv2Lots of memory (1:8 or more)SAP HANA, SQL Server, Redis, large caches
Storage OptimizedLHigh I/O on local diskNoSQL, data warehouses, big data
GPUN (NC, ND, NV)NVIDIA GPUsMachine learning, rendering, visualization
High Performance ComputeH, HB, HCInfiniBand, high CPUScientific simulations, CFD, molecular dynamics
Confidential ComputingDCTrusted Execution EnvWorkloads with highly sensitive data

3.4 B series: Burstable VMs​

The B family deserves special explanation as it works differently from others. Burstable VMs accumulate CPU credits when utilization is below baseline and consume them when they need more CPU.

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

B VMs are ideal for workloads with low average CPU usage and sporadic peaks: development servers, test environments, CI/CD servers with periodic builds. They are not suitable for workloads with continuous high CPU usage.

3.5 Resizing: how and when it happens​

Changing the size of an existing VM is called resize. There are two technical scenarios:

Resize within the same hardware family (same cluster):

  • Can often be done without deallocation
  • Azure tries to resize without moving the VM to another physical host
  • May still require brief restart

Resize to another hardware family:

  • Requires deallocation of the VM (stop + deallocate, not just stop)
  • The VM is moved to a different physical host that supports the new SKU
  • Downtime of several minutes
100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

4. Structural Overview​

Decision hierarchy for VM Size choice​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Comparison of most common D family (general purpose) SKUs​

SKUvCPURAMTemp DiskPremium StorageTypical use
Standard_B2s24 GB8 GBYesDev/test, small sites
Standard_D2s_v528 GBNoneYesLight general purpose
Standard_D4s_v5416 GBNoneYesMedium general purpose
Standard_D4ds_v5416 GB150 GB NVMeYesGeneral purpose with temp disk
Standard_D8s_v5832 GBNoneYesIntensive general purpose
Standard_D16s_v51664 GBNoneYesHigh general purpose

5. Practical Operation​

VM resize process​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Important non-obvious behaviors​

Resize without deallocation depends on capacity in current cluster. Even if the new SKU is from the same family, if the current physical host doesn't have capacity to allocate the larger size, the resize will fail. The error message will indicate no capacity is available. The solution is to deallocate and try again (the VM will be allocated on a different host).

Deallocation releases dynamic public IP. If the VM uses a public IP with Dynamic allocation, when deallocating the VM for resize, the IP is released and a new IP is assigned when the VM starts. To avoid this, use public IP with Static allocation before deallocating.

Temporary disk changes size with the SKU. The temporary disk (D: on Windows, /dev/sdb on Linux) has its size determined by the SKU. When resizing to a smaller SKU, the temporary disk shrinks. Data on the temporary disk is lost in any deallocation or resize.

Some functionalities become unavailable when downsizing. If the VM has Accelerated Networking enabled and you resize to a SKU that doesn't support this functionality, it's automatically disabled. The same applies to high-performance disk caches.

VMs in Availability Sets have limited possible SKUs. An Availability Set is mapped to a specific physical cluster. Only SKUs available in that cluster can be used. To use a SKU from another family, it may be necessary to recreate the VM outside the Availability Set.

VMs in Availability Zones have more SKU flexibility. Zones use hardware distributed across multiple datacenters, offering greater variety of available SKUs compared to Availability Sets.


6. Implementation Methods​

Azure Portal​

When to use: one-time resize, visual verification of available SKUs

To resize via portal:

  1. Portal > Virtual Machines > select the VM
  2. Side menu > Size (in Settings)
  3. The portal displays available SKUs for that VM in that region
  4. Select the new SKU
  5. Resize

The portal will automatically show only SKUs that are available for that VM in that configuration (respecting Availability Set, zone, etc.).

Limitation: not reproducible, no version control, impractical for multiple VMs.


Azure CLI​

# List all available SKUs in a region
az vm list-sizes \
--location "brazilsouth" \
--output table

# Filter SKUs by family (e.g., D family)
az vm list-sizes \
--location "brazilsouth" \
--query "[?starts_with(name, 'Standard_D')]" \
--output table

# Check available SKUs for resizing a specific VM
# (considers Availability Set and location)
az vm list-vm-resize-options \
--resource-group "rg-producao" \
--name "vm-web-01" \
--output table

# See current VM SKU
az vm show \
--resource-group "rg-producao" \
--name "vm-web-01" \
--query "hardwareProfile.vmSize" \
--output tsv

# Resize without deallocation (tries first without stopping)
az vm resize \
--resource-group "rg-producao" \
--name "vm-web-01" \
--size "Standard_D4s_v5"

# Resize that requires deallocation (different family)
# Step 1: Preserve public IP if it's dynamic
# Check IP type
az network public-ip show \
--resource-group "rg-producao" \
--name "pip-vm-web-01" \
--query "publicIPAllocationMethod" \
--output tsv

# If Dynamic, change to Static before deallocating
az network public-ip update \
--resource-group "rg-producao" \
--name "pip-vm-web-01" \
--allocation-method Static

# Step 2: Deallocate
az vm deallocate \
--resource-group "rg-producao" \
--name "vm-web-01"

# Step 3: Resize
az vm resize \
--resource-group "rg-producao" \
--name "vm-web-01" \
--size "Standard_E4s_v5"

# Step 4: Start
az vm start \
--resource-group "rg-producao" \
--name "vm-web-01"

# Verify new size
az vm show \
--resource-group "rg-producao" \
--name "vm-web-01" \
--query "hardwareProfile.vmSize" \
--output tsv

# Script: Batch resize of multiple VMs
for vm in vm-web-01 vm-web-02 vm-web-03; do
echo "Resizing $vm to Standard_D4s_v5..."

az vm deallocate \
--resource-group "rg-producao" \
--name "$vm"

az vm resize \
--resource-group "rg-producao" \
--name "$vm" \
--size "Standard_D4s_v5"

az vm start \
--resource-group "rg-producao" \
--name "$vm"

echo "$vm resized successfully."
done

# Check CPU usage to determine if resize is necessary
# (via Azure Monitor)
az monitor metrics list \
--resource "/subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-web-01" \
--metric "Percentage CPU" \
--interval PT1H \
--start-time "$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--aggregation Average \
--output table

Azure PowerShell​

# List available SKUs in a region
Get-AzVMSize -Location "brazilsouth" |
Sort-Object Name |
Format-Table Name, NumberOfCores, MemoryInMB, MaxDataDiskCount

# Filter D family SKUs
Get-AzVMSize -Location "brazilsouth" |
Where-Object { $_.Name -like "Standard_D*" } |
Format-Table Name, NumberOfCores, MemoryInMB

# See available SKUs for a specific VM
Get-AzVMSize `
-ResourceGroupName "rg-producao" `
-VMName "vm-web-01" |
Format-Table Name, NumberOfCores, MemoryInMB

# See current SKU
(Get-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01").HardwareProfile.VmSize

# Resize without deallocation (same family)
$vm = Get-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01"
$vm.HardwareProfile.VmSize = "Standard_D4s_v5"
Update-AzVM -ResourceGroupName "rg-producao" -VM $vm

# Resize with deallocation (different family)
# Step 1: Deallocate
Stop-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01" -Force

# Step 2: Change size
$vm = Get-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01"
$vm.HardwareProfile.VmSize = "Standard_E4s_v5"
Update-AzVM -ResourceGroupName "rg-producao" -VM $vm

# Step 3: Start
Start-AzVM -ResourceGroupName "rg-producao" -Name "vm-web-01"

# Report: list all VMs and their sizes in an RG
Get-AzVM -ResourceGroupName "rg-producao" |
Select-Object Name, @{N="Size"; E={$_.HardwareProfile.VmSize}}, Location |
Format-Table

# Script: Automate resize based on average CPU
$resourceGroup = "rg-producao"
$vmName = "vm-web-01"
$threshold = 80 # CPU % to trigger scale up
$newSize = "Standard_D8s_v5"

$metric = Get-AzMetric `
-ResourceId (Get-AzVM -ResourceGroupName $resourceGroup -Name $vmName).Id `
-MetricName "Percentage CPU" `
-StartTime (Get-Date).AddHours(-4) `
-EndTime (Get-Date) `
-TimeGrainInMinutes 60 `
-AggregationType Average

$avgCpu = ($metric.Data | Measure-Object -Property Average -Average).Average

if ($avgCpu -gt $threshold) {
Write-Output "Average CPU: $avgCpu%. Above $threshold%. Starting resize to $newSize..."
Stop-AzVM -ResourceGroupName $resourceGroup -Name $vmName -Force
$vm = Get-AzVM -ResourceGroupName $resourceGroup -Name $vmName
$vm.HardwareProfile.VmSize = $newSize
Update-AzVM -ResourceGroupName $resourceGroup -VM $vm
Start-AzVM -ResourceGroupName $resourceGroup -Name $vmName
Write-Output "Resize completed."
} else {
Write-Output "Average CPU: $avgCpu%. Within limits. Resize not necessary."
}

Bicep​

// Define VM Size as parameter for flexibility
@description('VM Size SKU')
@allowed([
'Standard_B2s'
'Standard_D2s_v5'
'Standard_D4s_v5'
'Standard_D8s_v5'
'Standard_E4s_v5'
])
param vmSize string = 'Standard_D4s_v5'

resource vm 'Microsoft.Compute/virtualMachines@2023-03-01' = {
name: 'vm-web-01'
location: 'brazilsouth'
properties: {
hardwareProfile: {
vmSize: vmSize // SKU as parameter, not hardcoded
}
// ... rest of configurations
}
}

7. Control and Security​

Azure Policy to limit allowed SKUs​

To avoid using very expensive SKUs or inappropriate SKUs for an environment:

# Built-in policy: "Allowed virtual machine size SKUs"
# Policy ID: cccc23c7-8427-4f53-ad12-b6a63eb452b3

az policy assignment create \
--name "allowed-vm-sizes" \
--display-name "Only approved SKUs for production" \
--policy "cccc23c7-8427-4f53-ad12-b6a63eb452b3" \
--scope "/subscriptions/<sub-id>/resourceGroups/rg-producao" \
--params '{
"listOfAllowedSKUs": {
"value": [
"Standard_D2s_v5",
"Standard_D4s_v5",
"Standard_D8s_v5",
"Standard_E4s_v5",
"Standard_E8s_v5"
]
}
}'

This ensures that no VM can be created or resized to a SKU outside the approved list, avoiding accidental use of expensive SKUs.

vCPU quotas per subscription​

Each subscription has vCPU limits per VM family and per region. Before planning resizes that significantly increase vCPUs, check and request increases:

# See current vCPU quota per family in a region
az vm list-usage \
--location "brazilsouth" \
--query "[?contains(name.value, 'cores')].{Name: name.localizedValue, Current: currentValue, Limit: limit}" \
--output table

# View quota for specific family
az vm list-usage \
--location "brazilsouth" \
--query "[?name.value=='standardDSv5Family'].{Name: name.localizedValue, Current: currentValue, Limit: limit}" \
--output table

8. Decision Making​

Family choice by workload type​

WorkloadRecommended familyReason
Web server, REST APID-series (D2s-D8s v5)Balanced 1:4 CPU:RAM ratio, Premium SSD
SQL Server databaseE-series or M-seriesHigh memory for buffer pool
Redis, Memcached, in-memory cacheE-seriesHigh RAM for in-memory data
Machine Learning trainingN-series (NC, ND)GPU required for acceleration
Machine Learning inferenceN-series or F-seriesGPU or high CPU per query
Java/JVM application serverE-seriesJVM benefits from high RAM
CI/CD build serverF-series or B-seriesHigh CPU or burstable
Dev/test with sporadic usageB-seriesLow cost, burstable CPU
SAP HANAM-series (Mv2)Up to TBs of RAM
NFS file serverL-series or D-seriesHigh local or network I/O

When to resize vs. add more VMs (scale up vs. scale out)​

SituationApproachReason
Single-threaded application with high CPUScale up (larger VM)Application doesn't parallelize
Stateless web app with many requestsScale out (more VMs + LB)More efficient load distribution
Database with insufficient memoryScale up (more RAM)DB is stateful, difficult to distribute
Batch processing without critical deadlineScale out with Spot VMsMinimized cost
Development VM with build spikesKeep B-seriesBurstable covers the spikes
Application on B-series VM with continuous high CPUScale up to D-seriesB-series without credits is inefficient

9. Best Practices​

Always check available SKUs before planning a resize. The az vm list-vm-resize-options command only shows SKUs that are available for that specific VM at that moment. Available SKUs vary by region, Availability Set, and current cluster capacity.

Use Static Public IP for VMs that will be frequently deallocated. Dynamic IPs change with each deallocate/start. For VMs that go through frequent resizing or are shut down at night, configure static IP to maintain consistent addressing.

Monitor CPU and memory for at least 2 weeks before resizing. One week may not capture weekly patterns (Monday with more load, quieter weekends). Use Azure Monitor to get sufficient historical data before sizing decisions.

For production VMs, plan maintenance windows for resizes that require deallocation. Communicate downtime in advance. A resize to a different family takes only a few minutes, but those minutes need to be planned.

Use tags to track justified sizing. Tags like LastResizeDate, ResizeReason, OriginalSize help audit sizing decisions over time. Without tracking, in 6 months no one knows why a VM was increased in size.

Consider Reserved Instances after sizing is stabilized. Buying a Reserved Instance (RI) of the wrong size is wasteful. Wait for sizing to stabilize for 2-3 months before purchasing reservations. Savings can reach 72% compared to Pay-As-You-Go for the same SKU with a 3-year commitment.

For dev/test environments, use Azure Dev/Test subscriptions and B-series. Dev/test subscription has reduced pricing and B-series have low base cost. A B2s VM for development can cost 60-70% less than an equivalent D2s_v5 in a production subscription.


10. Common Errors​

ErrorWhy it happensHow to avoid
Resize fails due to lack of capacity in clusterDesired SKU without available hardware in current clusterDeallocate VM first; reallocation to new host has more options
Public IP changes after resize that required deallocationDynamic IP released during deallocationChange to static IP before deallocating
Temporary disk data lost after resizeTemporary disk is ephemeral, reset on any host operationNever store persistent data on temporary disk
B-series VMs with 100% CPU constantly and no creditsUnderstanding burstable as continuous usageB-series is for workloads with low average usage; change to D-series if usage is continuous
Choose SKU without s suffix and can't use Premium SSDNot understanding nomenclatureAlways verify that SKU has s if Premium Storage is needed
VM resize in Availability Set failingAS limits SKUs to physical clusterCheck list-vm-resize-options to see available SKUs
vCPU quota exhausted when trying to scale upNot checking quota before planning resizeMonitor quota with alerts at 70% and request increase preventively
Accelerated Networking disabled after downsizeSmaller SKU doesn't support the featureCheck feature compatibility before downsizing

The most expensive error​

Choosing a production SKU without researching options and staying on it for years without review. A VM created 3 years ago on D_v3 series can be migrated to D_v5 of the same family with more performance for the same price or less. Microsoft regularly launches new generations with better cost-performance. An annual sizing review can generate significant savings.


11. Operation and Maintenance​

Inventory and sizing analysis of all VMs​

# List all VMs with current SKU
az vm list \
--query "[].{Name: name, RG: resourceGroup, Size: hardwareProfile.vmSize, Location: location}" \
--output table

# Via Resource Graph: all VMs and their SKUs across the entire organization
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| project name, resourceGroup, location, subscriptionId, vmSize=properties.hardwareProfile.vmSize
| order by vmSize"

# Identify VMs with old generation SKUs for upgrade
az graph query -q "
Resources
| where type == 'microsoft.compute/virtualmachines'
| where properties.hardwareProfile.vmSize contains '_v3'
or properties.hardwareProfile.vmSize contains '_v2'
| project name, resourceGroup, vmSize=properties.hardwareProfile.vmSize"

Monitor metrics for resize decisions​

# Average CPU from last 24 hours
az monitor metrics list \
--resource "/subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-web-01" \
--metric "Percentage CPU" \
--interval PT1H \
--aggregation Average \
--start-time "$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--output table

# Available memory (for memory resize decisions)
az monitor metrics list \
--resource "/subscriptions/<sub-id>/resourceGroups/rg-producao/providers/Microsoft.Compute/virtualMachines/vm-web-01" \
--metric "Available Memory Bytes" \
--interval PT1H \
--aggregation Average \
--start-time "$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--output table

Azure Advisor for sizing recommendations​

Azure Advisor analyzes VM CPU and memory usage and generates automatic resize recommendations when it detects overprovisioning:

# View Advisor Cost recommendations (includes VM resizing)
az advisor recommendation list \
--category Cost \
--query "[?contains(shortDescription.problem, 'virtual machine') || contains(shortDescription.problem, 'VM')].{
VM: resourceMetadata.resourceId,
Problem: shortDescription.problem,
Solution: shortDescription.solution,
AnnualSavings: extendedProperties.annualSavingsAmount
}" \
--output table

12. Integration and Automation​

Auto-sizing with Azure Automation and Azure Monitor​

For environments where sizing needs to be automatically adjusted based on historical metrics:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Terraform with parametric sizing by environment​

variable "environment" {
type = string
default = "dev"
}

locals {
vm_sizes = {
dev = "Standard_B2s"
staging = "Standard_D2s_v5"
prod = "Standard_D4s_v5"
}

vm_size = local.vm_sizes[var.environment]
}

resource "azurerm_linux_virtual_machine" "main" {
name = "vm-app-${var.environment}"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
size = local.vm_size # B2s in dev, D4s_v5 in prod

# ... rest of configurations
}

13. Final Summary​

Essential points:

  • VM Size defines the amount of vCPU, memory, temporary disk, and network capabilities of a VM
  • Nomenclature follows the pattern Standard_[Family][vCPUs][subfamily][features]_v[N]; the s suffix indicates Premium Storage support
  • Main families: B (burstable), D (general purpose), E (memory optimized), F (compute optimized), N (GPU), L (storage optimized), M (extreme memory)
  • Resize within the same family can be done without deallocation (with possible brief reboot); resize to another family requires deallocation
  • Deallocation releases dynamic public IPs; use static IP to avoid address change

Critical differences:

  • Stop vs. Deallocate: Stop keeps VM on physical host (doesn't change SKU, charges for VM); Deallocate releases host (allows SKU change, doesn't charge for stopped VM)
  • Resize vs. Scale out: Resize increases resources of one VM (vertical scaling); Scale out adds more VMs (horizontal scaling)
  • B-series vs. D-series: B-series is burstable (variable CPU with credits, lower base cost); D-series has dedicated CPU (consistent performance)
  • SKU with s vs. without s: with s supports Premium SSD and disk caching; without s limits to Standard SSD/HDD

What needs to be remembered for AZ-104:

  • The CLI command to list resize options for a specific VM is: az vm list-vm-resize-options
  • The command to resize is: az vm resize --size <new-size>
  • For resize that requires deallocation: az vm deallocate + az vm resize + az vm start
  • The built-in policy to restrict SKUs is: "Allowed virtual machine size SKUs" (ID: cccc23c7-8427-4f53-ad12-b6a63eb452b3)
  • Available SKUs vary by region; what exists in East US may not exist in brazilsouth
  • VMs in Availability Sets have SKUs limited to the AS physical cluster
  • The temporary disk has its size determined by the SKU and loses data on every deallocation or resize