Skip to main content

Theoretical Foundation: Configure an internal or public load balancer


1. Initial Intuition​

Imagine a popular restaurant with a single cashier. During peak hours, the line grows, the cashier becomes overwhelmed, and customers wait too long. The obvious solution is to open more cash registers. But then a new problem arises: how do you get customers to choose the registers in a balanced way? You place a host at the entrance who directs the next customers to the register with the shortest line.

That host is the Load Balancer. It receives all incoming connections and distributes them among a set of servers (the registers), so that no server becomes overwhelmed while others are idle.

The Azure Load Balancer operates at layer 4 (transport) of the OSI model: it works with IP addresses and TCP/UDP ports, without inspecting the content of requests. This makes it extremely fast and efficient, but without intelligence about HTTP, cookies, or URLs. For content-aware HTTP load balancing, the correct service is Application Gateway.


2. Context​

The Azure Load Balancer is one of the pillars of high availability in Azure. It works together with other components:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Two SKUs exist: Basic (legacy, being discontinued) and Standard (recommended). Just like with public IPs, the Standard SKU is the only one that should be used in new projects. The Basic SKU will be discontinued in September 2025.


3. Building the Concepts​

3.1 The Five Components of a Load Balancer​

An Azure Load Balancer is composed of five elements that need to be configured in logical order:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

1. Frontend IP Configuration

The IP that clients use to connect to the Load Balancer. For a public LB, it's a public IP. For an internal LB, it's a private IP from the VNet.

A single Load Balancer can have multiple frontend configurations (multiple IPs), useful for hosting multiple services on the same LB.

2. Backend Pool

The set of resources that receive traffic. Can contain individual VMs (via NIC) or VM Scale Sets. VMs in the pool need to be in the same VNet as the Load Balancer.

In the Standard SKU, the backend pool can contain VMs by NIC (more specific) or by IP + VNet combination.

3. Health Probe

The mechanism that the Load Balancer uses to verify if each VM in the backend is healthy and available to receive traffic.

Probe types:

TypeProtocolWhat it checks
TCPTCPIf the TCP port is accepting connections
HTTPHTTPIf the HTTP response is code 200
HTTPSHTTPSIf the HTTPS response is code 200

The TCP probe is the simplest: tests if the port is open. The HTTP/HTTPS probe is more accurate: can verify a specific application endpoint (/healthcheck) that validates if the application is actually functional, not just if the port is open.

If a VM doesn't respond to the probe for a configurable number of consecutive failures (default: 2), it's removed from the distribution pool. When the probe responds successfully again, it's automatically readded.

4. Load Balancing Rule

Defines the mapping between the frontend and backend: "when traffic arrives at the public IP on port 80, distribute it among the VMs in the backend pool on port 80".

Components of a rule:

FieldDescriptionExample
Frontend IPWhich frontend IP receives trafficLoad Balancer's public IP
ProtocolTCP or UDPTCP
Frontend portPort on the frontend80
Backend portPort on backend VMs80 (can be different)
Backend poolWhich pool receives trafficpool-vms-web
Health probeWhich probe monitors VMsprobe-http-80
Session persistenceIf connections from the same client go to the same VMNone / Client IP / Client IP and Protocol

Port mapping is powerful: you can receive on port 443 on the frontend and redirect to port 8443 on the backend, without changing VM configuration.

Session Persistence (also called sticky sessions or session affinity):

OptionBehavior
None (default)Each connection is distributed independently (5-tuple hash)
Client IPConnections from the same client IP always go to the same VM (2-tuple hash)
Client IP and ProtocolConnections from the same IP+protocol go to the same VM (3-tuple hash)

5. Inbound NAT Rules

Unlike load balancing rules that distribute to multiple VMs, NAT rules map a specific frontend port to a port on a specific backend VM. They're used for direct access to a specific VM.

Example: port 50001 on LB β†’ RDP on vm-01; port 50002 β†’ RDP on vm-02. This allows accessing VMs individually via RDP without exposing each VM with its own public IP.

3.2 Public vs. Internal Load Balancer​

CharacteristicPublic Load BalancerInternal Load Balancer (ILB)
Frontend IPPublic IPPrivate IP from VNet
Who accessesInternet (any source)Resources within the VNet or connected networks
Use casesWeb tier, public APIsApplication tier, databases, internal services
NSG requiredYes, to control frontend accessYes, to control backend access

The ILB is fundamental in multi-tier architectures: the application tier (backend) is balanced by an ILB, accessible only from the web tier. The web tier is balanced by a public LB.

3.3 Outbound Rules: Internet Exit​

The Standard Load Balancer also manages outbound traffic to the internet from VMs in the backend pool, through Outbound Rules and the SNAT (Source Network Address Translation) mechanism.

When a VM without its own public IP needs to make an outbound connection to the internet (e.g., download updates, call an external API), the Load Balancer translates the VM's private IP to one of the frontend's public IPs.

Outbound Rules control how many SNAT ports are allocated per VM and which frontend IP is used for outbound traffic.


4. Structural View​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

5. How It Works in Practice​

How the Distribution Algorithm Works​

By default (Session Persistence: None), the Load Balancer uses a 5-tuple hash to determine which VM receives each packet:

  1. Source IP
  2. Source port
  3. Destination IP
  4. Destination port
  5. Protocol

The hash result is deterministic: the same set of 5 values will always map to the same VM (while the VM is healthy in the pool). This means that an existing TCP session continues on the same VM throughout its duration, even without explicit session persistence.

What changes when a VM is removed from the pool (health probe failure): existing connections are terminated, and new connections are redistributed among the remaining VMs.

Health Probe: Critical Behaviors​

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

Critical security point: the Load Balancer probe originates from address 168.63.129.16, the same Azure infrastructure IP as Azure DNS. NSGs on backend VMs must have an explicit rule allowing traffic from 168.63.129.16 on probe ports, otherwise VMs will be marked as unhealthy even when they are healthy.

The service tag AzureLoadBalancer in NSGs represents this probe IP and should be used instead of the direct IP for greater robustness.

Floating IP (Direct Server Return)​

The Standard Load Balancer supports Floating IP (also called Direct Server Return): when enabled, the destination IP in the packet that reaches the VM is the Load Balancer frontend IP, not the VM's private IP. The VM needs to have the frontend IP configured on a loopback or second interface.

This is necessary for high availability scenarios with SQL Server AlwaysOn and for some NVAs. For most web scenarios, it's not necessary.


6. Implementation Methods​

6.1 Azure Portal​

When to use: initial creation, configuration exploration, visual diagnostics.

Path: Create a resource > Networking > Load Balancer

The portal wizard guides through creation in logical order: Basics (name, SKU, type) β†’ Frontend IP β†’ Backend Pools β†’ Inbound Rules (Load Balancing Rules and Health Probes) β†’ Outbound Rules.

Portal advantage: visually shows relationships between components and validates conflicts in real time.

6.2 Azure CLI​

Create Standard public Load Balancer:

# 1. Create public IP for frontend
az network public-ip create \
--name pip-lb-web \
--resource-group rg-networking \
--sku Standard \
--allocation-method Static \
--zone 1 2 3

# 2. Create Load Balancer
az network lb create \
--name lb-web-public \
--resource-group rg-networking \
--sku Standard \
--frontend-ip-name fe-web \
--public-ip-address pip-lb-web \
--backend-pool-name bp-vms-web

# 3. Create HTTPS Health Probe
az network lb probe create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name probe-https-443 \
--protocol Https \
--port 443 \
--path "/health" \
--interval 5 \
--threshold 2

# 4. Create Load Balancing Rule
az network lb rule create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name rule-https-443 \
--frontend-ip-name fe-web \
--frontend-port 443 \
--backend-pool-name bp-vms-web \
--backend-port 443 \
--protocol Tcp \
--probe-name probe-https-443 \
--idle-timeout 15 \
--load-distribution Default

# 5. Add VMs to backend pool
az network nic ip-config address-pool add \
--address-pool bp-vms-web \
--ip-config-name ipconfig1 \
--nic-name nic-vm-web-01 \
--resource-group rg-producao \
--lb-name lb-web-public

az network nic ip-config address-pool add \
--address-pool bp-vms-web \
--ip-config-name ipconfig1 \
--nic-name nic-vm-web-02 \
--resource-group rg-producao \
--lb-name lb-web-public

Create Internal Load Balancer:

az network lb create \
--name lb-app-internal \
--resource-group rg-networking \
--sku Standard \
--frontend-ip-name fe-app \
--private-ip-address 10.0.2.100 \
--vnet-name vnet-producao \
--subnet subnet-application \
--backend-pool-name bp-vms-app

Create Inbound NAT Rule for RDP access to specific VM:

az network lb inbound-nat-rule create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name nat-rdp-vm01 \
--frontend-ip-name fe-web \
--protocol Tcp \
--frontend-port 50001 \
--backend-port 3389

Create Outbound Rule:

az network lb outbound-rule create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name outbound-rule-internet \
--frontend-ip-configs fe-web \
--backend-address-pool bp-vms-web \
--protocol All \
--allocated-outbound-ports 1024 \
--idle-timeout 15

6.3 PowerShell​

# Public frontend IP
$pip = Get-AzPublicIpAddress -Name "pip-lb-web" -ResourceGroupName "rg-networking"
$feIp = New-AzLoadBalancerFrontendIpConfig -Name "fe-web" -PublicIpAddress $pip

# Backend pool
$backendPool = New-AzLoadBalancerBackendAddressPoolConfig -Name "bp-vms-web"

# Health probe
$probe = New-AzLoadBalancerProbeConfig `
-Name "probe-https-443" `
-Protocol Https `
-Port 443 `
-RequestPath "/health" `
-IntervalInSeconds 5 `
-ProbeCount 2

# Load balancing rule
$rule = New-AzLoadBalancerRuleConfig `
-Name "rule-https-443" `
-FrontendIPConfiguration $feIp `
-BackendAddressPool $backendPool `
-Probe $probe `
-Protocol Tcp `
-FrontendPort 443 `
-BackendPort 443 `
-IdleTimeoutInMinutes 15 `
-LoadDistribution Default

# Create Load Balancer
$lb = New-AzLoadBalancer `
-Name "lb-web-public" `
-ResourceGroupName "rg-networking" `
-Location "brazilsouth" `
-Sku "Standard" `
-FrontendIpConfiguration $feIp `
-BackendAddressPool $backendPool `
-Probe $probe `
-LoadBalancingRule $rule

6.4 Bicep​

resource loadBalancer 'Microsoft.Network/loadBalancers@2023-05-01' = {
name: 'lb-web-public'
location: location
sku: {
name: 'Standard'
tier: 'Regional'
}
properties: {
frontendIPConfigurations: [
{
name: 'fe-web'
properties: {
publicIPAddress: {
id: publicIp.id
}
}
}
]
backendAddressPools: [
{
name: 'bp-vms-web'
}
]
probes: [
{
name: 'probe-https-443'
properties: {
protocol: 'Https'
port: 443
requestPath: '/health'
intervalInSeconds: 5
numberOfProbes: 2
}
}
]
loadBalancingRules: [
{
name: 'rule-https-443'
properties: {
frontendIPConfiguration: {
id: resourceId('Microsoft.Network/loadBalancers/frontendIPConfigurations', 'lb-web-public', 'fe-web')
}
backendAddressPool: {
id: resourceId('Microsoft.Network/loadBalancers/backendAddressPools', 'lb-web-public', 'bp-vms-web')
}
probe: {
id: resourceId('Microsoft.Network/loadBalancers/probes', 'lb-web-public', 'probe-https-443')
}
protocol: 'Tcp'
frontendPort: 443
backendPort: 443
idleTimeoutInMinutes: 15
loadDistribution: 'Default'
}
}
]
}
}

7. Control and Security​

NSG and Load Balancer​

For a Standard public Load Balancer to function, the NSG associated with backend pool VMs must have rules that:

  1. Allow traffic from the Load Balancer IP (AzureLoadBalancer tag) to probe ports
  2. Allow traffic on business ports (80, 443, etc.) to VMs

An NSG that blocks AzureLoadBalancer will cause probes to fail, and VMs will be marked as unhealthy, even when they are functional. This is one of the most common errors.

# NSG rule to allow health probes
az network nsg rule create \
--nsg-name nsg-subnet-web \
--resource-group rg-networking \
--name allow-lb-probe \
--priority 100 \
--source-address-prefixes AzureLoadBalancer \
--destination-port-ranges 443 \
--protocol Tcp \
--access Allow

Standard Load Balancer and Availability Zones​

The Standard Load Balancer with zone-redundant public IP distributes traffic among VMs in different Availability Zones, ensuring that a zone failure doesn't bring down the service:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

8. Decision Making​

Load Balancer vs. Application Gateway: which to choose?​

CriteriaAzure Load BalancerApplication Gateway
OSI LayerL4 (TCP/UDP)L7 (HTTP/HTTPS)
HTTP content awarenessNoYes (URL, headers, cookies)
SSL/TLS TerminationNoYes
URL-based routingNoYes (/api/* β†’ backend-api)
WAF (Web Application Firewall)NoYes
Cookie-based session affinityNoYes
ProtocolsTCP, UDPHTTP, HTTPS, WebSocket, gRPC
PerformanceMicrosecond latencyMilliseconds (more processing)
CostLowerHigher
Typical use caseDatabase VMs, gaming, non-HTTP layerREST APIs, websites, HTTP microservices

Public vs. internal Load Balancer?​

ScenarioTypeReason
Internet-accessible websitePublicExternal frontend
Internal API between microservicesInternalNo external exposure
Database layer in multi-tierInternalAccess only from application layer
RDP/SSH VMs via BastionInternal (no LB) or NAT RulesControlled access
UDP streaming servicePublicUDP protocol supported by L4 LB

9. Best Practices​

Use the /health or /healthcheck endpoint in applications for HTTP/HTTPS probes: a TCP probe only checks if the port is open. An application with port 443 open but returning error 500 on all requests will remain in the pool. An HTTP probe against /health can verify database connection, queues, and other internal dependencies, removing the VM from the pool if it's not actually functional.

Configure Availability Zones for production: use VMs distributed across zones and a Load Balancer with zone-redundant frontend. This ensures that a zone failure (hardware, power, network in a datacenter) doesn't bring down the service.

Avoid NAT rules for administrative access in production: use Azure Bastion instead of NAT rules for RDP/SSH. NAT rules expose ports to the internet and require manual mapping management. Bastion is more secure and centralized.

Configure Outbound Rules explicitly for SNAT control: instead of relying on automatic SNAT (which can exhaust ports at high scale), create explicit Outbound Rules with port numbers calculated for the expected volume of outbound connections.

Create a separate probe for each type of verification: if you have both HTTP on port 80 and HTTPS on port 443, create separate probes and link each rule to the appropriate probe. Shared probes between rules for different ports can give false positives.


10. Common Errors​

NSG blocking Load Balancer probe

VMs are added to the backend pool, but the Load Balancer marks them as unhealthy. The administrator verifies that the service on the VMs is working (tests directly via private IP). The problem is that the subnet NSG blocks traffic from AzureLoadBalancer. The probes never arrive, the LB considers the VMs offline, and no traffic is distributed. Adding the NSG rule for AzureLoadBalancer resolves immediately.

Using Basic Load Balancer SKU with Standard IP (or vice versa)

The Load Balancer SKU and associated public IP must be the same. Trying to create a Basic LB with Standard IP generates an error. Migration from Basic to Standard requires recreating the Load Balancer.

Creating probe on application port but forgetting to open that port in NSGs

The probe is configured for HTTPS 443, but the NSG only allows 443 traffic from the internet, not from the probe address (AzureLoadBalancer). VMs become unhealthy for the same reason as the previous error.

Not configuring Outbound Rules and exhausting SNAT ports

With many VMs making many simultaneous outbound connections to the internet, the automatic SNAT ports are exhausted. Outbound connections start failing with timeout. The solution is to create explicit Outbound Rules or use a NAT Gateway (recommended for high-scale outbound scenarios).

Using Session Persistence unnecessarily

For stateless applications (that don't depend on session on the same VM), enabling Session Persistence reduces balancing efficiency: if a client makes many requests, they all go to the same VM while others remain idle. Use Session Persistence only when the application actually requires it.


11. Operations and Maintenance​

Monitor Load Balancer Health​

# Check VM status in backend pool
az network lb address-pool show \
--lb-name lb-web-public \
--name bp-vms-web \
--resource-group rg-networking \
--query "loadBalancerBackendAddresses"

Available metrics in Azure Monitor for Load Balancer:

MetricWhat it measures
Data Path AvailabilityData path availability (probe success rate)
Health Probe StatusPercentage of healthy VMs in backend pool
Byte CountBytes processed by LB
Packet CountPackets processed
SYN CountSYN packets received
SNAT Connection CountActive and failed SNAT connections

SNAT Connection Count is especially important: when Failed SNAT Connections starts increasing, it indicates SNAT port exhaustion and the need for more ports via Outbound Rules or NAT Gateway.

Check Effective Load Balancer Rules on a NIC​

az network nic list-effective-nsg \
--name nic-vm-web-01 \
--resource-group rg-producao

Important Limits​

ItemBasic LimitStandard Limit
VMs per backend pool3001,000
Frontend IPs per LB200600
Load Balancing Rules2501,500
Inbound NAT Rules350 (per VM Scale Set)1,500
Health Probes25600
Default SNAT ports per VM (without Outbound Rule)Automatic1,024 (configurable)

12. Integration and Automation​

Load Balancer with VM Scale Sets​

The most important Load Balancer integration is with VM Scale Sets: when the VMSS scales (adds VMs), they are automatically added to the Load Balancer backend pool, and when it scales down, they are removed.

# Create VMSS already integrated with Load Balancer
az vmss create \
--name vmss-web \
--resource-group rg-producao \
--image Ubuntu2204 \
--vm-sku Standard_D2s_v3 \
--instance-count 3 \
--vnet-name vnet-producao \
--subnet subnet-web \
--lb lb-web-public \
--backend-pool-name bp-vms-web \
--upgrade-policy-mode automatic

Integration with Azure Monitor and Autoscale​

Combine Load Balancer with VMSS Autoscale based on traffic metrics:

100%
Scroll para zoom Β· Arraste para mover Β· πŸ“± Pinch para zoom no celular

13. Final Summary​

Essential points:

  • Azure Load Balancer operates at Layer 4 (TCP/UDP). For HTTP-aware balancing, use Application Gateway.
  • A Load Balancer consists of: Frontend IP, Backend Pool, Health Probe, Load Balancing Rules and optionally Inbound NAT Rules and Outbound Rules.
  • Public LB: frontend with public IP, receives internet traffic. Internal LB: frontend with private IP, receives traffic from within the VNet.
  • Standard SKU is the only one recommended for new projects. Basic is being deprecated in September 2025.

Critical differences:

  • Load Balancing Rule vs. Inbound NAT Rule: the first distributes traffic among all VMs in the pool; the second maps a frontend port to a specific backend VM.
  • Session Persistence None (5-tuple hash): each connection is routed independently. Client IP (2-tuple hash): all connections from the same client go to the same VM.
  • TCP Probe vs. HTTP/HTTPS Probe: TCP checks if the port is open; HTTP/HTTPS checks if the application responds with code 200, enabling more precise health checks.
  • NSG must allow AzureLoadBalancer on probe ports. Without this, all VMs become unhealthy even when functioning.

What needs to be remembered:

  • The probe originates from 168.63.129.16 (AzureLoadBalancer tag). NSGs that block this IP cause the most frequent error in LB configurations.
  • VMs in the Standard Load Balancer backend pool don't need their own public IP. The LB manages access and outbound SNAT.
  • Standard LB with zone-redundant frontend distributes traffic across zones automatically; cross-zone high availability requires VMs in multiple zones.
  • SNAT port exhaustion is a problem in high-scale environments without configured Outbound Rules. Monitor the SNAT Connection Count metric.
  • For administrative access to individual VMs, use Azure Bastion or NAT Rules; never expose RDP/SSH ports directly with public IPs on VMs.