Theoretical Foundation: Configure an internal or public load balancer
1. Initial Intuitionβ
Imagine a popular restaurant with a single cashier. During peak hours, the line grows, the cashier becomes overwhelmed, and customers wait too long. The obvious solution is to open more cash registers. But then a new problem arises: how do you get customers to choose the registers in a balanced way? You place a host at the entrance who directs the next customers to the register with the shortest line.
That host is the Load Balancer. It receives all incoming connections and distributes them among a set of servers (the registers), so that no server becomes overwhelmed while others are idle.
The Azure Load Balancer operates at layer 4 (transport) of the OSI model: it works with IP addresses and TCP/UDP ports, without inspecting the content of requests. This makes it extremely fast and efficient, but without intelligence about HTTP, cookies, or URLs. For content-aware HTTP load balancing, the correct service is Application Gateway.
2. Contextβ
The Azure Load Balancer is one of the pillars of high availability in Azure. It works together with other components:
Two SKUs exist: Basic (legacy, being discontinued) and Standard (recommended). Just like with public IPs, the Standard SKU is the only one that should be used in new projects. The Basic SKU will be discontinued in September 2025.
3. Building the Conceptsβ
3.1 The Five Components of a Load Balancerβ
An Azure Load Balancer is composed of five elements that need to be configured in logical order:
1. Frontend IP Configuration
The IP that clients use to connect to the Load Balancer. For a public LB, it's a public IP. For an internal LB, it's a private IP from the VNet.
A single Load Balancer can have multiple frontend configurations (multiple IPs), useful for hosting multiple services on the same LB.
2. Backend Pool
The set of resources that receive traffic. Can contain individual VMs (via NIC) or VM Scale Sets. VMs in the pool need to be in the same VNet as the Load Balancer.
In the Standard SKU, the backend pool can contain VMs by NIC (more specific) or by IP + VNet combination.
3. Health Probe
The mechanism that the Load Balancer uses to verify if each VM in the backend is healthy and available to receive traffic.
Probe types:
| Type | Protocol | What it checks |
|---|---|---|
| TCP | TCP | If the TCP port is accepting connections |
| HTTP | HTTP | If the HTTP response is code 200 |
| HTTPS | HTTPS | If the HTTPS response is code 200 |
The TCP probe is the simplest: tests if the port is open. The HTTP/HTTPS probe is more accurate: can verify a specific application endpoint (/healthcheck) that validates if the application is actually functional, not just if the port is open.
If a VM doesn't respond to the probe for a configurable number of consecutive failures (default: 2), it's removed from the distribution pool. When the probe responds successfully again, it's automatically readded.
4. Load Balancing Rule
Defines the mapping between the frontend and backend: "when traffic arrives at the public IP on port 80, distribute it among the VMs in the backend pool on port 80".
Components of a rule:
| Field | Description | Example |
|---|---|---|
| Frontend IP | Which frontend IP receives traffic | Load Balancer's public IP |
| Protocol | TCP or UDP | TCP |
| Frontend port | Port on the frontend | 80 |
| Backend port | Port on backend VMs | 80 (can be different) |
| Backend pool | Which pool receives traffic | pool-vms-web |
| Health probe | Which probe monitors VMs | probe-http-80 |
| Session persistence | If connections from the same client go to the same VM | None / Client IP / Client IP and Protocol |
Port mapping is powerful: you can receive on port 443 on the frontend and redirect to port 8443 on the backend, without changing VM configuration.
Session Persistence (also called sticky sessions or session affinity):
| Option | Behavior |
|---|---|
| None (default) | Each connection is distributed independently (5-tuple hash) |
| Client IP | Connections from the same client IP always go to the same VM (2-tuple hash) |
| Client IP and Protocol | Connections from the same IP+protocol go to the same VM (3-tuple hash) |
5. Inbound NAT Rules
Unlike load balancing rules that distribute to multiple VMs, NAT rules map a specific frontend port to a port on a specific backend VM. They're used for direct access to a specific VM.
Example: port 50001 on LB β RDP on vm-01; port 50002 β RDP on vm-02. This allows accessing VMs individually via RDP without exposing each VM with its own public IP.
3.2 Public vs. Internal Load Balancerβ
| Characteristic | Public Load Balancer | Internal Load Balancer (ILB) |
|---|---|---|
| Frontend IP | Public IP | Private IP from VNet |
| Who accesses | Internet (any source) | Resources within the VNet or connected networks |
| Use cases | Web tier, public APIs | Application tier, databases, internal services |
| NSG required | Yes, to control frontend access | Yes, to control backend access |
The ILB is fundamental in multi-tier architectures: the application tier (backend) is balanced by an ILB, accessible only from the web tier. The web tier is balanced by a public LB.
3.3 Outbound Rules: Internet Exitβ
The Standard Load Balancer also manages outbound traffic to the internet from VMs in the backend pool, through Outbound Rules and the SNAT (Source Network Address Translation) mechanism.
When a VM without its own public IP needs to make an outbound connection to the internet (e.g., download updates, call an external API), the Load Balancer translates the VM's private IP to one of the frontend's public IPs.
Outbound Rules control how many SNAT ports are allocated per VM and which frontend IP is used for outbound traffic.
4. Structural Viewβ
5. How It Works in Practiceβ
How the Distribution Algorithm Worksβ
By default (Session Persistence: None), the Load Balancer uses a 5-tuple hash to determine which VM receives each packet:
- Source IP
- Source port
- Destination IP
- Destination port
- Protocol
The hash result is deterministic: the same set of 5 values will always map to the same VM (while the VM is healthy in the pool). This means that an existing TCP session continues on the same VM throughout its duration, even without explicit session persistence.
What changes when a VM is removed from the pool (health probe failure): existing connections are terminated, and new connections are redistributed among the remaining VMs.
Health Probe: Critical Behaviorsβ
Critical security point: the Load Balancer probe originates from address 168.63.129.16, the same Azure infrastructure IP as Azure DNS. NSGs on backend VMs must have an explicit rule allowing traffic from 168.63.129.16 on probe ports, otherwise VMs will be marked as unhealthy even when they are healthy.
The service tag AzureLoadBalancer in NSGs represents this probe IP and should be used instead of the direct IP for greater robustness.
Floating IP (Direct Server Return)β
The Standard Load Balancer supports Floating IP (also called Direct Server Return): when enabled, the destination IP in the packet that reaches the VM is the Load Balancer frontend IP, not the VM's private IP. The VM needs to have the frontend IP configured on a loopback or second interface.
This is necessary for high availability scenarios with SQL Server AlwaysOn and for some NVAs. For most web scenarios, it's not necessary.
6. Implementation Methodsβ
6.1 Azure Portalβ
When to use: initial creation, configuration exploration, visual diagnostics.
Path: Create a resource > Networking > Load Balancer
The portal wizard guides through creation in logical order: Basics (name, SKU, type) β Frontend IP β Backend Pools β Inbound Rules (Load Balancing Rules and Health Probes) β Outbound Rules.
Portal advantage: visually shows relationships between components and validates conflicts in real time.
6.2 Azure CLIβ
Create Standard public Load Balancer:
# 1. Create public IP for frontend
az network public-ip create \
--name pip-lb-web \
--resource-group rg-networking \
--sku Standard \
--allocation-method Static \
--zone 1 2 3
# 2. Create Load Balancer
az network lb create \
--name lb-web-public \
--resource-group rg-networking \
--sku Standard \
--frontend-ip-name fe-web \
--public-ip-address pip-lb-web \
--backend-pool-name bp-vms-web
# 3. Create HTTPS Health Probe
az network lb probe create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name probe-https-443 \
--protocol Https \
--port 443 \
--path "/health" \
--interval 5 \
--threshold 2
# 4. Create Load Balancing Rule
az network lb rule create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name rule-https-443 \
--frontend-ip-name fe-web \
--frontend-port 443 \
--backend-pool-name bp-vms-web \
--backend-port 443 \
--protocol Tcp \
--probe-name probe-https-443 \
--idle-timeout 15 \
--load-distribution Default
# 5. Add VMs to backend pool
az network nic ip-config address-pool add \
--address-pool bp-vms-web \
--ip-config-name ipconfig1 \
--nic-name nic-vm-web-01 \
--resource-group rg-producao \
--lb-name lb-web-public
az network nic ip-config address-pool add \
--address-pool bp-vms-web \
--ip-config-name ipconfig1 \
--nic-name nic-vm-web-02 \
--resource-group rg-producao \
--lb-name lb-web-public
Create Internal Load Balancer:
az network lb create \
--name lb-app-internal \
--resource-group rg-networking \
--sku Standard \
--frontend-ip-name fe-app \
--private-ip-address 10.0.2.100 \
--vnet-name vnet-producao \
--subnet subnet-application \
--backend-pool-name bp-vms-app
Create Inbound NAT Rule for RDP access to specific VM:
az network lb inbound-nat-rule create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name nat-rdp-vm01 \
--frontend-ip-name fe-web \
--protocol Tcp \
--frontend-port 50001 \
--backend-port 3389
Create Outbound Rule:
az network lb outbound-rule create \
--lb-name lb-web-public \
--resource-group rg-networking \
--name outbound-rule-internet \
--frontend-ip-configs fe-web \
--backend-address-pool bp-vms-web \
--protocol All \
--allocated-outbound-ports 1024 \
--idle-timeout 15
6.3 PowerShellβ
# Public frontend IP
$pip = Get-AzPublicIpAddress -Name "pip-lb-web" -ResourceGroupName "rg-networking"
$feIp = New-AzLoadBalancerFrontendIpConfig -Name "fe-web" -PublicIpAddress $pip
# Backend pool
$backendPool = New-AzLoadBalancerBackendAddressPoolConfig -Name "bp-vms-web"
# Health probe
$probe = New-AzLoadBalancerProbeConfig `
-Name "probe-https-443" `
-Protocol Https `
-Port 443 `
-RequestPath "/health" `
-IntervalInSeconds 5 `
-ProbeCount 2
# Load balancing rule
$rule = New-AzLoadBalancerRuleConfig `
-Name "rule-https-443" `
-FrontendIPConfiguration $feIp `
-BackendAddressPool $backendPool `
-Probe $probe `
-Protocol Tcp `
-FrontendPort 443 `
-BackendPort 443 `
-IdleTimeoutInMinutes 15 `
-LoadDistribution Default
# Create Load Balancer
$lb = New-AzLoadBalancer `
-Name "lb-web-public" `
-ResourceGroupName "rg-networking" `
-Location "brazilsouth" `
-Sku "Standard" `
-FrontendIpConfiguration $feIp `
-BackendAddressPool $backendPool `
-Probe $probe `
-LoadBalancingRule $rule
6.4 Bicepβ
resource loadBalancer 'Microsoft.Network/loadBalancers@2023-05-01' = {
name: 'lb-web-public'
location: location
sku: {
name: 'Standard'
tier: 'Regional'
}
properties: {
frontendIPConfigurations: [
{
name: 'fe-web'
properties: {
publicIPAddress: {
id: publicIp.id
}
}
}
]
backendAddressPools: [
{
name: 'bp-vms-web'
}
]
probes: [
{
name: 'probe-https-443'
properties: {
protocol: 'Https'
port: 443
requestPath: '/health'
intervalInSeconds: 5
numberOfProbes: 2
}
}
]
loadBalancingRules: [
{
name: 'rule-https-443'
properties: {
frontendIPConfiguration: {
id: resourceId('Microsoft.Network/loadBalancers/frontendIPConfigurations', 'lb-web-public', 'fe-web')
}
backendAddressPool: {
id: resourceId('Microsoft.Network/loadBalancers/backendAddressPools', 'lb-web-public', 'bp-vms-web')
}
probe: {
id: resourceId('Microsoft.Network/loadBalancers/probes', 'lb-web-public', 'probe-https-443')
}
protocol: 'Tcp'
frontendPort: 443
backendPort: 443
idleTimeoutInMinutes: 15
loadDistribution: 'Default'
}
}
]
}
}
7. Control and Securityβ
NSG and Load Balancerβ
For a Standard public Load Balancer to function, the NSG associated with backend pool VMs must have rules that:
- Allow traffic from the Load Balancer IP (
AzureLoadBalancertag) to probe ports - Allow traffic on business ports (80, 443, etc.) to VMs
An NSG that blocks AzureLoadBalancer will cause probes to fail, and VMs will be marked as unhealthy, even when they are functional. This is one of the most common errors.
# NSG rule to allow health probes
az network nsg rule create \
--nsg-name nsg-subnet-web \
--resource-group rg-networking \
--name allow-lb-probe \
--priority 100 \
--source-address-prefixes AzureLoadBalancer \
--destination-port-ranges 443 \
--protocol Tcp \
--access Allow
Standard Load Balancer and Availability Zonesβ
The Standard Load Balancer with zone-redundant public IP distributes traffic among VMs in different Availability Zones, ensuring that a zone failure doesn't bring down the service:
8. Decision Makingβ
Load Balancer vs. Application Gateway: which to choose?β
| Criteria | Azure Load Balancer | Application Gateway |
|---|---|---|
| OSI Layer | L4 (TCP/UDP) | L7 (HTTP/HTTPS) |
| HTTP content awareness | No | Yes (URL, headers, cookies) |
| SSL/TLS Termination | No | Yes |
| URL-based routing | No | Yes (/api/* β backend-api) |
| WAF (Web Application Firewall) | No | Yes |
| Cookie-based session affinity | No | Yes |
| Protocols | TCP, UDP | HTTP, HTTPS, WebSocket, gRPC |
| Performance | Microsecond latency | Milliseconds (more processing) |
| Cost | Lower | Higher |
| Typical use case | Database VMs, gaming, non-HTTP layer | REST APIs, websites, HTTP microservices |
Public vs. internal Load Balancer?β
| Scenario | Type | Reason |
|---|---|---|
| Internet-accessible website | Public | External frontend |
| Internal API between microservices | Internal | No external exposure |
| Database layer in multi-tier | Internal | Access only from application layer |
| RDP/SSH VMs via Bastion | Internal (no LB) or NAT Rules | Controlled access |
| UDP streaming service | Public | UDP protocol supported by L4 LB |
9. Best Practicesβ
Use the /health or /healthcheck endpoint in applications for HTTP/HTTPS probes: a TCP probe only checks if the port is open. An application with port 443 open but returning error 500 on all requests will remain in the pool. An HTTP probe against /health can verify database connection, queues, and other internal dependencies, removing the VM from the pool if it's not actually functional.
Configure Availability Zones for production: use VMs distributed across zones and a Load Balancer with zone-redundant frontend. This ensures that a zone failure (hardware, power, network in a datacenter) doesn't bring down the service.
Avoid NAT rules for administrative access in production: use Azure Bastion instead of NAT rules for RDP/SSH. NAT rules expose ports to the internet and require manual mapping management. Bastion is more secure and centralized.
Configure Outbound Rules explicitly for SNAT control: instead of relying on automatic SNAT (which can exhaust ports at high scale), create explicit Outbound Rules with port numbers calculated for the expected volume of outbound connections.
Create a separate probe for each type of verification: if you have both HTTP on port 80 and HTTPS on port 443, create separate probes and link each rule to the appropriate probe. Shared probes between rules for different ports can give false positives.
10. Common Errorsβ
NSG blocking Load Balancer probe
VMs are added to the backend pool, but the Load Balancer marks them as unhealthy. The administrator verifies that the service on the VMs is working (tests directly via private IP). The problem is that the subnet NSG blocks traffic from AzureLoadBalancer. The probes never arrive, the LB considers the VMs offline, and no traffic is distributed. Adding the NSG rule for AzureLoadBalancer resolves immediately.
Using Basic Load Balancer SKU with Standard IP (or vice versa)
The Load Balancer SKU and associated public IP must be the same. Trying to create a Basic LB with Standard IP generates an error. Migration from Basic to Standard requires recreating the Load Balancer.
Creating probe on application port but forgetting to open that port in NSGs
The probe is configured for HTTPS 443, but the NSG only allows 443 traffic from the internet, not from the probe address (AzureLoadBalancer). VMs become unhealthy for the same reason as the previous error.
Not configuring Outbound Rules and exhausting SNAT ports
With many VMs making many simultaneous outbound connections to the internet, the automatic SNAT ports are exhausted. Outbound connections start failing with timeout. The solution is to create explicit Outbound Rules or use a NAT Gateway (recommended for high-scale outbound scenarios).
Using Session Persistence unnecessarily
For stateless applications (that don't depend on session on the same VM), enabling Session Persistence reduces balancing efficiency: if a client makes many requests, they all go to the same VM while others remain idle. Use Session Persistence only when the application actually requires it.
11. Operations and Maintenanceβ
Monitor Load Balancer Healthβ
# Check VM status in backend pool
az network lb address-pool show \
--lb-name lb-web-public \
--name bp-vms-web \
--resource-group rg-networking \
--query "loadBalancerBackendAddresses"
Available metrics in Azure Monitor for Load Balancer:
| Metric | What it measures |
|---|---|
| Data Path Availability | Data path availability (probe success rate) |
| Health Probe Status | Percentage of healthy VMs in backend pool |
| Byte Count | Bytes processed by LB |
| Packet Count | Packets processed |
| SYN Count | SYN packets received |
| SNAT Connection Count | Active and failed SNAT connections |
SNAT Connection Count is especially important: when Failed SNAT Connections starts increasing, it indicates SNAT port exhaustion and the need for more ports via Outbound Rules or NAT Gateway.
Check Effective Load Balancer Rules on a NICβ
az network nic list-effective-nsg \
--name nic-vm-web-01 \
--resource-group rg-producao
Important Limitsβ
| Item | Basic Limit | Standard Limit |
|---|---|---|
| VMs per backend pool | 300 | 1,000 |
| Frontend IPs per LB | 200 | 600 |
| Load Balancing Rules | 250 | 1,500 |
| Inbound NAT Rules | 350 (per VM Scale Set) | 1,500 |
| Health Probes | 25 | 600 |
| Default SNAT ports per VM (without Outbound Rule) | Automatic | 1,024 (configurable) |
12. Integration and Automationβ
Load Balancer with VM Scale Setsβ
The most important Load Balancer integration is with VM Scale Sets: when the VMSS scales (adds VMs), they are automatically added to the Load Balancer backend pool, and when it scales down, they are removed.
# Create VMSS already integrated with Load Balancer
az vmss create \
--name vmss-web \
--resource-group rg-producao \
--image Ubuntu2204 \
--vm-sku Standard_D2s_v3 \
--instance-count 3 \
--vnet-name vnet-producao \
--subnet subnet-web \
--lb lb-web-public \
--backend-pool-name bp-vms-web \
--upgrade-policy-mode automatic
Integration with Azure Monitor and Autoscaleβ
Combine Load Balancer with VMSS Autoscale based on traffic metrics:
13. Final Summaryβ
Essential points:
- Azure Load Balancer operates at Layer 4 (TCP/UDP). For HTTP-aware balancing, use Application Gateway.
- A Load Balancer consists of: Frontend IP, Backend Pool, Health Probe, Load Balancing Rules and optionally Inbound NAT Rules and Outbound Rules.
- Public LB: frontend with public IP, receives internet traffic. Internal LB: frontend with private IP, receives traffic from within the VNet.
- Standard SKU is the only one recommended for new projects. Basic is being deprecated in September 2025.
Critical differences:
- Load Balancing Rule vs. Inbound NAT Rule: the first distributes traffic among all VMs in the pool; the second maps a frontend port to a specific backend VM.
- Session Persistence None (5-tuple hash): each connection is routed independently. Client IP (2-tuple hash): all connections from the same client go to the same VM.
- TCP Probe vs. HTTP/HTTPS Probe: TCP checks if the port is open; HTTP/HTTPS checks if the application responds with code 200, enabling more precise health checks.
- NSG must allow
AzureLoadBalanceron probe ports. Without this, all VMs become unhealthy even when functioning.
What needs to be remembered:
- The probe originates from
168.63.129.16(AzureLoadBalancertag). NSGs that block this IP cause the most frequent error in LB configurations. - VMs in the Standard Load Balancer backend pool don't need their own public IP. The LB manages access and outbound SNAT.
- Standard LB with zone-redundant frontend distributes traffic across zones automatically; cross-zone high availability requires VMs in multiple zones.
- SNAT port exhaustion is a problem in high-scale environments without configured Outbound Rules. Monitor the
SNAT Connection Countmetric. - For administrative access to individual VMs, use Azure Bastion or NAT Rules; never expose RDP/SSH ports directly with public IPs on VMs.