Theoretical Foundation: Configure an internal or public load balancer

1. Initial Intuition

Imagine a popular restaurant with a single cashier. During peak hours, the line grows, the cashier becomes overwhelmed, and customers wait too long. The obvious solution is to open more cash registers. But then a new problem arises: how do you get customers to choose the registers in a balanced way? You place a host at the entrance who directs the next customers to the register with the shortest line.

That host is the Load Balancer. It receives all incoming connections and distributes them among a set of servers (the registers), so that no server becomes overwhelmed while others are idle.

The Azure Load Balancer operates at layer 4 (transport) of the OSI model: it works with IP addresses and TCP/UDP ports, without inspecting the content of requests. This makes it extremely fast and efficient, but without intelligence about HTTP, cookies, or URLs. For content-aware HTTP load balancing, the correct service is Application Gateway.

2. Context

The Azure Load Balancer is one of the pillars of high availability in Azure. It works together with other components:

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Two SKUs exist: Basic (legacy, being discontinued) and Standard (recommended). Just like with public IPs, the Standard SKU is the only one that should be used in new projects. The Basic SKU will be discontinued in September 2025.

3. Building the Concepts

3.1 The Five Components of a Load Balancer

An Azure Load Balancer is composed of five elements that need to be configured in logical order:

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

1. Frontend IP Configuration

The IP that clients use to connect to the Load Balancer. For a public LB, it's a public IP. For an internal LB, it's a private IP from the VNet.

A single Load Balancer can have multiple frontend configurations (multiple IPs), useful for hosting multiple services on the same LB.

2. Backend Pool

The set of resources that receive traffic. Can contain individual VMs (via NIC) or VM Scale Sets. VMs in the pool need to be in the same VNet as the Load Balancer.

In the Standard SKU, the backend pool can contain VMs by NIC (more specific) or by IP + VNet combination.

3. Health Probe

The mechanism that the Load Balancer uses to verify if each VM in the backend is healthy and available to receive traffic.

Probe types:

Type	Protocol	What it checks
TCP	TCP	If the TCP port is accepting connections
HTTP	HTTP	If the HTTP response is code 200
HTTPS	HTTPS	If the HTTPS response is code 200

The TCP probe is the simplest: tests if the port is open. The HTTP/HTTPS probe is more accurate: can verify a specific application endpoint (/healthcheck) that validates if the application is actually functional, not just if the port is open.

If a VM doesn't respond to the probe for a configurable number of consecutive failures (default: 2), it's removed from the distribution pool. When the probe responds successfully again, it's automatically readded.

4. Load Balancing Rule

Defines the mapping between the frontend and backend: "when traffic arrives at the public IP on port 80, distribute it among the VMs in the backend pool on port 80".

Components of a rule:

Field	Description	Example
Frontend IP	Which frontend IP receives traffic	Load Balancer's public IP
Protocol	TCP or UDP	TCP
Frontend port	Port on the frontend	80
Backend port	Port on backend VMs	80 (can be different)
Backend pool	Which pool receives traffic	pool-vms-web
Health probe	Which probe monitors VMs	probe-http-80
Session persistence	If connections from the same client go to the same VM	None / Client IP / Client IP and Protocol

Port mapping is powerful: you can receive on port 443 on the frontend and redirect to port 8443 on the backend, without changing VM configuration.

Session Persistence (also called sticky sessions or session affinity):

Option	Behavior
None (default)	Each connection is distributed independently (5-tuple hash)
Client IP	Connections from the same client IP always go to the same VM (2-tuple hash)
Client IP and Protocol	Connections from the same IP+protocol go to the same VM (3-tuple hash)

5. Inbound NAT Rules

Unlike load balancing rules that distribute to multiple VMs, NAT rules map a specific frontend port to a port on a specific backend VM. They're used for direct access to a specific VM.

Example: port 50001 on LB → RDP on vm-01; port 50002 → RDP on vm-02. This allows accessing VMs individually via RDP without exposing each VM with its own public IP.

3.2 Public vs. Internal Load Balancer

Characteristic	Public Load Balancer	Internal Load Balancer (ILB)
Frontend IP	Public IP	Private IP from VNet
Who accesses	Internet (any source)	Resources within the VNet or connected networks
Use cases	Web tier, public APIs	Application tier, databases, internal services
NSG required	Yes, to control frontend access	Yes, to control backend access

The ILB is fundamental in multi-tier architectures: the application tier (backend) is balanced by an ILB, accessible only from the web tier. The web tier is balanced by a public LB.

3.3 Outbound Rules: Internet Exit

The Standard Load Balancer also manages outbound traffic to the internet from VMs in the backend pool, through Outbound Rules and the SNAT (Source Network Address Translation) mechanism.

When a VM without its own public IP needs to make an outbound connection to the internet (e.g., download updates, call an external API), the Load Balancer translates the VM's private IP to one of the frontend's public IPs.

Outbound Rules control how many SNAT ports are allocated per VM and which frontend IP is used for outbound traffic.

4. Structural View

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

5. How It Works in Practice

How the Distribution Algorithm Works

By default (Session Persistence: None), the Load Balancer uses a 5-tuple hash to determine which VM receives each packet:

Source IP
Source port
Destination IP
Destination port
Protocol

The hash result is deterministic: the same set of 5 values will always map to the same VM (while the VM is healthy in the pool). This means that an existing TCP session continues on the same VM throughout its duration, even without explicit session persistence.

What changes when a VM is removed from the pool (health probe failure): existing connections are terminated, and new connections are redistributed among the remaining VMs.

Health Probe: Critical Behaviors

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

Critical security point: the Load Balancer probe originates from address 168.63.129.16, the same Azure infrastructure IP as Azure DNS. NSGs on backend VMs must have an explicit rule allowing traffic from 168.63.129.16 on probe ports, otherwise VMs will be marked as unhealthy even when they are healthy.

The service tag AzureLoadBalancer in NSGs represents this probe IP and should be used instead of the direct IP for greater robustness.

Floating IP (Direct Server Return)

The Standard Load Balancer supports Floating IP (also called Direct Server Return): when enabled, the destination IP in the packet that reaches the VM is the Load Balancer frontend IP, not the VM's private IP. The VM needs to have the frontend IP configured on a loopback or second interface.

This is necessary for high availability scenarios with SQL Server AlwaysOn and for some NVAs. For most web scenarios, it's not necessary.

6. Implementation Methods

6.1 Azure Portal

When to use: initial creation, configuration exploration, visual diagnostics.

Path: Create a resource > Networking > Load Balancer

The portal wizard guides through creation in logical order: Basics (name, SKU, type) → Frontend IP → Backend Pools → Inbound Rules (Load Balancing Rules and Health Probes) → Outbound Rules.

Portal advantage: visually shows relationships between components and validates conflicts in real time.

6.2 Azure CLI

Create Standard public Load Balancer:

# 1. Create public IP for frontend
az network public-ip create \
  --name pip-lb-web \
  --resource-group rg-networking \
  --sku Standard \
  --allocation-method Static \
  --zone 1 2 3

# 2. Create Load Balancer
az network lb create \
  --name lb-web-public \
  --resource-group rg-networking \
  --sku Standard \
  --frontend-ip-name fe-web \
  --public-ip-address pip-lb-web \
  --backend-pool-name bp-vms-web

# 3. Create HTTPS Health Probe
az network lb probe create \
  --lb-name lb-web-public \
  --resource-group rg-networking \
  --name probe-https-443 \
  --protocol Https \
  --port 443 \
  --path "/health" \
  --interval 5 \
  --threshold 2

# 4. Create Load Balancing Rule
az network lb rule create \
  --lb-name lb-web-public \
  --resource-group rg-networking \
  --name rule-https-443 \
  --frontend-ip-name fe-web \
  --frontend-port 443 \
  --backend-pool-name bp-vms-web \
  --backend-port 443 \
  --protocol Tcp \
  --probe-name probe-https-443 \
  --idle-timeout 15 \
  --load-distribution Default

# 5. Add VMs to backend pool
az network nic ip-config address-pool add \
  --address-pool bp-vms-web \
  --ip-config-name ipconfig1 \
  --nic-name nic-vm-web-01 \
  --resource-group rg-producao \
  --lb-name lb-web-public

az network nic ip-config address-pool add \
  --address-pool bp-vms-web \
  --ip-config-name ipconfig1 \
  --nic-name nic-vm-web-02 \
  --resource-group rg-producao \
  --lb-name lb-web-public

Create Internal Load Balancer:

az network lb create \
  --name lb-app-internal \
  --resource-group rg-networking \
  --sku Standard \
  --frontend-ip-name fe-app \
  --private-ip-address 10.0.2.100 \
  --vnet-name vnet-producao \
  --subnet subnet-application \
  --backend-pool-name bp-vms-app

Create Inbound NAT Rule for RDP access to specific VM:

az network lb inbound-nat-rule create \
  --lb-name lb-web-public \
  --resource-group rg-networking \
  --name nat-rdp-vm01 \
  --frontend-ip-name fe-web \
  --protocol Tcp \
  --frontend-port 50001 \
  --backend-port 3389

Create Outbound Rule:

az network lb outbound-rule create \
  --lb-name lb-web-public \
  --resource-group rg-networking \
  --name outbound-rule-internet \
  --frontend-ip-configs fe-web \
  --backend-address-pool bp-vms-web \
  --protocol All \
  --allocated-outbound-ports 1024 \
  --idle-timeout 15

6.3 PowerShell

# Public frontend IP
$pip = Get-AzPublicIpAddress -Name "pip-lb-web" -ResourceGroupName "rg-networking"
$feIp = New-AzLoadBalancerFrontendIpConfig -Name "fe-web" -PublicIpAddress $pip

# Backend pool
$backendPool = New-AzLoadBalancerBackendAddressPoolConfig -Name "bp-vms-web"

# Health probe
$probe = New-AzLoadBalancerProbeConfig `
  -Name "probe-https-443" `
  -Protocol Https `
  -Port 443 `
  -RequestPath "/health" `
  -IntervalInSeconds 5 `
  -ProbeCount 2

# Load balancing rule
$rule = New-AzLoadBalancerRuleConfig `
  -Name "rule-https-443" `
  -FrontendIPConfiguration $feIp `
  -BackendAddressPool $backendPool `
  -Probe $probe `
  -Protocol Tcp `
  -FrontendPort 443 `
  -BackendPort 443 `
  -IdleTimeoutInMinutes 15 `
  -LoadDistribution Default

# Create Load Balancer
$lb = New-AzLoadBalancer `
  -Name "lb-web-public" `
  -ResourceGroupName "rg-networking" `
  -Location "brazilsouth" `
  -Sku "Standard" `
  -FrontendIpConfiguration $feIp `
  -BackendAddressPool $backendPool `
  -Probe $probe `
  -LoadBalancingRule $rule

6.4 Bicep

resource loadBalancer 'Microsoft.Network/loadBalancers@2023-05-01' = {
  name: 'lb-web-public'
  location: location
  sku: {
    name: 'Standard'
    tier: 'Regional'
  }
  properties: {
    frontendIPConfigurations: [
      {
        name: 'fe-web'
        properties: {
          publicIPAddress: {
            id: publicIp.id
          }
        }
      }
    ]
    backendAddressPools: [
      {
        name: 'bp-vms-web'
      }
    ]
    probes: [
      {
        name: 'probe-https-443'
        properties: {
          protocol: 'Https'
          port: 443
          requestPath: '/health'
          intervalInSeconds: 5
          numberOfProbes: 2
        }
      }
    ]
    loadBalancingRules: [
      {
        name: 'rule-https-443'
        properties: {
          frontendIPConfiguration: {
            id: resourceId('Microsoft.Network/loadBalancers/frontendIPConfigurations', 'lb-web-public', 'fe-web')
          }
          backendAddressPool: {
            id: resourceId('Microsoft.Network/loadBalancers/backendAddressPools', 'lb-web-public', 'bp-vms-web')
          }
          probe: {
            id: resourceId('Microsoft.Network/loadBalancers/probes', 'lb-web-public', 'probe-https-443')
          }
          protocol: 'Tcp'
          frontendPort: 443
          backendPort: 443
          idleTimeoutInMinutes: 15
          loadDistribution: 'Default'
        }
      }
    ]
  }
}

7. Control and Security

NSG and Load Balancer

For a Standard public Load Balancer to function, the NSG associated with backend pool VMs must have rules that:

Allow traffic from the Load Balancer IP (AzureLoadBalancer tag) to probe ports
Allow traffic on business ports (80, 443, etc.) to VMs

An NSG that blocks AzureLoadBalancer will cause probes to fail, and VMs will be marked as unhealthy, even when they are functional. This is one of the most common errors.

# NSG rule to allow health probes
az network nsg rule create \
  --nsg-name nsg-subnet-web \
  --resource-group rg-networking \
  --name allow-lb-probe \
  --priority 100 \
  --source-address-prefixes AzureLoadBalancer \
  --destination-port-ranges 443 \
  --protocol Tcp \
  --access Allow

Standard Load Balancer and Availability Zones

The Standard Load Balancer with zone-redundant public IP distributes traffic among VMs in different Availability Zones, ensuring that a zone failure doesn't bring down the service:

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

8. Decision Making

Load Balancer vs. Application Gateway: which to choose?

Criteria	Azure Load Balancer	Application Gateway
OSI Layer	L4 (TCP/UDP)	L7 (HTTP/HTTPS)
HTTP content awareness	No	Yes (URL, headers, cookies)
SSL/TLS Termination	No	Yes
URL-based routing	No	Yes (`/api/*` → backend-api)
WAF (Web Application Firewall)	No	Yes
Cookie-based session affinity	No	Yes
Protocols	TCP, UDP	HTTP, HTTPS, WebSocket, gRPC
Performance	Microsecond latency	Milliseconds (more processing)
Cost	Lower	Higher
Typical use case	Database VMs, gaming, non-HTTP layer	REST APIs, websites, HTTP microservices

Public vs. internal Load Balancer?

Scenario	Type	Reason
Internet-accessible website	Public	External frontend
Internal API between microservices	Internal	No external exposure
Database layer in multi-tier	Internal	Access only from application layer
RDP/SSH VMs via Bastion	Internal (no LB) or NAT Rules	Controlled access
UDP streaming service	Public	UDP protocol supported by L4 LB

9. Best Practices

Use the /health or /healthcheck endpoint in applications for HTTP/HTTPS probes: a TCP probe only checks if the port is open. An application with port 443 open but returning error 500 on all requests will remain in the pool. An HTTP probe against /health can verify database connection, queues, and other internal dependencies, removing the VM from the pool if it's not actually functional.

Configure Availability Zones for production: use VMs distributed across zones and a Load Balancer with zone-redundant frontend. This ensures that a zone failure (hardware, power, network in a datacenter) doesn't bring down the service.

Avoid NAT rules for administrative access in production: use Azure Bastion instead of NAT rules for RDP/SSH. NAT rules expose ports to the internet and require manual mapping management. Bastion is more secure and centralized.

Configure Outbound Rules explicitly for SNAT control: instead of relying on automatic SNAT (which can exhaust ports at high scale), create explicit Outbound Rules with port numbers calculated for the expected volume of outbound connections.

Create a separate probe for each type of verification: if you have both HTTP on port 80 and HTTPS on port 443, create separate probes and link each rule to the appropriate probe. Shared probes between rules for different ports can give false positives.

10. Common Errors

NSG blocking Load Balancer probe

VMs are added to the backend pool, but the Load Balancer marks them as unhealthy. The administrator verifies that the service on the VMs is working (tests directly via private IP). The problem is that the subnet NSG blocks traffic from AzureLoadBalancer. The probes never arrive, the LB considers the VMs offline, and no traffic is distributed. Adding the NSG rule for AzureLoadBalancer resolves immediately.

Using Basic Load Balancer SKU with Standard IP (or vice versa)

The Load Balancer SKU and associated public IP must be the same. Trying to create a Basic LB with Standard IP generates an error. Migration from Basic to Standard requires recreating the Load Balancer.

Creating probe on application port but forgetting to open that port in NSGs

The probe is configured for HTTPS 443, but the NSG only allows 443 traffic from the internet, not from the probe address (AzureLoadBalancer). VMs become unhealthy for the same reason as the previous error.

Not configuring Outbound Rules and exhausting SNAT ports

With many VMs making many simultaneous outbound connections to the internet, the automatic SNAT ports are exhausted. Outbound connections start failing with timeout. The solution is to create explicit Outbound Rules or use a NAT Gateway (recommended for high-scale outbound scenarios).

Using Session Persistence unnecessarily

For stateless applications (that don't depend on session on the same VM), enabling Session Persistence reduces balancing efficiency: if a client makes many requests, they all go to the same VM while others remain idle. Use Session Persistence only when the application actually requires it.

11. Operations and Maintenance

Monitor Load Balancer Health

# Check VM status in backend pool
az network lb address-pool show \
  --lb-name lb-web-public \
  --name bp-vms-web \
  --resource-group rg-networking \
  --query "loadBalancerBackendAddresses"

Available metrics in Azure Monitor for Load Balancer:

Metric	What it measures
Data Path Availability	Data path availability (probe success rate)
Health Probe Status	Percentage of healthy VMs in backend pool
Byte Count	Bytes processed by LB
Packet Count	Packets processed
SYN Count	SYN packets received
SNAT Connection Count	Active and failed SNAT connections

SNAT Connection Count is especially important: when Failed SNAT Connections starts increasing, it indicates SNAT port exhaustion and the need for more ports via Outbound Rules or NAT Gateway.

Check Effective Load Balancer Rules on a NIC

az network nic list-effective-nsg \
  --name nic-vm-web-01 \
  --resource-group rg-producao

Important Limits

Item	Basic Limit	Standard Limit
VMs per backend pool	300	1,000
Frontend IPs per LB	200	600
Load Balancing Rules	250	1,500
Inbound NAT Rules	350 (per VM Scale Set)	1,500
Health Probes	25	600
Default SNAT ports per VM (without Outbound Rule)	Automatic	1,024 (configurable)

12. Integration and Automation

Load Balancer with VM Scale Sets

The most important Load Balancer integration is with VM Scale Sets: when the VMSS scales (adds VMs), they are automatically added to the Load Balancer backend pool, and when it scales down, they are removed.

# Create VMSS already integrated with Load Balancer
az vmss create \
  --name vmss-web \
  --resource-group rg-producao \
  --image Ubuntu2204 \
  --vm-sku Standard_D2s_v3 \
  --instance-count 3 \
  --vnet-name vnet-producao \
  --subnet subnet-web \
  --lb lb-web-public \
  --backend-pool-name bp-vms-web \
  --upgrade-policy-mode automatic

Integration with Azure Monitor and Autoscale

Combine Load Balancer with VMSS Autoscale based on traffic metrics:

100%

Scroll para zoom · Arraste para mover · 📱 Pinch para zoom no celular

13. Final Summary

Essential points:

Azure Load Balancer operates at Layer 4 (TCP/UDP). For HTTP-aware balancing, use Application Gateway.
A Load Balancer consists of: Frontend IP, Backend Pool, Health Probe, Load Balancing Rules and optionally Inbound NAT Rules and Outbound Rules.
Public LB: frontend with public IP, receives internet traffic. Internal LB: frontend with private IP, receives traffic from within the VNet.
Standard SKU is the only one recommended for new projects. Basic is being deprecated in September 2025.

Critical differences:

Load Balancing Rule vs. Inbound NAT Rule: the first distributes traffic among all VMs in the pool; the second maps a frontend port to a specific backend VM.
Session Persistence None (5-tuple hash): each connection is routed independently. Client IP (2-tuple hash): all connections from the same client go to the same VM.
TCP Probe vs. HTTP/HTTPS Probe: TCP checks if the port is open; HTTP/HTTPS checks if the application responds with code 200, enabling more precise health checks.
NSG must allow AzureLoadBalancer on probe ports. Without this, all VMs become unhealthy even when functioning.

What needs to be remembered:

The probe originates from 168.63.129.16 (AzureLoadBalancer tag). NSGs that block this IP cause the most frequent error in LB configurations.
VMs in the Standard Load Balancer backend pool don't need their own public IP. The LB manages access and outbound SNAT.
Standard LB with zone-redundant frontend distributes traffic across zones automatically; cross-zone high availability requires VMs in multiple zones.
SNAT port exhaustion is a problem in high-scale environments without configured Outbound Rules. Monitor the SNAT Connection Count metric.
For administrative access to individual VMs, use Azure Bastion or NAT Rules; never expose RDP/SSH ports directly with public IPs on VMs.

1. Initial Intuition​

2. Context​

3. Building the Concepts​

3.1 The Five Components of a Load Balancer​

3.2 Public vs. Internal Load Balancer​

3.3 Outbound Rules: Internet Exit​

4. Structural View​

5. How It Works in Practice​

How the Distribution Algorithm Works​

Health Probe: Critical Behaviors​

Floating IP (Direct Server Return)​

6. Implementation Methods​

6.1 Azure Portal​

6.2 Azure CLI​

6.3 PowerShell​

6.4 Bicep​

7. Control and Security​

NSG and Load Balancer​

Standard Load Balancer and Availability Zones​

8. Decision Making​

Load Balancer vs. Application Gateway: which to choose?​

Public vs. internal Load Balancer?​

9. Best Practices​

10. Common Errors​

11. Operations and Maintenance​

Monitor Load Balancer Health​

Check Effective Load Balancer Rules on a NIC​

Important Limits​

12. Integration and Automation​

Load Balancer with VM Scale Sets​

Integration with Azure Monitor and Autoscale​

13. Final Summary​

1. Initial Intuition

2. Context

3. Building the Concepts

3.1 The Five Components of a Load Balancer

3.2 Public vs. Internal Load Balancer

3.3 Outbound Rules: Internet Exit

4. Structural View

5. How It Works in Practice

How the Distribution Algorithm Works

Health Probe: Critical Behaviors

Floating IP (Direct Server Return)

6. Implementation Methods

6.1 Azure Portal

6.2 Azure CLI

6.3 PowerShell

6.4 Bicep

7. Control and Security

NSG and Load Balancer

Standard Load Balancer and Availability Zones

8. Decision Making

Load Balancer vs. Application Gateway: which to choose?

Public vs. internal Load Balancer?

9. Best Practices

10. Common Errors

11. Operations and Maintenance

Monitor Load Balancer Health

Check Effective Load Balancer Rules on a NIC

Important Limits

12. Integration and Automation

Load Balancer with VM Scale Sets

Integration with Azure Monitor and Autoscale

13. Final Summary