Load Balancing Is Not Just Distributing Traffic. Here Is What It Really Does.
by Arif Ikhsanudin, Backend Developer
What Engineers Think Load Balancers Do
Ask most engineers what a load balancer does and you get: "It distributes traffic across multiple servers." That is accurate the way "a database stores data" is accurate — technically correct, operationally incomplete.
The mental model of a load balancer as a simple traffic splitter causes real problems. Teams configure round-robin across three instances and consider the problem solved. Then they get surprised when a backend instance dies and requests continue hitting it for 30 seconds. Or when sticky sessions cause one instance to handle 70% of the traffic. Or when TLS configuration at the load balancer does not match their security requirements. These are not edge cases — they are the operational reality of running a load balancer in production.
What Load Balancers Actually Do
Health checking and failure removal. A load balancer continuously checks backend health and removes failing instances from the pool. The critical configuration is the health check parameters: interval (how often to check), threshold (how many failures before removal), and timeout (how long to wait for a response). A health check with a 30-second interval and a threshold of three failures means a dead backend handles traffic for up to 90 seconds before removal. For a system handling 500 req/s, that is 45,000 failed requests during that window.
# ALB health check configuration
# Default settings fail slowly:
HealthCheckIntervalSeconds: 30 # checks every 30s
HealthyThresholdCount: 3 # needs 3 successful checks
UnhealthyThresholdCount: 3 # needs 3 failed checks to remove
# Worst case: 90 seconds of traffic to a dead backend
# Aggressive settings for faster failover:
HealthCheckIntervalSeconds: 10
UnhealthyThresholdCount: 2
# Worst case: 20 seconds -- much more acceptable
TLS termination. The load balancer handles the TLS handshake with the client and forwards decrypted traffic to backends over the internal network. This offloads CPU-intensive cryptographic operations from application servers. It also centralizes certificate management — you renew the certificate in one place rather than on every instance. The tradeoff: traffic between the load balancer and backends is unencrypted unless you configure end-to-end TLS (mutual TLS between load balancer and backends), which adds complexity.
Connection pooling and HTTP/2 multiplexing. Modern load balancers like nginx and AWS ALB maintain persistent connection pools to backends. A client makes a request; the load balancer may use an existing connection to the backend rather than opening a new one. For HTTP/2, the load balancer multiplexes multiple client streams onto fewer backend connections. This matters significantly for high-concurrency workloads where connection setup overhead is non-trivial.
Session affinity (sticky sessions). The load balancer can route all requests from the same client to the same backend instance, based on a cookie. This is required when backends hold session state locally. The problem: it creates uneven load distribution. A client that makes 10x more requests than average disproportionately loads one backend. It also complicates failover — if the pinned backend dies, that client's session is lost. The better solution is stateless backends with centralized session storage (Redis), making sticky sessions unnecessary.
Balancing Algorithms That Matter
Round robin: requests cycle through backends in order. Simple, even distribution when all requests have similar cost. Fails when backends have different capacities or when request cost varies significantly.
Least connections: new requests go to the backend with the fewest active connections. Better for variable-cost requests — a long-running query on one backend does not disproportionately load that backend because new short requests route elsewhere. AWS ALB's "least outstanding requests" algorithm is a variant of this.
Weighted: backends receive traffic proportional to assigned weights. Used during deployments — gradually shift traffic from old to new instances by adjusting weights rather than a hard cutover.
What This Changes About Your Design
If you design your system assuming the load balancer is a dumb traffic splitter, you will be surprised by health check lag, session distribution problems, and TLS configuration gaps. Design instead with the assumption that the load balancer is a configurable policy layer between clients and your application.
Configure health checks aggressively. Use least-connections for variable-cost workloads. Remove local session state from application servers. Configure appropriate connection draining (the time the load balancer gives in-flight requests to complete before removing an instance during a deployment). These are not advanced configurations — they are the baseline for a load balancer configuration that behaves correctly under real conditions.