Load Balancer: What Is It and How Does It Work?

What is a Load Balancer?

A load balancer is an appliance that could be physical or virtual and acts as a proxy to distribute network traffic across several servers. Load balancers are utilized to increase the capacity (concurrent users) and reliability of applications. Load balancers improve the performance of shared applications and desktops by distributing the network traffic on multiple servers, as well as by distributing application-specific tasks on each server individually.

Layer 4 vs Layer 7 Load Balancers

There are seven networking layers in the Open Systems Interconnection (OSI) model. Layer 4 load balancers (L4 LBs or TCP/UDP LBs) operate at the intermediate transport layer, whereas the Layer 7 load balancers (L7 LBs or HTTP LBs) operate at the highest level—the application layer.

Layer 4

Layer 4 load balancers do not have access to the content of the data packets and deliver requests based on the source and destination IP addresses specified in the first few packets of the TCP stream. They simply act as an intermediary for forwarding the packets between the source and the destination. Their input and output are mostly the same, which means that they do not manipulate the packet content.

Layer 7

Operating at the application layer, Layer 7 load balancers have access to the actual content of the data packets. They terminate the TCP connection from the original source and re-establish the connection with a backend server selected based on the packet content. For instance, L7 LBs have the ability to identify video streaming requests and forward them to specific servers that have high GPU.

Unlike L4 LBs, L7 LBs can make intelligent load balancing or traffic distribution decisions. They can also be more expensive than L4 LBs in terms of computing power and latency. However, that’s rarely an issue with modern, powerful servers.

Hardware vs Software-Based Load Balancers

There are two primary flavors of load balancers: Hardware Load Balancers (HLBs) and Software Load Balancers (SLBs).

Hardware

Hardware Load Balancers—also called Hardware Load Balancing Devices (HLD’s)—are rack-and-stack hardware appliances. These appliances are built from customized Application-Specific Integrated Circuits (ASICS) and Field-Programmable Gate Arrays (FPGA) to help distribute traffic across multiple servers on a network.

HLB’s depend entirely on specialized software (firmware) to deliver web traffic across a cluster of servers. One distinct advantage of HLB’s over SLB’s is that they have a minimum effect on CPU performance. Software Load Balancers (SLB’s), on the other hand, are programs that you can install on any standard x86/64-bit or Virtual Machines (VMs) to help distribute traffic across many servers. Since SLB’s run alongside other programs, they may have a significant impact on CPU performance.

Organizations typically provision HLB’s in pairs to manage occasional peak traffic workloads. This ensures high availability when the other HLB fails. While you can achieve high availability through pairing HLB’s, it may also result in redundancies since some balancers may sit idle during off-peak times.

Software

SLB’s, in contrast, can scale elastically to meet the growing demands. Since SLB’s are software-defined, you can simply auto-scale them in real-time during peak times to achieve high availability. As such, an organization can eliminate over-provisioning costs that are prevalent with HLB’s.

Since virtually all HLB’s come with over-provisioning requirements, an organization must have specialized personnel to configure and maintain the devices. On the other hand, SLB’s are simple to configure and manage because they are software-defined.

Load Balancer Algorithms

There are various techniques and algorithms that can be used to load balance client access requests across server pools.

Round-robin

Round-robin network load balancing rotates connection requests between servers in the order that requests are received. For example, if there are three servers: Server A, Server B, and Server C:

The first request goes to Server A
The second request goes to Server B
The third request goes to Server C

The round-robin load balancer continues passing requests to servers based on this order to ensure that the server load is distributed evenly to manage high traffic.

Weighted round-robin

The weighted round-robin load-balancing algorithm allows assigning weights to each server based on criteria like hardware limitations or traffic-handling capacity. The higher the weights, the higher the proportion of requests they receive. For example, if there are three servers: Server A, Server B, and Server C:

Server A accepts 20 requests per second
Server B accepts 10 requests per second
Server C accepts 15 requests per second

If the weighted round-robin load balancer receives 6 requests, it directs them in this sequence:

A – B – C – A – B – A, and the result would be:

3 requests go to Server A
1 request go to Server B
2 requests go to Server C

In this manner, the weighted round-robin algorithm distributes the load according to each server’s capacity.

Least Connections

Least Connections load balancers send requests to servers with the less active connections, which minimizes chances of server overload- provided that the servers are of equal specifications. The idea is to maintain an equal number of connections to each server but can severely fail if the servers can’t accept the same number of requests.

Least Response Time

Least Response Time load balancers distribute requests across multiple servers. When a load balancer utilizes the least response time method, it selects the server with the least number of active connections and the least average response time. The response time, also known as Time to First Byte (TTFB) is the time duration of sending a data request to a server and receiving the first data response.

Smart Load Balancing with Parallels RAS

Parallels® Remote Application Server (RAS) supports round-robin or resource-based balancing. Parallels RAS resource-based load balancing dynamically distributes the traffic between servers based on counters, such as the number of existing user sessions, memory, and CPU utilization. In addition, it reconnects disconnected servers by default, so users don’t lose any work or data.

Moreover, High Availability Load Balancing (HALB), is included with Parallels RAS at no extra cost. It eliminates the constraints of multi-gateway environments by distributing incoming connections based on workload and dynamically directing traffic among healthy gateways. HALB also enables administrators to run many HALB appliances simultaneously, reducing the possibility of downtime and ensuring that applications are always available.

Download your 30-day Parallels RAS trial and experience intuitive load balancing in your infrastructure in a matter of minutes!