June 4 2025

Load Balancing Without a Black Box

How to decide where each request goes without treating the load balancer like a magic box in the diagram.

Andrews Ribeiro

Founder & Engineer

4 min Intermediate Systems

#system-design#systems#load-balancing#scaling#infrastructure#architecture

Track

System Design Interviews - From Basics to Advanced

Step 11 / 19

Back to track Previous article Next article

The problem

A load balancer appears in almost every system design diagram.

The problem is that it usually shows up as an automatic box:

“put a load balancer here”

But that still explains almost nothing.

It leaves basic questions unanswered:

what is being distributed
by which rule
what happens to session state
what happens when one instance becomes unhealthy

“Distribute evenly” sounds good, but many times that is not even the right problem.

Mental model

A load balancer is the component that decides which instance receives each request or connection.

The most useful way to think about it is not by algorithm name first.

It is by four questions:

am I distributing short requests or long-lived connections?
is important state trapped in the instance?
how do I detect that an instance became unhealthy?
which routing rule fits this traffic shape?

If you answer that, the load-balancing choice becomes much less mysterious.

Because underneath, the decision affects:

latency
load distribution
session state
fault tolerance

Breaking it down

First: short requests or long-lived connections?

That difference changes a lot of what comes next.

If the traffic is made of short requests, like a normal HTTP API, distributing per request is often enough.

If the traffic is made of long-lived connections, like WebSocket, the conversation changes.

One instance may hold many more open connections for much longer than another.

So the first senior question is usually:

am I distributing requests or connections?

Then: is state trapped in the instance?

This point often separates a good answer from a memorized one.

If session, cart, or user context lives in local instance memory, you will drift toward affinity.

That is where sticky session enters.

It can work as a quick relief.

But it also often comes with a cost:

hotspots
worse failover
less predictable horizontal scaling

In many cases, the healthier answer is:

move critical state out of the instance
let balancing stay freer

Only now does it make sense to talk about algorithms

The common names are:

round robin
least connections
hash or some form of affinity by key

Their value is not in the name. It is in the behavior.

Round robin works well when requests are short and similar.

Least connections is often better when connections can stay open for a long time.

Some form of hash helps when you truly need affinity by user, session, or resource.

The point is:

a good algorithm is the one that matches the traffic shape and the application state model

Health checks are part of the design

Without health checks, the balancer keeps sending traffic to nodes that are already unhealthy.

But a bad health check also causes problems:

if it is too shallow, it approves a broken node
if it is too aggressive, it removes healthy nodes during a spike

That matters because the load balancer can spread a problem instead of containing it.

Layer 4 vs Layer 7 only matters when it changes the decision

This topic often gets treated like a networking exam, but it can stay simple.

L4 routes earlier, with less application context
L7 routes with more application context

If you need to route by host, path, header, or an application rule, L7 matters more.

If the problem is more direct and the focus is simple, fast connection handling, L4 may be enough.

You do not need to turn this into a long lecture in every interview.

Simple example

Imagine a chat system that uses WebSocket.

If you use pure round robin, it may look fine on paper.

But some instances can end up full of long-lived connections while others do not.

A stronger answer could sound like this:

Because I have long-lived connections, I would first think about connection distribution, not only request distribution. Least connections probably fits better than round robin. If I am still trapping state in the instance, sticky session may show up as a short-term relief, but I would treat it as temporary. I also need health checks so the system stops sending traffic to unhealthy nodes.

That already shows judgment without turning the topic into a networking module.

Common mistakes

Putting a load balancer in the diagram and stopping there.
Choosing an algorithm before explaining the traffic shape.
Using sticky session without naming the cost.
Ignoring health checks.
Treating long-lived connections and short requests as if they were the same.

How a senior thinks

People with more experience often move in an order like this:

What am I distributing? Is state trapped? How do I detect unhealthy nodes? Which rule fits this traffic?

Maturity here also means understanding that a load balancer does not fix a badly designed application.

If the instance holds too much state, if one machine behaves much worse than another, or if the health check lies, the balancer just distributes an unresolved problem.

What the interviewer wants to see

In interviews, the interviewer usually wants to see whether you:

understand the traffic shape
know how algorithm choice maps to behavior
consider session and state
remember failure behavior

Load balancing is not a pretty box in a diagram. It is a decision about how to split traffic without creating a new bottleneck on the way.

Quick summary

What to keep in your head

Balancing traffic is not the same as distributing real load well.
Before you talk about an algorithm, you need to know whether you are distributing short requests or long-lived connections.
Sticky session can relieve a local problem, but it often comes with hotspots and worse failover.
Health checks are part of the design, not a detail to remember at the end.

Practice checklist

Use this when you answer

Can I explain the traffic shape before I choose an algorithm?
Do I know when sticky session helps and when it only hides badly placed state?
Can I distinguish layer 4 from layer 7 without turning it into a networking lecture?
Can I say how the system stops sending traffic to an unhealthy instance?

You finished this article

Part of the track: System Design Interviews - From Basics to Advanced (11/19)

Next step

Strong vs Eventual Consistency Next step →