June 4 2025
Load Balancing Without a Black Box
How to decide where each request goes without treating the load balancer like a magic box in the diagram.
Andrews Ribeiro
Founder & Engineer
4 min Intermediate Systems
Track
System Design Interviews - From Basics to Advanced
Step 11 / 19
The problem
A load balancer appears in almost every system design diagram.
The problem is that it usually shows up as an automatic box:
“put a load balancer here”
But that still explains almost nothing.
It leaves basic questions unanswered:
- what is being distributed
- by which rule
- what happens to session state
- what happens when one instance becomes unhealthy
“Distribute evenly” sounds good, but many times that is not even the right problem.
Mental model
A load balancer is the component that decides which instance receives each request or connection.
The most useful way to think about it is not by algorithm name first.
It is by four questions:
- am I distributing short requests or long-lived connections?
- is important state trapped in the instance?
- how do I detect that an instance became unhealthy?
- which routing rule fits this traffic shape?
If you answer that, the load-balancing choice becomes much less mysterious.
Because underneath, the decision affects:
- latency
- load distribution
- session state
- fault tolerance
Breaking it down
First: short requests or long-lived connections?
That difference changes a lot of what comes next.
If the traffic is made of short requests, like a normal HTTP API, distributing per request is often enough.
If the traffic is made of long-lived connections, like WebSocket, the conversation changes.
One instance may hold many more open connections for much longer than another.
So the first senior question is usually:
am I distributing requests or connections?
Then: is state trapped in the instance?
This point often separates a good answer from a memorized one.
If session, cart, or user context lives in local instance memory, you will drift toward affinity.
That is where sticky session enters.
It can work as a quick relief.
But it also often comes with a cost:
- hotspots
- worse failover
- less predictable horizontal scaling
In many cases, the healthier answer is:
- move critical state out of the instance
- let balancing stay freer
Only now does it make sense to talk about algorithms
The common names are:
round robinleast connectionshashor some form of affinity by key
Their value is not in the name. It is in the behavior.
Round robin works well when requests are short and similar.
Least connections is often better when connections can stay open for a long time.
Some form of hash helps when you truly need affinity by user, session, or resource.
The point is:
a good algorithm is the one that matches the traffic shape and the application state model
Health checks are part of the design
Without health checks, the balancer keeps sending traffic to nodes that are already unhealthy.
But a bad health check also causes problems:
- if it is too shallow, it approves a broken node
- if it is too aggressive, it removes healthy nodes during a spike
That matters because the load balancer can spread a problem instead of containing it.
Layer 4 vs Layer 7 only matters when it changes the decision
This topic often gets treated like a networking exam, but it can stay simple.
L4routes earlier, with less application contextL7routes with more application context
If you need to route by host, path, header, or an application rule, L7 matters more.
If the problem is more direct and the focus is simple, fast connection handling, L4 may be enough.
You do not need to turn this into a long lecture in every interview.
Simple example
Imagine a chat system that uses WebSocket.
If you use pure round robin, it may look fine on paper.
But some instances can end up full of long-lived connections while others do not.
A stronger answer could sound like this:
Because I have long-lived connections, I would first think about connection distribution, not only request distribution.
Least connectionsprobably fits better thanround robin. If I am still trapping state in the instance,sticky sessionmay show up as a short-term relief, but I would treat it as temporary. I also need health checks so the system stops sending traffic to unhealthy nodes.
That already shows judgment without turning the topic into a networking module.
Common mistakes
- Putting a load balancer in the diagram and stopping there.
- Choosing an algorithm before explaining the traffic shape.
- Using sticky session without naming the cost.
- Ignoring health checks.
- Treating long-lived connections and short requests as if they were the same.
How a senior thinks
People with more experience often move in an order like this:
What am I distributing? Is state trapped? How do I detect unhealthy nodes? Which rule fits this traffic?
Maturity here also means understanding that a load balancer does not fix a badly designed application.
If the instance holds too much state, if one machine behaves much worse than another, or if the health check lies, the balancer just distributes an unresolved problem.
What the interviewer wants to see
In interviews, the interviewer usually wants to see whether you:
- understand the traffic shape
- know how algorithm choice maps to behavior
- consider session and state
- remember failure behavior
Load balancing is not a pretty box in a diagram. It is a decision about how to split traffic without creating a new bottleneck on the way.
Quick summary
What to keep in your head
- Balancing traffic is not the same as distributing real load well.
- Before you talk about an algorithm, you need to know whether you are distributing short requests or long-lived connections.
- Sticky session can relieve a local problem, but it often comes with hotspots and worse failover.
- Health checks are part of the design, not a detail to remember at the end.
Practice checklist
Use this when you answer
- Can I explain the traffic shape before I choose an algorithm?
- Do I know when sticky session helps and when it only hides badly placed state?
- Can I distinguish layer 4 from layer 7 without turning it into a networking lecture?
- Can I say how the system stops sending traffic to an unhealthy instance?
You finished this article
Part of the track: System Design Interviews - From Basics to Advanced (11/19)
Share this page
Copy the link manually from the field below.