June 24 2025
Rate Limiting: When, How, and Why
How to think about rate limiting as shared capacity protection, which strategies exist, and what actually matters in practice.
Andrews Ribeiro
Founder & Engineer
5 min Intermediate Systems
Track
System Design Interviews - From Basics to Advanced
Step 10 / 19
The problem
Rate limiting often shows up in conversations as a quick detail.
Someone says:
- “put a limiter at the edge”
and it sounds like the topic is done.
But the important part is not naming the limiter.
It is explaining:
- what it is protecting
- who the limit applies to
- what behavior it creates when traffic gets tight
Without that, the system usually falls into two common failures:
- one client consumes too much capacity and makes life worse for everyone else
- the system degrades chaotically instead of predictably
Mental model
Think of it this way:
rate limiting is a capacity contract.
In plain English, you are saying:
- above a certain pace, this client will have to wait, fail, or slow down
That can serve different goals:
- protect a public API
- reduce abuse
- distribute a shared resource
- absorb bursts
- limit expensive actions like login, SMS sending, or report generation
So the useful question is not:
- “do we need rate limiting?”
It is this one:
which capacity am I protecting, for whom, and what should happen when the limit is hit?
Breaking the problem down
Where rate limiting usually lives
The most common place is near the system entry point:
- API gateway
- load balancer with rules
- application HTTP layer
The earlier you block, the less useless work you spend.
But that does not mean every limit belongs only at the edge.
Some limits make more sense closer to the rule itself:
- per user
- per specific action
- per expensive resource
- per external integration
Example:
- limiting requests by API key at the edge makes sense
- limiting “at most 3 SMS messages per hour for the same user” is more product logic
The algorithm changes behavior
You do not need to memorize formulas.
You need to understand how each option behaves.
Fixed window:
- simple
- easy to explain
- but creates odd behavior at the window boundary
The client can send a lot at the end of one minute and a lot again at the beginning of the next.
Sliding window:
- smooths that edge
- tends to be fairer
- but is usually a bit more expensive to maintain
Token bucket:
- fills a bucket with tokens over time
- each request spends one token
- allows controlled bursts
In interviews, token bucket is often a strong answer because it balances clarity with real behavior.
Distributed rate limiting usually needs shared state
If you have multiple instances and each one counts locally, the client can dodge the limit by landing on different instances.
That is why, in distributed systems, the counter usually lives in shared state.
Redis shows up here often because:
- it is fast
- it handles counters and expiration well
- it supports useful atomic operations for this case
It is not mandatory in every scenario.
But it is a common design that is easy to defend.
The limit key changes the effect
You can rate limit by:
- IP
- user
- API key
- tenant
- endpoint
- action
The key choice changes who pays the price.
If you limit by IP only, you may punish many users behind the same NAT.
If you limit by user only, anonymous abuse becomes harder to control.
Good answers usually show that the key is part of the design, not a default afterthought.
What happens when the limit is exceeded
The limit is not finished when you block the request.
You still need to define system behavior:
- return
429 Too Many Requests - include retry hints
- slow down instead of hard blocking
- queue some requests
- prioritize paid or internal traffic
This matters because the behavior becomes part of the product experience.
Simple example
A good interview answer could sound like this:
“I would treat rate limiting as protection for shared capacity. First I would define what I am protecting and who the limit applies to. At the edge, I would likely use rate limiting by API key or user to avoid wasting work early. For distributed counting, I would use shared state, often Redis, because local counters break across multiple instances. For the algorithm, token bucket is a good default when I want controlled bursts without unlimited spikes. And I would be explicit about the response, usually
429with retry guidance, so the client sees a predictable contract instead of random failure.”
That works because it:
- explains the goal
- picks a place for the limiter
- shows awareness of distribution
- treats limit behavior as part of the system
Common mistakes
- Treating rate limiting as a generic abuse checkbox.
- Naming an algorithm without explaining its behavior.
- Counting locally in a distributed system and assuming it still works.
- Ignoring what the client sees when the limit is hit.
- Mixing product limits and infrastructure limits without saying so.
How a senior thinks about it
People with real production experience usually simplify the conversation into two questions:
Which capacity am I protecting?
What should this feel like for the client when traffic is too high?
That framing clears a lot of noise.
Instead of sounding theoretical, the answer starts sounding operational.
What the interviewer wants to see
In this scenario, the interviewer wants to see whether you:
- explain what is being protected
- understand why the algorithm changes behavior
- recognize the distributed counting problem
- make the client-visible behavior explicit
- keep the answer grounded in trade-offs instead of buzzwords
Good rate limiting is not just about stopping traffic. It is about turning overload into something predictable.
Quick summary
What to keep in your head
- Rate limiting protects shared capacity and helps keep fairness across clients.
- The algorithm matters because it changes how the limit behaves around bursts, windows, and multiple instances.
- In distributed systems, counting locally on each instance is almost never enough.
- The client response is part of the design too: a good limit should not feel like random failure.
Practice checklist
Use this when you answer
- Can I explain fixed window, sliding window, and token bucket without hiding behind formulas?
- Do I know where a rate limiter usually lives and when it also belongs near business rules?
- Can I explain why Redis shows up so often in this problem?
- Do I know what to return to the client when the limit is exceeded?
You finished this article
Part of the track: System Design Interviews - From Basics to Advanced (10/19)
Share this page
Copy the link manually from the field below.