June 2 2025
CAP Theorem in Practice
The useful part of CAP starts when parts of the system stop seeing each other and you need to decide whether to wait or keep responding.
Andrews Ribeiro
Founder & Engineer
4 min Intermediate Systems
The problem
CAP theorem becomes a slogan very quickly.
Lots of people cite it. Very few explain it.
The most famous line is:
“You can only have two out of three.”
The problem is that this barely helps.
If you heard that sentence and still did not get it, the problem is not you.
It jumps straight to the slogan before showing the real situation the theorem is trying to explain.
Mental model
CAP is about what happens when a distributed system suffers a network failure.
Think about two parts of the system that are still up, but stop talking to each other properly.
Simple examples:
- two regions stop seeing each other
- a primary and a replica lose communication
- two nodes stay alive, but cannot confirm state with each other
When that happens, you lose the comfort of assuming both sides will answer with the same truth.
That is where CAP starts to matter.
What you actually need to choose
The three terms are these:
- C for consistency: I only answer when I can preserve a coherent view of the data
- A for availability: I keep answering even during the failure
- P for partition tolerance: the system keeps existing even with network failure between parts of it
The part that confuses people most is this:
In distributed systems,
Pis not a luxury. The network can fail whether you like it or not.
So the practical choice is usually not “which two do I want on a good day?”
The practical choice is usually:
- do I hold the response to protect consistency?
- or do I keep responding while accepting stale data or temporarily divergent behavior?
Breaking the problem down
When it makes sense to wait
Some flows do not tolerate lies.
Examples:
- balance
- credit limit
- leader coordination
- distributed locking
If the network broke and you cannot confirm the state properly, it may be better to stop, wait, or return a temporary error.
The pain here is temporary unavailability.
But that trade-off may be worth it because wrong data is too expensive.
When it makes sense to keep responding
Other flows tolerate delay better than silence.
Examples:
- feed
- like counters
- recommendations
- non-critical ranking
If one part of the system keeps answering with slightly stale data for a few seconds, that may be acceptable.
The pain here is temporary inconsistency.
But that trade-off may be worth it because the product does not disappear in front of the user.
Only then does CP and AP help
If someone uses the labels CP and AP, they get much easier now:
CP: under failure, the system leans more toward consistencyAP: under failure, the system leans more toward availability
But those are already labels. The important part comes first.
Concrete example
Imagine a social network.
A user publishes a post. Some followers in another region still do not see it for a few seconds.
That is usually acceptable. The system keeps responding and the state converges later.
Now imagine a bank balance or coordination for a critical job.
If two sides of the system cannot talk to each other, it may be better to hold the response than to risk conflicting states.
That is the real point:
- in a feed, a small delay is often acceptable
- in a balance or critical coordination flow, delay may be less dangerous than a wrong answer
CAP does not choose for you.
The business decides which mistake is acceptable during the failure.
What CAP does not explain
CAP is not for explaining every stale-data case.
Sometimes the real problem is:
- badly invalidated cache
- replica lag
- reading from the wrong region
- a badly designed async pipeline
So:
- not every stale-data problem is CAP
- not every consistency problem needs this theorem
CAP is a lens for system behavior under network failure in a distributed system.
Common mistakes
- Repeating “two out of three” without even mentioning partition.
- Talking as if
Pwere optional. - Using
CPandAPbefore explaining the concrete problem. - Assuming CAP explains every stale-data situation.
- Treating CAP as architecture decoration instead of a failure-behavior decision.
How this shows up in interviews
In interviews, a strong answer often sounds like this:
“If the network fails here, would I rather block this flow to protect consistency, or keep responding with a chance of delay?”
That is much better than dropping “two out of three” and hoping it sounds deep.
The interviewer wants to see whether you:
- understand partition in human language
- connect consistency and availability to a real flow
- do not use the term as decoration
Closing
CAP gets much easier when you change the question.
Instead of asking:
“Which two out of three do I choose?”
Ask:
“When the network breaks, would I rather wait for the right answer or keep responding with a risk of stale data?”
Quick summary
What to keep in your head
- CAP only matters when there is a network failure between parts of the system.
- In distributed systems, partition is not something you simply choose to ignore.
- In practice, the useful choice is between waiting for a consistent answer and continuing to respond with a risk of stale data.
- CAP does not explain every consistency problem; bad cache invalidation and replica lag may be a different conversation.
Practice checklist
Use this when you answer
- Can I explain CAP without repeating 'two out of three' as empty trivia?
- Can I explain what a network partition is in simple language?
- Can I give one flow where I would rather wait and another where I would rather keep responding?
- Can I separate CAP from other stale-data problems such as bad cache invalidation or replica lag?
You finished this article
Next step
Scalability and Bottlenecks Next step →Share this page
Copy the link manually from the field below.