June 2 2025

CAP Theorem in Practice

The useful part of CAP starts when parts of the system stop seeing each other and you need to decide whether to wait or keep responding.

Andrews Ribeiro

Founder & Engineer

4 min Intermediate Systems

#system-design#systems#distributed-systems#cap-theorem#consistency#availability

The problem

CAP theorem becomes a slogan very quickly.

Lots of people cite it. Very few explain it.

The most famous line is:

“You can only have two out of three.”

The problem is that this barely helps.

If you heard that sentence and still did not get it, the problem is not you.

It jumps straight to the slogan before showing the real situation the theorem is trying to explain.

Mental model

CAP is about what happens when a distributed system suffers a network failure.

Think about two parts of the system that are still up, but stop talking to each other properly.

Simple examples:

two regions stop seeing each other
a primary and a replica lose communication
two nodes stay alive, but cannot confirm state with each other

When that happens, you lose the comfort of assuming both sides will answer with the same truth.

That is where CAP starts to matter.

What you actually need to choose

The three terms are these:

C for consistency: I only answer when I can preserve a coherent view of the data
A for availability: I keep answering even during the failure
P for partition tolerance: the system keeps existing even with network failure between parts of it

The part that confuses people most is this:

In distributed systems, P is not a luxury. The network can fail whether you like it or not.

So the practical choice is usually not “which two do I want on a good day?”

The practical choice is usually:

do I hold the response to protect consistency?
or do I keep responding while accepting stale data or temporarily divergent behavior?

Breaking the problem down

When it makes sense to wait

Some flows do not tolerate lies.

Examples:

balance
credit limit
leader coordination
distributed locking

If the network broke and you cannot confirm the state properly, it may be better to stop, wait, or return a temporary error.

The pain here is temporary unavailability.

But that trade-off may be worth it because wrong data is too expensive.

When it makes sense to keep responding

Other flows tolerate delay better than silence.

Examples:

feed
like counters
recommendations
non-critical ranking

If one part of the system keeps answering with slightly stale data for a few seconds, that may be acceptable.

The pain here is temporary inconsistency.

But that trade-off may be worth it because the product does not disappear in front of the user.

Only then does `CP` and `AP` help

If someone uses the labels CP and AP, they get much easier now:

CP: under failure, the system leans more toward consistency
AP: under failure, the system leans more toward availability

But those are already labels. The important part comes first.

Concrete example

Imagine a social network.

A user publishes a post. Some followers in another region still do not see it for a few seconds.

That is usually acceptable. The system keeps responding and the state converges later.

Now imagine a bank balance or coordination for a critical job.

If two sides of the system cannot talk to each other, it may be better to hold the response than to risk conflicting states.

That is the real point:

in a feed, a small delay is often acceptable
in a balance or critical coordination flow, delay may be less dangerous than a wrong answer

CAP does not choose for you.

The business decides which mistake is acceptable during the failure.

What CAP does not explain

CAP is not for explaining every stale-data case.

Sometimes the real problem is:

badly invalidated cache
replica lag
reading from the wrong region
a badly designed async pipeline

So:

not every stale-data problem is CAP
not every consistency problem needs this theorem

CAP is a lens for system behavior under network failure in a distributed system.

Common mistakes

Repeating “two out of three” without even mentioning partition.
Talking as if P were optional.
Using CP and AP before explaining the concrete problem.
Assuming CAP explains every stale-data situation.
Treating CAP as architecture decoration instead of a failure-behavior decision.

How this shows up in interviews

In interviews, a strong answer often sounds like this:

“If the network fails here, would I rather block this flow to protect consistency, or keep responding with a chance of delay?”

That is much better than dropping “two out of three” and hoping it sounds deep.

The interviewer wants to see whether you:

understand partition in human language
connect consistency and availability to a real flow
do not use the term as decoration

Closing

CAP gets much easier when you change the question.

Instead of asking:

“Which two out of three do I choose?”

Ask:

“When the network breaks, would I rather wait for the right answer or keep responding with a risk of stale data?”

Quick summary

What to keep in your head

CAP only matters when there is a network failure between parts of the system.
In distributed systems, partition is not something you simply choose to ignore.
In practice, the useful choice is between waiting for a consistent answer and continuing to respond with a risk of stale data.
CAP does not explain every consistency problem; bad cache invalidation and replica lag may be a different conversation.

Practice checklist

Use this when you answer

Can I explain CAP without repeating 'two out of three' as empty trivia?
Can I explain what a network partition is in simple language?
Can I give one flow where I would rather wait and another where I would rather keep responding?
Can I separate CAP from other stale-data problems such as bad cache invalidation or replica lag?

You finished this article

Next step

Scalability and Bottlenecks Next step →

You finished this article

Next step

Scalability and Bottlenecks Next step →

Next article Strong vs Eventual Consistency Previous article Load Balancing Without a Black Box

CAP Theorem in Practice

The problem

Mental model

What you actually need to choose

Breaking the problem down

When it makes sense to wait

When it makes sense to keep responding

Only then does `CP` and `AP` help

Concrete example

What CAP does not explain

Common mistakes

How this shows up in interviews

Closing

What to keep in your head

Use this when you answer

Keep exploring

Articles

System Design

Related articles

Replication and Sharding Without Mystery

Strong vs Eventual Consistency

AI scenarios in production

Related articles

Scalability and Bottlenecks Next step →

Next article Strong vs Eventual Consistency

Previous article Load Balancing Without a Black Box

Replication and Sharding Without Mystery

The problem

Mental model

What you actually need to choose

Breaking the problem down

When it makes sense to wait

When it makes sense to keep responding

Only then does CP and AP help

Concrete example

What CAP does not explain

Common mistakes

How this shows up in interviews

Closing

What to keep in your head

Use this when you answer

Keep exploring

Articles

System Design

Related articles

Replication and Sharding Without Mystery

Strong vs Eventual Consistency

AI scenarios in production

Only then does `CP` and `AP` help