July 29 2025

Request Cache vs Shared Cache Without Mixing Layers

When the team calls any kind of data reuse a cache, it starts mixing local deduplication, request memoization, and shared cache as if they were the same thing.

Andrews Ribeiro

Founder & Engineer

3 min Intermediate Systems

#architecture-patterns#backend#cache#architecture#performance#consistency

The problem

Cache becomes an elastic word too quickly.

The team sees the same data being fetched more than once and concludes:

“let’s add cache”

But it almost never stops to ask:

cache where?
for how long?
visible to whom?
invalidated by whom?

Without that, two very different things get mixed together:

reusing data inside the same request
reusing data across different requests

Mental model

Request cache usually means:

local memoization
deduplication inside one execution
avoiding the same fetch two or three times in the same flow

Shared cache usually means:

Redis
memory shared across instances
value reused by multiple requests and multiple processes

Both can be useful.

But they do not solve the same problem.

When request cache makes more sense

Request cache helps when:

the same request touches the same data more than once
multiple parts of the flow ask for the same entity
you want to reduce repeated work without creating external state

Simple example:

one request loads the current user in middleware, then the use case asks for the same user, then one permission check asks again.

That does not require Redis.

It only requires one local reuse layer during execution.

The gain here is usually:

simplicity
zero distributed invalidation
less internal repetition

When shared cache enters the picture

Shared cache starts making sense when:

multiple requests repeat one expensive read
the source cost is too high
the data tolerates controlled staleness
several instances need to reuse the same value

Examples:

configuration read all the time
low-churn catalog
expensive dashboard projection
aggregated response that does not need perfect real-time accuracy

Here another type of question appears:

how old can this data get?
what triggers invalidation?
what happens if the cache disappears?

The common mistake

The common mistake is using shared cache to solve a local problem.

Example:

the same request makes three identical queries
instead of fixing the flow or using request cache, the team inserts Redis in the middle

That creates:

serialization
network hops
keys
invalidation
extra observability

to solve repetition that lived inside the same process.

The opposite mistake also exists.

Some systems try to reuse global data with one local process variable, as if that were enough in an environment with several instances.

Simple example

Imagine one orders listing API.

Inside the same request:

you need to translate customerId into name in several places

Maybe request cache is already enough to avoid fetching the same customer five times.

Now imagine:

the home page asks for the same operational indicators every few seconds

Then a shared cache with a short TTL may make sense.

Those are different problems.

Treating both as just “cache” without context only hides the real decision.

How a senior thinks

Engineers who choose better usually ask:

does the reuse need to live only during this request?
can the value be shared across requests?
can the data get slightly old?
who invalidates it and with what semantics?
am I improving architecture or only hiding a bad query?

Those questions prevent automatic Redis-for-everything decisions.

Interview angle

This topic appears in performance, backend, and system design.

The interviewer usually wants to see whether you distinguish:

local deduplication
distributed cache
consistency
operational cost

A strong answer often sounds like this:

“If the problem is repetition inside the same request, I would start with local cache for that execution. I would leave shared cache for repeated reads across requests, when the origin is expensive and the invalidation semantics are clear.”

Direct takeaway

Good cache is not the one that appears earlier.

It is the one that solves the right repetition in the right layer.

Quick summary

What to keep in your head

Request cache solves repetition inside one execution. Shared cache solves reuse across executions.
These two layers carry different cost, invalidation, and risk.
Using global cache for a local problem usually adds complexity too early.
The first useful question is not 'should we cache?' but 'who needs to reuse this data and for how long?'.

Practice checklist

Use this when you answer

Do I know whether my problem is repetition inside one request or between different requests?
Can I say who invalidates this cache and with what semantics?
Am I using shared cache to hide a bad query or avoidable N+1?
Can I explain the cache layer without calling everything generic optimization?

You finished this article

Next step

Cache and Consistency in Real Systems Next step →

You finished this article

Next step

Cache and Consistency in Real Systems Next step →

Next article Sharing Context Between Requests and Jobs Without Global Hacks Previous article Batch vs Streaming: When Each Processing Shape Makes Sense

Share this page

Request Cache vs Shared Cache Without Mixing Layers

The problem

Mental model

When request cache makes more sense

When shared cache enters the picture

The common mistake

Simple example

How a senior thinks

Interview angle

Direct takeaway

What to keep in your head

Use this when you answer

Keep exploring

Articles

Architecture & Patterns

Related articles

Rate Limiting: When, How, and Why

Admission Control in the Backend: When Rejecting Early Is Better Than Failing Late

Anti-Corruption Between Internal Domains Without Becoming an Ornamental Layer

Related articles

Rate Limiting: When, How, and Why

Admission Control in the Backend: When Rejecting Early Is Better Than Failing Late

Anti-Corruption Between Internal Domains Without Becoming an Ornamental Layer