September 6 2025

Workload Affinity Without Turning Scaling Into a Lottery

Not every workload should land on any worker at any time. Without some degree of affinity, the system wastes locality, heats up hotspots, and scales by luck.

Andrews Ribeiro

Founder & Engineer

3 min Intermediate Systems

#architecture-patterns#backend#workload#affinity#scaling#architecture

The problem

Not all work is the same.

Some workloads benefit a lot when they stay close to:

one key
one partition
one warm cache
one already established connection
one already loaded context

If the system distributes everything randomly all the time, every scale event becomes a bet:

where will this work land?
will it reuse anything or reheat everything?
will it create a hotspot nobody predicted?

Scaling turns into a lottery.

Mental model

Workload affinity means keeping some stable proximity between the work and one useful resource.

That affinity can be by:

tenant
shard
entity
time partition
work class

The goal is not to “pin it forever.”

The goal is to capture locality when that locality reduces enough cost to justify the extra control.

Simple example

Imagine recomputation jobs by tenant.

Each job needs to read:

tenant configuration
already warmed local cache
related connections and partitions

If each execution lands on a random worker:

it warms context every time
it fights the cache with other tenants
it creates more variable behavior

If there is affinity by tenant or partition, the system reuses more context and becomes more predictable.

But if that tenant grows too much, the system also needs to rebalance.

The common mistake

The common mistake is going to one of the extremes:

everything random
everything rigidly pinned

In the first case, you lose useful locality.

In the second case, you create:

a fixed hotspot
difficulty rebalancing
loud failure when one node dies

Another common mistake is adopting affinity without knowing what is being preserved.

If there is no clear gain in cache, context, or partition locality, maybe you are only making scaling harder.

What usually helps

It usually helps to answer four questions:

what is the natural affinity key?
what real gain does it bring?
when is the system allowed to break that affinity?
how does redistribution happen without chaos?

In practice, that often appears as:

partitioning by key
consistent hashing
preferred workers with fallback
controlled rebalance
skew observability

Good affinity improves predictability.

Bad affinity only ties the system to an arbitrary distribution.

How a senior thinks

Engineers who have already suffered through unpredictable scaling often ask:

which context is actually worth keeping together?
how much skew do I accept before I rebalance?
if one worker disappears, does the work fit somewhere else cleanly?
am I using affinity to gain locality or to hide some other inefficiency?

That conversation avoids a backend that scales in replica count but not in behavior.

Interview angle

This topic appears in queues, jobs, multi-tenant systems, caching, and partitioning.

The interviewer wants to see whether you understand:

that not every distribution needs to be fully random
that locality is also an architectural decision
that useful affinity must coexist with rebalance and fault tolerance

A strong answer often sounds like this:

“I would use affinity when it preserves context or partition locality in a measurable way, but I would make clear how the system rebalances and breaks affinity when one node gets hot or disappears. Otherwise we are just trading randomness for brittle rigidity.”

Direct takeaway

Scaling should not depend on luck.

If locality matters, affinity needs to be designed.

Quick summary

What to keep in your head

Distributing everything completely at random may be simple, but sometimes it wastes locality and increases variability.
Affinity makes sense when keeping context, cache, or partition nearby reduces real cost.
Affinity that is too rigid creates hotspots and fragility. Good affinity still accepts rebalance.
Healthy scaling should not depend on luck for work to land in the right place.

Practice checklist

Use this when you answer

Can I say what real gain exists in keeping this workload near one key, partition, or worker?
Can I rebalance when one node gets hot or disappears?
Is my system using affinity for useful locality or only to hide a structural problem?
If I shut one worker down right now, does the workload keep moving without unpredictable behavior?

You finished this article

Next step

Internal Cache Consistency Without Panic Invalidation Next step →

You finished this article

Next step

Internal Cache Consistency Without Panic Invalidation Next step →

Next article Pagination, Filtering, and Sorting Without Bad Endpoints Previous article Anti-Corruption Between Internal Domains Without Becoming an Ornamental Layer

Share this page

Workload Affinity Without Turning Scaling Into a Lottery

The problem

Mental model

Simple example

The common mistake

What usually helps

How a senior thinks

Interview angle

Direct takeaway

What to keep in your head

Use this when you answer

Keep exploring

Articles

Architecture & Patterns

Related articles

Batch vs Streaming: When Each Processing Shape Makes Sense

Admission Control in the Backend: When Rejecting Early Is Better Than Failing Late

Anti-Corruption Between Internal Domains Without Becoming an Ornamental Layer

Related articles

Batch vs Streaming: When Each Processing Shape Makes Sense

Admission Control in the Backend: When Rejecting Early Is Better Than Failing Late

Anti-Corruption Between Internal Domains Without Becoming an Ornamental Layer