Skip to main content

Async Bugs and Race Conditions

How to make timing failures easier to understand by making ordering, concurrency, and shared state visible.

Andrews Ribeiro

Andrews Ribeiro

Founder & Engineer

The problem

Race conditions feel scary because they rarely fail the exact same way twice.

It works locally, fails in production, disappears when you add a console.log, and only shows up when two responses land in one specific order.

That makes a lot of people treat async bugs like bad luck, when the real problem is usually much simpler: the system has no solid rule for which result is still allowed to update shared state.

Mental model

When you are hunting an async bug, looking only at “what the code does” is not enough.

You also need to look at:

  • the timeline of events
  • which operation finished before the other
  • whether the state was still valid when the result arrived

Once the investigation shifts from “reading lines of code” to “drawing the sequence of events,” the bug usually stops feeling like a ghost.

It also helps to replace a bad sentence with a better one:

  • bad: “the app went weird”
  • better: “two operations finished in an order the UI was not prepared to handle”

Breaking it down

A practical way to investigate this kind of bug looks like this:

  1. list the concurrent events involved
  2. draw the order in which they can finish
  3. find the point where two operations compete over the same state
  4. identify the missing guarantee: cancellation, locking, request versioning, or final validation

That turns a “random bug” into a predictable collision.

This matters because concurrency does not mean total chaos. It means there are multiple valid timelines, and your code still has to stay correct in more than one of them.

Simple example

Imagine an autocomplete input:

  • the user types re
  • request A is sent
  • the user keeps typing and reaches react
  • request B is sent
  • request B returns first and shows the correct results
  • request A returns later and overwrites the UI with stale data

The problem is not fetch.

The problem is that the frontend accepted an old response as if it were still the current truth.

Good fixes here are straightforward:

  • cancel the earlier request with AbortController
  • ignore responses with an outdated request ID
  • only update the UI if the response still matches the current input

None of these fixes exist to make the request “faster.” They exist to stop old state from winning after the world has already changed.

Common mistakes

  • trying to reproduce the bug by random clicking without mapping the timeline first
  • putting a setTimeout on top of the problem and hoping it disappears
  • assuming “async” means random and impossible to fix
  • forgetting that two perfectly valid responses can still break the UI if they arrive in the wrong order

How a senior thinks

More experienced engineers do not call an async bug flaky by reflex.

They draw the timeline and ask:

What sequence of events puts this system into an invalid state?

That question pulls the discussion out of superstition and back into causality.

Another useful question usually follows:

What guarantee is missing that should stop old state from becoming valid again?

Sometimes the answer is cancellation. Sometimes it is idempotency. Sometimes it is a lock. Sometimes it is just checking whether the state is still current before applying the result.

What the interviewer wants to see

In frontend or systems interviews, concurrency reveals depth very quickly.

  • You understand that concurrency makes execution order less predictable.
  • You look for collision points over shared mutable state.
  • You talk about architectural guarantees, not just adding more await.

A strong answer often sounds like this:

I would draw the timeline and figure out which response or operation arrived too late but still managed to write into shared state. From there I would choose the right guarantee: cancellation, locking, versioning, request IDs, or a final validation check.

A race condition is not bad luck. It is a collision the architecture still does not know how to survive.

Quick summary

What to keep in your head

Practice checklist

Use this when you answer

You finished this article

Next article Debugging Rounds: How to Investigate Broken Code Like a Real Engineer Previous article Logs and Observability Without Noise

Keep exploring

Related articles