Skip to main content

Debugging Rounds: How to Investigate Broken Code Like a Real Engineer

How to investigate broken code with order, hypothesis, and a clear next step under uncertainty.

Andrews Ribeiro

Andrews Ribeiro

Founder & Engineer

The problem

Many people do well in theory and then freeze in a debugging round.

Not because they do not know how to debug.

But because, in the interview, chaos arrives compressed:

  • little context
  • time pressure
  • incomplete symptoms
  • the expectation to reason live

Then the answer degrades into one of two extremes:

  • a random list of tools
  • an early solution without real investigation

Neither of those signals seniority.

What this round is really measuring

In general, the interviewer is not asking:

“Do you know how to use Datadog, Kibana, Sentry, or tool X?”

What they want to know is something more structural:

  • can you create order in the middle of the mess?
  • can you distinguish symptom from cause?
  • can you reduce the search space?
  • do you know which question needs to be answered first?
  • can you change direction when the hypothesis falls apart?

In other words:

a debugging round measures method under uncertainty

Mental model

A strong answer usually follows this logic:

  1. define the symptom
  2. reduce the scope
  3. form an initial hypothesis
  4. choose the most useful signal to validate it
  5. say what the next step would be depending on the result

That sounds simple.

And that is the point.

A good debugging answer almost never looks magical. It looks organized.

How to structure the answer

1. Start with what is broken

Before talking about tools, say what you understand about the symptom.

Examples:

  • 500 errors are increasing
  • one specific flow is getting slow
  • a job is processing twice
  • the failure affects only part of the users

That opening shows you are not treating “debugging” like a generic verb.

You are starting from observable behavior.

2. Reduce scope as early as possible

Then narrow the problem.

Useful questions:

  • since when has it been happening?
  • did it start after a deploy, feature flag, or config change?
  • does it affect everyone or only one segment?
  • does it happen in one flow or many?
  • is it intermittent or constant?

A good debugging answer does not try to embrace the whole system too early.

It keeps removing variables from the table.

3. State the current hypothesis

This changes how the answer feels.

Instead of saying:

“I would open the logs, then check the database, then look at the queue”

say something like:

“If the problem started after the deploy and is concentrated only in card checkout, my initial hypothesis is a regression in that flow or in one dependency connected to it.”

Now there is an investigation line.

It is no longer a checklist. It is reasoning.

4. Choose the first signal, not your favorite tool

Tools matter.

But what communicates maturity is the criterion.

Examples of signals:

  • error rate by endpoint
  • correlation with a deploy
  • errors concentrated in one provider
  • latency increasing at one specific step
  • failed requests sharing the same pattern

The interviewer wants to hear:

“I would look at this signal because it helps me confirm or weaken this hypothesis”

That “because” carries a lot of weight.

5. Say what would make you change direction

A weak answer sounds attached to its first hypothesis.

A strong answer shows abandonment criteria.

Example:

“If I saw that the failure also happens in flows without this integration, I would drop that line and go back to something more transversal, like config, authentication, or infrastructure.”

That shows flexibility without sounding lost.

Simple example

Question:

“One critical feature started failing in production. How would you investigate it?”

Weak answer:

“I would open the logs, check the metrics, maybe roll back, and then look at the database.”

None of that is necessarily wrong.

But the answer is still loose.

Stronger answer:

“I would start by defining the symptom more precisely: error, slowness, or duplication? Then I would reduce scope to understand when it started, whether it matches a deploy or recent change, and whether it affects all users or one segment. With that slice, I would form the most likely initial hypothesis and choose the most useful signal to confirm it before changing the system. If the impact were high and there were a safe reversible mitigation, I would also consider containment in parallel.”

That already communicates:

  • order
  • judgment
  • awareness of impact
  • investigation before guessing

The most common mistake

The most common mistake in a debugging round is answering as if the goal were to impress by naming many things.

But debugging is not a buffet.

If you talk about:

  • logs
  • traces
  • queues
  • database
  • cache
  • restart
  • rollback
  • timeout

all at once, the impression is not depth.

It is technical anxiety.

How a senior answers differently

The strongest answers usually create this feeling:

“I do not know the cause yet, but I know exactly how to reduce uncertainty without making the system messier.”

That is the signal.

Not omniscience.

Control.

Seniority in a debugging round appears when the answer:

  • prioritizes what needs to be understood first
  • avoids changing everything at once
  • separates investigation from mitigation
  • makes clear why the next step is the next step

A short framework to keep in mind

If you want one simple format to use live:

  1. “First I would define the symptom better.”
  2. “Then I would reduce the scope.”
  3. “With that, I would form the initial hypothesis.”
  4. “I would look for the most useful signal to confirm or weaken that hypothesis.”
  5. “Depending on the result, I would decide whether to keep investigating, mitigate, or change direction.”

That already solves a large part of the round.

Interview angle

When this kind of question appears, the interviewer is checking whether you can:

  • think with incomplete context
  • stay clear under pressure
  • create an investigation sequence
  • communicate the trade-off between investigating and containing

So it is worth remembering:

in a debugging round, sounding methodical matters more than sounding heroic

You can learn tools later.

Mental order is what makes the answer level up.

Quick summary

What to keep in your head

Practice checklist

Use this when you answer

You finished this article

Next article When to Abstract and When to Keep It Simple Previous article Async Bugs and Race Conditions

Keep exploring

Related articles