May 13 2025

Debugging Rounds: How to Investigate Broken Code Like a Real Engineer

How to investigate broken code with order, hypothesis, and a clear next step under uncertainty.

Andrews Ribeiro

Founder & Engineer

5 min Intermediate Systems

#debugging-production#debugging#interviews#troubleshooting#production#communication

The problem

Many people do well in theory and then freeze in a debugging round.

Not because they do not know how to debug.

But because, in the interview, chaos arrives compressed:

little context
time pressure
incomplete symptoms
the expectation to reason live

Then the answer degrades into one of two extremes:

a random list of tools
an early solution without real investigation

Neither of those signals seniority.

What this round is really measuring

In general, the interviewer is not asking:

“Do you know how to use Datadog, Kibana, Sentry, or tool X?”

What they want to know is something more structural:

can you create order in the middle of the mess?
can you distinguish symptom from cause?
can you reduce the search space?
do you know which question needs to be answered first?
can you change direction when the hypothesis falls apart?

In other words:

a debugging round measures method under uncertainty

Mental model

A strong answer usually follows this logic:

define the symptom
reduce the scope
form an initial hypothesis
choose the most useful signal to validate it
say what the next step would be depending on the result

That sounds simple.

And that is the point.

A good debugging answer almost never looks magical. It looks organized.

How to structure the answer

1. Start with what is broken

Before talking about tools, say what you understand about the symptom.

Examples:

500 errors are increasing
one specific flow is getting slow
a job is processing twice
the failure affects only part of the users

That opening shows you are not treating “debugging” like a generic verb.

You are starting from observable behavior.

2. Reduce scope as early as possible

Then narrow the problem.

Useful questions:

since when has it been happening?
did it start after a deploy, feature flag, or config change?
does it affect everyone or only one segment?
does it happen in one flow or many?
is it intermittent or constant?

A good debugging answer does not try to embrace the whole system too early.

It keeps removing variables from the table.

3. State the current hypothesis

This changes how the answer feels.

Instead of saying:

“I would open the logs, then check the database, then look at the queue”

say something like:

“If the problem started after the deploy and is concentrated only in card checkout, my initial hypothesis is a regression in that flow or in one dependency connected to it.”

Now there is an investigation line.

It is no longer a checklist. It is reasoning.

4. Choose the first signal, not your favorite tool

Tools matter.

But what communicates maturity is the criterion.

Examples of signals:

error rate by endpoint
correlation with a deploy
errors concentrated in one provider
latency increasing at one specific step
failed requests sharing the same pattern

The interviewer wants to hear:

“I would look at this signal because it helps me confirm or weaken this hypothesis”

That “because” carries a lot of weight.

5. Say what would make you change direction

A weak answer sounds attached to its first hypothesis.

A strong answer shows abandonment criteria.

Example:

“If I saw that the failure also happens in flows without this integration, I would drop that line and go back to something more transversal, like config, authentication, or infrastructure.”

That shows flexibility without sounding lost.

Simple example

Question:

“One critical feature started failing in production. How would you investigate it?”

Weak answer:

“I would open the logs, check the metrics, maybe roll back, and then look at the database.”

None of that is necessarily wrong.

But the answer is still loose.

Stronger answer:

“I would start by defining the symptom more precisely: error, slowness, or duplication? Then I would reduce scope to understand when it started, whether it matches a deploy or recent change, and whether it affects all users or one segment. With that slice, I would form the most likely initial hypothesis and choose the most useful signal to confirm it before changing the system. If the impact were high and there were a safe reversible mitigation, I would also consider containment in parallel.”

That already communicates:

order
judgment
awareness of impact
investigation before guessing

The most common mistake

The most common mistake in a debugging round is answering as if the goal were to impress by naming many things.

But debugging is not a buffet.

If you talk about:

logs
traces
queues
database
cache
restart
rollback
timeout

all at once, the impression is not depth.

It is technical anxiety.

How a senior answers differently

The strongest answers usually create this feeling:

“I do not know the cause yet, but I know exactly how to reduce uncertainty without making the system messier.”

That is the signal.

Not omniscience.

Control.

Seniority in a debugging round appears when the answer:

prioritizes what needs to be understood first
avoids changing everything at once
separates investigation from mitigation
makes clear why the next step is the next step

A short framework to keep in mind

If you want one simple format to use live:

“First I would define the symptom better.”
“Then I would reduce the scope.”
“With that, I would form the initial hypothesis.”
“I would look for the most useful signal to confirm or weaken that hypothesis.”
“Depending on the result, I would decide whether to keep investigating, mitigate, or change direction.”

That already solves a large part of the round.

Interview angle

When this kind of question appears, the interviewer is checking whether you can:

think with incomplete context
stay clear under pressure
create an investigation sequence
communicate the trade-off between investigating and containing

So it is worth remembering:

in a debugging round, sounding methodical matters more than sounding heroic

You can learn tools later.

Mental order is what makes the answer level up.

Quick summary

What to keep in your head

A good debugging round shows mental order, not tool tourism.
Symptom, scope, hypothesis, and next step usually matter more than an immediate final answer.
A strong answer makes clear what you are trying to prove before you start changing the system.
Under pressure, sounding disciplined matters more than sounding brilliant.

Practice checklist

Use this when you answer

Can I start with symptom and scope before mentioning logs, traces, or dashboards?
Can I explain which hypothesis I am testing and what would make me abandon it?
Can I answer a debugging round without jumping straight to rollback, restart, or patch?
Can I show calm and structure even when the question comes with incomplete context?

You finished this article

Next step

How to Debug Without Changing Code in the Dark Next step →