March 20 2025

Logs and Observability Without Noise

How to write logs that actually help during an investigation instead of flooding the system with expensive text.

Andrews Ribeiro

Founder & Engineer

3 min Intermediate Systems

#debugging-production#debugging#logs#observability

The problem

When teams realize they are blind in production, the reflex is often to log everything.

The result is predictable:

more text
more cost
less clarity
a slower investigation anyway

The on-call engineer opens the dashboard and gets a wall of noise instead of help.

Mental model

A good log is not the one that prints the most.

A good log helps someone answer, quickly:

what failed
in which flow
with what context
since when
with what impact

In other words:

useful logs are investigation tools, not runtime panic translated into text

Breaking it down

Log important boundaries

Not every internal function deserves a log.

The places that usually give the best signal are:

request entry
external dependency calls
meaningful business errors
job completion
important flow decisions

That reduces noise and increases the density of useful information.

Add context that helps connect events

A vague message like error while processing usually forces someone to open the code in the middle of an incident.

Better context looks like:

request_id
user_id or tenant_id when appropriate
flow name
dependency involved
returned status
latency

Context turns a log into an actual clue.

Design for search, not just reading

Production logs are rarely read line by line.

They need to be searchable.

That is why structured fields and consistent names help more than clever wording.

The person investigating wants to filter, group, and correlate.

If they still need to open the code just to understand which flow failed, the log arrived too poor.

Separate signal from spam

If everything becomes error, nothing feels urgent anymore.

If everything becomes a log, nothing stands out.

Use level and frequency deliberately.

Logging every loop iteration might calm you during local development.

In production, it usually becomes pollution.

It is also worth separating useful logs from debugging curiosity. Not everything that helps you locally deserves to become permanent production cost.

Simple example

Compare these two cases.

Bad log:

Error happened

Better log:

checkout_failed request_id=9f2 order_id=8342 provider=stripe status=timeout latency_ms=4500 retryable=true

The second one is not better because it has more words.

It is better because the person opening the logs can already answer:

which flow failed
which dependency was involved
whether it looks like a timeout
whether a retry might help

That means the investigation starts closer to the cause.

Common mistakes

logging too much state while still hiding the core information
writing vague messages that do not explain the event
swallowing the original stack trace or error code
failing to carry request_id or similar context across services
treating logs as a replacement for actual investigative thinking

How a senior thinks

More experienced engineers write logs with the worst possible moment in mind.

The reasoning sounds like this:

If this breaks at 3 AM, what would I want to find in one search so I can understand the problem without opening ten files?

That question improves system quality a lot.

Because the log stops being application autobiography and becomes an operational tool.

What the interviewer wants to see

In interviews, this topic is usually testing production maturity.

The interviewer wants to see whether you:

think in terms of context and traceability
know where logging is useful
avoid useless noise
understand that observability exists to reduce uncertainty

A strong answer often sounds like this:

I would design logs around the questions I need to answer during an incident. Instead of printing everything, I would prioritize important boundaries and structured context so I can filter by flow, dependency, request, and impact.

Good logs save investigation time. In a real incident, minutes matter.

Quick summary

What to keep in your head

Good logs answer investigation questions. Bad logs only add volume.
Context usually matters more than a pretty sentence or a stack trace with no surrounding signal.
Writing logs with the on-call engineer in mind reduces diagnosis time and team friction.
Not every line deserves a log. Boundaries and relevant events usually matter more.

Practice checklist

Use this when you answer

Can I say which fields need to exist in a useful production log?
Do I know the difference between local debugging logs and logs that help in a real incident?
Can I explain why structured context is more useful than vague text?
Can I explain in an interview how I would design logs to reduce investigation time?

You finished this article

Next step

Investigating Production Failures Next step →

You finished this article

Next step

Investigating Production Failures Next step →

Next article Async Bugs and Race Conditions Previous article Investigating Production Failures

Logs and Observability Without Noise

The problem

Mental model

Breaking it down

Log important boundaries

Add context that helps connect events

Design for search, not just reading

Separate signal from spam

Simple example

Common mistakes

How a senior thinks

What the interviewer wants to see

What to keep in your head

Use this when you answer

Keep exploring

Articles

Debugging & Production

Related articles

Async Bugs and Race Conditions

Debugging Rounds: How to Investigate Broken Code Like a Real Engineer

Distributed Tracing: What It Is and How to Use It to Debug Systems

Related articles

Async Bugs and Race Conditions

Debugging Rounds: How to Investigate Broken Code Like a Real Engineer

Distributed Tracing: What It Is and How to Use It to Debug Systems