May 8 2025

SLO, SLA, and SLI: What They Are and How to Answer About Them in Interviews

How to distinguish these three concepts without buzzwords and explain the role of each one clearly in a real context.

Andrews Ribeiro

Founder & Engineer

6 min Intermediate Systems

#debugging-production#debugging#production#observability#reliability#interviews

The problem

These three acronyms show up a lot in platform, backend, reliability, and more mature product-team interviews.

And many people answer badly for one simple reason:

they do not separate the roles clearly enough.

So the answer becomes something like:

“SLA is availability”
“SLO is the goal”
“SLI is the metric”

Technically that points in the right direction.

But it is still too shallow.

Because it does not show:

why these things exist
how they relate to each other
and how they influence engineering decisions

The real problem is this:

many answers stop at the short definition and never reach operational judgment.

Mental model

Think about it like this:

SLI measures, SLO guides, and SLA commits.

That sentence organizes almost everything.

In simple terms:

SLI is the observed indicator
SLO is the internal target the team is trying to hit
SLA is the formal commitment, usually with commercial or contractual consequences

Once you understand that order, you stop treating the three as synonyms with different clothes.

Breaking it down

SLI is what you observe

SLI means Service Level Indicator.

In practice, think of it as a measurement of relevant system behavior.

Examples:

percentage of successful requests
p95 latency of a critical endpoint
rate of messages processed within a certain time
successful logins without error

The main point is:

an SLI is not just “any metric.”

It is a metric that represents something important about the experience or the service.

SLO is the internal target that guides decisions

SLO means Service Level Objective.

It is the target the team chooses to pursue on top of an SLI.

Simple example:

SLI: percentage of successful checkout requests
SLO: 99.9% monthly checkout success

Here is the important part:

an SLO is not just a pretty number.

It exists to influence decisions.

Things like:

is this release worth shipping now?
has this incident already consumed too much of our margin?
are we spending too much reliability for speed?

Without that layer, the SLO becomes only a dashboard.

SLA is an external commitment

SLA means Service Level Agreement.

Usually this is the more formal part:

a contract
a commercial promise
a customer commitment
a consequence if the promise is not met

Examples:

financial credits
contractual penalties
response obligations

That is why an SLA is usually more connected to the external relationship than to the team’s daily operation.

Mature teams do pay attention to the SLA, but they should not operate only by it.

Because waiting until the contractual limit is near is already too late.

SLO is usually more useful to engineering than SLA

This is a very good interview point.

SLA matters.

But in day-to-day work, engineering usually operates much more around the SLO.

Why?

Because the SLO gives room to maneuver before the problem becomes a commercial crisis.

It lets the team see:

degradation before a serious break
reliability budget being consumed over time
the need to reduce risk before an external commitment explodes

In other words:

the SLO helps the team steer.

The SLA only warns when the team is already close to the wall.

Without a good SLI, the rest gets weak

This mistake is also common.

The team defines an SLO without having a reliable indicator.

The result:

poorly measured target
false sense of health
vague discussion about reliability

If the indicator does not represent the experience or the service well, the target built on top of it loses value too.

So the first useful question is often:

does this SLI actually capture the behavior that matters?

Not every availability number summarizes reliability

Another bad shortcut is reducing everything to uptime.

A system can be “up” and still be:

too slow
failing in one critical flow
failing for an important subset of users
accumulating operational delay that stays invisible in a simple dashboard

That is why a mature answer does not treat reliability as only a raw availability percentage.

It thinks about relevant user experience.

In interviews, the best answer connects the concept to use

It is not enough to say:

SLI is a metric
SLO is an objective
SLA is an agreement

It is better to show:

one real example
why the SLO would be defined that way
how it helps a decision
and why the SLA is not the only ruler for the team

That is what moves the answer out of memorization.

Simple example

Imagine a product with a critical checkout flow.

A weak answer would be:

“SLI is the metric, SLO is the goal, and SLA is the contract.”

That is correct, but still superficial.

A better answer:

“I would first think about the SLI that actually represents the critical experience, for example the checkout success rate or the payment-confirmation latency. On top of that, the team defines an internal SLO, like 99.9% monthly success, to guide release decisions and reliability prioritization. The SLA would be the external commitment to the customer, possibly more conservative and with contractual consequences. In practice, engineering operates by the SLO so it does not discover the problem only after the SLA has already been broken.”

That answer shows:

the difference between the three pieces
one concrete example
operational use
judgment

Common mistakes

treating SLI, SLO, and SLA as synonyms
assuming any metric is already a good SLI
operating the team only by the SLA
reducing reliability to raw uptime
memorizing the definition without explaining why it changes decisions

How a senior thinks

More mature engineers often think like this:

“Good reliability needs a clear way to measure, one internal target that guides trade-offs, and one external commitment that does not get discovered only when things are already bad.”

That view is useful because it connects observability with product and operations.

Seniority here is not knowing the acronym expansion.

It is understanding how those pieces influence:

prioritization
release decisions
risk
investment in reliability

What the interviewer wants to see

When this topic comes up, the evaluator usually wants to understand whether you:

clearly separate indicator, objective, and agreement
can give a concrete example of an SLI and an SLO
understand why teams operate more around SLO than SLA
connect reliability to real engineering decisions
avoid answers that sound too bureaucratic

A strong answer usually has this shape:

explain the difference
give one real flow example
show how the SLO guides decisions
explain the more external role of the SLA

If that appears, the answer is already above average.

SLI, SLO, and SLA do not exist to make a team look process-heavy. They exist to make reliability measurable and negotiable.

When the team only remembers the SLA, it is usually already reacting too late.

Quick summary

What to keep in your head

SLI is the observed metric, SLO is the internal target, and SLA is the formal commitment with external consequence.
Without a reliable SLI, an SLO becomes opinion. Without a clear SLO, reliability turns into vague conversation.
An SLA should not be the team's main instrument for daily operation.
In interviews, a strong answer shows how these pieces help decision-making, prioritization, and trade-offs.

Practice checklist

Use this when you answer

Can I explain the difference between SLI, SLO, and SLA without mixing definition and contract?
Can I give one concrete example of an SLI and an SLO for a real flow?
Can I explain why engineering teams operate more around SLO than SLA?
Can I answer without turning reliability into corporate jargon?

You finished this article

Next step

Logs and Observability Without Noise Next step →