Skip to main content

SLO, SLA, and SLI: What They Are and How to Answer About Them in Interviews

How to distinguish these three concepts without buzzwords and explain the role of each one clearly in a real context.

Andrews Ribeiro

Andrews Ribeiro

Founder & Engineer

The problem

These three acronyms show up a lot in platform, backend, reliability, and more mature product-team interviews.

And many people answer badly for one simple reason:

they do not separate the roles clearly enough.

So the answer becomes something like:

  • “SLA is availability”
  • “SLO is the goal”
  • “SLI is the metric”

Technically that points in the right direction.

But it is still too shallow.

Because it does not show:

  • why these things exist
  • how they relate to each other
  • and how they influence engineering decisions

The real problem is this:

many answers stop at the short definition and never reach operational judgment.

Mental model

Think about it like this:

SLI measures, SLO guides, and SLA commits.

That sentence organizes almost everything.

In simple terms:

  • SLI is the observed indicator
  • SLO is the internal target the team is trying to hit
  • SLA is the formal commitment, usually with commercial or contractual consequences

Once you understand that order, you stop treating the three as synonyms with different clothes.

Breaking it down

SLI is what you observe

SLI means Service Level Indicator.

In practice, think of it as a measurement of relevant system behavior.

Examples:

  • percentage of successful requests
  • p95 latency of a critical endpoint
  • rate of messages processed within a certain time
  • successful logins without error

The main point is:

an SLI is not just “any metric.”

It is a metric that represents something important about the experience or the service.

SLO is the internal target that guides decisions

SLO means Service Level Objective.

It is the target the team chooses to pursue on top of an SLI.

Simple example:

  • SLI: percentage of successful checkout requests
  • SLO: 99.9% monthly checkout success

Here is the important part:

an SLO is not just a pretty number.

It exists to influence decisions.

Things like:

  • is this release worth shipping now?
  • has this incident already consumed too much of our margin?
  • are we spending too much reliability for speed?

Without that layer, the SLO becomes only a dashboard.

SLA is an external commitment

SLA means Service Level Agreement.

Usually this is the more formal part:

  • a contract
  • a commercial promise
  • a customer commitment
  • a consequence if the promise is not met

Examples:

  • financial credits
  • contractual penalties
  • response obligations

That is why an SLA is usually more connected to the external relationship than to the team’s daily operation.

Mature teams do pay attention to the SLA, but they should not operate only by it.

Because waiting until the contractual limit is near is already too late.

SLO is usually more useful to engineering than SLA

This is a very good interview point.

SLA matters.

But in day-to-day work, engineering usually operates much more around the SLO.

Why?

Because the SLO gives room to maneuver before the problem becomes a commercial crisis.

It lets the team see:

  • degradation before a serious break
  • reliability budget being consumed over time
  • the need to reduce risk before an external commitment explodes

In other words:

the SLO helps the team steer.

The SLA only warns when the team is already close to the wall.

Without a good SLI, the rest gets weak

This mistake is also common.

The team defines an SLO without having a reliable indicator.

The result:

  • poorly measured target
  • false sense of health
  • vague discussion about reliability

If the indicator does not represent the experience or the service well, the target built on top of it loses value too.

So the first useful question is often:

  • does this SLI actually capture the behavior that matters?

Not every availability number summarizes reliability

Another bad shortcut is reducing everything to uptime.

A system can be “up” and still be:

  • too slow
  • failing in one critical flow
  • failing for an important subset of users
  • accumulating operational delay that stays invisible in a simple dashboard

That is why a mature answer does not treat reliability as only a raw availability percentage.

It thinks about relevant user experience.

In interviews, the best answer connects the concept to use

It is not enough to say:

  • SLI is a metric
  • SLO is an objective
  • SLA is an agreement

It is better to show:

  • one real example
  • why the SLO would be defined that way
  • how it helps a decision
  • and why the SLA is not the only ruler for the team

That is what moves the answer out of memorization.

Simple example

Imagine a product with a critical checkout flow.

A weak answer would be:

“SLI is the metric, SLO is the goal, and SLA is the contract.”

That is correct, but still superficial.

A better answer:

“I would first think about the SLI that actually represents the critical experience, for example the checkout success rate or the payment-confirmation latency. On top of that, the team defines an internal SLO, like 99.9% monthly success, to guide release decisions and reliability prioritization. The SLA would be the external commitment to the customer, possibly more conservative and with contractual consequences. In practice, engineering operates by the SLO so it does not discover the problem only after the SLA has already been broken.”

That answer shows:

  • the difference between the three pieces
  • one concrete example
  • operational use
  • judgment

Common mistakes

  • treating SLI, SLO, and SLA as synonyms
  • assuming any metric is already a good SLI
  • operating the team only by the SLA
  • reducing reliability to raw uptime
  • memorizing the definition without explaining why it changes decisions

How a senior thinks

More mature engineers often think like this:

“Good reliability needs a clear way to measure, one internal target that guides trade-offs, and one external commitment that does not get discovered only when things are already bad.”

That view is useful because it connects observability with product and operations.

Seniority here is not knowing the acronym expansion.

It is understanding how those pieces influence:

  • prioritization
  • release decisions
  • risk
  • investment in reliability

What the interviewer wants to see

When this topic comes up, the evaluator usually wants to understand whether you:

  • clearly separate indicator, objective, and agreement
  • can give a concrete example of an SLI and an SLO
  • understand why teams operate more around SLO than SLA
  • connect reliability to real engineering decisions
  • avoid answers that sound too bureaucratic

A strong answer usually has this shape:

  1. explain the difference
  2. give one real flow example
  3. show how the SLO guides decisions
  4. explain the more external role of the SLA

If that appears, the answer is already above average.

SLI, SLO, and SLA do not exist to make a team look process-heavy. They exist to make reliability measurable and negotiable.

When the team only remembers the SLA, it is usually already reacting too late.

Quick summary

What to keep in your head

Practice checklist

Use this when you answer

You finished this article

Next article How to Distinguish Symptom from Root Cause Previous article Writing Postmortems the Team Respects

Keep exploring

Related articles