October 28 2025

AI scenarios in production

How to think about an AI feature in a real product without treating the model like magic and hiding cost, fallback, and likely failure.

Andrews Ribeiro

Founder & Engineer

5 min Intermediate Systems

#system-design#ai-systems#ai#llm#product#reliability

The problem

Many AI features are born in a childish way:

someone sees a good demo
the team picks a model
ships an integration
and calls that a strategy

While the happy path works, everything looks fine.

The problem shows up when the system meets the real world:

the model is slower than the interface can tolerate
the response format changes
quality drops on an important class of cases
cost spikes
the model makes a confident mistake

That is when it becomes obvious that “adding AI” was never the main work.

The main work was designing a product that stays trustworthy when the AI behaves like AI.

Mental model

The most useful way to think about this scenario is:

an AI feature is a probabilistic component inside a system that users still expect to be reliable.

That sentence clears up a lot of confusion.

Your backend still needs contracts.

Your interface still needs acceptable response times.

Your product still needs to protect user trust.

So the main question stops being:

which model is most impressive?

And becomes:

which errors are acceptable?
which errors are unacceptable?
what happens when the AI fails, slows down, or returns something bad?

Breaking it down

First define the role of the AI

Not every AI feature carries the same level of risk.

Compare these cases:

summarizing a long support ticket
suggesting text for an email
classifying incident priority
responding automatically to a customer

In the first two, the user can still review the result easily.

In the last two, a bad answer can become operational damage or trust loss much faster.

So the first decision is not technical. It is about product:

is the AI assisting, recommending, or deciding?

The more irreversible the action is, the less room you have to let the model act alone.

Then define the dangerous error

This step often separates mature teams from overexcited ones.

You need to know:

which error is merely annoying
which error is expensive
which error destroys trust

If the feature summarizes tickets, the dangerous error might be hiding a pending customer action.

If the feature classifies fraud, the dangerous error might be clearing the wrong case without review.

Without that clarity, the rest of the architecture stays too generic.

Output contracts matter

Many teams treat model output like free-form text and hope it works out.

In production, that is fragile.

The system needs some combination of:

an expected schema
response validation
handling for invalid format
fallback when the output is unusable

That is true even when the final experience is text. The UI, analytics, and downstream flows still depend on minimum predictability.

Fallback is not a detail

If the model:

times out
returns poor output
exceeds cost limits
or becomes unavailable

the product still needs to keep moving.

Fallback can mean:

hiding the feature
showing partial content
returning to a manual flow
asking for human review
using a simpler deterministic rule

The main point is simple: the user should not be trapped by the model’s mood.

Evaluation and observability belong early

An AI feature without measurement becomes an argument about feelings.

You need to observe at least:

perceived quality
fallback rate
latency
cost
failure rate by case type
which prompt, context, or model version produced the output

That is what lets you know whether the feature is actually helping or only looking modern in a demo.

If quality drops after a model or prompt change, you need to be able to locate that change quickly.

Otherwise every conversation turns into opinion.

Simple example

Imagine a feature that summarizes long support conversations for an agent.

The weak answer would be:

“I would send the text to an LLM, get the summary back, and render it.”

The mature answer would be:

“I would treat the AI as an assistant, not as a single source of truth. The system receives the conversation, prepares relevant context, and asks for a structured summary. I validate the format, monitor latency, and cap cost. If the response is poor, too slow, or fails, the interface hides the summary and keeps the original flow available. I also evaluate quality against reference cases so the summary does not omit pending actions or invent a tone that never existed.”

That second answer is better because it:

defines the role of the AI
identifies the dangerous error
includes fallback
talks about evaluation and operations

Common mistakes

Talking only about prompts and not about product behavior.
Choosing a model before defining tolerable error.
Treating fallback as a UX detail for later.
Ignoring cost and latency as if they were separate infrastructure problems.
Trusting pretty output without schema, validation, or measurement.

How a senior thinks

Someone who has seen AI features in production tends to be less dazzled and more defensive.

The conversation does not revolve around “the model is smart.”

It revolves around:

where it genuinely helps
where it should not operate alone
how the system absorbs failure
how the team notices regression before the user does

That may sound less glamorous.

But it is exactly what lets the feature survive outside the demo.

What the interviewer wants to see

In this scenario, the interviewer wants to see whether you:

understand AI as a probabilistic dependency
connect output quality to business risk
talk naturally about fallback, validation, and observability
treat cost and latency as part of the design
avoid the naive posture of “just add a model”

An AI feature in production is not the one that looks magical. It is the one that remains useful when the model is slow, wrong, or unavailable.

Once you design control before you design shine, the architecture starts sounding real.

Quick summary

What to keep in your head

An AI feature is not only an API integration; it is a probabilistic component inside a product that still needs to feel predictable.
The most important question is not which model to use, but which error the business can tolerate.
Fallback, evaluation, output contracts, and observability belong in the design before the promise scales.
Latency, cost, and user trust are part of the architecture, not polish for later.
In interviews, the strong answer shows control over failure, not fascination with the model.

Practice checklist

Use this when you answer

Can I explain what changes when a probabilistic component enters a deterministic system?
Can I name the most dangerous failure mode of the feature I am designing?
Can I describe fallback, evaluation, and observability without sounding improvised?
Can I show how cost, latency, and user trust change the architecture?

You finished this article

Next step

Failure and recovery scenarios Next step →

You finished this article

Next step

Failure and recovery scenarios Next step →

Next article Social Media Feed System Design Previous article Failure and recovery scenarios

AI scenarios in production

The problem

Mental model

Breaking it down

First define the role of the AI

Then define the dangerous error

Output contracts matter

Fallback is not a detail

Evaluation and observability belong early

Simple example

Common mistakes

How a senior thinks

What the interviewer wants to see

What to keep in your head

Use this when you answer

Keep exploring

Articles

System Design

AI Systems

Related articles

Investigating Production Failures

RAG vs Fine-Tuning Without a False Binary

API scenarios at scale

Related articles

Failure and recovery scenarios Next step →

Next article Social Media Feed System Design

Investigating Production Failures

RAG vs Fine-Tuning Without a False Binary