Observability Isn't a Feature You Just Turn On
Rethinking "observability "in the context of a continuous process...
The SaaS playbook for application monitoring broke when we started building AI products.
When engineering teams built traditional SaaS products, observability meant one thing: is the system behaving correctly?
Server up? Check. API responding? Check. Database latency within bounds? Check.
These were deterministic questions with deterministic answers. The system either worked or it didn’t. Observability was something you implemented and then largely forgot about. It ran in the background, paging engineers when thresholds got crossed.
Then LLMs arrived we applied the same trusty playbook.
We stood up observability tools. We tracked tokens, latency, and error rates. We built dashboards. And then we moved on, treating “LLM observability” as a box we had checked.
This is why most teams have no idea what’s actually happening in their AI products.
The Binary Trap
The term “LLM observability” implies a state you can achieve. You’re either doing it or you’re not. Install the tool, configure the traces, done.
But observability for non-deterministic systems isn’t a state. It’s a process. And treating it as a binary creates the illusion of insight.
Traditional SaaS observability worked because the questions were simple:
Did the request succeed?
How long did it take?
What resources did it consume?
These questions don’t translate to LLMs. An LLM can respond in 200ms with perfect uptime while consistently failing to understand what users actually need. The system is behaving correctly in the technical sense, but it’s not behaving usefully.
When teams treat observability as something they’ve “implemented,” they stop there. They have the traces. They have the dashboards. They have green checkmarks across the board.
What they don’t have is any understanding of whether customers are getting value.
Why This Model Is a Dead End
The SaaS observability model was designed for a specific purpose: to verify that deterministic systems were behaving deterministically. It answered a narrow question (is the system working?) and it answered it well.
For LLMs, that question is almost meaningless.
The output of an LLM isn’t “correct” or “incorrect” in any absolute sense. It’s somewhere on a spectrum of useful to useless, and where it lands depends entirely on context: what the user was trying to accomplish, what they already know, what happened earlier in the conversation.
Traditional observability can’t capture this. And no amount of additional tooling or eval-like solutions built on the same foundation will fix it.
You can add more traces. You can track more tokens. You can build more dashboards. But if you’re still treating observability as a binary, as something you’ve done rather than something you’re continuously doing, you’re building on a foundation that won’t support its weight.
Observability as an Organizational Process
What does observability actually mean for AI products?
It’s not a tool. It’s not a dashboard. It’s a continuous process that transforms raw customer interactions into organizational action.
This process has three stages, and most teams don’t get past the first.
Stage 1: Collect This is where traditional LLM observability stops. You’re logging conversations, capturing traces, storing the data. The tools are set up. The data is flowing maybe you even stood up some evals.
Most teams declare victory here.
Stage 2: Classify Raw data is useless without meaning. What was the customer trying to do? Did they accomplish it? Were they frustrated? Confused? Ready to buy?
Classification takes the firehose of conversation data and turns it into structured insight. It answers questions like:
What topics drive the most confusion?
Which customer segments struggle most?
Where does the AI consistently miss the mark?
This is where most teams stall, not because classification is technically hard, but because it requires continuous effort. It’s not a feature you ship. It’s a capability you refine.
Stage 3: Activate Insight without action is just expensive storage. The final stage turns labeled data into organizational response.
Product teams see which use cases are failing and prioritize fixes. Sales teams see which accounts are frustrated and intervene before renewal. Customer success sees patterns across segments and builds proactive outreach. Marketing sees what customers actually care about and refines messaging.
This stage is where observability delivers business results, and where the “install and forget” model completely fails.
The Organizational Problem
The binary conception of observability persists because it fits neatly into existing organizational structures.
Engineering owns tools. They implement observability, check the box, and move on to the next project.
But AI observability (real observability?, please help me with a better term here!) isn’t something engineering can own in isolation. It requires continuous classification that demands domain expertise. It requires activation that happens in product, sales, success, and marketing.
When observability is treated as a tool that engineering “implements,” it creates a structural disconnect. The people who can interpret and act on customer insights (product, success, sales) are dependent on the people who own the data (engineering). Information gets stuck. Insights arrive too late. Customers churn before anyone notices.
The fix isn’t better handoffs between teams. The fix is recognizing that observability isn’t a handoff at all. It’s a continuous process that runs across the organization.
What This Means in Practice
Stop asking “do we have observability?” Start asking “can we answer these questions right now?”
Which customers are struggling with our AI today?
What use cases are failing most often?
Which accounts should sales be talking to this week?
What should we build next to improve customer outcomes?
If your team can’t answer these questions without filing a ticket with engineering, your observability process is broken, regardless of what tools you have installed.
The goal isn’t more traces. It’s faster loops from customer signal to organizational action.
That requires thinking about observability not as infrastructure that engineering ships, but as an ongoing process that transforms how your entire organization learns from every customer conversation.
Gentle plug here, if you’re feeling this pain, we can help!
The Shift
The teams that figure this out will have a significant advantages. They’ll catch problems before customers give up. They’ll find upsell opportunities while deals are still winnable. They’ll build the right features because they understand what customers actually struggle with.
The teams that don’t will have beautiful dashboards showing them exactly how fast their AI responded to customers who eventually left.
Observability isn’t something you finish. It’s something you do.


