Measure the agent on containment, not "deployed"

Trey Heath

22 Jun 2026 • 2 min read

“Deployed” is a vanity metric. It tells you an agent exists. It tells you nothing about whether the agent does the work.

Thousands of CIOs left Info-Tech LIVE in Las Vegas this week asking one question: where’s the value? They shipped pilots. They counted deployments. And they came home with a stack of dashboards that measure activity and prove nothing.

There’s a better number. It’s called containment.

Containment is the share of interactions an agent resolves end to end before a human touches it. The 2026 service benchmarks report containment rates of 80 to 99.5%. That’s the spread between an agent that handles four of five cases alone and one that handles all but a handful. The number ties straight to cost and to what the customer feels, because every contained case is a person you didn’t pay and a wait the customer didn’t sit through.

Containment is a customer-service term. Steal it. It works for any agent you run.

It works for the prosecutor’s agent sorting discovery. It works for the family-office agent reconciling statements. It works for the back-office agent in a regulated shop closing routine requests. The question is the same everywhere. What share of the work finishes without a human, and how good is the work that finishes?

Now the part the pundits skip.

A high containment rate is only safe when you pair it with the rule for when the agent must stop. Containment without a handoff rule is an agent guessing on cases it has no business touching. So you write the rule before you chase the number. One explicit line: when this happens, hand it to a person. Low confidence. A dollar threshold. A regulated decision. Pick the trigger and make it loud.

The boring guardrail is what makes the headline number trustworthy.

This is why most pilots stall. Teams measure them on activity. Tickets touched. Messages sent. Hours “saved” by a formula nobody checks. The pilot runs for a quarter, produces a deck, and dies, because there was never a line that said pass or fail. Switch the metric and the next pilot has a verdict built in. Either containment climbs and the handoff stays clean, or it doesn’t.

Most teams can’t even get here yet. Roughly 20% have scaled a single agent to one full function. “Deployed” hides that. Containment exposes it, because you can’t fake a number that counts only the cases that closed.

On a multi-agent system we run, the metric that moved the room wasn’t how many agents shipped. It was how many cases closed without a human, and the rule for the ones that shouldn’t. The first number told us the system worked. The second told us it was safe.

So here’s the Monday move.

Pick one agent already in production. Just one. Set a target containment rate. Write the single explicit rule for when it must hand off to a person. Then watch that number for two weeks.

Two numbers, really. The share it contains. The share it correctly refuses. Both have to hold.

You’ll learn more in those two weeks than in a year of deployment counts. You’ll see where the agent is strong and where it’s bluffing. You’ll see whether the handoff rule fires when it should or sits idle while bad cases slip through. And you’ll have something no vanity dashboard gives you: a number that tells you the truth on its own.

Businesses don’t buy flash. They buy work that saves time, money, and mistakes, and they buy it because the result is measurable. Containment is that measure for agents.

Stop counting what you launched. Count what closed.

Practical AI in your inbox. Weekly.