Your AI worked in the pilot. Running it is the hard part.

Most AI initiatives don’t fail when they’re being built. They tend to fail later, once they’re already in use.

That’s not usually how people expect it to play out. A lot of the effort and focus goes into getting something working in the first place, proving the concept, showing value, and getting it into people’s hands. When that happens, it feels like real progress, people start using it and there’s a sense of momentum behind it.

But that early success can be misleading, and the part people don’t always see coming is what happens next.

What you’ve really proven at that point is that it works once, in a relatively controlled environment, often with a limited set of inputs and a clear understanding of how it’s being used. What you haven’t proven yet is whether it can operate reliably as part of the business, day in and day out.

It worked in the pilot. That’s not the same as working in production.

Things start to shift quite quickly once it moves beyond that controlled setting.

In testing or early pilots, you’re usually quite close to it. You understand the inputs, you can sense-check the outputs, and you can step in if something doesn’t look right. But that level of control disappears once it’s in wider use.

People start interacting with it in ways you didn’t anticipate. They ask different questions, use it for slightly different purposes, or rely on it in situations you hadn’t really designed for. At that point, it stops being something you built and starts becoming something the organisation depends on. That’s where a lot of the real-world complexity starts to show up.

AI doesn’t behave like traditional software. The same prompt won’t always produce the same outcome, inputs aren’t always clean or predictable, and the way people use it isn’t always consistent. So something that looked stable in a pilot can start to behave very differently in production, not because it’s broken, but because the environment around it has changed.

One way to think about it is that you’re not just deploying a piece of technology, you’re introducing something that behaves more like a digital employee. And like any employee, it needs to be managed.

It needs feedback, it needs oversight, and it needs some way of understanding whether it’s making good decisions, not just producing plausible outputs. None of that happens automatically once something is live, and it’s often where the gap starts to appear.

You can’t just switch it off and go back

Something else that gets underestimated is how quickly these solutions become embedded.

Once people start relying on them, they stop doing the underlying task themselves. Over time, they lose familiarity with the process, they forget the steps, and in some cases they lose the ability to step back in and do it manually if they need to.

That’s fine while everything is working, but it creates a problem if you ever need to pull the agent back or switch it off. At that point, you’re not just removing a tool, you’re removing a capability that the organisation has already started to depend on, and that’s not something you can rebuild overnight.

This is where the idea of “just turning it off” becomes more complicated than it sounds.

There’s a bit of an irony in that. You do need to be able to turn it off, and in some cases you need that ability immediately, a kind of AI kill switch if something isn’t behaving as expected.

But the moment you do, you create a gap. The work still needs to happen, the decisions still need to be made, and the people who used to do that work may no longer be in a position to pick it back up in the same way.

So you end up in a situation where you’re exposed on both sides. The agent introduces risk while it’s running, but removing it introduces risk as well, and neither option is particularly comfortable.

This isn’t about slowing things down or avoiding AI. The speed organisations are moving at is part of the value.

But moving quickly without thinking about how these things are going to operate over time tends to create fragile solutions. They work well enough to get adopted, but not always well enough to be relied on.

What needs to be true for this to work

So a more useful question than “how quickly can we deploy this?” is what needs to be in place for it to keep working once it becomes part of day-to-day operations.

How do we know it’s making the right decisions?

How do we see when it’s drifting?

How do we step in when something doesn’t look right?

Most organisations don’t have clear answers to those questions yet, and that’s the gap that starts to show up once the pilot is over. You’ve proven it works, but you haven’t yet proven that you can run it.

If any of this feels familiar, it’s probably because you’re already starting to see some of it in practice.

Agents are already finding their way into the business, whether that’s been planned or not. The harder part is making sure they operate reliably once they’re there, and that’s where most organisations are still finding their feet.

We’ve explored this in more detail in our Agent Operations Centre white paper, looking at what it takes to manage, monitor, and continuously improve agent use at scale.

It's not about building agents faster. It's about being able to run them properly.

Explore the Agent Operations Centre white paper

Learn more

Shannon Moir | Director of AI

Shannon leads Fusion5’s AI capability, helping organisations understand and prepare for the impact of artificial intelligence across their people, processes and technology.

Your AI worked in the pilot. Running it is the hard part.

It worked in the pilot. That’s not the same as working in production.

You can’t just switch it off and go back

What needs to be true for this to work

It's not about building agents faster. It's about being able to run them properly.

Read next