Property 1=dark
Property 1=Default
Property 1=Variant2

The big problem with the way we launch AI

Is the S-curve slowing down? Is there an AI bubble? Are our jobs going to disappear? You’d be forgiven for thinking, reading the news, that AI is on the brink of destroying life as we know it – first through malicious content, then a post-bubble recession, then as AGI.

My personal p(doom) is low. It’s low because my day-to-day work involves speaking with product leaders who are building and shipping real AI products. What I see, consistently, is not recklessness – but teams working hard to optimise their technology in the direction of safety, usefulness, and real customer value.

Over the past year, I’ve spoken with hundreds of AI product managers – from first-movers with names you'd recognize to established technology organisations integrating GenAI into existing products – about their evaluation frameworks for AI validation. Almost without exception, they’re doing the “right” things:

  • Building internal frameworks to assess usefulness, safety, and robustness

  • Using benchmarks and synthetic tests to validate core capabilities

  • Iterating rapidly to unlock new product value and market opportunities

Connect with me at MWC

We're going to be at Mobile World Congress in Barcelona next week, and it would be great to link up if this post describes you. Let's connect!

Set up a meeting

A three headed challenge: speed, uncertainty, and scrutiny

However, while life-as-we-know-it seems assured, the effectiveness, engagement levels and sentiment relating to products entering the market do not. In my research speaking to AI product leaders at all our clients, they spoke at some lengths about their concerns about products going live.

From those conversations, we've identified three challenges, which work together to form a massive question over how to safely release robust AI products. As briefly as I can put them, they are: speed, uncertainty, and exposure. 

Why those three?

  • Speed. Businesses are generally under a lot of pressure to get their products out very quickly. Getting the best product to market the fastest is the name of the game for many of our clients, but that results in an enormous amount of pressure.

  • Uncertainty. Businesses are launching non-deterministic products to a global user base, which makes this an exceptionally difficult environment to assess whether their product will be engaging and useful. This creates an uncomfortable environment to make decisions in. 
  • Exposure and scrutiny. Our clients are launching products which will face enormous scrutiny. Many B2B2C products will face scrutiny from enterprise buyer cycles. Some will face scrutiny from regulators. But above all, products will face scrutiny from users who will have to find them useful and engaging.

     

Why there's a gap in current assurance models

As GenAI moves from experimental features to customer-facing products, the limitations of purely internal evaluation become clearer. Benchmarks, synthetic tests, and internal reviews are invaluable – but they optimise for known scenarios. They tend to answer questions like:

  • Is the model capable?
  • Does it meet internal quality thresholds?
  • Does it behave as expected in controlled conditions?

What they struggle to answer is:

  • How does this behave when used differently than intended?
  • How does tone, trust, or usefulness shift across markets?
  • What happens when real users bring their own context, expectations, and habits?

That’s not a failure of internal teams – it’s a natural limitation of testing from inside the building. Increasingly, the AI leaders I speak to are looking for confidence that extends beyond the lab.

So what's next?

We're excited for next week. Next week, we'll be in MWC speaking to even more fabulous Product Leaders ahead of a big announcement. It would be absolutely amazing to see you there. And in the meantime, watch this space!