Is the S-curve slowing down? Is there an AI bubble? Are our jobs going to disappear? You’d be forgiven for thinking, reading the news, that AI is on the brink of destroying life as we know it – first through malicious content, then a post-bubble recession, then as AGI.
My personal p(doom) is low. It’s low because my day-to-day work involves speaking with product leaders who are building and shipping real AI products. What I see, consistently, is not recklessness – but teams working hard to optimise their technology in the direction of safety, usefulness, and real customer value.
Over the past year, I’ve spoken with hundreds of AI product managers – from first-movers with names you'd recognize to established technology organisations integrating GenAI into existing products – about their evaluation frameworks for AI validation. Almost without exception, they’re doing the “right” things:
Building internal frameworks to assess usefulness, safety, and robustness
Using benchmarks and synthetic tests to validate core capabilities
Iterating rapidly to unlock new product value and market opportunities
However, while life-as-we-know-it seems assured, the effectiveness, engagement levels and sentiment relating to products entering the market do not. In my research speaking to AI product leaders at all our clients, they spoke at some lengths about their concerns about products going live.
From those conversations, we've identified three challenges, which work together to form a massive question over how to safely release robust AI products. As briefly as I can put them, they are: speed, uncertainty, and exposure.
Why those three?Speed. Businesses are generally under a lot of pressure to get their products out very quickly. Getting the best product to market the fastest is the name of the game for many of our clients, but that results in an enormous amount of pressure.
As GenAI moves from experimental features to customer-facing products, the limitations of purely internal evaluation become clearer. Benchmarks, synthetic tests, and internal reviews are invaluable – but they optimise for known scenarios. They tend to answer questions like:
What they struggle to answer is:
That’s not a failure of internal teams – it’s a natural limitation of testing from inside the building. Increasingly, the AI leaders I speak to are looking for confidence that extends beyond the lab.
We're excited for next week. Next week, we'll be in MWC speaking to even more fabulous Product Leaders ahead of a big announcement. It would be absolutely amazing to see you there. And in the meantime, watch this space!