Strategically measuring AI's true impact in the enterprise
By Shayan Mohanty, Chief AI and Data Officer, Thoughtworks
In today's technological landscape, organisations are grappling with a fundamental question: how do we truly measure the success of our AI investments? Leaders understand that defining and evaluating AI's impact isn't static, but dynamic, demanding continuous adaptation and strategic foresight.
At a strategic level, AI success can be categorised across several key dimensions:
- Performance and accuracy – ensuring models consistently meet defined benchmarks like precision and recall;
- Adaptability to real-world scenarios – assessing how AI systems perform in dynamic, unpredictable environments;
- Ethical behaviour and bias mitigation – proactively addressing potential biases to uphold fairness and inclusivity; and critically,
- Business impact – quantifying the measurable return on investment (i.e., cost reduction, revenue generation, or improved customer satisfaction).
Bridging the revenue gap: proof-of-concept (PoC) to production
Source: Thoughworks.
Mohanty.
One pressing issue AI faces across the industry is the chasm between promising AI PoCs and their successful deployment in production environments. While much of the enthusiasm around agentic AI stems from its potential to cut costs by automating repeatable problems, enterprises keen on leveraging it should focus on processes governed by well-defined workflows – what engineers often refer to as 'state machines.'
There is also a noticeable lack of conviction that most PoCs are ready for production because they might be risky in various ways. Given that these risks are, at the moment, very difficult to quantify, this holding pattern significantly impedes the realisation of ROI, leaving billions in investment unrealised across the AI ecosystem – from chip manufacturers to vertical AI companies.
Unpacking the 'black box'
One of the most persistent hurdles in AI adoption is the perceived 'black box' nature of complex models. There is a growing imperative for techniques like input-sensitivity testing and neural activation analysis. These are proving invaluable in dissecting how AI systems reach conclusions, enabling the detection of inconsistencies or 'hallucinations' before they impact production.
While standardised benchmarks offer a baseline, deep dives through 'evals' (evaluations) that provide granular, context-specific insights into practical performance, such as handling ambiguous customer queries should become the new norm.
Sovereign AI and regional nuances
The global nature of business means AI systems must contend with diverse linguistic, cultural, and behavioural patterns. This is particularly salient in regions like Southeast Asia, where models primarily trained on English or Chinese datasets often struggle with regional dialects and localised use cases. This challenge has driven the emergence of sovereign AI initiatives, where nations prioritise building independent AI capabilities tailored to their specific needs.
Through a partnership with AI Singapore and Thoughtworks, projects like SEA-LION, a large language model (LLM) focused on Southeast Asia, underscore the strategic importance of localisation. Such models, trained on the region's 11 official languages and nuances, not only mitigate inherent biases but also ensure AI solutions deliver profoundly more relevant insights in regional markets.
Cultivating trust and continuous evolution
Cultivating trust in AI is paramount, especially given the trust deficit highlighted by the 2025 Edelman Trust Barometer. When enabling AI agents to act across systems, you have to trust them, bringing privacy and security to the forefront. This can be done by stress-testing models against real-world data, implementing fairness metrics, multifaceted training data, and rigorous ethical guidelines, as transparency is inextricably linked to trust.
Unlike traditional IT, AI systems are dynamic; agentic AI is not a 'set it and forget it' system. Successful adoption demands an ongoing loop of monitoring, testing, and recalibration, evolving all 'appendages' around the agent as tasks change.
To address concerns about AI's impact on jobs requires frankness about its capabilities and contributions; ultimately, a lower bar for AI knowledge will enable broader adoption, abstracting core concepts much like web development frameworks. This paves the way for advanced agentic AI – a powerful and reliable driver of sustainable growth and competitive advantage for businesses.
Imperatives for AI success
For organisations looking to truly harness AI's power, consider these critical strategies imperative:
- Define clear success metrics with measurable outcomes.
- Harmonise standard benchmarks with contextual evaluations to ensure real-world relevance, being wary of over-reliance on easily 'gamed' benchmarks
- Demystify AI decisions through input-sensitivity tests to build transparency and trust
- Conduct proactive bias and reliability stress-testing for fairness.
- Embrace continuous adjustment and smart guardrails by building robust feedback loops and exploring deeper architectural integrations for holistic remediation, moving beyond ad-hoc solutions.
Comments
Post a Comment