Why reliable data is essential for trustworthy AI

May 28, 2024 7 min read

Table of Contents

Quality of data
Loss of trust
Lineage of data
Building confidence

After decades where artificial intelligence (AI) was largely confined to research projects, niche applications or even science fiction, it’s now a mainstream business tool.

Driven by applications such as Google’s Bard (now Gemini), Mistral and ChatGPT especially, generative AI (GenAI) is already impacting the workplace.

Industry analyst Gartner, for example, predicts 95% of workers will routinely use GenAI to complete their day-to-day tasks by 2026.

At the same time, more organisations are using GenAI to power “chatbots” and other services that let the public to interact with technology in a more natural way. Large language models (LLMs) allow computers to communicate with users in something resembling human speech, and the models themselves can trawl the vast resources of the internet to find answers to even the most obscure questions. And that’s where the problems can lie.

Unsurprisingly, AI, with its risks and benefits, was a key focus of both Gartner’s Data and Analytics Summit and the 2024 Tech.EU summit, both in London.

GenAI tools stand accused of creating biased results, or even results that are entirely untrue. These hallucinations have led to businesses having to compensate customers, as well as reputational damage.

“Governance is even more critical when delivering AI-infused data products,” Gartner’s Alys Woodward told the firm’s Data and Analytics Summit. “With AI, unintended consequences can emerge rapidly. We’ve already seen some examples of successful implementations of GenAI. These organisations deploy the technology with appropriate guardrails and targeted use cases, but we never know when our AI-infused data products will lead us into trouble.”

Firms are already being held liable by regulators and courts for decisions made using AI. The European Union’s (EU’s) AI Act, which starts to come into force from June, will create new obligations as well as impose new penalties. Fines for the most serious breaches of the law will be as high as 7% of global turnover, more than for breaches of the GDPR.

But if the AI Act is a wake-up call for organisations to be more careful and transparent about their use of AI, it will also prompt them to look more closely at how AI models form the conclusions they do.

This, in turn, relies on the quality of data, both for training models and during the inference – or operational – phase of AI. The current large language models rely primarily on public data, gathered from the internet. And, although there are moves afoot to allow firms to use their own data for training as well as inference [oracle], the actual algorithms used by the AI models themselves remain opaque.

This “black box” approach by AI suppliers has led to concerns about bias and potential discrimination, both when dealing with customers but also in areas such as recruitment. Organisations will also have concerns about whether their proprietary data is being used to train models – the main AI suppliers say they no longer do this – privacy concerns around the use of sensitive information, and whether data, including prompts, could leak out of AI tools.

“When organisations start deploying AI capabilities, the questions of trust, risk and compliance become very important,” said Nader Henein, a vice-president analyst at Gartner specialising in privacy.

However, he added that organisations are increasingly exposed to risks via AI tools they bring in from outside.

These include specific AI tools, such as Gemini or ChatGPT, but also AI functionality built into other applications, from desktop tools and browsers to enterprise packages. “Almost everyone out there is using one or more SaaS [software-as-a-service] tool, and many 1716913077 have AI-enabled capabilities within them,” he said. “The AI Act is pointing to that and saying you need to understand, quantify and own that risk.”