Podcast: How to ensure data quality for AI


In this podcast we talk with Cody David, solutions architect with Syniti, which is part of Capgemini, about the importance of ensuring data quality for artificial intelligence (AI) workloads.

Being able to trust AI is core to its use, he says. And here we need to be sure that its outcomes are reliable. That’s only going to be the case if AI is trained on a dataset that’s not full of duplicates and incomplete data.

Meanwhile, AI, claims David, can be used to help with data quality, such as by finding issues in datasets that can lead to erroneous outcomes.

The big takeaway is that organisations need a “data-first” attitude so that AI can do its work and produce reliable results that can be trusted, and he outlines the quick wins that can be gained.

Antony Adshead: What are the key challenges in data quality in the enterprise for AI use cases?

Cody David: One of the biggest challenges in data quality for AI that I see is trust.

Many people view an AI system as a single black box. When it produces an incorrect insight or action, they call it an AI mistake and they lose confidence. Sometimes permanently, they might lose that confidence.

The real issue, however, often lies in poor data quality. This is compounded by the lack of understanding of how the AI solutions truly work.

Consider a sales organisation. They have a CRM and it has duplicate customer records. And an AI solution ranks your top customers incorrectly because it’s not rolling up all of the transactions to one account.

So, the sales team blames the AI tool, never realising that the root cause is actually poor or inconsistent data. This is an example of what we call data quality for AI; ensuring that data is accurate and ready for those AI-driven processes.

On the flip side, there’s also AI for data quality, where an AI solution can actually help detect and merge those duplicate records that we just gave in that example. I think one more challenge is that data quality has historically been an afterthought. Organisations often jump into AI without this data-first mentality and before ensuring they have that solid data foundation.

So, you have these legacy systems, these legacy ERP systems with thousands of tables and decades of compounding data issues.

That all adds to this complexity. And that’s why it’s crucial to address data quality issues proactively rather than trying to retrofit solutions after those AI initiatives fail. We’ve got to put that data up-front and centre of these AI initiatives and then establish that stable solution that’s going to support those trustworthy AI outputs.

What are the key steps that an organisation can take to ensure data quality for AI?

David: I think a systematic approach always begins with data governance.

And that’s really the policies for how data is collected, stored, cleansed, shared, and finding out who is the true owner of particular business processes or datasets. It’s crucial to figure out who’s responsible for those standards.

I think that next, you want to prioritise. Rather than trying to fix everything at once, focus on those areas that deliver the biggest business impact. That’s a very key phrase there: what’s the biggest business impact of what you’re trying to fix as far as data quality? And figure out the ones that feed your AI solutions.

This is where you’re going to see those quick wins. Now, there are going to be budget concerns that often arise when you start talking about these data quality, data governance programmes. And ironically, it’s more expensive to work with bad data over the long run.

I think a practical solution is to start small. Pick a critical business process with measurable financial impacts. Use that as a pilot to demonstrate those real savings in ROI.

And once you show those data quality improvements lead to tangible benefits, like cost reductions or higher working capital, you will have a stronger case with the management for a wider data governance investment. You should also embed those data quality practices in data workflow. For example, integrate validation rules into your data management so errors can be caught immediately, preventing that data from impacting those solutions.

If you can’t put in validations like that upon your data creation, you’ve got to put the systems and processes into place to catch those immediately through automated reporting.

Lastly, I would say always focus on that continual improvement. Measure data quality metrics and use them to drive iterative refinements by weaving that data governance into your organisation, proving its value through those targeted pilots, and then you create that sustainable foundation for the trustworthy AI initiatives.

Finally, I wondered if you could give an example of one or two quick wins that enterprises can get in terms of data quality and improving data qualities for AI?

David: There are a few different examples of where we try to get quick wins for data quality, especially when trying to get very quick ROIs and high-impact business processes.

If you take an ERP system, we have what we call MRO materials. Those are ones that are parts to equipment in a manufacturing process. And when you have those materials, you usually keep a safety stock or an amount of those items that would allow you to repair those machines.

If a plant goes down, you’re going to potentially lose millions of dollars a day. And if you have duplicate materials, as an example, you’re actually storing more than you need. And that’s actually working capital that, if you were to correct that data quality, you free up that working capital.

And then, of course, you can use that working capital for other parts of your initiatives.

Another one would be maybe vendor discounts. If you have vendors that are duplicated in a system, and they are entitled to rebates based upon the amount of money they’re spending, they’re not going to realise those particular rebates. That could be an area where you could have cost savings as well.



Source link