spot_img

Date:

Share:

PBT Group: Why businesses must rethink their data lakes

While businesses are racing to implement Artificial Intelligence (AI), many overlook a critical factor in success: the quality and structure of the data feeding these models. The reality is that a model is only as good as the data it is trained on. This is according to Julian Thomas, Principal Consultant at PBT Group.

“If your data lake is unmanaged or full of unstructured, incomplete, insufficient, or unreliable data, even the most sophisticated AI will not deliver value,” he emphasises.

Thomas explains that too many organisations treat their data lakes as passive repositories, a place to store everything, rather than a curated resource. This approach undermines governance, hinders usability, and creates downstream issues for data teams tasked with developing AI and machine learning solutions.

“To get AI right, we need to shift the mindset around data lakes. They should be active environments governed by frameworks like the Medallion architecture, which helps teams clean, refine, and enrich data in a structured, layered way.”

PBT Group often uses the Medallion architecture to bring structure to a data lake. It separates data into three layers. Bronze for raw, unfiltered data; Silver for data that has been cleaned and enriched, that is more analytics-friendly; and Gold for the curated, trusted datasets that are fully governed and ready for use in Business Intelligence or machine learning. This progression helps teams work from a consistent base, trace where data comes from and ensure that what is delivered matches the needs of the people using it.

But a layered structure is only part of the solution. The real differentiator, according to Thomas, is data wrangling.

“Data wrangling is not just a technical clean-up. It is a deliberate, skilled process of transforming messy, inconsistent data into something reliable and fit for purpose. That includes everything from deduplication to validation and enrichment.”

This approach is particularly important in industries like financial services, where it is essential to know exactly where your data comes from and how it has been handled. It is also crucial when training AI models, which depend on accurate historical data to perform reliably and fairly.

As part of the wider data wrangling process, Thomas emphasises that it is important to understand the main difference between data wrangling and the process of Extract, Transform and Load (ETL). “Data wrangling can be considered as ‘informal ETL’, done in the context of machine learning for a given initiative. ETL is effectively the same activity, however it is automated for long term use. Once data wrangling has been completed with the resulting training model approved for production implementation, the data wrangling solution must be handed over to a formal engineering team where it can be converted into formal ETL.”

Thomas also cautions against viewing data quality as a once-off project.

“Data governance must be embedded into daily operations. From ingestion to output, quality controls, validation steps, and metadata tracking need to be built into every phase.”

The payoff? A structured data lake combined with rigorous wrangling makes data more accessible and AI-ready. It enables teams to experiment with confidence, deliver faster iterations, and avoid the costly rework that comes from poor input data.

“As AI becomes more integrated into business decisions, the pressure on data teams will only increase. Getting the fundamentals right now, especially how we wrangle and structure our data, will determine who actually succeeds in turning AI into value.”

spot_img
spot_img

━ More like this

As Africa industrialises, grid resilience becomes the next energy challenge

At Africa Energy Forum 2026, Sungrow highlights the growing role of utility-scale energy storage in supporting resilient, flexible power systems across the continent. Cape Town,...

Consistent Customer Experience Does Not Happen by Accident

A recent experience with a client in one of their franchised branches stood out for me for one reason. The service felt remarkably consistent....

We Cannot Ask Twelve Months to Repair Eighteen Years

We are one of South Africa’s top learning and development providers. Learnerships, skills programmes, and workplace readiness training are what we do, and we...

Nearly two-thirds of analysed Docker Hub images contained critical vulnerabilities, Kaspersky research reveals

 An analysis conducted using Kaspersky Container Security has revealed that only 1 out of every 10 Docker Hub images analysed, including those with 10,000...

Chat commerce starts with service, not sales

Businesses are no longer asking whether AI, chat, and conversational commerce will shape customer engagement. They are asking where to start. There is a...
spot_img