Blog

Data Management in the Modern Data Stack and age of AI: A Call to Action

04 Jul, 2025
Xebia Background Header Wave

Introduction

We’re seeing a surge of interest in Data Management from companies of all sizes. But this interest is often born out of struggle. They’re wrestling with fundamental questions: Can we trust this data? What’s the single source of truth? What does this data actually mean? How are these metrics calculated? Why are our pipelines and dashboards constantly breaking?

My belief is that the shift of the last decade – the move from ETL to ELT and from on-prem relational databases to the Modern Data Stack – has led to a neglect of Data Management.

We’ve actually gotten worse at it, despite the increasing complexity and volume of data.

And with the rise of AI, the stakes are even higher. Garbage in, garbage out has never been more relevant. The promise of AI hinges on the quality, trustworthiness, and understanding of the underlying data. Neglecting Data Management now is like building a skyscraper on a shaky foundation – it might look impressive at first, but it’s ultimately unsustainable.

This blog post will explore the consequences of this paradigm shift on Data Management, its impact on organizations, and propose a framework to help reclaim control and establish Data Management as a critical function in the modern data landscape.

Back in the Days

Before the rise of the “Modern Data Stack,” the landscape was dominated by Relational Databases, primarily designed to provide key figures and trends. Think: customer counts, product sales, and margin evolution. A central IT team was responsible for this Relational Database product, typically procured from an relational database vendor and deployed on-premise. This was the (enterprise) Data Warehouse, the primary source for reporting and dashboarding.

This setup had its limitations. Data users faced:

  1. Limited data type support: Lack of support for images, audio, and JSON severely restricted the use of new data sources and insights.
  2. Slow innovation: Difficulty adopting new technologies like streaming / realtime data sources or usage of APIs.
  3. Painful upgrades: Updating or upgrading the on-premise technology stack was a multi-week, or even month-long, project.

But this approach also offered significant benefits:

  1. Clearly defined schemas: Target tables were well-defined and structured, based on logical and physical data models.
  2. Source-to-target mapping: Collaboration was enforced with source system engineers and domain experts to agree on data usage, quality expectations, and delivery schedules.
  3. ETL tooling: Tools like Informatica PowerCenter or Ab Initio provided features like data lineage, data validation, business logic libraries, and data cataloging.

Let’s be clear: I’m not advocating for a return to the Relational Database era. The shift from ETL to ELT, schema-on-write to schema-on-read, and on-prem to cloud brought unprecedented freedom, speed, and technological advancements. However, we lost structure, oversight, and control. In the “Modern Data Stack,” full data lineage is often missing, data validation is absent, and documentation is often outdated. We’ve also abandoned practices vital to Data Management: data modeling, agreements with source systems, and a unified information layer. Instead, we see custom ETL jobs for every dashboard, hurting reusability and consistency.

This contributes directly to the issues many data teams face today: data trustworthiness concerns, lack of data ownership, a heavy data-ops burden, and difficulty in understanding data meaning and origin.


(Re)instate Data Management Practices

It’s time to redefine Data Management for this new data reality.

To overcome these challenges, we need to strategically incorporate Data Management practices into our current way of working and technology stacks.

Data Management Framework

The best place to begin is by defining your Data Management framework. Data Management is a broad topic with dozens of sub-categories. However, you should select subcategories based on your main challenges and the opportunities to gain the most benefit.

These subcategories can be organizational (data strategy, processes, operating model), technical (metadata store, data quality engine, data lineage, automated governance workflows) or knowledge management focused (capturing knowledge, documenting output, creating data models).

DM framework

Example DM framework, categories should be fit to your organization

After defining these subcategories, align with your key stakeholders. Be clear about your intended purpose and objectives. Topics like metadata, data lineage, and data quality are open to interpretation. Document your framework in a guideline, shareable throughout the organization. This is also a great way to get new team members up to speed and teach them how they are expected to work with data. Also, do a “road tour” throughout your organization, presenting the framework and explaining how it will help achieve strategic objectives and address current challenges. Align closely with the team developing and implementing your data technology stack, as the components of the framework will have to be embedded in that stack.


Road Map

Implementing Data Management takes time. Your teams are already busy, so prioritize which framework categories to address first. Create a road map, starting with what’s most critical to your organization.

Furthermore, consider taking a use case driven approach. Find the use cases that are of strategic importance to your organization, convince the sponsors and key stakeholders of that use case to adopt proper data management practices and work with the delivery team to make the use case a success. This will create buy-in and show value, needed to expand the effort more widely.

I believe some areas are most important to start with:

  1. Metadata: Metadata is the foundation of Data Management. Capture metadata for all data. Semantic metadata includes definitions and explanations. Security metadata classifies data and indicates sensitivity. Consider data ownership and usage statistics. This enables a data-driven approach to Data Management. For AI specifically, rich and accurate metadata is critical for model explainability, bias detection, and ensuring responsible AI development.
  2. Organizational Model: Often, central or decentralized data teams shoulder all ownership and responsibility. They struggle to find context and people who can explain how data was created. Also, without proper ownership, it’s hard to make decisions about priorities and capacity allocation. Accountability for data assets should be on the business side, with people that have a mandate to make decisions, not solely with the data team.
  3. Data Contracts: A data contract is an agreement between data producers and consumers. It ranges from a simple template-based agreement to an enforced usage policy. Even in its simplest form, it forces communication and awareness of expectations. In more advanced forms, it enhances data security and provides an overview of data access. This ensures that the data conforms to expectations.
  4. Data Portfolio Management: Data teams are often overwhelmed by requests from various business units and domains. Simply decentralizing and giving each domain a data team is not a magic bullet, as this can lead to duplication and inconsistencies. A well-organized portfolio of use cases, prioritized based on business value, is crucial to bring focus and maximize impact. This ensures that efforts are directed towards the most impactful projects and prevents redundant development. Those are also the projects where proper data management practices are the most impactful and important. A well-managed data portfolio ensures that AI initiatives are aligned with strategic goals and avoids the pitfall of building AI solutions on top of poorly managed or duplicated data.

With these areas in order, you can expand to other areas, based on your organization’s challenges and opportunities. Such as:

  1. If data quality is a major issue, address that next. Data quality capabilities allow you to continuously monitor data and receive early warnings. It also builds trust by demonstrating the quality of data used for critical use cases.
  2. Adopt data modelling practices, focussing on the data entities that are most crucial to your business model and context. Model these out well and ensure to make reusable data assets out of those entities.
  3. You might also consider data lineage. Many tools now offer column-level lineage, but it’s often scattered across tools. Providing data lineage across the entire chain will give you explainability, about the origin of data, the source of data issues, the impact of changes etc.
  4. Another option is reference data, allowing standardized lists of values. Managing reference data reduces variety and increases recognizability.

I could continue listing areas to add, but the core message is that metadata, your organizational model (especially data ownership), data portfolio management and data contracts are the typical starting points. With this foundation, expand with data quality, reference data, or data lineage.

And off course you wonder, but how do I actually implement this?

For many of these aspects we have successfully implemented solutions with our clients. In follow-up blogs we will dive deeper into specific aspects of implementation, and the tools and technologies we’ve used to achieve that.

Conclusion

We started this exploration highlighting the critical void in structure and knowledge that emerged with the modern data stack. This framework isn’t about abandoning the progress we’ve made; it’s about enhancing it. And now, with the pervasive influence of AI, it’s not just about better dashboards or more efficient pipelines – it’s about ensuring the responsible and effective application of AI.

The time to act is now. Let’s reclaim control, build a more robust data future, and finally realize the promise of data-driven and AI-powered success!

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts