Page 1 of 2 next >>

How Mature Is Your Data? The Answer Is Key to GenAI Success

According to Gartner’s Hype Cycle for Emerging Technologies, generative AI (GenAI) has reached the “Peak of Inflated Expectations.” What does this mean in simple terms? Essentially, right now people’s expectations of what the technology can do doesn’t align with reality. As such, companies can make costly mistakes if they let hype or FOMO lead their GenAI strategy and investment. That said, even as GenAI is generating a lot of hype, it can also generate tremendous value if businesses do it right.

Doing it right starts with data. The models on which GenAI solutions are built are only as good as the data that feeds them. Businesses must get their data landscape in order if they want to drive real business value with GenAI.

When starting out on any GenAI initiative, in addition to defining the desired business outcomes or product features, I’d encourage business leaders to determine where their organizations fall on the path of data maturity.

This is an exercise we conduct with every Caylent customer to ensure the foundational elements are in place for GenAI initiatives to succeed. In the majority of organizations, there will be a mix of maturity at a more granular level, where exceptions may exist in more advanced or more regressive directions.

Phase One: Transactional

The first phase of data maturity has data being used almost exclusively at a transactional level within an organization. As the name suggests, business events and transactions are captured, but ad hoc solutions such as spreadsheets are often used as databases. There tends to be a lack of data governance over the information stored in many disparate sources and living in silos, creating risk and making it difficult for data to yield any real business insight.

To progress beyond this phase, organizations can take several actions, starting with surveying their data consumers, cataloging data assets, and getting some initial quality metrics in place. Great data likely already exists, albeit in a silo or two, but this data can be used to start GenAI pilots. Prioritize the data sources that will be most effective together for the next phases of the journey.

Next, organizations should define their cloud data strategy, design a governed data platform, and begin building the foundational components of their data structure. For the cataloged business processes that are not storing data in a scalable fashion, prioritize enhancements to these data producers based on expected business value, and, if necessary, begin upskilling technical teams with cloud data and AI knowledge for the preferred data platform.

Phase Two: Insightful

In this stage of data maturity, you’ve created an initial foundation, and insights from historical data are used to inform organizational decision making. Here, we begin to see basic forms of data quality and governance. Information may be integrated into a single platform and enriched with third-party data sources. However, data is not available in real time, and any sort of applied machine learning is arduous and cost intensive.

Organizations in this phase should define standards for data producers, such as quality, recency, and availability, and begin to hold producers accountable so that data consumers can subscribe with confidence. They should consider creating a data management function that can own data governance and central engineering responsibilities for core patterns in the data ecosystem, while building initial metrics and notifications can begin to identify data issues in near real time.

From a people perspective, continue to upskill data analysts and data scientists on cloud data and AI concepts. Organizations should also invest in operational accelerators like MLOps and LLMOps to increase development velocity and remove undifferentiated work so that data analysts and data scientists can focus on their specialization. When organizations do take their first few GenAI pilots into production, they should be sure to measure their alignment and outcomes to iteratively improve based on real-world experience.

Another best practice includes publishing a data catalog that makes it easy for business users to identify enterprise data. Also, organizations should invest their engineering efforts toward reducing the time it takes to generate insights through event-driven and streaming data patterns.

On an ongoing basis, project leaders should revisit possible uses of analytical AI and GenAI as new ideas emerge and new data lands on the platform. It’s also important that technical team members are enabled with tools that allow them to run experiments on data and, in turn, derive new insights.

Page 1 of 2 next >>


Subscribe to Big Data Quarterly E-Edition