Reimagining Data Management for the Real-Time, AI-Powered Future

Apr 11, 2024

By Joe McKendrick

Page 1 of 2 next >>

Data management needs AI and machine learning (ML), and, just as important, AI/ML needs data management. As of now, the two are connected, with the path to successful AI “intrinsically linked to modern data management practices,” said Dan Soceanu, senior product manager for AI and data management at SAS. Blazing this path requires “prioritizing data quality, accessibility, and governance.”

What are the modern data management practices needed to help organizations build and move forward with AI? They stem from agile practices to governance, architecture, and rethinking the tenets of data management.

Enterprises today are at various stages of readiness to support the demands of AI. “While some have made significant strides in integrating AI into their operations, many still deal with challenges that hinder their full adoption and utilization of AI technologies,” said Fawaz Ghali, principal data science architect at Hazelcast.

The challenge is that companies “are not looking at AI and data the right way,” said David De Cremer, dean of the D’Amore-McKim School of Business at Northeastern University and author of the forthcoming book The AI-Savvy Leader: 9 Ways to Take Back Control and Make AI Work. “Everybody is talking about how to get AI to make better decisions with better data, but this is missing the point. This perspective means we look for ‘gaps’ in data, lock data down, or try to identify bias in datasets. This is an old-world take on an entirely new class of capability.”

Adoption of data-driven AI varies by industry. Companies “with a strong tradition of innovation, like tech and telecoms firms, are actively preparing for AI through agile and scalable data architectures,” said Ted Lango, SVP at Intradiem. “But others that typically use complex legacy systems, like insurance companies, are moving more slowly. A foundational shift toward data architecture that can handle the complex data types used by AI will be necessary.”

Ultimately, De Cremer said, the question that needs to be asked is: “What do we want AI to be? Then, we look at how data will be a determinant of this outcome.”

MAKING AI ENTERPRISE-READY

Enterprises today are not ready to support the demands of AI, concur the industry watchers who were interviewed for this article. “The corporate world is not prepared, in terms of knowledge and expertise, to understand AI in the right ways,” said De Cremer. “For example, how to use AI in an augmenting, rather than automating, way. Or they underestimate the amount of energy or electricity needed to create powerful applications.”

In addition, organizations seeking AI solutions risk overspending on ineffective technology. De Cremer urged focusing on the “ROI of data collection, processing, and provision together with the cost of feeding it into AI. Firms that have an end-to-end understanding will deploy resources more effectively rather than blowing money on expensive cloud computing services just because it’s cool.”

This calls for a “cultural shift toward embracing AI as a strategic imperative rather than just another technological tool,” said Ghali. “Enterprises need to invest in building AI capabilities across their workforce. This includes hiring data scientists, machine learning engineers, and AI specialists, as well as upskilling existing employees to understand AI concepts and technologies.”

Large language models (LLMs), for instance, “are a game changer for AI, so it makes sense to backstop them with a data architecture capable of improving accuracy and speeding up results,” said Ghali. “Many are failing to realize this nirvana.”

Another crucial piece of AI readiness is establishing a framework for data governance and ensuring data quality, said Sharad Varshney, founder and CEO of OvalEdge. “AI consumes vast volumes of data. The more you feed it, the better it becomes. To that end, organizations must be able to find, access, and share data securely from every corner of the organization. Data governance helps make data accessible, while a data catalog enables enterprises to centralize all of their metadata, streamlining the data discovery process.”

This governance needs to extend to ensuring data privacy and data security. “It’s critical for customers to be informed of how businesses manage their data, establishing trust that their information is secure,” said Rich Sonnenblick, chief data scientist at Planview. This also means increasing AI literacy among employees.

Jeff Foster, director of technology and innovation at Redgate Software, offered up these questions that need to be addressed at the beginning of the process to assure governance in data-driven AI: “Where is the data? Whose data is it? What purpose are you using it for? AI adds another complexity on top—who owns the output? How is it fact-checked? Is the use of AI ethically acceptable?”

The key to successful AI implementations is a mix of “specialized expertise and significant amounts of contextualized data,” said Andrew Sellers, head of technology strategy at Confluent. “Most enterprises will need to invest in both of these requirements. The data engineering challenges of AI remain since LLMs are of limited utility and lack domain-specific information. Enterprises will need to invest in real-time technologies that make siloed operational data accessible.”

MAKING DATABASES AI-READY

The process of preparing databases and data environments for the AI era requires heightened degrees of collaboration. “First, spend time with your tech teams to understand what data must be extracted to properly fuel the AI and machine learning tools in action,” recommended Brian Lanehart, president, CTO, chief risk officer, and co-founder at Momnt. “Then, working with database managers or providers, ensure the right algorithms are in place to power the data.”

This planning is critical, as data itself—not the enabling technology—is today’s competitive weapon. “In an environment where organizations often use similar AI tools, proprietary data is the only distinguishing competitive advantage,” said Ansh Kanwar, EVP of technology, product, and strategy at Reltio. “This underscores the necessity for data to be trustworthy, reliable, and accessible to all users in near real time, ensuring that enterprises can fully leverage their unique data resources to stay ahead in the market.”

Enterprises “collecting vast amounts of data without properly validating that data for completeness, correctness, and with thorough documentation and context are setting themselves up for failure,” Sonnenblick continued. “Organizations may need to develop sophisticated processing pipelines to aggregate disparate data sources, remove spurious or incomplete records, and ensure data is scrubbed of intellec information before using data to generate insights or train downstream models and neural networks.”

This means rethinking the roles of databases and data environments as something more than simply a means to store and retrieve data. “AI applications such as large language models rely extensively on the idea of semantic pattern matching—finding similarity between different data points to make decisions,” said Venkat Margapuri, assistant professor of computer science at Villanova University. “Traditional databases don’t provide the ability to perform pattern matching.”

There are many choices in the data market, taking organizations in many directions. “Not all databases are created equal, so some are more equipped to run AI-powered tools than others,” said Lanehart. “While many databases today can handle the typical clerical asks of users, more sophisticated algorithms are needed to ensure that the tool can support AI efforts.”

Vector databases and graph databases are well-positioned to handle AI requirements, Margapuri said. “Vectorization converts text or images into numbers in a high-dimensional space,” he explained. “For instance, an image of a cat might be represented using a predetermined number of datapoints where each datapoint represents a region or pixel on the image. On a different note, the data can also be represented as a graph containing nodes and edges. The nodes represent an entity such as an object or person; edges represent the relationship between the nodes.”

There are distinctions between use cases for vector and graph databases. Applications “that deal primarily with text processing and image processing might benefit from vector databases, whereas applications such as recommender systems that match different entities to observe patterns might benefit from graph databases,” Margapuri explained.

Still, today’s relational databases may be suitable for many AI use cases, others believe. “Today’s most robust NoSQL databases are ideally suited to process and store the many data types, from text to image to video, that AI models use to create contextually accurate outputs,” said Varshney.

The key is to instill greater specialization in how data is organized and queried, said Sellers. “Many database technologies have been optimized for AI needs with features like high throughput query resolution and vector-based data models. Many of the most prolific data persistence technologies, such as Postgres, MySQL, MongoDB, and Elastic, incorporate data structures and secondary indexing capabilities that can empower enterprises to take advantage of generative AI.”

Another question that will arise is whether to blame back-end databases or LLMs for issues or inconsistencies that occur. “The databases are all fine in the sense they suffer known issues anyway,” said De Cremer. “AI will be much less fussy at ingestion than humans, but then, we won’t be able to distinguish between algorithm- and data-based errors. For example: Does that floating hand appear in the GenAI picture because of bad data or a bad LLM algorithm?”

While AI and machine learning technologies “bring exciting opportunities, they cannot solve the data problems that continue to stymie many companies,” said Kanwar. “As a result, digital transformation initiatives are stuck in reverse, costing companies valuable time and resources.”

The key is having a “normalized database built upon a modern technology stack, scrubbed and cataloged into well-documented data products,” said Sonnenblick. This “ensures an organization is pulling the highest-quality data to efficiently inform AI algorithms.”

Another risk to harnessing data assets to move forward with AI is data fragmentation—a result of rapidly expanding data environments. “As data volumes grow exponentially, companies are facing greater fragmentation and a dramatic increase in silos, hindering their ability to become truly data-driven,” Kanwar cautioned.

However, data-powered AI “is often hindered by the prevalence of unstructured, inaccessible data across the enterprise,” Soceanu agreed. “The reality for most organizations today is an overwhelming accumulation of data across various sources—on-premises, cloud, data lakes—that remains largely untapped due to disorganization and fragmentation.”

Such fragmentation “complicates data access and analysis, posing significant challenges in achieving a unified view of customer information, operational insights, and strategic intelligence,” said Kanwar. “Without trusted, accurate data, AI/ML models are useless, or worse, as bad, incomplete, or biased data fed into AI poses an enormous risk for companies.”

Page 1 of 2 next >>

Newsletters

Reimagining Data Management for the Real-Time, AI-Powered Future

MAKING AI ENTERPRISE-READY

MAKING DATABASES AI-READY

White Papers

Sponsors