Change is happening, and it’s happening so fast that even the most seasoned data managers and professionals can’t keep up. New technologies, along with new approaches to data management, are restructuring—and reimagining—data-related jobs.
Industry leaders are expressing astonishment at the speed in which the data world is changing. “I’m witnessing something that’s going to make everything different,” said Milan Parikh, lead enterprise architect at Cytel. “That sensation of jumping around between five different tools just so that you can even get one basic report—an ETL tool over here, BI dashboard over there, then some AI somewhere in the middle? That’s going to be a thing of the past. New platforms are consolidating everything into one platform. BI, ETL, AI—the entire stack.”
As an example, Parikh made this illustration: “We recently completed a project where we replaced our old configuration with one of these packaged components. It would take my team 3 complete days to extract data, clean it up, duplicate it all around, and produce a weekly report. Now, that same report takes 4 hours. And half of that is simply checking twice because we can’t believe it took that long.”
Selling new developments to data teams is perhaps the greatest challenge, Parikh acknowledged. “I had to get my BI folks to learn data engineering. Got my data engineers to learn AI deployment content. Go small. Choose something that won’t kill you if it blows up. Get your folks prepared for change. And for the love of all that is holy, consider security from Day One.”
Among the emerging technologies and methodologies that are reshaping data scenarios are the ones explored below.
RETRIEVAL-AUGMENTED GENERATION
Retrieval-augmented generation, or RAG, has arisen just in the last 2 years, helping what has been the major Achilles’ heel of AI—accuracy and trust. When it comes to AI, “The starting point must remain ‘distrust and verify,’” said Dan Gaylin, president and CEO of NORC at the University of Chicago, and author of Fact Forward: The Perils of Bad Information and the Promise of a Data-Savvy Society. “Taking shortcuts with data sources and analytics can lead to bad outcomes.”
RAG can help improve the accuracy and reliability of generative AI responses “by having them retrieve information from trusted data sources outside of their training data,” Gaylin explained. “By providing AI systems with access to trusted, specialized knowledge specific to their domains of inquiry, RAG makes an AI less likely to provide bad information.”
Employing a RAG architecture “could allow organizations to leverage their proprietary information while minimizing the risk of erroneous and fake content,” Gaylin said. Still, RAG comes with a number of challenges, starting with the need for reliable source data. Gaylin recommended that data managers “develop criteria for identifying key sources that could receive a seal of approval as being a higher-quality information source.”
As Gaylin also noted, “RAG-enabled systems use more computing resources and are limited by the availability and quality of reliable source data. They also can misinterpret the relationship between the source data and the rest of the vast corpus of information on which the systems were trained.”
Another limitation of RAG “is the human resources involved in developing, activating, and maintaining the RAG processes,” he added. While a promising method for ensuring more trustworthy data in enterprises, “RAG isn’t a silver bullet,” Gaylin opined. “Rather, it’s a promising step toward AI systems that can better support the data integrity and transparency our society desperately needs.”