The past 2 years have been defining ones for enterprises seeking to become data-driven. There have been changes wrought by COVID-19, of course, but, even before the pandemic, companies were already on a path to better leverage the data that was streaming in from all corners of their organizations. With this heightened focus, new roles have been emerging for the caretakers of data, including database administrators, data engineers, data analysts, data scientists, and developers.
The question now is: What are the greatest opportunities and trends that will be impacting data environments in the year ahead? To get a sense of where things are heading in 2022, we canvassed industry experts.
EQUITY AS CODE
AI and machine learning are everywhere now, but there is work to be done to overcome obstacles in the way of widespread adoption. At the core of AI resistance—from both business leaders as well as society at large—is the issue of AI-induced bias. The organizations succeeding with AI in the year ahead will be those that find ways to make AI fairer and more accurate. “The problem is that algorithms can absorb and perpetuate racial, gender, ethnic, and other social inequalities,” said Chris Bergh, CEO of DataKitchen. Data leaders and professionals can start addressing this danger “by viewing AI systems from a manufacturing process perspective—treating AI bias as a quality problem,” Bergh advised. The fast-moving nature of AI and machine learning-bound data means continuous changes are required—and, as a result, “a deployed model may drift out of the target range of accuracy.”
The key to keeping models aligned and as free from prejudice as possible is to continuously monitor data and algorithm behavior, as well as other quality issues while in operation, Bergh said, citing an emerging approach called “equity as code.” Tests that check for equity in AI and data are increasingly being built into automated applications that continuously test, deploy, and monitor models, he explained. As part of DevOps, equity as code “provides the approach and methodological tools to impose equity controls on AI algorithms.”
DATA VALUE INDEXES
While various modes of data storage have been around for decades, it’s getting more difficult to adequately store and manage the explosion of data that has taken place in recent years. The coming year will see the adoption of new approaches to data storage tied more directly to its business value. Plus, there is a growing consensus that not all data needs to—nor should—be stored at all. “When considering the value one is getting from big data assets, one should consider a number of factors including the cost of storing, managing, and securing the data; the quality of the data assets themselves; access patterns for that data, as well as the latency requirements and retention policies,” said Emma McGrattan, senior vice president of engineering at Actian. “Once a value can be placed on the data, then a determination can be made as to where to store it, if at all.”