Game-Changing Technologies in the Data Environment of 2020

<< back Page 4 of 4

While these tools are widely available—and often can be accessed without IT—“the knowledge to use the tools appropriately is lagging,” Fritchey said. “Sure, you’re going to need to know the mechanics of the tool. However, you have to have an understanding of the operations of the organization and the data within that organization in order to arrive at the right kinds of analysis that will lead to competitive decision making. Developing and building this knowledge is the hard part. You have to appropriately educate the people using the tools in order to get it right.”


Logan Wilt, AI data scientist with DXC Technology, has a different take on things, promoting what he called “data watersheds”—which are “separate but interconnected forms of data storage. The term is a play on data lakes, of course, but it also draws on the imagery of watersheds as containing streams, rivers, lakes, ponds, deltas—all bodies of water that serve different ecological needs.” This varied landscape is still an emerging practice, Wilt stated, noting that “while software architectures have moved toward microservices and atomicity, data architectures are still often monolithic: There is a data lake, a data warehouse, or a database that is set up to be the one word of truth on data.”

Data watersheds, Wilt continued, “embrace data that exists in different forms on different systems. The key is that, although data exists in different forms and systems, it is not managed in isolation. Digital threads, metadata, linking keys, and so forth are deliberately and holistically managed when data is consistent and flexible to the problems it can solve.”

The dilemma is simple: “There’s too much data, and too little time,” related Peter Bailis, CEO of Sisu, and professor at Stanford. “Our ability to collect rich, structured data has outstripped our ability to find useful answers in it. In many ways, the technologies enabling this golden age of data—the cloud data warehouse—is also the root cause of this dilemma. Enterprises are capturing incredibly wide, granular datasets, describing every transaction and customer interaction in minute detail. An everyday retail transaction generates well over 60 features. A media streaming session is tracked with hundreds of variables.”

Unfortunately, Bailis added, “our BI tools were built for a simpler time. Dashboards expect nice, clean datasets with fewer than a dozen columns. Analysts themselves only have time to manually check a handful of hypotheses before they’re off to the next question. As a result, our data teams are reactive and use less and less of the data we capture every day.” 

The key is to be able to “automate the analysis of these vast tables and rapidly test millions of hypotheses to find actionable, relevant business opportunities,” Bailis said. “Instead of requiring an analyst to manually test features or carefully construct complex queries, these platforms start with a KPI and diagnose which populations in the data matter—and then proactively recommend to the analyst where to prioritize their limited attention.” 

<< back Page 4 of 4