It’s not an accident that data is the bedrock of the Data, Information, Knowledge, Wisdom (DIKW) pyramid made famous by Russell Ackoff in 1989. That expresses the common conviction that data are the irreducible foundation on which all knowledge is built.
Except that’s not how data and knowledge work. It’s not even that there’s a more fundamental layer beneath data. Rather, the very shape of the pyramid reinforces a paradigm that has outlived its purpose. A pyramid is too stable, too linear, and way too one-way in its direction. It’s an Industrial Age model in which raw materials—data—are refined and transformed into a usable product.
In an age in which data is increasing exponentially, and our new technology, especially machine learning, can’t get enough of it, we need to flip the pyramid. Then we need to thoroughly rethink what data is, the context we want to use it in, and how to get the most value from it.
In the old DIKW pyramid, the line between data and information is actually a warehouse—another remnant of the Industrial Age. Data warehouses have served businesses’ predictable needs since the 1950s when the modern idea of data became prevalent in business.
Under this old definition, data are atomic elements of knowledge, siloed according to the department and application that developed them, and tagged with metadata that reflected how that data was anticipated to be useful. The value extracted from them after they were delivered was likely to be forever lost.
This was sufficient when businesses moved slowly and steadily in a stable and predictable ecosystem—a pace determined in part by the slowness of the data processes.
Now, of course, data has been transformed in just about every dimension. Its quantity has grown exponentially. Thanks to the Internet, its connectedness has grown even faster than that. And machine learning systems have made incalculable the value of data, including data that once would have seemed too inconsequential to even record.
Now businesses want data to help them get deep, dynamic, and even real-time insights into their customers, markets, and supply chains, often via machine learning models. They want to make better predictions both of day-to-day realities and longer-term trends. They want data to help them create better, unique experiences for their customers. And they need fast access to data to support the rapid innovation that will let them thrive in the new hyper-competitive environment. Businesses can no longer afford to wait for the old data mills’ wheels to turn.
Fortunately, they now can move fast. But that requires not just the installation of new software and procedures. It requires a new basic understanding of data.
Data’s New Paradigm
Insight, prediction, and support for rapid innovation are not new objectives for businesses, but what they demand from data was unthinkable even a few years ago. To meet today’s needs:
- The time to construct and deliver datasets has to go from weeks or months to minutes or seconds.
- Data has to be fluidly interoperable across all of the old silos, including application silos.
- It has to be capable of being retrieved in ways literally no one anticipated.
- It has to support the most stringent security and compliance demands in a global and often inconsistent regulatory environment.
So what does the data that supports these requirements look like?
Let’s say a business wants to know how to stock its stores’ shelves for the big holiday sale. Traditionally, the manager gets a report on prior holiday sales, broken down by product category and store location. But so many contextual factors affect sale days. The manager wonders how these factors may have correlated with local ad buys? With changes in median income by region? With changes in leisure time? What the heck, with fuel prices, the weather, how the local sports team did, and all the rest of the small causes that together can have large effects.
The traditional data report discouraged active engagement in such questions. It was a glimpse through the slats of a fence. But now, the manager doesn’t want a report about data so much as a conversation with the data, including being able to ask questions that the data warehouse was not set up to answer. And, of course, the manager wants all this as close to instantly as possible.
Or perhaps this manager has decided that machine learning might provide a way to find hidden statistical relationships that can give finer-grained predictions about which products might be in surprisingly high demand. The machine learning model will need lots of data to analyze, data that cuts across many domains, including application data. Training a model is highly iterative and often involves many requests for data. Any impediments directly slow the pace of innovation, and can in turn seriously diminish a business’ competitiveness.
But more than speed is at issue here. For example, data needs to be able to interoperate with data from any other silo, from applications, from the Internet, from anywhere — not just put into a common format, but to be transformed conditionally to meet an incoming request. The data warehouse begins to look less like a computer RAM and more like a massive computer.
In fact, that’s not just an analogy. At this new architecture’s core is an API that can programmatically retrieve and transform the data, draw inferences from it, and present it in whatever form will be of most use to the person requesting it, whether that’s a pdf, a JSON file designed for the machine learning system, a public-facing website, or an interactive tool for further inquiry and exploration.
This computational power can also ensure that data that’s delivered meets all of the relevant requirements for privacy and security. Because the regulatory environment is so fragmented, complex, and ever-changing, automating compliance through a programmable interface is crucial.
Because this is an API that interfaces between the data and its uses, it can even make the data smarter by enabling feedback loops that let the entire data collection learn from how it has been used. It enables data to speak to data from every other source, and learns from what those data say to one another.
Perhaps most important, this API-centric approach lets us make more of all of our data because it is responsive and real-time so that we are not tied to yesterday’s predictions of what we think will matter to the company today.
All of that is lost in the picture of our information environment as a pyramid. In that old paradigm, data is treated as a passive resource, rather than as an active business driver and agent of change. An organization only gets full value from its data when it embraces it being connected and enhanced, amplified and focused, applied across projects and industries, and getting smarter with every use.