Big data has unceremoniously ended the era of the “all-purpose database.” The days of sticking uniform data into a single database and running all your business applications off it are gone. Business data today comes in a variety of formats, from countless sources, in huge volumes and at fantastic speeds. Some data is incredibly valuable the instant it arrives, other data is only valuable when combined with large amounts of additional data and analyzed over time.
Data today has a complex lifecycle. Many organizations have learned this the hard way with big data projects built around a single database (undoubtedly positioned by the vendor as providing “unprecedented scalability, performance and flexibility for all your big data needs”) that failed to deliver any meaningful business value. The big lesson learned from these big data trial-and-error exercises is simple: different stages of the data lifecycle require a database purpose-built to capture the value of the data present at that particular stage.
Today, the database universe is cluttered, noisy and exciting, with seemingly countless NoSQL databases, 10 to 15 NewSQLs and multiple layers of Hadoop systems now in existence. All players in the space are eager to talk about their technology, but few seem to understand where they fit in the big data continuum and when someone would want to use their system. This confusion leaves enterprises little choice but to continue the trial and error process.
Big Data Grows Up
The analyst community has done its best to create an organizational ontology of database systems, but has tended to categorize them according to their technological capabilities instead of by real-world use cases. Big data is a complex concept and this approach is becoming less and less useful as big data matures from its hype-laden stage, to its more sober “I’ve got to show business value with this” stage. To get business value from big data, databases need to be categorized according to their appropriateness for specific business purposes, not their raw technical capabilities. This requires a brand-new way of looking at big data itself. Specifically, we need to recognize the “lifecycle” of big data, and understand that as data ages, its value does not diminish, but the nature of that value does begin to change, and so too must the methods used to unlock that value.
Individual Data Items vs. Aggregated Datasets
Over time, the focus of data value shifts from the individual item to the aggregate. When you purchase an iPad, for example, the individual transaction (data item) is highly valuable at that particular point in time – to you, to Apple and to your credit card company. Over the course of a week, Apple may use analytics on a data-set that includes that item and other similar transactions for the purposes of inventory control or regional trending. As time goes on, the value of that individual data item will continue to decline. In six months, there will be almost no value in knowing that you bought your iPad in a particular Apple store on a particular date. However, there will be tremendous value in including that purchase in a broader dataset that gives insight into consumer behavior that the company can use to inform pricing, branding and product design.
The Big Data Lifecycle: From Old to New
To date, enterprises have used analytics to gain insight into historical trends. Business intelligence, reporting, and historical trend analysis are all examples of technologies used to extract value from historical data. Additionally, there has been an increased use of “exploratory analytics” – applications that explore data for increasingly deeper insights. This category of analytics enables users to not just observe trends, but help discover them. This is the realm of the data scientist, where systems based on Hadoop deep statistical analysis and other deep exploratory analytics tools are used to discover these insights.