It has never been easier for businesses to collect significant amounts of data about their customers, their business, and their operations. And it will be easier again tomorrow, next month, and next year. By some accounts global data storage is expected to reach 162 zettabytes in the next 5 years. That’s enough to store 2 billion years of continuous audio.
Year over year, more and more of this information is captured in centralized, structured enterprise warehouses. In this rapid reconsolidation of the storage market, IDC forecasts that enterprises will soon be the primary owner of this newly created data, capturing over 50% of the data stored by 2025. That’s a 2x increase from 2018, when storage was dominated by entertainment and endpoint device storage.
How did we get here? Five years ago, CIOs and CDOs focused their teams on managing the operational complexity created by an explosion of relational data stores, Hadoop data lakes, time series databases, and more. And while this gave engineering and IT teams more agency over how they generated and captured data, it only served to increase the amount of effort it took for a business to get value from this information.
This convoluted landscape finally incited a backlash against fragmentation and today we are seeing a simpler, more scalable cloud data warehouse model emerge. As more organizations seek to simplify their data stores and pipelines—and to make the information more accessible to the rest of the organization—companies are accelerating their adoption of cloud data warehouses such as Snowflake, Redshift, and BigQuery.
The upshot of this convergence is that it’s now theoretically possible for a business to track changes to almost every critical metric at an increasingly fine granularity. Unfortunately, our newfound capabilities to capture and store hundreds of variables for every business interaction has produced far too many factors for anyone to reasonably check.
For a concrete example, let’s look at a typical online retailer with an annual total revenue of $50M. If you assume an average order value of $80, that’s approximately 625,000 transactions a year, or about 12,000 every week. Not too bad, but now consider all the factors that describe that transaction:
- 10-12 features for basic transaction variables (date, time, day part, success, total $, tax, discounts, etc.)
- 20 more describing customer traits (address, payment type, history, loyalty, gender,
- 10 for order details (SKUs, counts, categories, etc.)
- 10-20 custom variables describing marketing and acquisition (promos, ads, coupons, origination, customer segment, landing pages, time on site, page views, and more)
Without getting creative, that’s almost 60 features per transaction, each with the potential of hundreds of unique values. Put this all together, and you’re quickly looking at a search space of close to a billion possible combinations of data to explain a single KPI like AOV or customer lifetime value.
And that’s just a basic retail scenario. For companies focused on media streaming, app engagement, or ad sales, that list can easily balloon to several hundred features very quickly.
Why Dashboards and BI Tools Aren’t Enough
In most businesses, it’s simply not possible to diagnose what’s happening from day-to-day with a set of static dashboards and reports. These descriptive tools can show a marketing operations team when conversion rates begin to fall off, but because they’re based on a very narrow slice of data, they’re ineffective at explaining the complex reasons why that change occurred.
When pre-built dashboards fail, companies frequently fall back to the tried-and-true method of deploying an expert analytics team to tackle these frequent questions of “why?” Unfortunately, in even the basic retail scenario above, understanding changes to sales, customer acquisition, and operating expenses can take days or weeks for even the best teams to diagnose with current analytics tools. In today’s climate, we don’t have time to wait on results for taking action.
The reason for this is in the complexity of the data. Given a dataset with 5 or 10 key factors to analyze, an expert analyst can reasonably explore the data and compose custom queries to test a few dozen potential hypotheses about the data. Results are presented to the business in a few hours to a few days.
But with the state of enterprise data today, it’s not feasible to expect even a large team of analysts to comprehensively explore millions, let alone billions, of possible hypotheses in the data—particularly when speed and decisiveness are critical to success. Multiplied over the countless questions a dedicated analyst team fields on a weekly basis, you quickly create a scenario where most questions go unanswered, largely due to a deficit of time and attention.
Addressing the Challenge of Attention: Augmenting Diagnosis
Ultimately, this makes the task of operationalizing these diagnostic analyses one of prioritizing the team’s attention. Focusing on what will give you the results you need in order to understand why the metrics are changing will relieve your team from spending countless hours trying to make sense of data. While this sounds easy in theory, there must be proper processes and systems in place to streamline the approach. Otherwise, it can be a needle in a haystack type of situation, limited by time and bandwidth even though the answer is in there.
Overall, business analysts and data teams need to identify what actually matters and offer solutions to their business units based on the changes that are taking place—whether positive or negative. It’s no longer about how much data you can pull in, but are you looking at the data properly, finding solutions to operational challenges, and seeing clearly what’s right in front of you.
By cutting out the distraction and prioritizing your attention on the facts and the metrics that matter, you can not only move faster, but you can avoid making bad decisions.