The Data Analytics Stack: Past, Present, and the AI-Driven Future

Data insights have never been more vital to business operations than they are today. Line-of-business (LOB) teams rely on data insights to drive sustainable growth, lead business operations, and understand cus­tomer behaviors and needs.

The growing demand for data analytics has spurred a flourishing $31 billion-plus industry of established and emerging tech companies—but is the data analytics tech stack working for enterprises? Despite years of investment and effort, research indicates that 95% of businesses still strug­gle with operational challenges around data and analytics, leaving only 5% with the competitive advantage data-driven decision making delivers.

Part of the issue is the deluge of data from software applications. Studies show that the average small business with 500 or less employees has 172 apps. Mid-market com­panies, with between 501 and 2,500 employ­ees, have 255 apps on average. It’s more than twice that for large enterprises, which aver­age 664 apps. As more data accumulates, it’s easy for an organization to become unaware of what data it has, where it is, and how to use it. Another problem is the complexity of the data analytics stack.

A Brief Look Back

To understand where the data stack should go to serve enterprises better, it helps to know where it has been. Data man­agement began in the 1950s and ’60s, when organizations using mainframes needed entire floors to warehouse the punch cards storing their data. The data infrastructure was simple in the early days of mainframe data management; however, due to limited access to specialized data analytics skills and technology, making data-driven deci­sions was challenging.

Business intelligence (BI) and visual­ization tools multiplied, first with OLAP in the 1970s and BI software such as Busi­nessObjects and MicroStrategy in the late 1980s and early 1990s. In the decades that followed, Tableau, Power BI, Looker, and others introduced BI and visualization tools with improved ease of use to democratize data and analytics. Later, as the deluge of data and the need for more management and analysis eclipsed the capabilities of on-prem systems, data management in the cloud became more popular.

Complexity Ensues

During the past decade, so many com­panies have introduced new data tools, causing most enterprises to have spent bil­lions on large, modular data stacks. How­ever, the “stitch-it-together” approach has led to extreme complexity in integrating, managing, and coding for many point solu­tions. Additions to the data stack have also resulted in more data silos and lack of col­laboration among data scientists, analysts, and LOB teams using different platforms and processes.

Collaboration is further complicated because each BI tool has its own lightweight semantic layer containing the business con­text and semantic meanings that let users interact with data using business terms such as “product,” “customer,” or “revenue.” Hav­ing many different, lightweight semantic layers creates an enormous burden for the data team in updating, correcting, editing, and adding to the duplicate business logic dispersed throughout an organization. The impact on the business is even more con­cerning: Users get different answers, and trust in the data erodes due to inconsis­tency in metrics definitions within various semantic layers.

The Need for a Standard Semantic Layer

Companies now want more comprehen­sive platforms to reduce complexity, which has spurred some consolidation in the data stack. For example, cloud data warehouses are down to a few big players: Snowflake, Databricks, Google BigQuery, and Ama­zon Redshift. This is a welcome develop­ment because having data in a single place significantly reduces operational overhead. Cloud data warehouses allow organizations in every industry and size to collect and process more data easily and affordably.

However, the rest of the data stack remains fragmented. Visualization, the top layer, has become a commodity, and it’s unlikely to be federated because so many stakeholders have different needs and opinions about what tools they prefer. The middle layer—the semantic layers, catalogs, governance, security, and quality checks— is still fragmented, but it is poised for rapid consolidation.

The first step toward unification will be widespread adoption of a universal seman­tic layer that defines all the metrics and metadata for all possible data experiences. One standard semantic layer essentially decouples BI from the front end, enabling total data consistency across all data appli­cations, as well as AI agents and chatbots.

Consolidation of semantic layers into one “master” layer is essential to pave the way for AI. AI needs the knowledge from your data warehouse and the context in your universal semantic layer to generate new reasoning and knowledge. The seman­tic layer serves as a single source of truth for AI models to understand standard business context and definitions—and avoid halluci­nations. Because of AI, the semantic layer will become the center of gravity for consol­idation and simplification across the mid­dle of the data stack.

Further Democratizing Data Analytics

A universal semantic layer improves AI outputs and accuracy and vice versa. Once trained, AI can also enhance the data models and definitions within the seman­tic layer by suggesting improvements to the definitions and the code based on usage. AI agents backed by a semantic layer can further curate and democratize the data by allowing business users to conduct natural language queries.

For example, a salesperson in North America could say, “I’m looking for Q1 purchase data from customer A for my sales presentation.” An AI agent would not only generate commentary, i.e., “The APAC team recently gathered that information. Customer A mostly purchased Product B, but in December bought much more of Product A,” but could also deliver a table of purchases, organized by date and grouped by products, as well as a chart showing the number of purchases monthly. All this is done without the salesperson knowing where the information resides or how it is labeled in a BI tool. There is no need to fig­ure out, for example, which of the 10 “pur­chase data” data fields is the right one.

Simplifying the Middle of the Stack

Once the use of a universal semantic layer becomes commonplace, it is logical to consider incorporating more function­ality into the semantic layer, negating the need for multiple point solutions. Because models and metrics are defined and docu­mented within the semantic layer, it makes sense to offer catalog features.

This layer in the data stack also ensures data consistency and accuracy by applying business rules and logic at the data layer, maintaining data integrity, and enforc­ing data governance policies. In addition, the semantic layer already centralizes and enforces data access controls, allowing organizations to control and manage data security and compliance based on user roles and permissions. By incorporating all this functionality into one platform, orga­nizations will be able to simplify their data analytics stacks.

The Next Evolutionary Step

The data stack is far too complex today, with new tools coming on the market every month. Each new tool compounds the work of data teams, not to mention increases the cost of ownership, delaying enterprise data initiatives and fostering data silos. It’s time to simplify and consolidate the disjointed data stack as much as possible, starting with adopting a semantic layer.

Undoubtedly, new data tools will con­tinue to come to market, but thankfully, the universal semantic layer will tame a significant portion of data stack complexity. Only then will business users and decision makers begin to get the insights needed to power day-to-day business processes and decisions and realize the full promise of AI initiatives.