Democratizing Data Starts With Establishing a Data-Centric Culture

Aug 11, 2022

By Navin Sharma

Thanks to significant innovation and recent disruption in the data management space, organizations are achieving unprecedented value from both internal and external data. Much emphasis has been placed on the importance of achieving digital transformation, yet thanks to these rich data assets, organizations are finding new ways to serve customers across both digital and physical boundaries.

By connecting all relevant data, savvy organizations are creating a comprehensive and accurate picture of these customers in the context of their use cases. Not only does this form the basis of an organization’s data-centric culture, but it also democratizes data across business functions and operations, generates faster and deeper insights, and creates actionable intelligence when and where the business needs it.

Creating this culture and establishing a universal, data-driven approach is often what determines if a company becomes an innovator or a laggard. In fact, a recent Accenture study, found that nearly all leaders (77%) are using data across their organization to reimagine their product and services, 70% are using it to realize definitive operational excellence results and optimize customer experience, and 60% are using it to reduce internal costs by creating a justifiable business case. So, how are leaders exploiting their data to the fullest?

To quote the Accenture study above, they started by collecting, storing, analyzing, and reconfiguring massive amounts of new data, which amplify opportunities and help give them critical insights to deliver new business value. The problem is, as Forrester points out, between 60%–73% of all data within an enterprise goes unused for analytics.

Data Lakehouses: The First Step Forward to Attaining a Data-Centric Organization

Unlike antiquated data warehouse and data lakes, data lakehouses help companies co-locate data using cost-effective methods for storage and innovative approaches that enable them to operate data at the computational layer so they can leverage the benefits of AI. Companies like Databricks are creating a data lakehouse framework that adopts standardized system designs (relying on open standards such as Delta Lake, Hudi, and Iceberg) that blend the benefits of a data lake with the structures and data management features of the data warehouse to unify an organization’s data, analytics, and AI.

Data science and analytic leaders continue to make great strides wrangling data from within and outside their organization as these data lakehouses have diminished the need to support costly and rigid ETL pipelines to structured and costly on-prem data warehouses. The issue is, while access to the data is now possible, it is still democratized and made available to users and to non-technical users who need to self-serve the data and use it to derive rapid insights. Hooking up BI tools directly to the data lakehouse gets users closer to the last mile, but it usually comes with a cost in terms of reducing latency, supporting collaboration, and creating easy-to-understand vocabulary that connects data across domains to encourage reuse and context. Enabling self-serve through data exploration and enriching analytics by inferring new insights also tend to suffer.

How a Semantic Layer Turns Complex Data Into Familiar Business Terms

The idea of a semantic layer is not new. Defined as a business representation of data, a semantic layer enables end users to easily access data using common, business-friendly terms and maps multifaceted data relationships into understandable business terms that deliver a unified, consolidated view of data across the organization.

The concept has been around for more than three decades, and is an approach that BI vendors have promoting by helping companies build purpose-built dashboards. But consumption has been hindered, given that the layer was typically embedded as part of a proprietary BI system.

The resulting rigidness and complexity create the same limitations as a physical relational database system which modeled data to its structured query language rather than how data is really needed and used, which is many-to-many.

As a result, organizations are adopting knowledge graph-powered semantic data layers that sit between the storage and consumption layers to remove last mile latency, collaboration, and semantic- language search challenges. This is important because despite an organization’s best efforts, it’s likely it will still silo data found in multi-cloud apps, require just-in-time data and insight, communicate and share business concepts not technical metadata, and understand/ leverage connections across their data ecosystem to provide complete and accurate understanding.

Beginning with a semantic model that represents a canonical (logical) view and depicts an interrelated business concept that provides the foundation of good data uniformity, a basis for data story-telling and explainable AI and decouples the location and complexity of the underlying data structures of the various sources of available data, end users can ask questions based on business concepts and the interrelationships between them via the semantic layer. In doing so, organizations can map to the underlying metadata (tables, views, attributes) and help generate data sharing across applications through the creation of metadata-informed data pipelines.

Crossing the Last Mile

A knowledge graph-powered semantic data layer is proving to be not only pivotal but also powerful in enabling rapid innovation. The ability to connect relevant data across functional domains for richer insights enables a data-sharing culture that promotes findability, accessibility, interoperability, and reusability.

Better yet, it closes the value gap from the potential to the realized value of data, allowing organizations to finally finish the last mile of their data-centric enterprise journey.