Driving Actionable Business Intelligence with a Data Catalog

Data has become a disruptive force for global businesses and a catalyst for digital transformation. The ability to combine data from different sources and to leverage the increasing amounts of detail in this data offers unprecedented opportunities to realize the value locked up in organizational data and leverage it for actionable insights and competitive advantage.

In fact, it’s not a stretch to say that the difference between enterprises that know how to leverage data to drive insights and those that don't can spell the difference between success and failure.  Industry analyst firm Gartner predicts “by 2018 most business users will have access to self-service business analytics tools, but that only one in 10 initiatives will be sufficiently well-governed to avoid data inconsistencies that negatively impact the business.”[1]

But data can only be leveraged for BI initiatives to the extent it can be accessed and trusted. And, while today’s self-service BI and analytics tools satisfy a user’s craving for more “consumerized” technology, they often leave an analyst stuck in neutral because the users, first and foremost, cannot find the data they need to perform any analysis.

This is where an effective data governance solution that incorporates a data catalog becomes essential to any successful BI initiative. Data governance breaks down data silos and creates new levels of collaboration to derive intelligence from data that is reliable, transparent and productive. A data governance-driven data catalog serves as a single source for BI professionals to more quickly put their hands on the trusted data they need to make intelligent analyses that can move a business forward.  

The Benefits of a Data Catalog as Part of a Data Governance Solution

In any enterprise that lacks a data governance solution, a business analyst typically wastes weeks (even months) trying to find, understand and validate data. They then must determine the meaning and the business logic behind the data and clean it before vetting with key stakeholders (e.g., CXO, inter-department users, IT) to ensure the data is valid, includes the proper quality controls, is compliant, and reflects the correct policies. This time-consuming, iterative process of data discovery too often results in BI project failure.

A data catalog creates a better way by enabling BI professionals to easily find and access trusted data.

The catalog is a single source that gives users a view of what data the organization possesses. It consists of metadata in which definitions of data store objects such as base tables, views (virtual tables), synonyms, value ranges, indexes, users, and user groups are stored. It enables users to continuously enrich that data by tagging, documenting and annotating datasets built from either existing or new data sources that can used by others in the enterprise.

The data catalog is a key tool for a variety of business users, including:

  • Chief Data Officer/CXO: A C-level executive wants an overview of data assets across the entire business ecosystem (e.g., BI, CRM, Finance, Manufacturing)
    • The data catalog creates trusted data by demonstrating governance policies are in place and compliance is applied across lines of business.
  • Data Analysts/BI Analysts: Data or Business Intelligence analysts seek better data for the analyses they’re tasked to perform.
    • The data catalog enables these analysts to find internal “approved” datasets that meet their needs and collaborate with the overall dataset onboarding process.
  • Data Stewards: Data stewards drive practical guidelines on governance across the business. As the ones closest to the data, they evangelize and communicate the value of data, stay current with how data can support changing regulatory demands, and serve as advocates for future data initiatives.
    • The data catalog helps data stewards understand what the data means, how it should be used, and for which purposes.
  • Data Producers and Data Consumers: Data and process owners generate data in their daily operations, and data consumers use that data for the follow up work, such as process hand overs and analysis.
    • The data catalog formally establishes the producers and the consumers to make responsibilities transparent in SLA-like data sharing agreements

The 7 Benefits of a Data Catalog in a Big Data World

Here are a few benefits organizations can enjoy from an innovative and automated data catalog:

1. Supporting the Democratization of Data

In the past, organizations have delivered data through a catalog of reports developed in a technology Center of Excellence. However, this created a constant backlog due to IT resource limitations and these reports were often used as data sources for other types of analysis. With today’s easy to use analytical tools, BI professionals can create a catalog of data instead of reports. This allows users the freedom to use the data as they see fit, while maintaining good governance over that data.

Organizations also want ways to increase the value of their metadata beyond just structural information (e.g., column names, field lengths) to include meaning, relationships and business lineage. Again, this was traditionally IT’s purview, and resulted in time and resource bottlenecks. Today, an automated data catalog allows business users from across a company to “crowdsource” the necessary information to continuously add organizational value to metadata.

Over time and as the volume of data in the enterprise grows, machine learning will also become an important component of a data catalog to fully automate the process of enriching and enforcing metadata consistency.

2. Creating Personalized Data for Intensive Data Users

“Power” data users can leverage catalog functionality to manipulate arbitrary combinations of data across various datasets to see personalized views of information. The data catalog enables a user to see the major characteristics of that data at a glance, such as certification level, quality level, ownership/stewardship and content. This gives users the knowledge to combine disparate datasets to run specific and more sophisticated BI analyses.

3. Supporting the “Amazon-ification” of Data

Businesses are waking up to the dichotomy between the simple technology we use in our personal lives vs. the complexity of technology in the workplace.

The process of identifying relevant information resources in the enterprise can be difficult without a consolidated source or directory across multiple reporting tools and environments. Often the information needed to effectively determine the suitability of a given report or dataset is lacking. 

But increasingly, savvy businesses are recognizing the value of taking an “Amazon-ified” approach to data.  Think of it this way: sites like Amazon and Google offer not only easy-to-use search capabilities that span multiple information silos, but also provide relevant, curated metadata.

By considering data and search from a consumer’s perspective, particularly around mimicking the sort of one-stop-shop approach users enjoy with Amazon, businesses can deliver new levels of efficiency and a way for users to easily “shop” for the data they need, all from one central location, without needing to go through a technical intermediary.

A data catalog helps deliver on this promise by enabling users to identify the data they’re interested in, provide them access, and leverage the power of the crowd to see which data has proven the most useful to which people. The increased emphasis on machine learning will allow for the data catalog to make specific recommendations to users, further speeding the data search process.

4. Easily Onboarding New and Trusted Data

Another challenge faced by data users is how to bring new data into the environment.

Enterprises grapple with “dark” data – data that’s stored (and siloed) on individual laptops or on departmental servers. All this has the potential to create a mess of unlabeled data.

A data catalog enables users to easily onboard new data with structured workflows and roles-based approvals. A sophisticated catalog will include a template for what information is required before data can be on-boarded along with automation to harvest the technical metadata (e.g., column headings, tables).

5. Providing a Complete View of Data

A sophisticated data catalog provides complete information about the data at a user’s fingertips. It should help a user understand the data’s meaning from the business glossary, retrieve technical information from the data dictionary, and enable users to view relationships, issues, and sources through lineage and traceability diagrams

6. Benefiting from Enterprise-wide Collaboration

When it comes to data, it really does take a village. Increasingly, businesses will turn to their data catalog as a resource for enabling the enterprise-wide sharing of datasets and collaboration on the construction of datasets (e.g., creating workflow and collaboration around building datasets involving teams of stakeholders). Social functionality will be built into catalogs to enable interested parties to continue to enhance the availability and quality of data, and see which parties have used particular datasets and for which purposes.

7. Certifying and Authorizing Authoritative Data Sources

A data catalog helps certify data and authorize access of specific datasets to the appropriate users, all while offering a degree of flexibility and control.

Data Catalog Use Cases

The beauty of a data catalog is it can be used in a variety of ways depending on a user’s role in the organization and their particular need. Here are two scenarios that show the data catalog in action.

In the first example, the catalog acts as a data infrastructure to guide BI analysts to discover and leverage approved datasets. For instance:

  • A Tableau user wants to build a dashboard using internal data in support of a particular business requirement.
  • Data stewards across the organization have already categorized or registered the internal data sources in the data catalog. This dataset has then undergone the approval process, and has been certified at a level, which means both its sources and its calculations (of periods, metrics, etc.) are verified and correct.
  • The Tableau user connects to the data catalog to easily identify and obtain the desired data. The catalog functionality uses a sophisticated machine learning-assisted engine to recommend useful datasets/models and shows which of these have been certified at which levels, as well as their quality, provenance, etc. Once chosen, the data is exported in native Tableau format.
  • The Tableau analyst uses the correct datasets to build business dashboards to meet his or her business intelligence needs. The analyst is 100% confident that he/she has the right data, from the right sources, to support the analysis and the critical business decisions it will inform.

In the second scenario, a BI analyst needs to define a process for onboarding new data sources to the catalog:  

  • An analyst has identified a dataset kept in a local format by the company’s strategic alliances group. This dataset has important information about partner engagement with the organization’s customers. The analyst believes it would be a useful dataset for many data citizens, and wants it to become a corporate asset.
  • The analyst starts the on-boarding process, identifies the dataset, its source, and basic information about what it is and why it is useful.
  • The workflow manager contacts the appropriate data owners, stewards and subject matter experts to discuss whether this data is useful as a corporate asset.  They agree on whether to include it in the corporate domain.
  • The dataset is approved, and its information is automatically imported into the governance process.
  • The data steward for that dataset can enrich the metadata in any way he/she chooses.
  • This data is now available for use by approved data citizens.

Data Catalogs Enable 3 Primary Benefits that Drive Business Advantage

As data catalog functionality continues to become more automated and sophisticated, enterprises will enjoy three primary benefits that combine to drive business advantage:

  • Maximizing the Value of Big Data: Data catalog functionality helps empower a range of data users – from the Chief Data Officer to BI analysts to data stewards to business users to find, trust and derive more business value from data. The catalog also helps to foster ongoing collaboration and reuse of data across the enterprise – shifting it from siloed information to data that helps people make true business impact.
  • Increasing Business Speed: Studies show that up to 75% of the time spent performing analytics is engaged in data wrangling activities. Cataloging makes it simple to find the data a BI professional needs, tie it to business terms that can vary across different parts of the organization and make smart decisions about how to use it correctly.
  • Ensuring Proper Data Control: Data scientists, owners and users can ensure that the correct data values, references and results are used. One of the biggest benefits of big data is the flexibility to use data in new and different ways. Sometimes this means changing or adding new attributes to the information an enterprise has. A data catalog supports the ability to add to and change the data, along with the underlying metadata, through simple collaborative processes to ensure the right stakeholders are involved in every decision.


[1] www.gartner.com/smarterwithgartner/managing-the-data-chaos-of-self-service-analytics


Subscribe to Big Data Quarterly E-Edition