Microsoft has largely left tools for data governance to independent software providers (ISPs) and partners in their ecosystem. (Full disclosure—I am employed by an ISP that sells a product in this space—S1 Doc.) Microsoft has put forward a few feature sets over the years, including Master Data Services and Data Quality Services, and, more recently, the new Azure Data Classification feature set.
The Data Governance Challenge
DBAs and data management teams have long struggled with supporting data governance functions. Naturally, a well-managed organization wants to extract maximum value from their data assets. But that is incredibly difficult for companies whose data assets were created over a period of years or decades without an integrated, top-down data architecture.
As these data management systems grow organically over time and as new data management systems are added, users of corporate data resources face a variety of challenges: Where to find the data they need? How to connect to the data? How to analyze and interpret the data, especially if metadata and documentation are limited or unavailable? Those who produce data for corporate use also face related challenges such as creating documentation and security provision for corporate data. And, as mentioned earlier, DBAs responsible for administrating and securing data face a never-ending array of challenges, from storing and sharing data to data discovery and corporate security and compliance.
What Can Azure Purview Do for You?
Whereas the earlier Microsoft feature sets for master data management and data quality offer some help, Microsoft has now offered a preview release of Azure Purview. Azure Purview is a unified SaaS data governance toolkit designed to work with on-premise, cloud, and even other SaaS data platforms. Purview enables you to discover data assets in your organization, build a map of those assets, and classify the data according to it sensitivity, as well as build data lineage maps.
You can find some major tools within Purview: the Data Map, the Data Catalog, and Purview Studio. First, Data Map is a cloud-native PaaS tool that captures metadata from a wide variety of organizational data sources, from operational databases to data lakes and analytic systems. Once created, the Data Map is automatically kept up-to-date using automated scanning. In turn, the Data Map powers the Data Catalog tool and provides insights within the Purview Studio.
You can scan a wide variety of Azure data sources (such as Azure SQL DB, Azure Blob Storage, Azure Data Lakes, and Azure Cosmos DB) and tons of different structured file formats (JSON, Parquet, CSV, XML, and much more). In addition, you can scan three different levels: L1–L3.
An L1 scan collects basic metadata such as the file name, size, and fully qualified name. An L2 scan collects the schema of database tables and file types. And an L3 scan extracts the schema and also applies system and custom classification rules.
The Data Catalog enables users to quickly and easily find relevant data using “lenses,” such as sensitivity labels, classifications (GDPR and others), glossary terms, and the like. You can also use the Data Catalog to curate your data in the business glossary manager. From there, you can see visual representations of your data lineage from origination, such as from an operational database, through ETL processes, and on to visualization and data science platforms such as Power BI or Azure Machine Learning.
Not Just for DBAs
While extremely beneficial for any data management professional such as a DBA, Azure Purview is also very useful for data stewards, security officers, and compliance officers. In fact, anyone who would benefit from a top-down, enterprisewide data map with analytics will find this to be a very useful tool.
You begin by registering one or more data sources. Purview then copies and indexes the location information and metadata of the data source as well. You can further annotate the Data Map information, adding descriptions, tags, and so on. Once created, you can use the Purview Studio to browse data lineage, query for relationships between and among data sources, and do a variety of other useful activities.
You can read more about the Preview of Azure Purview at https://docs.microsoft.com/en-us/azure/purview/overview#next-steps. You can read the Azure Purview documentation at https://docs.micro?soft.com/en-us/azure/purview. There is also a starter kit and tutorials for each of the tools in the toolkit. Pricing is still to be determined.
Get started by creating an Azure account in which you own an Azure Active Directory tenant. You’ll also need permissions to create resources in your Azure account since Purview needs to create a managed Resource Group and a subordinate Storage account and EventHub namespace. n