Data governance is sometimes viewed as a roadblock that keeps data scientists and analysts from turning data into business insights quickly and efficiently. Yet, in the enterprise data and analytics work that we’ve undertaken for clients across diverse industries, we’ve found that it’s often a lack of sound data governance that prevents organizations from realizing the full value of their data.
Data governance deals with such questions as the origins, or lineage, of data; who can access data and what they can do with it; how data is categorized or catalogued; and the quality and completeness of data. In addressing such questions clearly and directly, data governance increases the productivity of data scientists and analysts. That benefits the business decision-makers who count on them for insights.
Here are five ways that a modern approach to data governance can make your data scientists and analysts more productive, enabling your business leaders to gain insights more quickly.
1. Good business metadata is good for the business. Effectively governed metadata – that’s data that labels or categorizes other data – facilitates the discovery process for data scientists, helping them find the data they need, when they need it.
Tagging and cataloging data at the time of ingestion will help your organization keep its data lake clean while giving your data scientists a better understanding of what’s available to them.
2. Effective schema management saves time and money, especially in a big data environment. Schemas define how data should be read. It’s essential that data consumers know which schema to use when looking at particular files. Yet managing schemas can be difficult, particularly in a big-data environment. Programmatic technical and business schema discovery eases the problem.
When a new data set is ingested into a data lake, an open-source tool can help you determine the schema automatically and, in a mature environment, match the newly discovered data to existing business metadata, providing you with both the business and technical metadata immediately. Publishing, curating, and governing all known schemas will save your data scientists and analysts considerable time, freeing them to focus on their primary roles.
3. Good data quality and profiling can accelerate time to insight. Poor data quality is among the key reasons that 40 percent of business initiatives fail to achieve targeted benefits, according to a report by Gartner Inc., which also notes that data quality affects overall labor productivity by as much as 20%.
Developing a sound architecture and effective data-quality protocols will help you keep your data lake from becoming a data swamp. Establishing data-usage agreements between producers and consumers of data will also prove helpful, as these agreements give everyone a better idea of the level of data quality expected and how it will be documented. Profiling data and storing the profiles with metadata is also a useful practice, giving your data scientists a better understanding of the types of data contained in the system and allowing them to formulate hypotheses more quickly.
4. Data lineage can help keep you from getting sued or fired. In an era of data breaches, data governance can provide important protections to your business and its employees. Data governance won’t stop determined hackers from gaining access to secure data, but, in the event of a breach, it will help you understand what has and hasn’t been compromised.
Data governance affords particular protections to people who work in regulated industries such as financial services and healthcare. In an audit, data governance enables you to show exactly where your data came from and how you made particular calculations.
5. Your models and analyses will run right in production. If your data governance program includes the measures discussed up to this point, your data will be of sufficiently high quality that you’ll experience fewer problems with models and analyses in production.
If you go a step further and establish preventive and detective controls, you’ll gain additional benefits. Preventive controls help ensure that low-quality data isn’t used by the business. Detective controls help the production and operations team troubleshoot jobs that fail as a result of data quality issues.
Saving time, energy, and money
A well-managed data governance program provides data scientists what they need in order to remain focused on delivering business value. That includes metadata as well as information about data schemas, quality, structure, and completeness. With a modern data governance program in place, data scientists needn’t spend their working hours looking for data, trying to understand definitions, wondering whether data sets are complete and accurate or trying to determine where data originated. That saves time, energy and money while improving the quality of business decision-making. A sound data governance program also will keep your organization safe and compliant, with full documentation of how data is used and by whom.
For more information on this topic, read the CapTech white paper, “5 Ways Modern Data Governance Will Make Your Organization More Productive.”