A Guide to Data Mapping Essentials in the GDPR Era

The initial step in preparing for the onset of General Data Protection Regulation (GDPR) involves getting a full view of the entire information chain to ensure regulations are met. To be fully prepared, businesses must establish data management best practices, and they need to know more about their data: where data that needs to remain private resides, where it comes from, where it travels within the organization, how it’s processed, and who consumes it. Given today’s increasingly demanding data management environment, navigating this data complexity requires an increasingly sophisticated mapping capability.

To use an analogy, before the advent of sophisticated mobile computing, people bought paper-based maps when they visited a new region. It was the only way to find places they had never been before—but this kind of map was static. It quickly became outdated since it lacked dynamic context—in other words, there was no way of gauging roadwork, traffic problems, newly built roads, or any other potential road issues that might surface. Making one change required redoing the whole map.

To put it simply, there was a lack of transparency. Without any way to track and trace, passengers wouldn’t know if their taxi driver was taking the fastest route to their destination. The advent of GPS changed everything and gave travelers a more accurate and dynamic view of a region, with details of traffic, road, and weather conditions constantly updated. Today, people often use GPS, even on familiar trips, because it can instantly update them of any issues they may encounter during their journey.

We’re seeing a similar shift taking place in data management today. In the past, for many businesses, there was no need for dynamic data mapping. A high-level, paper-based view of the landscape was sufficient. That’s changed with the explosion of data—and this change will accelerate even more when GDPR comes into force. Beyond just viewing where data is stored, organizations need dynamic data mapping capabilities for a precise view of their data and to provide transparency around “the rights of the data subject,” such as the right to be forgotten, rights of accessibility, and rights of rectification.

Navigating In-Depth Data Management

This explains the rationale behind metadata management. If you drill down further, you’ll find a recurring theme: that drawing connections between disparate datasets is key to the kind of data management that GDPR demands.

In the past, organizations managed data relative to privacy and eventually processed opt-ins, but it was typically done in a specific context, limited to one department. If people working in the marketing department were responsible for managing a list of customers that potentially contained private data, they might have had to inform the local authorities about it. Similarly, the HR department would take on exclusive responsibility for the privacy of employee data.

That’s all changed. Today, with GDPR in the offing, businesses need to have a comprehensive view of the private data they are managing. The fact that one business may know an individual in many different contexts only further complicates the situation. If an individual bought a business’ products or services, the business will know them as a customer and their details will be stored in its CRM system. However, if they are also contracted, they will be in the financial system; if they have taken out a subscription, details will be stored in the support department; and for digital products or services—such as connected objects in the internet of things—everything that they do might be tracked in yet another location.

This highlights the complexities of compliance with GDPR and the broader view of data now required for businesses to achieve. The emphasis can no longer be placed on a single department managing its own data requirements. Instead, the focus must shift to managing all of an individual’s private data across the entire enterprise, whether the individual is a customer or an employee. This is clearly a complex undertaking, so how can businesses effectively go about it?

Gaining a Holistic Data View

The first stage of the process is to create full segmentation of your data, or, in other words, a data taxonomy. At this point, the focus should be on creating a high-level view of the private data that needs to be managed. For GDPR, that’s likely to be some data related to customers and employees. Drilling down into employee data is likely to include information about their performance, salary, benefits, and even health or family data. High-end business tools might be needed to complete this task in a business glossary.

The next stage for the business is to assign responsibility for the different data areas. This involves deciding who takes care of employees’ health data, for example, or who looks after their performance details. In parallel, organizations can start to define the foundations of their approach to data policy, something that typically includes outlining data retention strategy, or how long they need to keep certain types of data before archiving or deleting it.

Once this whole process has been undertaken, the business will understand the datasets it needs to control. It doesn’t necessarily know where all this data resides, but it does at least understand what information needs to be managed and what data will need to be considered when a customer asks for information to be changed or deleted. The business may also need to implement technology to connect to the data in order to maintain its quality and ensure it is kept consistently accurate and up-to-date.

Making the Right Connections

When it comes to connecting to the data, businesses will have to carry out “stitching,” a metadata management technique that involves connecting data to the physical system that manages it. If the organization is looking at identity data specifically, they should connect to the HR system but maybe also to the payroll system. Beyond that, they might need to consider that identity data will also be in the recruitment system because before the employee was hired, they were a candidate. And, they should also consider the travel and expense management system for those who might hold sensitive information, such as credit card numbers.

To ensure compliance, businesses will need to carry out the stitching process to make a physical connection to the actual data they are managing. In other words, with stitching, they can map directly to the file if the attribute name is identical. Alternatively, they can connect through the creation of relevant correspondence that helps to make links between the logical high-level data and the physical data. This means that when a candidate becomes an employee, a data integration project can be run to take the candidate’s data from the recruitment system and bring it into the HR system, effectively drawing the lineage between the different bits of data. 

Foundations in Place

At this point and by following these data mapping best practices, the business would have come a long way in its metadata management journey. It would have developed the kind of dynamic mapping that is critical in viewing where data is located in context with other data. All the finer-grained data elements would have been defined and linked to all the systems that use them, and the dependency or relationship between each of the systems would have been established. This solid mapping foundation makes it easier to make adjustments further down the line.

This means that if the business needs to change the format of its data in any way, such as using four digits for the year instead of two, it is far easier to achieve. They can get answers to questions such as: Where do I have the data first? If I change it in the HR system, what is the impact elsewhere? The business can ask: Should I change the data integration job that takes the data from the recruitment information or should I just propagate these four digits down to the HR application?

The same principles can apply to data masking. The organization can leverage its mapping and data integration capability to start applying guidelines to the data. They might want to disguise the exact birth date of a given individual within the system, or to avoid the segregation of younger and older candidates in the recruitment system, they might want to mask the date of birth information completely.

Good metadata management is about having a dynamic view of the data. To use the GPS analogy once again, you need to be able to see the route to your customers, roughly where their offices are, and how long it will take for you to drive there. But you also need to be able to act whenever an exception occurs. Metadata management is not simply about mapping and visualizing the data—it is also about knowing how to act when there is a problem, and it’s about helping to guide that action. Today, the latest GPS systems don’t just tell you that there is a traffic jam, they also suggest another route to take. That’s the same kind of benefit the business can attain with metadata management. Whenever a change or a new regulation is introduced, the metadata management tool should guide the business to apply the right action to its data.

Technology Whose Time Has Come

In the past, despite the rapid growth in data volumes affecting multiple industry sectors, the market for metadata management remained largely restricted to banking, financial services, and other highly regulated industries. The advent of GDPR and its demands on companies of all sizes and types has elevated metadata management up the priority list for all businesses.

What this means will vary from company to company. Some businesses can use existing software to document their data and then focus on keeping records accurate and up-to-date by evolving their systems over time. However, as time goes by and as the importance of data mapping increases, more and more companies will need to move to a metadata management approach and to the growing portfolio of technologies that support it.



Newsletters

Subscribe to Big Data Quarterly E-Edition