Between all the new regulations, and the existing ones, such as HIPAA, which governs medical information, it can be a complicated situation. Beyond CCPA, other states have also passed or intend to pass different data privacy laws, which may lead to a patchwork of frameworks with which enterprises need to comply. Becoming fully versed in all of the various jurisdictions and geographies that accompany them can be tricky.
Additionally, enterprises are held responsible if a third-party data management company that stores or uses their data gets hacked. The company that originally took possession of a consumer’s data is still responsible, regardless of whom else the company shared that data with.
Going forward, organizations should create a new position in their businesses with responsibility for compliance under the new regulations—similar to how GDPR requires the appointment of a data protection officer—to manage this complexity and ensure the entire organization can comply.
Data Protection Trends Keeping CEOs Out of Hot Water
Just as the data analytics market and government regulations are always changing, so are the technology trends around data protection. Staying up-to-speed on the latest technology trends is a must. The following is a look at emerging trends that can help keep C-level executives out of hot water.
Fine-Grained Protection
Fine-grained protection involves rendering unreadable only those specific fields that are sensitive, leaving the rest of a record in the clear. For example, a customer record may include general demographic information and transaction history, which is useful for analytics purposes but not if the whole record is encrypted. By protecting only PII fields, the remaining fields are still available to analysts and data scientists. There exist several approaches for fine-grained protection. Tokenization is the substitution of data with a token that preserves length, type, and referential integrity. Format-preserving encryption provides similar benefit via Advanced Encryption Standard (AES)-256 encryption. Masking entails swapping a field with another “fake” field—for example, swapping a person’s name with a fake one such as “John Doe.”
For more articles like this one, go to the 2020 Data Sourcebook
Anonymization
In very large customer datasets, it is sometimes possible to join that data with other publicly available data about the same people, and re-identify those people even if the PII had fine-grained protection. For example, a healthcare provider’s set of electronic health records, if joined with publicly available postal data or a telephone directory, could yield more information to narrow down the identity of specific people even if the identifying fields are protected. A person with a rare medical condition could be identified if it is also known that a pharmacy dispensed medication for this condition in the very small town in which this person happened to live. HIPAA specifically mentions that care should be taken to make this more difficult. There are emerging techniques for introducing just enough “noise” into a dataset to make such re-identification more difficult without losing the analytics usefulness of the data, such as differential privacy. This is an ongoing area of research that data scientists and analysts should continue to follow.
Sensitive Data Discovery ?of Non-Traditional Datatypes
Tools that help discover sensitive fields within traditional structured datasets, such as a customer database, have existed for many years. However, more enterprises are capturing and using new datatypes such as social data (including audio, video, and other multimedia), biometric data, location data (such as smartphone GPS locations), IoT sensor data, and other “unstructured” data, all of which may contain sensitive elements and entail risk for the enterprise, including being subject to regulations. The demand for a broader array of discovery tools is growing, and devising techniques for recognizing and protecting the sensitive elements in such data is an ongoing area of development.
Data Usage Analytics
Even the best data protection is not 100% foolproof, as a privileged user’s account may still be hijacked, or encryption keys compromised. Additionally, a given sensitive dataset may have originally been intended for a specific business need, but that dataset’s usage may grow over time as business needs change, thus raising the risk profile of that dataset. For these reasons, data usage analytics is emerging as an important requirement. This includes trend analysis as well as anomalous behavior analysis. Machine learning can be leveraged to establish baselines of “normal” usage, define smarter alerts, and reduce the number of false positives over time.
Named Subject Search
Many regulations, including GDPR and CCPA, include the “right to be forgotten” and other “opt-out” options that give consumers the right to request that their data not be used in certain ways, or even deleted outright. Yet, with data often being shared among multiple applications, analytics tools, and even third parties, it can be difficult to ensure that all of a given person’s data has been identified. Data quality issues (such as errors in a person’s name or address) as well as data-formatting differences across systems create additional challenges. Tools for uniquely identifying a given subject’s data records are coming to market, with ongoing research into machine learning techniques leading to improved performance and accuracy.
What’s Ahead
The waters are becoming murky for the majority of organizations that utilize data for analytics. With hackers becoming more sophisticated in their techniques, along with the increase in data handling regulations, it’s easy to see how an organization can be quite vulnerable to data privacy breaches and penalties for non-compliance. Taking a data-first security approach can help keep businesses on the right side of the law.