There are two types of businesses in the world today: those that run on data and those that will run on data. Data security now sits at the top of nearly every organization's priority list. But with such a high volume of data coming into most businesses every day, how can information security professionals quickly identify which data is the highest priority for protection? After all, security costs time and money, and not all types of data are as sensitive or vulnerable as others.
It is for this very reason that data discovery and classification techniques are making a significant resurgence. In fact, global analyst houses such as Forrester and Gartner have emphasized that a renewed focus on data classification is now foundational to establishing an effective data security strategy.
What is Data Classification?
Data classification is a process of consistently categorizing data based on specific and pre-defined criteria so that it can be efficiently and effectively protected. In addition to simplifying security strategies, data classification can greatly assist companies in meeting governance, compliance or regulation mandates such as PCI DSS and GDPR, as well as protecting important intellectual property.
Here are some of the most common misperceptions around data classification:
- It takes too long to provide value - Automated classification drives insights from day one. Automation for both context and content brings order to all your sensitive data both quickly and easily. Data collection and visibility can continue until the organization is prepared to deploy and operationalize a policy, and even without a policy, insights from automated data classification can drive security improvements.
- It is too complicated- Many data classification projects get bogged down because of overly complex classification schemes. When it comes to classification, adding more sets often adds complexity, not quality. Start with three categories to dramatically simplify getting your program off the ground. If more schemes or sets are needed after deployment, your decision will be driven by data, not speculation.
- It is just another layer of bureaucracy for my organization - Data classification can be an enabler and a way to simplify data protection. By understanding what portion of your data is sensitive, the resources can be allocated appropriately. Everyone understands what needs to be protected. Sensitive and regulated data is prioritized, and public data is given lower priority—or destroyed—to eliminate future risk to its theft.
It’s easier to manage the data deluge with classification. Considering how fast volumes of new data are accelerating, an InfoSec professional responsible for protecting an organization’s digital assets must take a new approach to stay ahead of the challenge. Classification enables them to avoid the inefficiency of taking a "one size fits all" approach, or the risk of arbitrarily choosing what data to expend resources protecting.
How Can Businesses Implement an Effective Classification Strategy?
Every business has different data classification needs to address, so a strategy must be tailored accordingly. The following five-point action plan can be used to create the foundation of an effective strategy for nearly any business.
Define a data classification policy
What are the goals, objectives and strategic intent? Each organization must clearly communicate how classification can support increased revenue, trim costs and reduce risk to achieve buy-in from the executive leadership team. Once this has been accomplished, it is equally important to make sure users are aware of data classification policies and to ensure they understand why a program is being put in place. An effective policy must also balance the confidentiality and privacy of employees and users against the integrity and availability of the data being protected. A policy that is too stringent can alienate staff and impede their ability to carry out their jobs, but if it’s too lax, the very data the business is trying to protect could be put at risk.
Establish the scope
It is important to establish where the boundaries will be early on; otherwise, data classification efforts can quickly grow out of control. This is particularly important when considering partners and third parties. In setting up a scope for their data classification program, organizations must consider how far into their network they aim to reach, and whether it is even feasible. It is equally important to consider legacy and archived data. Where is this data and how will it be protected? Finally, make sure to note anything that’s out of scope and ensure this is evaluated and adjusted regularly.
There are a few key questions organizations must ask as they define their “buckets” in order to help guide these efforts and get the program started:
- What are the data types (structured vs. unstructured)?
- What data needs to be classified?
- Where does sensitive data live?
- What are some examples of classification levels?
- How can data be protected and which controls should be used?
- Who is accessing the data?
Organizations may need to take careful consideration as they establish these buckets to weigh sensitivities around highly specified types of data, e.g.:
- Personal data and unique identifiers—this includes date of birth, name and address, credit card information and health records, as well as certain online identifiers such as location data and mobile device ID
- Pseudonymous data—this is sensitive information that no longer allows the identification of an individual without additional information, and is kept separate from it via measures like hashing or encryption.
- Genetic and biometric data—information such as fingerprints, facial recognition data and retinal scans
- Other types of data, such as criminal records, which naturally have more impact and requirements around it
“Discover” all sensitive data defined in the scope
Once data policy and scope have been established, the next task is to identify all the sensitive data that requires classification and protection within the business. First, understand what data it is you are looking for. This could take many forms, ranging from personally identifiable information (PII), payment card numbers and healthcare records through to business IP, source code, proprietary formulas, etc. Next, focus on where this data is likely to be found, from endpoints and servers, to on-site databases and in the cloud. Remember that discovery is not a onetime event and it should be continuously re-evaluated, taking into account data at rest, data in motion and data in use across all business platforms.
Evaluate appropriate solutions
When the time comes to identify an appropriate data classification solution, there are plenty of options to choose from. Many of the best solutions today are automated, and classification can be context (e.g., file type, location) and/or content-based (e.g., fingerprint, regular expressions). This option can be expensive and may require a high degree of fine-tuning, but once up and running it is extremely fast and classification can be repeated as often as desired.
An alternative to automated solutions is a manual approach which allows users themselves to choose the classification of a file. This approach relies on a data expert to lead the classification process and can be time intensive, but in businesses where the classification process is intricate and/or subjective, a manual approach can often be preferable.
A final option is to outsource the classification process to a service provider or consulting firm. This approach is rarely the most efficient or cost effective, but can provide a one-time classification of data and give any business a good idea of where it stands in terms of compliance and risk.
Ensure feedback mechanisms are in place
The final stage is to ensure there are effective feedback mechanisms in place that allow swift reporting both up and down the business hierarchy. As part of this, data flow should be analyzed regularly to ensure classified data isn’t moving in unauthorized ways or resting in places it shouldn’t be. Any issues or discrepancies should be immediately flagged for follow up.
While it’s tempting to think your organization has gotten along just fine without a classification strategy, taking a passive approach to the challenge is akin to saying, “I’ve never needed insurance in the past.” It reflects a lack of understanding of the importance of classification or a misperception that it is only for more mature organizations. While businesses can protect their data without classification, it comes at the expense of efficiency.
With a strong classification strategy, organizations will be able to better understand the difference between regulated, internal-only, and public data. This insight intelligently elevates data risks based on the impact of a breach. Without classification, data protection solutions, including data loss prevention and advanced threat protection, will be prone to higher false positives and false negatives, and alerts will be of lower fidelity.
With data now playing a pivotal role in nearly every business around the world, the ability to track, classify and protect it is no longer a luxury. An effective data classification strategy should sit as a cornerstone of any modern security initiative, allowing businesses to quickly identify which data is most valuable to the organization and ensure it is safe at all times.