Taking Control of Your Data

Bookmark and Share

It's no secret that today's enterprises are drowning in data. The explosive growth of unstructured and semi-structured information in particular introduces significant ramifications for governance, protection and management practices. In the age of information overload, the following best practices help organizations of all sizes effectively manage, control and protect unstructured data.

Unstructured data is information comprised of disparate files like documents, images, and spreadsheets stored on file servers, NAS devices, and semi-structured repositories like SharePoint. This data type presents a unique challenge for enterprises. Unlike structured data repositories (such as databases), which have defined data types and rules to enforce where that data is stored, unstructured data is in a constant state of flux. Documents are created around the clock by virtually all employees, and are often stored in user-defined directory structures with few rules about what data is stored, where it is stored, and what each file contains. Whether the data represents customer information, product design documents, HR and legal documentation, blueprints, images, or audio and video files that relate to the business - unstructured and semi-structured enterprise data must be effectively classified, protected and managed.

The Unstructured Data Challenge

With so many individuals creating and storing documents, files and other content, the volume of unstructured data is growing exponentially. Organizations can quickly become overwhelmed with the task of managing and protecting this large, constantly changing pool of data. The increasing number of teams and groups within an organization, combined with added management and security requirements, causes the number of folders and SharePoint sites to increase proportionately faster than the information being stored in them.

Access is key. As a rule of thumb, each data set and container such as a file or SharePoint site should be restricted to a list of team members and appropriate users. To remain effective, this list must reflect organizational changes and changes to the data's sensitivity. Therefore, each set of data represents an organizational decision like "Who should have access to this data?" More containers mean more decisions, and more maintenance.

This increasing complexity widens an already sizable information gap between end users, data, and IT. Data containers without an owner can only be maintained on a best-effort basis by IT, which often lacks insight into who can and does access the data it contains.

Without knowing the appropriate owner of a particular data set it is impossible for IT to answer pressing questions around data governance such as "Who has access to a particular data set?" "Who has unnecessary permissions to each data set?" "What other data have they been accessing?" "Which data is sensitive?" "Who is the likely data owner?" The job of defining ownership should lie primarily with the data owner.

Organizations today are beginning to understand that second to their employees, data is their most critical asset.  Consequently, they need to approach data management as they approach capital management - by employing disciplined methodologies utilizing automation and actionable intelligence. Once employed, these methodologies secure and protect unstructured data in a scalable and repeatable fashion, without requiring additional intervention from IT personnel or disturbing business processes.

The following best practices present a set of guidelines organizations can begin thinking about to take control of their data.

Identify Data Owners

Many organizations lack a process to identify data owners. One of the most effective ways to determine data owners is to track which users have the most frequent access to the data over time. Based on this top set of users, organizations can determine who should - and should not - have access, what type of protection the data should have, and indicate when the data is redundant.

Define Data of Interest

Enterprise IT personnel should work closely with data owners, security and risk managers to identify keywords, phrases and patterns that distinguish the data of interest. Common types of information requiring special attention are intellectual property, customer data and employee information.

Use Metadata to Focus and Accelerate

Metadata provides data about enterprise data and can be used to focus and accelerate the classification process, such as determining file sizes, types and locations. Examples that can be used to enhance data classification include access permissions (especially data that is overly accessible); access activity (identifying which folders are the most frequently used and which folders are not being used at all); and ownership information (helps to limit searches to data owned by specific people).

Report & Remediate

As data access needs change, permissions are seldom revoked and users accumulate more and more authorizations over time. Results should be given to decision-makers - typically the data owners and governance/risk/compliance teams - so they understand the situation and can begin formulating remediation strategies and plans.

Reassess Data Often

Unstructured data is growing, constantly being modified and going stale. Therefore, it should be periodically re-scanned to ensure that an organization maintains an accurate view of its sensitive data.

Next Steps

Depending on the volume of data within a particular organization, implementing the above efforts can make a meaningful impact to an organizations data management policy. However, when one considers that a single terabyte of data will often contain 50,000 folders (of which, 2,500 or 5% will have unique permissions) - the amount of work it takes to manage unstructured data can be overwhelming, and manual processes can quickly break down.

Automated solutions take much of the legwork out of the above data management requirements while leveraging metadata (or, data about the data).  Metadata is critical, as it helps expedite data classification projects and deliver results within the context required to quickly remediate problems.

Data and its associated protection and management requirements are growing at an extraordinary pace. Data-intensive organizations require a scalable framework and an automated process to manage critical data related tasks such as complying with new regulatory requirements, establishing archive policies, meeting intellectual property requirements, and adhering to personal confidentiality laws mandating additional protections.