<< back Page 2 of 2

Securing Your Cloud Data Lake With an In-Depth Defense Approach

Avoid Data Copies

The majority of data breaches that we have witnessed are the result of human errors. However, most of the errors that humans make are due to the complexity of the infrastructures that they are dealing with. Not having a basic understanding of the impact that changing a security setting may have on an entire cloud environment is the reason for most of the issues and breaches seen today.

The worst enemy of data security and governance is the lack of a self-service environment. By allowing users to utilize a self-service environment on the data lake, enterprises eliminate the need for users to create multiple copies of the same data every time they need to make a small change to it. Data lake engines allow users to query data directly from the data lake, thus eliminating the need for data copies, which are hard to secure.

Keep It Simple

There are two types of security risks: exogenous—driven by external attackers—and endogenous—driven by employees exposing data. Unfortunately, many of the systems put in place to protect against the former are increasing the risk of endogenous attacks. Security systems sometimes can be so complex that users try to work around them. In order to manage this type of risk, enterprises need to focus on simplicity and give users enough tools so they don’t attempt to work outside the system that is in place for them.

Enterprises should provide a governed mechanism for data sharing that avoids disconnected copies and also avoids restricting access to data unnecessarily; this will only stifle self-service and drive users to less-governed alternatives.

Enterprises should also enable coarse-grained ownership when possible. The scalability and elasticity of the cloud make it easier to create separate resources for different teams. Full resource isolation is emerging as a common model for data lakes and data warehouses, allowing data teams to use their resources without sharing them with other organizational units. Additionally, access control is easier to set up and maintain.

Self-Service Is the Foundation of Governance

Implementing security measures to keep attackers in check and avoid data from leaking out can be a daunting task if not implemented correctly. Security policies can do more harm than good if they are perceived by users as roadblocks. The good news is that self-service is emerging as the fundamental element of data security and governance; it allows users to have access to properly secured and curated data, thus avoiding the need to work around security hurdles to complete their jobs. Additionally, it allows for admins to keep track of what actions are being taken against what assets through features such as data lineage and activity monitoring.

When implementing a security model on your cloud data lake, always start simply, and only add complexity as needed while keeping the user experience in mind. This way, the number of security mishaps caused by endogenous reasons can be reduced to a minimum. 

<< back Page 2 of 2


Subscribe to Big Data Quarterly E-Edition