Image courtesy of Shutterstock.
Big Data Poses Security Risks
Big data refers to the massive amounts of structured and unstructured data that are difficult to process using traditional data management tools and techniques. While big data can inform enterprise operations—offering business advantages—the present methods of mining and managing big data are still evolving and pose serious security and privacy challenges. Confronting these challenges is essential if the potential of big data is to be fully exploited.
According to Gartner, 64% of enterprises have or plan to use big data.1 Similarly, NewVantage Partners found, in a 2013 study that focused heavily on large finance companies, that 68% of respondents had spent at least $1 million on big data, about twice the number that gave the same answer in 2012.2 The rising popularity of big data, at least among large enterprises, may force other companies to keep pace. And even companies that do not choose to employ big data themselves may use cloud services that do. Like enterprises with their own big data programs, customers of cloud services will need to understand the security issues that big data creates and how they can be mitigated.
Understanding Big Data Security Threats
Big data environments are expansive and technically complex, characteristics that in themselves create security problems. The scope of big data can make it difficult, for instance, to control and monitor the rights that users have to access particular files and resources.
Discussing this problem in Forbes, Davi Ottenheimer, a senior director at EMC, explains that with the scale of big data, problems can easily emerge, but finding the cause of such problems is difficult.3 And a study by the Cloud Security Alliance (CSA) found that, in addition to their unwieldy size, these environments have a variety of data types, and that much of the data is streaming instead of static. Combined, these characteristics render many common security approaches ineffective.4
The CSA report divides big data security threats into the following four categories:
- Infrastructure Security - Big data infrastructures are distributed across many servers and often across multiple networks, so pulling data from them requires approaches, like MapReduce, that are not used in traditional environments. These mapping technologies are vulnerable to special types of attacks, such as when hackers spy on transactions or alter the results of operations. The data sources themselves are open to attacks. NoSQL databases are commonly used in these environments, and they can be targeted by injection attacks in which hackers insert their own code into a database application.
- Data Privacy - Concerns about privacy loom over many big data security discussions. In most cases, a breach of a big data service will be a privacy breach. Big data projects typically store consumer data that most users will expect to be private. To maintain the confidentiality of data in such an environment, access control must be managed at a very granular level, which is difficult and takes significant effort.
- Data Management - The dispersed, often multi-tiered nature of big data architectures makes managing data difficult. In particular, it is difficult to determine data's "provenance," that is, the source from where it came and the history of its creation and modifications. But these factors are critical concerns when evaluating the risks posed by a piece of data and when enforcing reputation-based security schemes. Provenance is also an important issue for complying with regulations like PCI and Sarbanes-Oxley.
- Integrity and Reactive Security - Data that enterprises gather can be dangerous, possibly because it has been planted by hackers. This threat compels organizations to find ways to validate data. One way is to use real-time analysis, which can alert enterprises to potential problems. Such analysis relies on algorithms that are constructed to filter out data that appears suspicious, either because of its content or its source.
Apache Hadoop Poses Specific Security Issues
Specific risks emerge from Apache Hadoop, the most popular big data tool for dividing data sets across many servers. Hadoop has been pressed into a broader, more mission-critical range of uses than it was originally designed to address. And the industry has been playing catch up to add security capabilities that it did not originally have. For instance, its authentication method, Kerberos, does not offer security features that many enterprises may demand, including the ability to control access based on roles and to integrate with directories like LDAP and Active Directory.5