Shortening the Path to Innovation With Big Data as a Service

As the business value of big data increases, vendors are offering cloud-hosted big data technology, known as big data as a service (BDaaS). Why choose BDaaS instead of on-premises deployments? Do you lack internal IT expertise in big data? Is the upfront cost for a cluster an issue? Or do you not have the luxury of time to build a cluster? These are all important questions when considering BDaaS.

For the data scientists and business users, the main benefits of BDaaS are faster spin-up of clusters, lower upfront costs, reduced IT overhead, and fewer software dependencies. For example, a product recommender system can be implemented on a 10-node Hadoop cluster in AWS for a reasonable cost (e.g., $21/day for hourly processing1). This compares favorably in cost compared to procuring an internal cluster and the lead time to set it up, which is particularly attractive if your internal IT team lacks big data experience. BDaaS can be ideal for handling new initiatives, particularly exploratory ones.

How should you choose a BDaaS provider? The type of analytics should be a key decision factor, whether it is business intelligence (BI), prepackaged predictive analytics solutions, or custom-built analytic applications. Providers typically support one or more types. With BDaaS, the user can focus on the features of the BI tool, such as geographic sales analysis, rather than ensuring the IT organization supports the long chain of tool dependencies. Predictive analytic solutions are automated modeling tools in which the underlying technologies are not critical versus ease of deriving analytic scores for decision making. Here, BDaaS is ideal, as it allows the business to immediately focus on building models without waiting for the hardware/software installation or necessary data science expertise. Other enterprises have teams of experienced data scientists and developers who want to design custom applications—accompanied by all the necessary data-handling, error-processing, and security requirements. For them, the available technology stack is critical, whether Spark, Java, R, PMML, NoSQL, or Docker. Even for experienced developers, getting big data technology to work often requires external support, and so the quality and extent of expert support by the BDaaS for the entire BDaaS stack should be evaluated.

The compliance officer also needs to be involved in evaluating BDaaS. Many BDaaS providers have taken considerable effort to achieve regulatory compliance with PCI, HIPAA, etc. BDaaS allows less physical control over data, so it raises concerns for data governance, both in-transit and for data stored on the third-party system. The compliance and audit requirements for your BDaaS provider must meet the same stringent requirements for in-house systems. Ensure the provider meets any applicable regulatory certification; validate appropriate access control to your data; ensure the BDaaS utilizes effective cybersecurity software to protect data assets. Data stored with a BDaaS provider is typically managed through a single account, and the employees who control those accounts must be carefully vetted. Recent news articles discuss cases of former employees attempting to take valuable data assets to new organizations (e.g., through control of AWS credentials2). For employees who leave on unfriendly terms or plan harm to the enterprise, the single point of control over this data through the BDaaS is troubling and potentially damaging to your enterprise.

Responsibility for data security cannot be fully entrusted to the BDaas provider. Rather, the business has to take full responsibility for regular audits of the provider to validate the security of the data and protection against compromise. Further, the business may choose to minimize its risk by hashing all sensitive data elements (payment card numbers, PII, national ID, etc.) to avoid storage of sensitive data in the cloud. From the outset of a big data project, some solutions can be designed to avoid utilizing data elements which pose high risk if compromised.

BDaaS can drive innovation where timeline, cost, or internal expertise in big data technology are limiting factors. As such, BDaaS is a powerful tool if used with caution. Ultimately, each organization needs to balance its own internal big data capabilities and skillsets while simultaneously ensuring that data governance for assets transferred to the cloud sufficiently protects the organization.

1 Wu, Xing, Yan Liu, and Ian Gorton. “Scalability and Cost Evaluation of Incremental Data Processing Using Amazon’s Hadoop Service.” Big Data: Algorithms, Analytics, and Applications (2015): 21.

2 Fikes, Bradley J. “Deposition Set for USC Alzheimer’s Expert Aisen.” Union Tribune [San Diego] 14 July 2015.

 Image courtesy of Shutterstock.


Subscribe to Big Data Quarterly E-Edition