10 Big Data Predictions for 2017 from Oracle’s Balaji Thiagarajan

With the rise of smartphones, laptops, sensors on machines, vehicles, and appliances, massive amounts of data are being generated, according to Balaji Thiagarajan, group vice president of big data at Oracle.  For companies that can transform and manage it, he notes, data represents a huge opportunity as a source of competitive advantage and should be leveraged as such. Big data and cloud are two technologies driving dramatic transformations and, says Thiagarajan, organizations must be ready to react and take advantage of important new trends and technologies to make sure that they come out ahead next year. Here, Thiagarajan shares 10 key predictions for big data in 2017. 

1-The era of ubiquitous machine learning has arrived. According to Thiagarajan, machine learning is no longer the sole preserve of data scientists. The ability to apply machine learning to vast amounts of data is greatly increasing its importance and wider adoption and there will be a huge increase in the availability of machine learning capabilities into tools for both business analysts and end users—impacting how both corporations and governments conduct their business. "Machine learning will affect user interaction with everything from insurance and domestic energy to healthcare and parking meters," he notes.

2-When data can’t move, bring the cloud to the data. It’s not always possible to move data to an external data center, says Thiagarajan. Privacy issues, regulations, and data sovereignty concerns often preclude such actions, and sometimes, the volume of data is so great that the network cost of relocating it would exceed any potential benefits. In such instances, the answer is to bring the cloud to the data. In the future, more and more organizations will need to develop cloud strategies for handling data in multiple locations, he observes.

3-Applications, not just analytics, propel big data adoption. Early use cases for big data technologies focused primarily on IT efficiencies, data processing at massive scale and analytic solution patterns, says Thiagarajan but now there are wide variety of industry-specific, business-driven needs empowering a new generation of applications dependent on big data. Increasingly, applications are driving big data adoption, he says.

4-The Internet of Things will integrate with enterprise applications. The Internet of Things is for more than inanimate objects, points out Thiagarajan.  However, opportunities such as providing a higher level of healthcare for patients or enhancing customer experience via mobile applications requires monitoring and acting upon the data that people generate through the devices they interact with. The enterprise must simplify IoT application development and quickly integrate this data with business applications. By blending new data sources with real-time analytics and behavioral inputs, enterprises are developing a new breed of cloud applications capable of adapting and learning on the fly, says Thiagarajan, who notes that the impact will be felt not only in the business world, but also in the exponential growth of smart city and smart nation projects across the globe.

5-Data virtualization will light up dark data. Data silos proliferate in the enterprise on platforms such as Hadoop, Spark and NoSQL databases, observes Thiagarajan. Potentially valuable data stays dark because it’s hard to access (and also hard to find). As a result, organizations are realizing that it’s not feasible to move everything into a single repository for unified access, and that a different approach is required.

6- Kafka looks set to be the runaway big data technology of 2017. Apache’s Kafka technology is already building momentum, and is likely to hit peak growth in 2017, says Thiagarajan. “Kafka is a means of seamlessly publishing big data event topics, ingesting data into Hadoop/, and distributing data to enterprise data consumers. Kafka employs a traditional, well-proven bus-style architecture pattern, but with very large data sets and a wide variety of data structures. This makes it ideal for bringing data into your data lake and providing subscriber access to any events your consumers ought to know about,” he notes.

7-A boom in prepackaged integrated cloud data systems. Increasingly, organizations are seeing the value in data labs and end-to-end data platforms for experimenting with big data and driving innovation, but uptake has been slow, says Thiagarajan. “It isn’t easy to build these from scratch—whether on-premises or in the cloud. Prepackaged offerings including integrated cloud services such as data lake, data flow, data science, data wrangling, data integration and analytics are removing the complexity of do-it-yourself solutions. Expect a boom in prepackaged, integrated cloud data labs throughout the year.”

8-Cloud-based object stores become a viable alternative to Hadoop HDFS. “Object stores have many desirable attributes: availability, replication (across drives, racks, domains, and data centers), DR, and backup,” says Thiagarajan. “They’re the cheapest, simplest places to store large volumes of data, and can directly accommodate frameworks like Spark. We see object storage technologies becoming a repository for big data as they get more and more integrated with big data computing technologies and will provide a viable alternative to HDFS stores for a lot of use cases. All exist as part of the same data-tiering architecture.”

9-Next-generation compute architectures enable deep learning at cloud scale. Acceleration technologies, such as GPUs and NVMe; optimal placement of storage and compute; high-capacity, non-blocking networking—none of these things is new, but the convergence of all of them is, says Thiagarajan.  “Together, they enable cloud architectures that realize order of magnitude improvements in compute, I/O, and network performance. The result? Deep learning at scale, and easy integration with existing business applications and processes.”

10-Hadoop security is no longer optional. Hadoop deployments and use cases are no longer predominantly experimental. Increasingly, they’re business-critical to organizations, notes Thiagarajan. “As such, Hadoop security is non-optional.  You can expect to deploy multilevel security solutions for your big data projects in the future.”