Trends To Watch: Data Lakes in Clouds, Behavioral Analytics Goes Mainstream

May 25, 2016

By Ben Werther

Thanks to the cloud and other empowering technologies such as Hadoop and Apache Spark, we’re at the tipping point for big data. These technologies now provide a path to big data success for companies who otherwise lack the specialized big data skills or heretofore proprietary (and expensive) infrastructure to do it themselves.

As 2016 progresses, we’ll see the broader market put big data capabilities to work and the benefits of big data will, in turn, spread beyond the privileged few companies that were early big data adopters.

What’s exciting about this, beyond the fact that big data and its rewards are becoming more real for more companies, is how all this momentum is accelerating innovation around big data analytics in general. As a result, here are 5 big data analytics trends to watch for over the next 12 months:

Trend #1: Citizen Data Scientists Rise Up!

Gartner predicts the number of citizen data scientists — business users who derive insights without relying on the traditional data scientists for data preparation help — will grow 5X faster than their highly trained data scientist counterparts between now and 2017.

Across the Platfora customer base, we are witnessing companies build environments that enable this user group to work with machine learning and data science techniques without being statisticians. This means increased use of tools and solutions that enable do-it-yourself data discovery.

Take Vivint for example. As one of the largest home automation companies in North America, Vivint collects a variety of data (sensor, behavioral, weather and geodata) to deliver its smart home solutions. “Providing non-technical employees with self-service access to data in Hadoop is creating lots of new business opportunities for our company and is helping us deliver a better customer experience,” said Brandon Bunker, Senior Director of Customer Analytics and Intelligence.

Trend #2: Understanding Behavior is the Killer App

The ability to understand a broader pattern of behavior (people, web visitors, devices, etc.) beyond a single channel (such as web clickstream) is increasingly being viewed as mission-critical by companies of all stripes and sizes.

This need to understand things such as attribution, cohort behavior, and conversion paths, as well as the ability to flexibly segment populations based on whatever behavior patterns companies want to identify — at web scales encompassing hundreds of millions or billions of visitors or devices — is no longer seen as a niche or enterprise-sized “nice to have” but as mainstream “gotta have it”.

And for good reason. This kind of behavior analytics and the decision-making it supports are incredibly powerful differentiators for retail, web and gaming as well as more traditional industries. No longer is classic segmentation based on demographics or hard-coded attributes enough against competitors with this kind of flexible understanding of their audience.

Take Riot Games, the 1,500 person gaming company most famous for League of Legends, for example. They use big data analytics to deeply understand the player experience of their more than 67 million monthly active players and to make daily data-driven decisions to optimize gameplay as they add and refine features into the gaming environment.

Trend #3: Minding the Gaps in IoT

In 2015, dozens of major technology and manufacturing vendors introduced products and platforms for the Internet of Things (IoT), including General Electric, Google, Samsung and SAP.

These platforms all have interesting attributes, however most of these products and platforms suffer significant data management shortcomings. They only focus on capturing their own IoT data, and then hoard it away in silos. The result: a lack of connection to the broader world of data assets (including customer data, clickstreams and other digital footprints) and a corresponding analytics gap for organizations that want to ask bigger questions across silos.

As 2016 continues, look for recognition that IoT platforms, as well as the devices themselves, need to be active participants in the modern data lake architecture, capable of externalizing assets for broader analysis. Enterprise providers and device makers will be pressured to provide solutions that can close these data and analytics gaps.

Trend #4: Apache Spark Gets Real

After nearly 6 years of development, testing and early-stage deployments, 2016 is the year in which Apache Spark needs to be visibly delivering on its promise. Signs are positive, but expect some backlash and frustration as reality sets in.

Look for the Spark open source community to roll up its sleeves and address Spark’s rough edges — especially in the areas of performance and reliability — to spur on broader adoption. Likewise, look for Spark to prove its value as a platform for data transformation, machine learning, and streaming analytics, as it works to mature the longer-term performance and reliability elements for broader enterprise adoption.

Trend #5: Data Prep Becomes Feature of Data Discovery

Until recently, data preparation was a time-consuming, “behind the scenes” task that was handled by ETL engineers. A newer crop of modern products are lowering the bar and making much of this work self-service and increasingly guided by aspects of machine learning.

We are quickly seeing an evolution here, however, from data prep’s focus on working a single dataset or transforming a handful of datasets into one, towards data prep as a part of an overall data discovery lifecycle. This has been made possible with the advent of important features like a browseable data catalog with a well-traced lineage, a security and lifecycle management solution, and a feed into a variety of analytic workflows. These needs are now leading the subsumption of data prep into the broader data discovery market.

Look for more companies to place strategic bets on vendors who embed more self-service data prep tools into larger discovery-oriented and analytic workflows. At the same time, expect increased emphasis on data discovery itself as part of a larger, end-to-end big data lifecycle where assets are managed from requirement to retirement.

Making Big Data Deliver

We’ve seen a lot of new data technologies emerge over the past few years, and there are some more radical shifts coming on the longer horizon. However, 2016 will be a digest and ‘make it real’ year — with companies focused on simplification, getting wins from their data with the people they have on staff, and a broadening of imagination and outcomes that go well beyond old-school business intelligence and data warehousing. I consider this a foundational year where the broader market will be laying those foundational production building blocks that will quickly pay off and prepare them for the next decade of change and competition.

A free eBook elaborating on these trends is available here for DBTA readers.