<< back Page 2 of 4 next >>

New Technologies in a Big Data World


With the rise of digital products comes a new approach to big data analytics—product analytics. “Every single person who uses a digital product is giving information about how a manufacturer can make it more successful,” said David Robinson, director of data science at Heap. “Digital products like SaaS, ecommerce, and mobile applications are able to track behavioral data—the stream of pageviews, clicks, and other interactions users perform as they engage with a product.”

Product analytics “turns data into actionable insights for improving a product,” he continued. “Just as a previous generation of analysts and database engineers learned to turn warehouses into business intelligence, we’re seeing a transformation in our ability to get value out of the massive amounts of behavioral data we’re collecting. Product managers and designers no longer have to rely on interviewing a few users at a time; they can see how thousands or millions of users are engaging with their product in the real world. Directors don’t have to guess where to apply strategic investment into a product; they can measure and compare the impact that each investment will have.”

A potential issue that needs to be addressed, however, is causal reasoning, Robinson cautioned. “The science of finding actionable product insights is that of reasoning about causes: answering the question of ‘If I made this product change, what would happen to the business outcomes?’ And behavioral data is particularly rife with confounding traps where the unwary might mistake correlation for causation. This problem arises at every stage of the product development process. A product manager might be able to use behavioral data to measure the number of users affected by a bug, but even more important to them is to measure the impact the bug has on business outcomes. And if a product manager is careless with behavioral data, they might draw an absurd conclusion— like ‘Among all visitors, users who run into an error on the checkout page are 10x more likely to purchase’—simply because those are the users who reached the checkout page in the first place.”


Behind artificial intelligence is machine learning (ML), where data is advanced to train algorithms and models. However, these tasks—and the amount of data needed—can be overwhelming. “Data scientists still spend 45% of their time on data prep, and there’s a global shortage in trained data scientists,” said Eric Lundberg, senior product manager, Camunda. “Data systems that can provide clean, ML-ready data can reduce the amount of data scientist time and push the project’s ROI high enough to be worth investing in.”

Today’s software providers are increasingly offering ML-ready datasets and “are making it easier for anyone to create high-quality, meaningful machine learning models,” Lundberg said.

Providing ML-ready datasets eliminates the need for a data pipeline completely for basic use cases. “This also extends the reach of AI to citizen data scientists,” said Lundberg. “Rather than running your project through tight competition for precious data engineering or data science resources, any team can create their own machine learning models.”

At the same time, “it’s impossible to make guarantees about data quality if you aren’t the one collecting it,” he cautioned. “Missing data, inconsistent data collection, or human errors take time to correct, and they’re all specific to the data collection method.” But things are moving in the right direction.

Operationalizing data for machine learning “has a great impact on how data is managed and delivered,” said Manasi Vartak, founder and CEO of Verta. “One example is the emergence of feature stores used by data scientists to ensure continuity between ML model features across model development and model production environments. As companies innovate to digitally transform, the first logical step is to get data in a position to research and experiment with potential AI or ML solutions to a business problem. Now that model building capabilities have matured and stabilized, operationalizing models has entirely unique requirements. Feature stores are one aspect of bridging the divide between model build and model operations, but there’s a whole slew of additional considerations for organizations operationalizing data for machine learning.”

<< back Page 2 of 4 next >>


Subscribe to Big Data Quarterly E-Edition