ClearStory Data Advances Automated Data Preparation with Machine-Based Approach

ClearStory Data has announced an update to its data inference and data harmonization capabilities called Infinite Data Overlap Detection (IDOD).  With this R&D innovation, ClearStory’s Spark-based analytics solution now detects and infers data patterns and customer-specific data types for all values for data that a user connects to as part of their analysis.

According to the vendor, the new IDOD advance addresses a market need to blend and harmonize complex, multiple “categorical value” data sources that are highly dimensional. The nature of this data analysis complexity, and the diverse sources the data originates from, creates the root cause for the delays and challenges in speeding business insights. The new capability, the company says, addresses an expanding need for organizations is faster and more precise insights on large, complex data sources. “By adding the advanced IDOD capability to automatically recognize infinite categories, values and granularities in data sources,” said Dr. Tim Howes, CTO of ClearStory Data, “we speed the cycle of data to insights by addressing a significant pain point that enterprises across all industries face today: the intricate, tedious task and massive time sink caused by manual data wrangling on large, complex data.”

ClearStory’s aim is to replace traditional methods of manually matching data or column headers or sampling data, which, it contends, are labor-intensive and error-prone. The new IDOD capability can be used to determine how complex data from multiple sources should be blended, viewed, and visualized on the fly. IDOD plays the role of data modeling advisor to the business user, enabling them to blend data together and discover insights without data modeling expertise and the time manual processes require.  

In research it conducted in October 2015, ClearStory says it found that nearly 70% of companies need access to refreshed data insights either hourly or daily. Eighty-six percent of them struggle with this challenge on a regular basis where four or more data sources and file formats are involved for analysis. In addition, 68% said they experienced “data blindness” at least once per week because they could not identify “what’s happening now, and why” soon enough, hurting their ability to make decisions.

The most difficult part of the problem that is being addressed with the new capabilities, ClearStory says, is the presence of customer-specific attributes and distinct values and nuances of data such as product names, category names, distinct phone numbers, product codes, and brand attributes. These data and attributes in particular, have traditionally required heavy manual data wrangling to reconcile and inspect many thousands to millions of unique values with integrity and consistency.

For more information, go to