Subscribe to the 5 Minute Briefing Information Management email newsletter

Five Minute Briefing - Information Management
May 18, 2022

Five Minute Briefing - Information Management: May 18, 2022. A concise weekly report with key product news, market research and insight for data management professionals and IT executives.

News Flashes

Data consumers need data for BI and analytics to make business decisions. But for most organizations, their current data infrastructure isn't keeping up with demand. In a presentation at Data Summit 2022, titled "Building the Open Data Lakehouse," Mark Lyons, senior director, product management, Dremio, explained why more organizations are moving their analytics and BI to an open data lakehouse and how you can build a successful lakehouse strategy.

At Data Summit 2022, Sudha Viswanathan, staff engineer, Wayfair, presented a talk titled, "Gaining Insights From Clickstream Data." Viswanathan explained that Wayfair's clickstream data refers to data that contains information about customer actions on the Wayfair site, such as what pages were viewed, the products that were clicked, what was added to the cart, which URL brought the customer to Wayfair. This helps Wayfair make data-driven decisions regarding revenue attribution for different marketing channels and improves traffic and test analysis and ad bidding.

A common pattern in data lake and lakehouse design is structuring data into zones, with bronze, silver, and gold being typical labels. Each zone is suitable for different workloads and different consumers. For instance, machine learning algorithms typically process against bronze or silver, while analytic dashboards often query gold. This prompts the question: Which layer is best suited for applying data quality rules and actions? The answer: All of them.

The coming decade is going to require a modern data warehouse to meet demanding new requirements for machine learning, data variety, and real-time analytics—while still satisfying the more traditional need for analysis of structured data at scale.

Modern data analytics platforms that fuel enterprisewide data hubs are critical for decision making and information sharing. The problem? Integrating legacy data stores into these hubs is just plain hard, and there is no magic bullet. However, the best data hubs include all enterprise data.

As the world becomes increasingly data-driven, AI/ML algorithms are being incorporated in most business applications. Historically, data in AI architectures was moved to a central location to perform both model training and inference. This centralized approach is becoming untenable due to cost, performance, and privacy reasons.

Companies now collect more data than ever before, but challenges remain for accessing and analyzing them. David Armlin, VP solution architect and customer success, ChaosSearch, discussed "Learn, Unlearn, Relearn: Embracing the Future of Cloud Analytics," during his Data Summit 2022 session.

As sensor technology becomes more affordable, companies of all sizes will have the ability to embrace IoT strategies to build innovative products and services and establish new revenue streams. Yet, as with any promising technology, challenges remain. At Data Summit 2022, Paul Scott-Murphy, CTO, WANdisco discussed "Solving the IoT Data Management Puzzle With Gateways to the Cloud."

Wednesday's Data Summit 2022 keynotes opened with Laura Sebastian-Coleman, data quality director, Prudential Financial, who discussed "Data Quality Deniers & What We Learn From Them." One of the biggest organizational obstacles to data quality management is basic pessimism about the possibility of managing the quality of data. This is due to lack of clarity—the goals and processes for data quality management have not been defined or have not been understood—and disbelief that the quality of data could be subject to control.

According to Kathy Schneider, chief marketing officer, Kx, the definition of "real time" is changing as the window of opportunity for making decisions is shrinking. It is widely accepted that insights-driven businesses perform better than others, and how fast they can use those insights is more critical than ever.