<< back Page 3 of 4 next >>

Reversing the 80/20 Ratio in Data Analytics


Before implementing any kind of technical tool, “organizations must realize the business value of the data,” said Dan Wu, privacy counsel and legal engineer at Immuta. “Internal champions can identify specific use cases and form a cross-functional coalition—including governance, risk, and compliance—to rally behind them. With this coalition, a team can identify solutions to balance utility and safety.”

It is important to start with a strong, intelligent management foundation that can extract data intelligence through data integration, orchestration, metadata management, connectivity, and AI and machine learning services, supporting on-premise and cloud deployments, said Wesselmann. “Focus on collaboration between various roles such as data architects, data integration experts, developers, and even data scientists. This collaboration supports DataOps initiatives for using tools within an agile framework, enabling data prep-aration and automation of data workflows, and provides transparency across the data roles.”

Tools and Platforms

A cohesive organizational approach also paves the way to adoption of current architectural methodologies involving DevOps and agile protocols. “Architectural trends like containerization and microservices provide the opening for moving fast and adapting, but those architectures are only successful if the organization running them has embraced the decentralized, bottoms-up mentality that created them in the first place,” said Newton. Service APIs can also supplant mechanisms such as ETL custom logic to speed up and enable teams to build and use common dimensions. This “frees up analysts and data scientists embedded within each business unit to focus on collecting detailed atomic fact data to answer things like ‘Who is my most valuable customer?’ or ‘Which of my regions are underperforming?’” said Chakravarthi.

Emerging tools and platforms can help address challenges with data engineering, which is still characterized by manual tasks. “Solutions that can help accelerate this process, such as data lakes technology, can organize data from multiple sources by building relationships, removing duplicates, and automatically refreshing data,” said Alex Ough, senior CTO architect at Sungard Availability Services.

“It’s easier than ever to collect and aggregate this data with a wide array of flexible cloud data warehouses and cloud-native data pipeline tools,” said Bailis. However, he added, “there’s a premium on every analyst’s time and attention. It just takes too long to properly diagnose and assess the impact of every potential change. To combat this imbalance between cheap data and expensive people, teams can look to adopt platforms that augment their ability to rapidly diagnose changing KPIs and recommend next steps collaboratively.”

Even existing formats can be repurposed to achieve a simpler architecture. Stevenson advocates a DataOps approach that supports a data mesh architecture, which provides for “discoverability, visibility, and governance backed by tooling that is supported by SQL, a ubiquitous data language. This simplifies developing and managing data-intensive applications running on the data infrastructure that is now a commoditized technology. When this happens, everyone can contribute.”

The challenge, of course, is working around legacy infrastructure, which may be too expensive to rip and replace. Bry Dillon, vice president, cloud, channels, and community for OSIsoft, calls for “purpose-built operational systems that can access the data” without “disrupting critical functions of systems which could be decades old.” He also recommends “systems that can normalize and contextualize the data to give as much color and depth to the data as possible. Some of this can be gathered from the source systems themselves, but many of these are quite old, and it is best to leverage the understanding by the people who operate them.” AI and machine learning technologies can also play a role, especially if they “are more suited for operational datasets so they can get a faster time to value.”

Business data delivery can also be accelerated through self-serve data management platforms. “These platforms come in many forms—delivered on cloud or on-premises,” said Brian Sparks, product manager for data integrity at Vertex. “Some focus on simple user interfaces, while others focus on offering the most functionality. What they have in common is that they typically offer some form of ETL, some form of orchestration allowing users to set up data pipelines for repetitive data flows, and they have the ability to ensure the quality required for your business.”

<< back Page 3 of 4 next >>


Newsletters

Subscribe to Big Data Quarterly E-Edition