Empowering Modern Data Architectures for Improved Data Engineering Operations

Though seemingly simple, the task of producing fast, reliable, quality data required of data engineers is anything but; often multifaceted in its nature, the jobs of data engineers address a myriad of data areas, inevitably surfacing an equal myriad of data challenges.

Experts joined DBTA’s webinar, “The Top Data Engineering Challenges and How to Solve Them,” to discuss how organizations can navigate and best support data engineering functions, where next-gen data architecture strategies may pose a wealth of opportunity for organizations attempting to build efficient, agile, and resilient data systems.

Aric Bandy, EVP, corporate development and CloudOps at Pythian, explained that “You cannot realize an impactful and strong data quality program without first establishing and promoting a cloud modernization strategy. It serves as the foundation for all future data insights and transformations.”

If cloud modernization strategy lies at the base of empowering any great data infrastructure, Brandy argued that data issues can be eliminated by eradicating bespoke builds, or manual, custom, on-off systems that impede holistic infrastructure efficiency.

Through automating cloud environments, organizations can reap a range of benefits, including increased revenue through increased speed of releases, shorter development cycles, reduced costs in labor, increased market share, reduced risk and human error, faster security audits, and more.

For data in particular, automation allows for consistent pipeline builds, automated testing, faster deployment, proper, agile scaling, and improved security and compliance, according to Brandy. These factors combine to create an adaptive data infrastructure underpinned by a modernized cloud, which ultimately supports continuous improvement.

“Cloud modernization and automation creates consistent, repeatable, and promotable configuration. It’s a framework that can compose, deploy, and manage multiple solutions,” said Brandy, which directly propels data engineering tasks into a more agile, efficient environment.

Francisco Alberini, product manager at Monte Carlo, highlighted the following data engineering trends that enterprises should keep their eye on:

  • Cost optimization
  • Data product managers
  • Semantic layers
  • Data contracts
  • Data observability

“It’s no secret that the data space is moving fast,” Alberini explained, and this rapidly evolving topic requires a variety of tools and strategies to be able to propel businesses into success.

The movement to the cloud is a costly migration; cost optimization is fundamental for data teams attempting to modernize with minimal impact to financials. Thinking about building practices around cost management is critical for any data team, as these sentiments should drive decision-making in a world where costs are continually growing.

The rise of the data product manager (DPM) introduces a leader to the data team—someone who answers data questions, optimizes stakeholder alignment, evaluates technology, and more—to “own” the responsibilities of enterprise data teams. This specialized skill set, which combines business knowledge with technological expertise, is expected to become an increasingly hot commodity for modernizing data architectures, Alberini explained.

The semantic layer, or the metrics layer, helps to solve complexity through consistent metrics. Imprecise reporting is a major roadblock for a data infrastructure; semantic layer tools make it possible to maintain consistent, holistic metrics across data teams.

Data contracts—similar to SLAs—are becoming more widespread in data organizations as governance continues to evolve. Alberini explained that this schema enables teams to know what data should be surfaced in accordance with governance guidelines, ultimately simplifying data management and data usage.

Finally, Alberini explained that data observability is critical in avoiding data downtime and, in turn, lost revenue. Instead of being tasked with firefighting, data observability allows data teams to anticipate data downtime by understanding the health of their ecosystems.

John de Saint Phalle, senior product manager at Precisely, noted that with the massive hype toward cloud migration and modernization, organizations need to be careful; data integrity,  de Saint Phalle argued, is a business imperative.

According to a Data Trends survey from 2019, 68% of organizations say disparate data negatively impacts their organization, while an HBR study further presents data quality challenges, citing that 47% of newly created data records have at least one critical error.

Poor data quality, then, can only lead to poor data engineering; de Saint Phalle explained that data with maximum accuracy, consistency, and context for confident business decision-making is key for empowering an efficient data architecture.

He  introduced Precisely’s Data Integrity Suite, which is composed of numerous data services—including those for data integration, observability, governance, quality, enrichment, geo addressing, and spatial analytics—that enable enterprises to deliver accurate, consistent, and contextual data necessary in modern industry.

In particular, the data integration piece is key toward driving an effective, modern data architecture; breaking down silos by quickly building modern data pipelines that drive innovation, de Saint Phalle explained, provides a data quality foundation that is capable of empowering data engineering tasks with positive outcomes.

You can view an archived version of this webinar here.