The new year is chock-full of exciting possibilities for data and its workers, though also with ongoing challenges as the needs of data continue to evolve. As data advances, so too must data engineering teams in order to keep up with the velocity of data’s growth.
Shane Murray, field CTO at Monte Carlo, and Samia Rahman, director of enterprise data strategy and governance at Seagen, gathered for a DBTA webinar, “2023 Predictions: Challenges and Opportunities for Data Engineering Teams,” to discuss their predictions for the most critical challenges and opportunities facing data engineering teams in the coming year.
Rahman emphasized the need to invest in data governance, particularly in privacy and compliance, for data teams to adapt to the constantly evolving rules and regulations for a variety of industries.
Additionally, Rahman described a surge of a new breed of engineers—as Rahman titled, full stack data engineers—that understand both the front-end of the business as well as the back-end of data engineering aspects involved with app dev teams and data product managers.
Murray offered his prediction centered on the unsettled BI sphere, explaining how the current solutions for BI are not meeting the needs of data teams or their consumers. He expects to see increased adoption of collaborative notebooks that are well-integrated with the modern data stack workflow in data science and analytics teams.
Murray further anticipates that there will be more direct integration between the data warehouse and end-user tools, such as Google Sheets and Excel, as opposed to increased investment in traditional BI tools. Data reliability practices are also expected to expand, cited Murray, particularly going beyond detection and resolution into more preventative measures.
In terms of cost optimization, Murray remarked that many workers are being asked to do more with less; using metadata on the warehouse, either from Monte Carlo or other sources, to better manage the complexity and cost of the storage and compute in customers’ environments is a developing trend that Murray noted. He also pointed to ROI and the increased demand of knowing the value of investment in solutions and tools, which Murray explained as something overdue—a welcome shift toward cost management.
Rahman highlighted how “ways of working, the team cost and structure, are things we are continuously monitoring in relation to the ROI; what the cost of a data product running, and how we can optimize that by building out the right platforms and right technology selections to reduce that cost, is top of mind for us.”
Additionally, Rahman pointed out the necessity of streamlining onboarding and offboarding processes to navigate the “talent wars” that may impede enterprise-wide productivity.
Transitioning into a discussion of data mesh and its relevance for the new year, Rahman offered context by defining data mesh as a socio-technical approach by which we can streamline the way we get value out of and manage data. It is a sum of people, processes, and tech—not just the tech—structured by domain topologies that rivals centralized approaches.
Murray added that users defining what a data product is within their data mesh—whether it be a production table or set of tables—has made it remarkably easier for those data teams to standardize on the skills required to contribute.
Designing and building an enterprise data platform to generate products at scale has become a significant challenge for modern enterprises.
Murray explained that “for most orgs, I see this in three phases. Often, especially at larger companies, you’re starting with a legacy where there is a lack of platform and potentially incoherent standards; shifting from there into a transformation where you’re building out a platform to support a clear set of initiatives and goals that drive immediate value for the business; and then shifting from there to a third phase where you're actually scaling that platform to be extensible for future, less-defined problem and more generalizing the technology of that platform to support a wider array of use cases. I tend to think of it as going from legacy to transformation to scaling.”
He added that setting the outcomes you wish to achieve and communicating those goals is critical for embracing and encouraging a scalable data platform within an enterprise’s infrastructure.
Self-service for data platforms, Murray explained, is something that is seen existing in hybridity today; most platform teams Murray encounters are mixed between providing the platform as a service while also managing critical data products or parts of the workflow themselves.
Rahman further added that bottlenecks for centralized data ingestion teams are being created when domain product teams ask for ingested data, wait in queue for data that does not fully meet their needs, going through turns until they finally get adequate insights. When something breaks, the same pattern of inefficiency takes place—Rahman emphasized the need to evolve this cycle by embedding source data engineers closely with domain teams to avoid bottlenecks.
Ensuring that data products are trustworthy is a growing trend for 2023, where data observability must fit into this need for reliability and quality. Data observability platforms, such as Monte Carlo, provide these tools to teams so they can avoid downtime and untrusted data, Murray noted.