KDNuggets, a community site for data professionals, ranked “We Don’t Need Data Scientists, We Need Data Engineers,” by Mihail Eric, a venture capitalist, researcher, and educator, as its top story of 2021. This sentiment holds even more true today, especially with the unending rush to leverage both generative and predictive AI within enterprise operations. Without the right kind of data, AI is dead in the water.
Data engineering—which includes not only data engineers by title but also their counterparts in adjacent fields such as database administration, management, architecture, and analysis—will ensure that AI initiatives are kept alive, well, and thriving. Accordingly, data engineers have risen to become the new stars in the AI-driven organization. For the purposes of this article, we assign roles in data engineering across a cluster of data professional categories. Collectively, as part of an overall data engineering team, these professionals are setting the tone and providing the guidance needed for developing fair, accurate, and business-viable AI models.
Everyone wants to embrace AI-related large language models in a big way, which only means more demand for data engineering. As a professional category, data engineers are essential, because in many cases, data scientists have been tasked with vetting and managing data resources, which diverts their time and resources away from building data-driven narratives for their businesses. In addition, since AI algorithms are tremendous data hogs, organizations need vibrant data pipelines to maintain the effectiveness of their AI efforts.
This is fueling significant shifts in the practice and theory of data engineering. Demand for real-time, AI-ready data is creating new challenges and opportunities for those in data engineering and adjacent fields such as database administration, management, and analysis. In the process, data engineering has entered the spotlight as enablers of the 21st-century enterprise.
This also requires more business savvy as part of the data engineering skills mix. Conversely, business teams need to have a better understanding of their data, and what it can do for their organizations. “Data practitioners are being asked to expand their knowledge of the business—while functional teams are finding they require their own internal data expertise to leverage their data,” a recent report from MIT Technology Review Insights states.
In essence, organizations are leaning heavily on data-engineering teams to turn their data assets into gold. Considerations such as organizational structure, data platform and architecture, and data governance are all essential to this process, especially as AI gets involved. Data engineers and their related colleagues are the go-to people who can make this happen.
The role of data engineering teams has always been clear: to design, construct, and maintain data architectures and ensure the viability of data moving through the organization’s systems—and this remains the primary mission. This includes ensuring that data is available for applications when and where it is needed by the business.
Helpful practices and technologies have emerged to help data engineering teams deliver on this mission, such as DevOps, DataOps, AIOps, and collaborative pipeline tools. Automation has lifted many of the burdens of database preparation, data modeling, quality assurance, and backup and storage.
As a result, the roles of data engineering teams are being elevated, from backroom maintenance to the forefront of the business. Data engineering is evolving into a role that involves greater strategizing for businesses seeking to either monetize data, leverage data to gain advantage in their markets, or boost innovation. This also involves serving as guardians of the data, ensuring compliance, cybersecurity, and privacy. Importantly, data engineering means making sure the data is there and it is ready anytime it is needed. This new importance has resulted in “staggering growth in data engineering jobs,” the MIT report states.
The evolving nature of data engineering can be seen in recent job descriptions:
- Ensure data accessibility for all: “Focus on centralizing our existing data into a library to make it more accessible to our teams and departments. Combine traditional and loosely connected intelligence data, developing and maintaining ETL processes, and building custom data solutions. Build and maintain scalable pipelines and infrastructure using AWS Step functions.”
- Oversee both data analysis and data governance: “Analyze business processes to identify areas for improvement and optimization. Collaborate with stakeholders to gather and document business requirements. Provide insights and recommendations based on data analysis and business needs. Develop and maintain data governance frameworks. Ensure data quality and integrity across various business processes.”
- Provide leadership to data teams: “Report to the chief technology officer and focus on driving the data strategy as a connected part of the brand ecosystem. Responsible for leading the development of data assets, privacy frameworks, and data standards and providing input into and managing the data roadmap. Contribute to privacy frameworks, terms, consents, and approaches to ensure we empower consumers, to effectively leverage data for their advantage, to abide by all laws and relevant best practices.”
- Take the lead on AI initiatives: “Collaborate with data scientists and analysts to understand data requirements and translate them into scalable, high performant data pipeline solutions. Support data discovery and data preparation for model development. Perform detailed analysis of raw data sources by applying business context and collaborate with cross-functional teams to transform raw data into curated and certified data assets to be used for machine learning and business intelligence use cases. Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues. Develop, test, and maintain robust tools, frameworks, and libraries that standardize and streamline the data and machine learning lifecycle.”
Key to all data engineering roles is ensuring that the business comes first, and that all activity is directly connected with business requirements. Today’s data engineer needs to be a technologist, leader, facilitator, and troubleshooter.
This calls for close collaboration with data scientists and AI training specialists to ensure their models are receiving the data required to support business decision-makers and decisioning systems. Just as importantly, data engineering teams need to work in tandem with data owners to ensure that the right data sources are being tapped, and with end users to ensure they are working with the best available information.
The challenge in today’s environment is to remove the obstacles and mitigate the challenges to effective data engineering—to ensure that data pipelines keep flowing, that much of the process is automated, is developed and operating collaboratively, and data insights are on target with business requirements.
Even if your job title is something other than “data engineer,” in many ways, everyone on the data team now has a role to play in ensuring the viability of a data-driven and AI-driven business.