RESEARCH@DBTA: Data Mesh, Data Fabric: Ideas Whose Time Has Come, Survey Shows

The data lakehouse, data fabric, and data mesh, are no longer just dreams shared by analysts at conference presentations.

A new survey of more than 200 IT leaders by Unisphere Research, a division of Information Today, Inc., finds considerable uptake of these modern data architectures. “The Move to Modern Data Architecture: 2022 Data Delivery and Consumption Patterns Survey,” May 2022, was conducted in partnership with ChaosSearch and included input from directors/managers of IT, directors/managers of analytics, and data architects.

Popular industries included technology, financial services, healthcare, manufacturing, and telecommunications. Two out of five survey respondents were from companies with 5,000 or more employees.

For the purposes of this survey, data lakehouses are defined as architectures that combine the structure of a data warehouse with the openness of a data lake and run on a commodity infrastructure. Data mesh is a highly decentralized, self-service architecture in which datasets are managed or controlled by business units across enterprises. Data fabric is a more centralized architecture that supports metadata designed to integrate disparate multiple data platforms and pipelines that simplify access to these assets.

Data warehouses still dominate the enterprise scene, employed at 82% of enterprises in the survey. In addition, 36% report they exclusively rely on data warehouses and have yet to adopt other forms of architecture, such as data lakes, data lakehouses, or data mesh. About half, 49%, currently use data lakes, while 21% are now employing data lakehouses. Another 19% say they currently have data mesh and/or data fabric architectures in place.

While data lakehouses are still relatively new on the scene, 43% of companies plan to start investing or increase investments in these types of platforms. At the same time, those respondents already using data lakehouses plan to significantly increase their investments, demonstrating the value being seen once these platforms are in place. Ninety-four percent plan to increase their spending, with 40% intending to increase their levels of investment significantly.

Training is of the essence as companies shift to these emerging data architectures. Tellingly, there is also a high percentage of data lakehouse users dedicating resources to skilled staffing. Seventy-eight percent see committing staffing and training resources to data lakehouse development as a priority, with 44% indicating this is a “high priority.”

While fewer than one in five enterprises are using some variation of data mesh and fabric architectures, many are cautiously eyeing the technologies. As they are relatively new approaches on the scene, it’s no surprise that investment plans are relatively tepid overall, with only 47% of respondents intending to start investing or increasing their levels of funding.

Those already implementing data meshes and fabrics are bullish on ramping up their investments in these approaches, however—92% intend to increase their funding. As is the case with data lakehouse

users, commitment to staffing and training among data mesh/fabric users runs extremely high. Sixty-one percent consider staffing allocation and skills development to be a “high priority,” far outpacing the levels of priority seen among those focusing on other data environments.

The growing diversity of data architecture environments—punctuated by data lakes, lakehouses, mesh, and fabric—means more sophistication and planning are required to manage and deliver data insights and reports. Data quality and timeliness are the most pressing issues cited, and close to two-thirds, 65%, report these issues have increased over the past 3 years, with 20% of the increase reported as “substantial.”

Data quality is a more pressing issue among modern data architectures, versus traditional data warehouses, where data is validated, loaded and transformed, and managed in a central repository. While only 27% of data warehouse users express concerns about data quality, this percentage doubles among those employing data mesh/fabrics, lakes, and lakehouses.

In addition, data lakehouse users report to have seen an increase in data delivery issues, as cited by 75% of respondents. Data warehouse users are not immune from the data crunches, either—69% say these issues have increased. IT leaders now have a plethora of choices available in terms of data platforms, as well as architecture patterns and enabling technologies, to build modern analytics ecosystems. Cloud will play an increasingly central role. A growing number of enterprises are now adopting next-generation solutions such as data lakehouses, data fabric, and data mesh.

These new architectures are being pressed into service to support leading-edge initiatives involving artificial intelligence, machine learning, and the Internet of Things.