Thomas Hazel, founder/CTO, ChaosSearch, examined the tools and technologies to get more value from data and how to determine which ones are right for your organization in a Data Summit 2022 keynote.
When it comes to data analytics infrastructure, there are a wealth of options for storing and querying information, Hazel noted, who offered definitions of a data lake, lakehouse, and data mesh.
According to Hazel:
A data lake is:
- A centralized repository of data in its raw form
- As a service with security /scalability /durability
- The foundation for archival ETL-ing and analytics
A lakehouse is:
- A combination of data lakes and warehouses
- A subset of data platforms focused on SQL/BI
- Provides data lake storage with data warehouse rules
Data mesh is:
- In a way the next generation of data fabric
- Data integration with distributed governance
- Data as a service driven by human experts
New technological approaches allow for more flexibility in cloud data management and are democratizing data for use across the organization. By stripping away data engineering complexity and lowering total cost of infrastructure ownership and maintenance, more and more organizations are unlocking the value of analytics at scale.
While still new, there is a lot of interest in data mesh, said Hazel, citing a ChaosSearch/Unisphere Research survey that found that:
- Less than 20% of respondents are using mesh/fabric
- More than 80% of respondents are still using warehouses
- Less than 2 years of architecture maturing for mesh/fabric
- More than 5 years in architecture maturity for lake/house
- More than 10 years of architecture maturity for warehouses
In terms of spending, the report found that:
- The warehouse dominates and investment is still growing
- Data lake and lake houses are where the new investment is
- Data mesh/fabric is on the rise, but newbies are cautious
- For data mesh/fabric in production, respondents are bullish
In closing, Hazel advised attendees who are considering using a new technology, such as a data lake, lakehouse, or data mesh, to continue to invest and work with what has already been successful but to also experiment with new architectures.
Many Data Summit 2022 presenters are also making their slide decks available for review.