Richard Winter, CEO & Principal Consultant, Wintercorp LLC and Faculty Member, TDWI (tdwi.org), talked about how data warehouses have changed since the 1990s, the confluence of trends affecting data warehouses today, and how to select one that works for you in a session at Data Summit 2022.
Winter described the origin of the data warehouse to bring together data for consistency and make it useful and how it has evolved since that time.
Despite the advances made by data warehouses over the years, a one-size-fits-all approach will not work for many companies. Issues related to scalability must be considered for data warehouse selection and this includes data complexity, data size, and query complexity, which all feed into the scalability question. Serious mistakes can be made in choosing a data warehouse if these issues are not considered.
At the lower end of the spectrum, you can choose any garden variety data warehouse and it will probably meet your requirements, but when the combination of scale and complexity heightens so does the risk, and you have to very careful.
Where do you begin? You have to define quantified architectural requirements:
- The database macro structure
- Workload, e.g., query classes and frequencies
- User population
- Service levels
Then do a quantified evaluation with measurement, working from your estimate of macro requirements and then use the test results to inform your architectural evaluation. It is very critical that you test for the workload and database you anticipate in the future and not your current needs, Winter stressed.
The coming decade is going to require a modern data warehouse to meet demanding new requirements for machine learning, data variety, and real-time analytics—while still satisfying the more traditional need for analysis of structured data at scale.
In closing, Winter said, the 2020s will require much more of data warehouses.
It is important to:
- Define your requirements quantitatively, looking forward
- Consider service levels, scale, and database complexity, and query complexity
- Don’t assume that cloud elasticity is the answer
- Performance and cost differences are driven by architecture differences
- Measure and test—don’t accept vendor claims
- Use a systematic process
Many Data Summit 2022 presentations are available for review at https://www.dbta.com/DataSummit/2022/Presentations.aspx.