<< back Page 2 of 4 next >>

The Data Warehousing Sanity Check

By Joe Caserta

Jan 4, 2017

Not only have the technologies of our data sources changed, but there are now new data sources, such as social media, sensor and machine data, system logs, and even video and audio. These sources are not only producing data at incredibly rapid rates and with an inherent mismatch to the relational model, but frequently there is also no internal ownership of the data, creating difficulties in governance and conformance to a rigid structure.

As the evolution of business evolves from conventional and executive wisdom, through data supported decision making, to become an analytics-driven enterprise, the technical data ecosystem must evolve into a modern engineered platform for the business to sustain competitive survival.

The Big Data Revolution Is Real

In response to new business demands, there has been substantial disruption in IT surrounding the tools and techniques used to store and process data. These innovations were created by relatively new technology companies and continue to evolve as they are embraced and expanded upon by other organizations with their own unique data challenges.

For more insight into big data technologies and trends, get the free Big Data Sourcebook

The changes brought on by the big data era are not caused only by access to larger amounts of data; the true catalyst is that departments within organizations—with all types of data—now approach data problems in ways that are tailored to their specific departmental needs. It’s no longer a one-size-fits-all enterprise venture that requires that each need be molded into a traditional monolithic system. IT organizations must support departments to objectively design and build analytics systems based on their specific business and data requirements, not on preconceived design approaches. Of course, a larger variety of options in the technology landscape are required, and so a comprehensive reference architecture for this new data ecosystem is critical for manageability.

Engineer and author Martin Fowler coined the term “polyglot persistence” in reference to this movement. He defines polyglot persistence as the situation “where any decent sized enterprise will have a variety of different data storage technologies for diverse kinds of data. There will still be large amounts of it managed in relational stores, but increasingly we’ll be first asking how we want to manipulate the data and only then figuring out what technology is the best bet for it.”

In other words, it’s natural for an organization to adopt a variety of new storage and data processing technologies based on requirements. The concept is an extension of “polyglot programming” and “microservice architecture,” where languages and platforms are chosen based on the ability to tackle different types of problems.

The Lasting Relevance of the Data Warehouse

Rest assured that the data warehouse is still relevant in this new era—but it is not alone.

Once accepted by the mainstream, big data technologies such as Hadoop were picked up by various organizations to solve the most challenging data problems. Most started out as a proof of concept, and then frequently launched in a production-like capacity. Unfortunately, though, many were built completely in silos, with no regard for enterprise architecture.

“Data quality” and “data stewardship” were considered bad words, and concepts of the “old way” were almost completely ignored in design and implementation. Ultimately, many of these tools suffered from service-level issues and interoperability challenges with other systems. There was a general lack of trust from consumers of data. The concepts behind data warehousing are now becoming critical again, particularly as they apply to big data systems. Analytic systems still need the concepts of data governance, data quality, and data stewardship. And, conformed master data and interoperability between applications still matter. The traditional EDW continues to have a place in modern data architecture, though its primary function has shifted.

Enter Agile and Collaboration

Historically, when building data solutions, the waterfall approach has been used to plan and manage projects. In this way, as development progresses through the phases of the project, data analysts, data engineers, and data scientists complete their work in separate tasks daisy-chained together. Collaboration between these domains is the next step toward the future.

<< back Page 2 of 4 next >>