When Data Virtualization?

Nov 30, 2010

By David Besemer and Robert Eve

"Evolution is not a force but a process; not a cause but a law." John Morley, On Compromise, published in 1886.

There comes a point in technology evolution when the question of adoption moves from "why implement" to "when to implement." Typically, we reach this inflection point when technology has developed sufficient functionality to represent a mature solution. In data virtualization, we have recently reached this inflection point of "When data virtualization?" Its functionality has matured to a fitness for both project-level and enterprise-scale implementations in a variety of use cases.

By avoiding the typical higher costs and longer lead-times associated with traditional approaches like data consolidation and replication, data virtualization addresses key enterprise-scale data integration challenges including incomplete information, timely data, source and consumer disparity, complexity, multiple physical locations, and governance. Let's look at each of these challenges briefly, and how data virtualization applies to them.

Challenges

In today's interconnected business environments, data more often than not needs to be related to other data to provide insight. By virtually combining multiple sources, both inside and outside the firewall, data virtualization overcomes the issue of incomplete information.

Up-to-the-minute data is now a key business requirement. Data virtualization's high-performance query optimization algorithms query, on demand, the data required by consuming solutions, without impacting source system performance, thus resolving the challenge of obtaining timely data.

Because data is diverse, with source structures and consumer needs rarely a match, the data abstraction capabilities of data virtualization transform raw data from its native structure and syntax into high-quality views and data services. Clearly described and standards-compliant, these objects are easy for solutions developers to understand and for solutions to consume.

In large enterprises, data is often difficult to identify. By automating entity and relationship identification, data virtualization accelerates data modeling to overcome complexity.

When information resides in multiple repositories and far-flung geographies, it may be hard to reach by decision makers. Wide data access securely exposes required data, making it available from a single virtual location, regardless of where it is physically stored.

Finally, data is a critical asset that must be governed. Data virtualization fully implements authentication, authorization, encryption and data quality policies and standards to ensure maximum control.

Evolving Needs, Evolving Solutions

Originally deployed to meet light data federation requirements in BI environments, today's data virtualization use cases span a range of consuming applications including customer experience management, risk management and compliance, supply chain management, mergers and acquisitions support, and more. Further, the range of data supported has grown beyond relational to include semi-structured XML, dimensional MDX, and the new NoSQL data types. Along the way, adoption has evolved from initial project-level deployments to enterprise-scale data virtualization layers that share data from multiple sources across multiple applications and uses.

At the same time, the data virtualization solutions themselves have evolved. From a vendor point of view, many of the early Enterprise Information Integration (EII) companies who entered the market in the early 2000s have been acquired or exited the market, leaving a short list of suppliers able to meet today's more advanced data virtualization requirements. To fill this gap between supply and demand, new entrants from adjacent markets such as BI and Extract-Transform-Load (ETL) have recently announced data virtualization products that leverage these vendors' existing offerings.

How to Measure Data Virtualization Platform Maturity

For those not content to adopt an "I know it when I see it" attitude toward data virtualization, we have developed a Data Virtualization Platform Maturity Model. This model uses a five-stage maturity timeline to provide a common framework for measuring the various phases typical in software innovation, along with a chart identifying key functionality categories that, when combined, identify viable data virtualization platforms.

Maturity and Time

Figure 1 depicts the intersection of time (X axis) and product maturity (Y axis) in the first of two dimensions of the Data Virtualization Platform Maturity Model.

The entry-level phase is characterized by products with a minimal functionality set. The secondary limited phase features product releases that satisfy initial customer demands within narrow (often vertical market) use-cases. Moving up the maturity curve, the intermediate phase introduces expanded functionality in response to growing marketplace traction. These feature-rich products address an expanding set of use cases. As product releases mature, they enter the advanced stage, where products address more complex use cases as well as support large-scale enterprise-wide infrastructure requirements. To be characterized as mature, products must increase their functional depth as they expand market penetration. This phase may also be identified as products that incorporate functionality from adjacent functionality areas.

Functionality Dimension

A fully mature data virtualization platform includes query processing, transformation, information delivery, enterprise-scale operations, data source access, modeling and metadata management, caching and security.

By overlaying the five stages in the maturity dimension across the eight categories on the functionality dimension as seen in Figure 2, enterprises and government agencies can use the Data Virtualization Platform Maturity Model to assess an offering's maturity.

At its core, data virtualization's primary purpose is on-demand, high-performance query of widely dispersed enterprise data. Consequently, data virtualization platforms must ensure that these queries are efficient and responsive. If the high-performance query processing engine is immature or poorly architected, the rest of the functionality is of little consequence. Maturity is typically measured by the breadth and efficiency of optimization algorithms.

Because source data is rarely a 100 percent match with data consumer needs, data virtualization platforms must transform and improve data, typically abstracting disparate source data into standardized canonical models for easier sharing by multiple consumers. Maturity is measured in the ease of use, breadth, flexibility and extensibility of transformation functions.

Enterprise end-users consume data using a wide variety of applications, visualization tools and analytics. Information consumers expect information delivered via standards-based data access mechanisms tailored to the enterprise or government agency information system requirements. Examples include XML documents via SOAP, or relational views via ODBC. Maturity is measured by the breadth of data consumer formats and protocols supported.

Because data virtualization serves critical business needs 7x24x365, enterprise-scale operational support is a core requirement. Data virtualization platforms should be highly deployable, reliable, available, scalable, manageable and maintainable. Maturity is measured in the breadth and depth of operational support capabilities.

Data virtualization platforms must reach and extract data efficiently from a wide variety of structured and semi-structured data sources. Further, they must include methods to programmatically extend data source access to handle unique, non-standard sources. Maturity in the data source access category is measured in the breadth of data source formats and protocols supported.

Modeling and development productivity with its concomitant faster time to solution is one of data virtualization's biggest benefits. To ensure data modeler and developer adoption, the tools must be intuitive to use and standards-based. Further, they must automate key work steps such as data discovery, code generation, and in-line testing as well as provide tight links to the source control system, metadata repositories, and more. Maturity for modeling and metadata management is measured by the degree that the data virtualization platform makes simple tasks easy and hard tasks possible.

In contrast to traditional data integration that periodically consolidated (or staged) data in physical stores, early stage data virtualization platforms dynamically combined data in-memory, on-demand. As data virtualization platform functionality has matured, the platform now delivers caching to address the middle ground between the two earlier approaches by enabling optional pre-materialization of queries' result sets. This flexibility typically improves query performance, works around unavailable sources, reduces source system loads, and more. Maturity is measured by the breadth of caching options for factors such as triggering, storage, distribution, update, etc.

Finally, security is always a concern and a major IT investment. But deploying data virtualization should not force reinvention of existing well-developed security policies. Instead, it should leverage existing standards and security frameworks. Maturity is measured in the breadth of authentication, authorization and encryption standards supported as well as a high degree of transparency.

Evaluating Modeling and Metadata Management in Detail

Within each category, a number of specific capabilities can be evaluated. Figure 3 depicts evaluation criteria for Modeling and Metadata Management.

Conclusion

Data virtualization functionality has evolved to meet changing IT demands. This process has inevitably moved the adoption question from "why" to "when." To answer the question of "when data virtualization," we designed the Data Virtualization Platform Maturity Model. This detailed, systematic approach supports the initial evaluation of a data virtualization platform. This approach may also be applied during the subsequent development of a technology adoption roadmap; during the alignment process of executing the adoption roadmap; and/or to measure over time the viability of the selected data virtualization offering.

For a complete Data Virtualization Platform Maturity Model with all eight functional categories mapped to the five maturity stages, go here.

Newsletters

When Data Virtualization?

White Papers

Sponsors