VoltDB Announces Hadoop Integration

VoltDB, a provider of high-velocity data management systems, has announced the release of VoltDB Integration for Hadoop.  The new product functionality, available in VoltDB Enterprise Edition, allows organizations to selectively stream high velocity data from a VoltDB cluster into Hadoop's native HDFS file system by leveraging Cloudera's Distribution Including Apache Hadoop (CDH), which has SQL-to-Hadoop integration technology, Apache Sqoop, built in.

According to VoltDB, with the volume, velocity and variety of data exploding, fueled by social applications, sensor automation, mobile networking, and other data-intensive forces, organizations are increasingly turning to specialized, task-specific data management solutions.  VoltDB is designed to process high velocity data in real time, while Cloudera's Distribution Including Apache Hadoop (CDH) provides organizations with a reliable and elastic infrastructure for data processing and deep analytics.  VoltDB's Integration for Hadoop allows customers to rapidly move high velocity data from VoltDB to CDH for long-term storage and analysis.

Because Volt is special purpose-built around high velocity data "we need to be very good at integrating with analytic databases so that customers that have high throughput, high velocity requirements on the front end, can then also take that data and move it into a long-term data store where they can get their deep analytics," VoltDB CEO Scott Jarr tells 5 Minute Briefing. 

VoltDB Integration for Hadoop is designed  to handle a variety of customer deployment scenarios including end-user applications, site-based OEM installations and cloud-based deployments. 

It combines VoltDB's enterprise-grade export environment with Apache Sqoop, a Cloudera-sponsored solution for integrating relational databases with Hadoop infrastructures, and delivers the following capabilities:

  • Simple, fast set-up.  Establishing integration between VoltDB and a Hadoop installation is fast and easy.  A user identifies which VoltDB data will be exported to Hadoop, configures the VoltDB export client with the location of Hadoop, the location of a VoltDB cluster, Sqoop options such as output formatting, and other installation-specific instructions (e.g., frequency of import).  The VoltDB export client automatically manages periodic Sqoop jobs based on this configuration.  The entire set-up process can be completed in about 15 minutes.
  • Loosely-coupled, push-pull operation.  VoltDB automatically pushes copies of export data, in real-time, to the VoltDB export client, which in turn automatically queues that data.  The Sqoop receiver then pulls data from the VoltDB export client and imports that data into HDFS on whatever frequency and in whatever amounts the user has defined.  VoltDB's export client manages its data buffer in a way that eliminates possible "impedance mismatches" (i.e., VoltDB exporting data faster than Sqoop imports that data).
  • Automatic overflow management.  VoltDB's export client also automatically writes overflow data to disk to optimize memory utilization.  This feature protects against large-scale overflows that could occur if the Sqoop receiver terminates, and allows export data to be retained across sessions if the VoltDB database is stopped.

Since VoltDB's sweet spot is in high velocity data, Jarr says the company  will be "somewhat agnostic" to the back-end infrastructures that customers want to use for  the deep analytic workload and will be market- and customer-driven in terms of additional integrations going forward. "But I wouldn't be surprised to see more data warehouse integrations as well as additional Hadoop-type integrations," Jarr notes.

"We already had built and delivered a pretty well-heeled export technology for target data warehousing environments and we have strengthened and adapted it for exporting data into Hadoop," adds Fred Holahan, CMO, VoltDB.

VoltDB understands that it is not the only game in town and that the role of VoltDB is to be able to ingest these very high velocity data sources, organize the data and provide real-time analytics, and then spool it out to target environments that permit very deep analysis, says Holahan.  "We do have the ability to generically support data warehousing targets as well right now, and over time as customers ask us for integrations with specific warehousing products, then we will adapt our generic export capability for each of those targets as we have done for Hadoop."

VoltDB has been developed under the leadership of Postgres and Ingres co-founder Mike Stonebraker. For more information, visit the VoltDB website at