Pentaho Delivers Pentaho Data Integration 3.2 for the Cloud

Bookmark and Share

Pentaho Corporation, a provider of commercial open source business intelligence and data integration software, has announced Pentaho Data Integration 3.2, a new release of Pentaho's extract, transform, and load (ETL) tool with enhancements for cloud- and cluster-based ETL, as well as new product features such as additional transformation capabilities and connections to new data sources. Pentaho Data Integration is already in production on the Amazon cloud supporting cloud-based data warehouses, and now with

Pentaho Data Integration 3.2, a parallel computing environment can be set up automatically and dynamically, without manual configuration required. Lance Walter, vice president of marketing for Pentaho, tells 5 Minute Briefing that "a parallel computing environment for ETL consists of a master server that defines and controls the ETL jobs, and multiple secondary, or slave, servers that do the actual work. Now, with Data Integration 3.2, administrators can input one command line to initiate an ETL job to run across a given number of secondary servers, and the software will dynamically create the secondary server instances and register them with the master server, so that manually configuring the connection to each cloud-based secondary server is no longer required. This saves time and simplifies cloud ETL deployment for administrators."

Cloud computing is an ideal solution for ETL processes, and Pentaho Data Integration 3.2 has been designed to take advantage of that. Because ETL has elastic computing needs, and weekly or monthly variability in data integration volumes can cause utilization spikes, this maps well onto cloud computing's ability to scale up and down according to load.

In addition to the new cloud computing capabilities, Pentaho Data Integration 3.2 also provides new features in the areas of source data connectivity, data transformation commands, better documentation of transformation steps, and in-flight monitoring of ETL processes. One key new data source is connectivity to applications, and in the area of transformations, users can now add their own self-defined computations and formulas.

The latest development build of Pentaho Data Integration 3.2 is available immediately as an Amazon Machine Image (AMI) for testing and benchmarking purposes, and is expected to be generally available in the near future. For more information on Pentaho and the new Data Integration 3.2 release, go here.