Pentaho Meets Big Data Challenges by Leveraging Amazon Web Services

Bookmark and Share

Pentaho Corporation, a provider of open source business intelligence (BI) and data integration software, has announced plans to enable Pentaho Data Integration for Hadoop to easily integrate with Hadoop data stored in Amazon Elastic MapReduce. As a result, Amazon Web Services LLC (AWS) customers can leverage Pentaho's ETL capabilities to deploy a hybrid data model to easily move data between Amazon Elastic MapReduce and databases, data warehouses and other cloud based and on-premise data stores.

According to Pentaho, its offering for Amazon Elastic MapReduce is a tightly integrated report designer that will give AWS customers the option to build production or ad hoc reports from data spanning AWS and on-premise data sources. Generally available in the November timeframe, this will be a pay-as-you-go utility offering that leverages the elastic nature of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon Simple Storage Service (Amazon S3).

"Typically, people are not going to use Hadoop simply as the only data source," Joe Nicholson, vice president of product marketing at Pentaho, tells 5 Minute Briefing. "It is more likely to be a hybrid of unstructured data stored in Hadoop, and structured data stored in traditional sources like a relational database or a data warehouse. With this partnership with Amazon Web Services, we basically now have an offering that allows people to store all this Hadoop data in Amazon EC2, and then we can actually treat that as simply another data source. It is a drag and drop environment." Users can move data out of Hadoop in Amazon's EC2, transform it, make it usable for end users, combine it with data in their existing data warehouse and then also "be able to generate reports and analytics on what is going on in those various data sources," Nicholson explains.

Read the blog, Data, Data, Data, by Pentaho founder and CEO Richard Daley on Business Intelligence from the Swamp.

Download the Pentaho Data Integration and BI Suite for Hadoop release candidate now.