EnterpriseDB Releases New Apache Spark Connector

EnterpriseDB (EDB) is launching a new version of the EDB Postgres Data Adapter for Hadoop with compatibility for the Apache Spark cluster computing framework, allowing users to combine analytic workloads.

The new version gives organizations the ability to combine analytic workloads based on the Hadoop Distributed File System (HDFS) with operational data in Postgres, using an Apache Spark interface.

Apache Spark data persists in-memory on the processing framework, producing speeds exponentially faster for some analytics. Complex applications that require multiple operations, including analytics on streaming data, benefit from using Apache Spark, as well as real-time marketing, cybersecurity analytics, machine log monitoring, and online recommendations.

The EDB Postgres Data Adapter that EnterpriseDB developed and released for the Postgres user community is a Foreign Data Wrapper (FDW) for Hadoop with Apache Spark compatibility. FDWs act as pipelines between Postgres databases and external data sources. They allow PostgreSQL queries to include structured or unstructured data, from multiple sources such as Postgres and NoSQL databases, as well as HDFS, as if they were in a single database.

Enterprise data centers that have databases that have evolved over the last decade will be able to use this as a single access point, explained Jason Davis, senior director, product management at EnterpriseDB.

Any enterprise that’s collecting weblog information, storing data about customers for analytic purposes, will benefit the most from this, according to Davis.

“This new announcement allows people to use Postgres to query and join that analytic information with information that they might be storing in Postgres that might be their system of record,” Davis said.

The company will continue to evolve its data adapters to improve their performance along with expanding the set of data sources it connects with.

“Spark and Hadoop is a great win for us,” Davis said. “A lot of our customers that are running on Hadoop have started to use Spark over the last few years.”

For more information about this news, visit