Data Management Companies Roll Out New Spark Integrations at Spark Summit

This week at Spark Summit, data management companies are rolling out new Spark integrations and support at Spark Summit to enable their users to take advantage of the open source data processing framework. In addition, Databricks, the company founded by the team that created Apache Spark, has announced that the Databricks Community Edition (DCE) is now generally available.

Microsoft made an extensive commitment for Spark to power Microsoft’s big data and analytics offerings, including Cortana Intelligence Suite, Power BI, and Microsoft R Server; MapR Technologies announced a new dedicated enterprise-grade Apache Spark Distribution; NoSQL database technology vendor Couchbase introduced a new Couchbase Spark Connector; Teradata has unveiled the Teradata Aster Connector for Spark, an integration of Apache Spark analytics with Teradata Aster Analytics; and IBM announced it is building on its $300 million investment in Apache Spark a year ago with a range of new initiatives. 

According to Qubole, a cloud-based big data provider that automates most open source data engines for the three largest public clouds, AWS, Azure and Google Cloud, on its Qubole Data Service (QDS), Spark has increased in popularity dramatically the most over the past year on QDS.

Qubole provides Spark as a service for customers including Pinterest, DataLogix, and TubeMogul, and says its statistics show Spark has had particularly strong adoption over the past 6 months (November 2015-April 2016). Spark Qubole compute usage hours (QCUH) has increased by 36%; Spark commands issued on QDS have had an approximately 300% increase; Spark paying customers has had a 50% increase; Spark commands usage has increased by approximately 300%; and the number of unique Spark clusters started has grown by 49%.

The just-introduced DCE, the free version of the Databricks data platform built on top of open source Apache Spark, gives users access to a 6GB micro-cluster as well as a cluster manager and the notebook environment to prototype simple applications. As a learning tool, DCE comes with a portfolio of Apache Spark learning resources, including a set of MOOCs and sample notebooks.

“This year we’ve seen explosive growth for the Apache Spark project and all signs indicate the pace will only accelerate as the community expands even more,” said Matei Zaharia, cofounder and chief technology officer at Databricks. “Databricks Community Edition has created an ideal environment for learning Apache Spark. Developers of all backgrounds can now use Databricks Community Edition to learn Spark and mitigate the acute Spark skills gap.”

Since the beta launch of Databricks Community Edition in February 2016, the company says, more than 8,000 users have registered for the free platform. Users have created over 61,000 notebooks in four different languages, including Python, Scala, SQL, and R. A survey conducted by Databricks of the DCE beta users identified that 60% of users were neither data scientists nor engineers, enabling a new category of people to learn Apache Spark, data science and data engineering skills.

“More than 2,200 students have already taken courses using Databricks Community Edition since its beta release, and with its general availability, we expect widespread adoption by universities across the world,” said Ion Stoica, co-founder and executive chairman at Databricks.

To sign up for Databricks Community Edition, visit