IBM to Fuel Spread of Apache Spark

Describing it as potentially the most important new open source project in a decade, IBM announced a major commitment to Apache Spark.

IBM plans to embed Spark into its Analytics and Commerce platforms, and to offer Spark as a service on IBM Cloud. IBM is committing the efforts of more than 3,500 IBM researchers and developers who will work on Spark-related projects at more than a dozen labs worldwide; donating its IBM SystemML machine learning technology to the Spark open source ecosystem; and educating more than one million data scientists and data engineers on Spark.

"IBM has been a decades-long leader in open source innovation. We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way," said Beth Smith, general manager, Analytics Platform, IBM Analytics. "Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation."

IBM will present a webcast on Thursday, June 25, 2015 at 11 am PT/ 2 pm ET to discuss common use cases for Spark, the challenges IBM has encountered in using Spark and integrating it into its portfolio, and share recommendations for others embarking on a similar path. The DBTA webcast will be presented by Trent Gray-Donald, distinguished engineer, IBM Analytics – Cloud Data Services, IBM; and Luis Arellano, program director, IBM. Register here.

As data and analytics are embedded into the fabric of business and society –from popular apps to the Internet of Things (IoT) –Spark brings essential advances to large-scale data processing. First, it dramatically improves the performance of data dependent apps. Second, it radically simplifies the process of developing intelligent apps, which are fueled by data.

To further accelerate open source innovation for the Spark ecosystem, IBM will

  • Build Spark into the core of the company's analytics and commerce platforms.
  • Have Watson Health Cloud leverage Spark as a key underpinning for its insight platform, helping to deliver faster time to value for medical providers and researchers as they access new analytics around population health data.
  • Open source its breakthrough IBM SystemML machine learning technology and collaborate with Databricks to advance Spark's machine learning capabilities.
  • Offer Spark as a Cloud service on IBM Bluemix to make it possible for app developers to quickly load data, model it, and derive the predictive artifact to use in their app.
  • Commit more than 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide, and open a Spark Technology Center in San Francisco for the Data Science and Developer community to foster design-led innovation in intelligent applications.
  • Educate more than 1 million data scientists and data engineers on Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC.  

As one of four founding members of the UC Berkeley AMPLab where Spark was first invented in 2009, IBM participates in multi-day research retreats, provides advice and real-world insight, and interacts closely with AMPLab researchers on projects of mutual interest.

For more information on IBM Analytics and Spark, visit

Image courtesy of Shutterstock


Subscribe to Big Data Quarterly E-Edition