Today at Spark Summit, MapR Technologies is announcing a new enterprise-grade Apache Spark Distribution. According to MapR, the advantages for users are that the distribution includes the complete Spark stack engineered to support advanced analytics applications, patented innovations in the MapR Platform, and key open source projects that complement Spark.
The new Spark Distribution option for the MapR Converged Data Platform enables advanced analytics - including batch processing, machine learning, procedural SQL, and graph computation, and is a production-ready platform for Spark workloads on-premise and in the cloud.
“This is a Spark-focused distribution that combines Apache Spark with the real time, persistent, web-scale data layer of MapR,” said Jack Norris, SVP, Data and Applications, MapR.
Because Spark runs seamlessly on MapR, the company says it benefits from the platform’s patented enterprise-grade features, such as web-scale storage, high availability, mirroring, snapshots, NFS, integrated security, and global namespace, according to MapR. Product extensions of the Spark Distribution could include real-time streaming and operational analytic capabilities, with MapR-Streams, MapR-DB, and Hadoop as add-ons.
The new Spark Distribution option for the MapR Converged Data Platform is significant for those organizations that are getting into big data and starting with Spark, said Norris. “The distribution simplifies the deployment and the management of the data, and improves the analysis.”
With this new distribution optimized for Spark, MapR says it is continuing to expand its commitment to the open source community with offerings tailored toward specific compute processing engines.
The new distribution includes the latest Spark version delivering in-memory processing for big data, enabling faster application development and allowing for code reuse across batch, interactive, and streaming applications.
“With the other Spark options that are out there, you really have to separate the data in motion from the data at rest, and now, the ease of development and the power of Spark is available across all of your data because it can be persisted with MapR and available,” said Norris. “You can have long-term trends, as well as freshly arriving second-by-second data.”
MapR will also leverage its Spark Distribution in its Quick Start Solution offerings, which include pre-built templates, configuration and installation. The most popular use cases for Spark, according to MapR, include building data pipelines and developing advanced analytical applications leveraging machine learning.
Spark is one of MapR’s fastest growing curriculum areas for online courses, said Norris. “They are all free and available on-demand and location-independent to provide flexibility for people who are interested in acquiring skills as developers, administrators, or data analysts.”
The new Spark Distribution is available now in the MapR Converged Community Edition and the MapR Converged Enterprise Edition. MapR will be showcasing its product offerings, free online Spark courses, and Spark Certification this week at Spark Summit.