Databricks Simplifies Management of Spark Workloads

Databricks has introduced a new offering to simplify the management of Apache Spark workloads in the cloud. “Databricks Serverless” is a managed computing platform for Apache Spark that allows teams to share a pool of computing resources and automatically isolates users and manages costs. The new offering aims to remove the complexity and cost of users managing their own Spark clusters.

Traditional cloud and on-premise platforms require teams or individuals to manage their own Spark clusters in order to enforce data security, isolate workloads, and configure resource allocation an approach that is costly and highly complex, as every team must learn to manage its own clusters, said Ali Ghodsi, cofounder and chief executive officer at Databricks. With Databricks Serverless, organizations can use a single, automatically managed pool of resources and get best-in-class performance for all users at lower costs, Ghodsi said.

Additional benefits of Databricks’ Serverless offering include auto-managed configuration of clusters; scaling of local storage; adaption to multiple users sharing the cluster; and greater security.

In a related announcement, Databricks has also launched Deep Learning Pipelines, a new library to integrate deep learning libraries, such as TensorFlow, with Apache Spark. This helps make deep learning more scalable as Spark can process much larger data across multiple nodes, further democratizing AI and data science.

Databricks has also announced the general availability of Structured Streaming, enabling up to 5x higher throughput and allowing customers get best-in-class latency while benefitting from Spark’s much simpler streaming APIs and lowering operational cost of their streaming applications.

For more information, go to