Databricks, founded by the original creators of Apache Spark, has launched new capabilities to lower the barrier for enterprises to innovate with AI. The new capabilities include MLflow for developing an end-to-end machine learning workflow; Databricks Runtime for ML to simplify distributed machine learning; and Databricks Delta for data reliability and performance at scale.
“To derive value from AI, enterprises are dependent on their existing data and ability to iteratively do machine learning on massive datasets. Today’s data engineers and data scientists use numerous, disconnected tools to accomplish this, including a zoo of machine learning frameworks,” said Ali Ghodsi, co-founder and CEO at Databricks. “Both organizational and technology silos create friction and slow down projects, becoming an impediment to the highly iterative nature of AI projects. Unified Analytics is the way to increase collaboration between data engineers and data scientists and unify data processing and AI technologies.”
According to Databricks, data is vital to both training and productionizing machine learning. However, using machine learning in production is difficult because the development process is ad hoc, lacking tools to reproduce results, track experiments and manage models. Databricks says it is alleviating this problem with MLflow, a new open source, cross-cloud framework, that can simplify the machine learning workflow. With MLflow, organizations can package their code for reproducible runs, execute and compare hundreds of parallel experiments, leverage any hardware or software platform, and deploy models to production on a variety of serving platforms. MLflow integrates with Apache Spark, SciKit-Learn, TensorFlow and other open source machine learning frameworks.
In addition, many organizations are adopting distributed deep learning leveraging a variety of frameworks like Tensorflow, Keras, and Horovod, along with complexities of managing distributed computing. The new Databricks Runtime for ML will help eliminate this complexity with pre-configured environments tightly integrated with the most popular machine learning frameworks, like Tensorflow, Keras, XGBoost, and scikit-learn. Databricks is also addressing the need to scale deep learning by introducing GPU support for both AWS and Microsoft Azure. Data scientists can now feed data sets to models, evaluate, and deploy cutting-edge AI models on one unified engine.
And as a key component in Databricks’ Unified Analytics Platform, the new Databricks Delta extends Apache Spark to simplify data engineering by providing high performance at scale, data reliability through transactional integrity, and the low latency of streaming systems. With Delta, the company says, organizations are no longer required to make a tradeoff between storage system properties, or spend their resources moving data across systems. Hundreds of applications can now reliably upload, query, and update data at massive scale and low cost, ultimately making datasets ready for machine learning.
For more information, visit www.databricks.com.