ClearML, the open source, unified MLOps platform, is unveiling their recent certification to run NVIDIA AI Enterprise, the end-to-end platform for building accelerated production AI. Following ClearML’s integration with the NVIDIA AI Enterprise software suite, this certification enhances ClearML’s efficiency and its GPU load, as well as ensuring compatibility with NVIDIA DGX systems and NVIDIA-Certified Systems.
As machine learning (ML) becomes increasingly critical to the modern enterprise, maximizing its effectiveness and value will define how enterprises most efficiently leverage it.
ClearML offers little to no overhead computing infrastructure management, enabling enterprises to develop, orchestrate, and scale ML workflows, according to the vendor. Enterprises seeking frictionless MLOps solutions can rely on ClearML to easily embrace AI while allowing its users to focus on ML code and automation.
The NVIDIA certification will drive ClearML’s ability to optimize across ML workflows, as well as ensure that by using ClearML, the performance of the NVIDIA hardware will not degrade. It additionally features the use of the NVIDIA AI Enterprise software suite, including NVIDIA TAO Toolkit, which allows users to create custom, production-ready AI models.
“The ability to deploy ClearML control planes on top of their on-prem Kubernetes clusters, backed by NVIDIA hardware, and get the best usage from the hardware that you bought, is very exciting,” said Moses Guttmann, CEO and co-founder of ClearML. “It does this in a way that expands the access to all the different stakeholders in the company, which means not only just data scientists but also product managers—allowing them to automate processes that data science people created for machine learning engineers to effectively build pipelines on top of all the new hardware that was purchased—and for the DevOps people to very easily manage this entire cluster without having to constantly support all the different users.”
To further increase the platform’s efficiency, ClearML utilizes NVIDIA Multi-Instance GPU (MIG) technology which can partition a GPU in as many as seven instances, each equipped with high-bandwidth memory, cache, and compute cores. This dramatically optimizes GPU power, enabling enterprises to match appropriate computing power to its corresponding AI workload, according to the vendor. A variety of workloads can run simultaneously within a single GPU, with instances accessible through containers.
“Nvidia's latest capability allows users to slice a single GPU card into multiple instances, which means the same card basically serves multiple users. And this is important, especially when you're developing the code itself, where you have to have access to the hardware itself. But at the same time, the goal is not to fully utilize it, but to basically debug and test your code on top of it,” explained Guttmann. “So, you can have multiple users sharing the same card, allowing the other cards to be utilized for more of the complicated aspects of things.”
“It allows you to actually train the models or run them in production and still have the ability for the developers to have access to this high-end GPU. On the other hand, it also allows customers to better utilize smaller models running at the same time on the same card for the exact same reasons,” said Guttmann.
ClearML offers free tier servers of its platform, as well as a self-hosted version. The enterprise also provides in-depth tutorials on its YouTube channel, and maintains a Slack channel for additional aid.
To learn more about ClearML’s platform and its recent NVIDIA certification, please visit https://clear.ml/.