NVIDIA Unveils AI Platform to Minimize Downtime in Supercomputing Data Centers

NVIDIA has introduced the NVIDIA Mellanox UFM Cyber-AI platform, which aims to minimize downtime in InfiniBand data centers by harnessing AI-powered analytics to detect security threats and operational issues, as well as predict network failures.

According to NVIDIA, the extension of the UFM platform product portfolio—which has managed InfiniBand systems for nearly a decade—applies AI to learn a data center’s operational cadence and network workload patterns, using both real-time and historic telemetry and workload data. Against this baseline, it tracks the system’s health and network modifications, and detects performance degradations, usage and profile changes.

The new platform provides alerts of abnormal system and application behavior, and potential system failures and threats, as well as performs corrective actions. It is also targeted to deliver security alerts in cases of attempted system hacking to host undesired applications, such as cryptocurrency mining.

“The UFM Cyber-AI platform determines a data center’s unique vital signs and uses them to identify performance degradation, component failures and abnormal usage patterns,” said Gilad Shainer, senior vice president of marketing for Mellanox networking at NVIDIA. “It allows system administrators to quickly detect and respond to potential security threats and address upcoming failures, saving cost and ensuring consistent service to customers.”

The UFM Cyber-AI platform complements the UFM Enterprise platform, which provides network monitoring, management, performance optimization, configuration checks and secure cable management.

NVIDIA has also added a third member of the UFM family, the UFM Telemetry platform, which captures real-time network telemetry data that is streamed to an on-premise or cloud-based database to monitor network performance and validate the network configuration.

More information is available about the UFM Appliance product line.