IBM Unveils On-Chip Accelerated AI Processor

IBM unveiled details of its new Telum Processor which is designed to bring deep learning inference to enterprise workloads to help address fraud in real-time. Introduced at the annual Hot Chips conference, IBM said Telum is its first processor that contains on-chip acceleration for AI inferencing while a transaction is taking place.

Three years in development, the new on-chip hardware acceleration is designed to help customers achieve business insights at scale across banking, finance, trading, insurance applications and customer interactions. A Telum-based system is planned for the first half of 2022.

The chip contains 8 processor cores with a deep super-scalar out-of-order instruction pipeline, running with more than 5GHz clock frequency, optimized for the demands of heterogenous enterprise class workloads. The completely redesigned cache and chip-interconnection infrastructure provides 32MB cache per core, and can scale to 32 Telum chips. The dual-chip module design contains 22 billion transistors and 19 miles of wire on 17 metal layers.

Telum is the first IBM chip with technology created by the IBM Research AI Hardware Center. In addition, Samsung is IBM's technology development partner for the Telum processor, developed in 7nm EUV technology node.

"Telum will be the central processor chip for the next generation IBM Z and LinuxONE systems. Organizations who want help in preventing fraud in real-time, or other use cases, will welcome these new IBM Z innovations designed to deliver in-transaction inference in real time and at scale," said IBM’s  Christian Jacobi, distinguished engineer, IBM Z Hardware Development Systems, and Elpida Tzortzatos, IBM fellow, IBM Z AI Strategy & Architecture, CTO z/OS, in an IBM blog post

In another IBM blog post, Kailash Gopalakrishnan explained that AI is enabling automation in a wide spectrum of industries but requires very high computational horsepower. 

Roughly 6 years ago, IBM started looking into building purpose-built AI hardware to meet the future challenges that will require dedicated processing power for AI systems, said Gopalakrishnan.

"Over that time, we’ve built three generations of AI cores, and in 2019 launched the AI Hardware Center in Albany, New York to foster a wider AI hardware-software ecosystem. Since 2017, we’ve been consistently improving the performance efficiency of our AI chips, boosting power performance by 2.5 times each year."

Gopalakrishnan said IBM's goal is to continue improving AI hardware compute efficiency by 2.5 times every year for a decade, achieving 1,000 times better performance by 2029.

"Our most recent AI core design was presented at the 2021 International Solid-State Circuits Conference (ISSCC) as the world’s first energy-efficient AI chip, which is at the forefront of low-precision training and inference AI, built atop 7nm chip technology. Over the past few years, we’ve been working with the IBM Systems teams to integrate the AI core technology from this chip into IBM Z. This work eventually became part of a Telum-based system, which we expect in the first half of next year. We see Telum as the next major step on a path for our processor technology, like the inventions of the mainframe and servers before."

IBM says that in a recent survey, 90% of respondents said that being able to build and run AI projects wherever their data resides is important. IBM Telum is designed to enable applications to run efficiently where the data resides, helping to overcome traditional enterprise AI approaches that tend to require significant memory and data movement capabilities to handle inferencing.

With Telum, the accelerator in close proximity to mission-critical data and applications means that enterprises can conduct high volume inferencing for real-time sensitive transactions without invoking off platform AI solutions, which may impact performance. Clients can also build and train AI models off-platform, deploy and infer on a Telum-enabled IBM system for analysis.

According to IBM, businesses typically apply detection techniques to catch fraud after it occurs, a process that can be time-consuming and compute-intensive due to the limitations of today's technology, particularly when fraud analysis and detection is conducted far away from mission critical transactions and data. Due to latency requirements, complex fraud detection often cannot be completed in real-time—meaning a bad actor could have already successfully purchased goods with a stolen credit card before the retailer is aware fraud has taken place.

IBM says the new chip's centralized design allows clients to leverage the full power of the AI processor for AI-specific workloads, making it ideal for financial services workloads like fraud detection, loan processing, clearing and settlement of trades, anti-money laundering and risk analysis. With these new innovations, clients will be positioned to enhance existing rules-based fraud detection or use machine learning, accelerate credit approval processes, improve customer service and profitability, identify which trades or transactions may fail, and propose solutions to create a more efficient settlement process.

"Keeping data on IBM Z offers many latency and data protection advantages," said Jacobi and Tzortzatos. "The IBM Telum processor is designed to help clients maximize these benefits, providing low and consistent latency for embedding AI into response time-sensitive transactions. This can enable customers to leverage the results of AI inference to better control the outcome of transactions before they complete."

For more information please visit