IBM Demonstrates Extreme Scale for Content-Aware Storage

Apr 20, 2026

IBM is launching its CAS offering, making it faster, easier, and more secure to perform RAG under the same roof as the rest of your data.

Content-aware storage (CAS) represents a new value-add paradigm for traditional storage systems. CAS, which aligns storage solutions to meet the needs of new AI workloads, is centered around a pushdown of data processing functions. Specifically, CAS handles document vectorization using LLM-based embedding models—a process normally performed outside of the storage system—to support the retrieval augmented generation (RAG) pipeline.

This new paradigm is a key element of IBM’s vision to integrate AI capabilities directly into enterprise storage systems, enabling businesses to extract untapped value from their proprietary assets without costly infrastructure expansion.

“Enterprises can derive unprecedented insights from all of their documents in storage systems,” said Sam Werner, GM IBM Storage. “It really opens the door to the next chapter in leveraging AI technology to drive business outcomes.”

According to IBM, at the core of the CAS solution is the vector database. Vector databases are designed to accelerate semantic searches of data, finding related documents to leverage in AI applications. In collaboration with Samsung and NVIDIA, IBM Research has successfully scaled its prototype platform to serving 100-billion vectors on a single server while maintaining recall precision of over 90% within a query latency of less than 700 milliseconds.

The vector database organizes the data so that an approximate nearest neighbor (ANN) search can be performed, making it possible to find semantically similar chunks during a RAG search.

To retrieve relevant chunks, a user’s query is converted into a vector using the same embedding model that was used to vectorize the stored documents. The vector database is then used to identify neighboring vectors according to some vector distance metric. The text chunks corresponding to the most relevant vectors are then passed to the LLM as part of the prompt.

This approach ensures that responses are grounded in enterprise-specific knowledge, which reduces hallucinations and improves trust in AI outputs, said IBM.

IBM's CAS is available for both on-premises and in the cloud deployments. To reduce deployment cost and management complexity, IBM Research undertook a strategy to specifically focus on improving vector density and reindexing time, reducing the number of servers that need to be deployed to support a given number of documents and vectors.

The IBM Storage Scale System 6000 (ESS 6000) is a high-performance, all-flash storage appliance designed for AI, high-performance computing (HPC), and massive data workloads.

The second part focuses on leveraging enterprise solid state drives to help achieve higher system-level storage performance. For this effort, IBM Research collaborated with Samsung, a global provider of advanced memory and storage technologies for AI and data center infrastructure.

Part of IBM’s strategy for AI is to remove artificial software barriers that prevent enterprises from exposing their data and applications to AI. With CAS, the company said, it is taking a crucial part of the RAG pipeline and giving that responsibility to the storage system. And the new indexing capabilities are all integrated into familiar file systems that make the entire system easy to deploy.

For more information about this news, visit www.ibm.com.

Newsletters

IBM Demonstrates Extreme Scale for Content-Aware Storage

White Papers

Sponsors