Qdrant, the leading provider of high-performance, open source vector search, is debuting Qdrant Cloud Inference, a new solution for generating text and image embeddings directly within managed Qdrant Cloud clusters. Born from community feedback that embedding pipelines were slowing team efficiency, Qdrant Cloud Inference enables users to transform unstructured text and images into search-ready vectors within a single environment, the company explained.
Within a single API call, users can generate, store, and index embeddings, directly integrating model inference within Qdrant Cloud. This greatly accelerates developer productivity and system efficiency, consolidating multiple services—including one for inference, another for storage, and another for indexing—and ultimately simplifying workflows, expediting development cycles, and removing unnecessary network hops.
“Developers no longer need to stitch together a brittle stack—instead, they get an integrated, fast, and production-grade workflow out-of-the-box,” said Bastian Hofmann, director of enterprise solutions, Qdrant.
As real-time AI workloads continue to dictate many aspects of business, the release of Qdrant Cloud Inference is a timely one, according to Hofmann.
“Real-time AI systems, especially agentic and multimodal, are becoming the norm. These systems don’t just retrieve from static memory, they generate new memory mid-task. That makes embedding speed and system cohesion critical,” noted Hofmann. “With inference now built natively into Qdrant Cloud, teams can go from raw input to indexed vector very fast, with zero external dependencies.”
Qdrant Cloud Inference is the only managed vector database offering multimodal inference—using separate image and text embedding models—natively integrated in its cloud, according to Qdrant. This support reflects the inherently multimodal nature of modern data, transcending beyond text into a diverse range of types such as images, PDFs, logs, screenshots, and more.
“This unlocks use cases like visual document retrieval, screenshot-augmented support agents, and agents with both textual and visual memory. As multimodal RAG [retrieval-augmented generation] and agent systems gain traction, the ability to seamlessly search across diverse inputs becomes essential for modern AI applications,” Hofmann explained.
Fundamentally, this update increases the seamlessness of developer workflows, enabling them to accelerate the construction of advanced AI applications—such as multimodal search, RAG, and hybrid search—with less of the advanced complexity.
“With Qdrant Cloud Inference, you can go from raw text or images to search-ready vectors in one API call—no separate service or data pipeline is needed. It’s a big step toward making real-time AI easier to build and faster to ship,” said Hofmann. “We’ve also included free monthly tokens, so teams can start embedding and searching right away without extra setup or added cost.”
To learn more about Qdrant Cloud Inference, please visit https://qdrant.tech/.