Qdrant is releasing platform version 1.17.0—updating search latency, introducing relevance feedback query, and deploying greater operational observability.
This release introduces a new Relevance Feedback Query as a scalable, vector-native approach to incorporating relevance feedback.
According to the company, the Relevance Feedback Query uses a small amount of model-generated feedback to guide the retriever through the entire vector space, effectively nudging search toward “more relevant” results without requiring expensive loops, expensive retrievers, or human labeling. This enables the engine to traverse billions of vectors with improved recall without having to retrain models.
This method works by collecting lightweight feedback on just a few top results, creating “context pairs” of more- and less-relevant examples. These pairs define a signal that adjusts the scoring function during the next retrieval pass.
Instead of rewriting queries or rescoring large batches of documents, Qdrant modifies how similarity is computed. Experiments demonstrate substantial gains, especially when pairing expressive retrievers with strong feedback models.
Additionally, this release includes several changes that reduce search latency. To improve query response times in environments with high write loads, Qdrant can now be configured to avoid creating large unoptimized segments. Delayed fan-outs help reduce tail latency by querying a second replica if the first does not respond within a configurable latency threshold.
A new update queue tracks up to one million pending changes. When the queue fills, back pressure slows incoming writes, preventing runaway load and helping clusters stay stable even during large batch operations or recovery after downtime.
For applications that demand consistently low-latency search, indexed-only mode ensures queries touch only fully indexed segments. A side-effect of using indexed-only queries was that they could temporarily hide the newest updates, before they were indexed. A new prevent_unoptimized optimizer setting solves this by throttling updates to match the indexing rate, reducing the creation of large unoptimized segments.
Together, these features give developers tighter control over write throughput, indexing behavior, and search performance, especially in high-volume environments, the company said.
This release further advances two new features: a new cluster-wide telemetry API and segment optimization monitoring.
This API provides information about all peers in a cluster, offering insights into cluster-wide operations such as leader elections, resharding, and shard transfers.
Optimization is a background process where Qdrant removes data marked for deletion, merges segments, and creates indexes. To improve visibility into this process, this release introduces segment optimization monitoring capabilities.
A new /collections/{collection_name}/optimizations API endpoint provides cluster-wide information about the current optimization status, as well as detailed information for current and past optimization operations.
Many people have been asking about point filtering in web UI and now it’s back, better than ever. In this release, Qdrant redesigned the point search interface in the Web UI to make exploring your data and discovering relevant points easier and more intuitive. The new two-field layout enables searching for points similar to another point, filtering by payload values, and finding points by ID.
For more information about this news, visit https://qdrant.tech.