Newsletters




deepset Debuts Quantifiable LLM Response Accuracy with Latest Capability


deepset, the company behind the popular Haystack open source framework for building NLP services, is debuting a revolutionary capability to its cloud platform that offers insights into the precision and accuracy of LLM generative AI (GenAI) responses. deepset Cloud tackles the foremost challenge of LLM-based GenAI—hallucinations—so that enterprises can confidently deploy GenAI apps.

Hallucinations are a massive obstacle when it comes to adopting an LLM model in an enterprise. Often combated with RAG systems, these LLMs maintain a proclivity to hallucinate by putting data in either the wrong context or completely fabricating one.

“From GPT-4 to the smaller open source models, it's [hallucinations] a problem for all of them, even with RAG,” said Mathis Lucka, head of product at deepset. “And that prompted us to build the Groundedness Observability feature where it measures…how well the answer that was just generated by the large language model is grounded in your own data that you gave to the model.”

Empowered by this challenge, deepset developed the Groundedness Observability Dashboard, a capability of deepset Cloud that uncovers trend data regarding how well GenAI responses are grounded in the source documents.

By offering a quantifiable score that reveals the accuracy and factuality of an LLM’s output—gathering metrics on tone, specific document source, how often that source is used, etc.—Groundedness Observability enables developers to make appropriate configurations to their RAG systems, models, and prompts to produce reliable responses.

Groundedness Observability also offers unique benefits in identifying the optimal hyperparameters for an organization’s retrieval step. This means enterprises can analyze which method of retrieval best suits their business’s needs. Additionally, this feature can help organizations optimize how much data is fed into the LLM, ultimately reducing overall costs.

“Picking out of this big range of LLMs the LLM that works best works most reliably for your individual use case [is a big challenge]. Different LLMs might have strengths and weaknesses depending on where you apply them, in which kind of data you apply them, for which use case you apply them,” said Milos Rusic, co-founder and CEO of deepset. “This is something you can check with [Groundedness Observability].”

deepset Cloud’s Groundedness Observability Dashboard is an LLM-agnostic capability, enabling its users to measure the accuracy and fidelity of the LLM and vendor of their choice.

deepset is also announcing Source Reference Prediction for its cloud platform, which further enhances confidence in LLM response quality. This feature adds academic-style citations to each generated answer, referencing back to its respective document where the information was sourced.

“[Source Reference Prediction is] user-facing reassurance or affordance, giving them trust in these answers and giving them the tools to independently verify that what was generated is actually what was in the text,” explained Lucka.

Regarding data privacy, customer data is secured by deepset’s robust security standards, adhering to SOC 2 Type II requirements. If an enterprise desires another level of security, they can choose to run deepset within their private cloud environment.

With both Groundedness Observability and Source Reference Prediction, the release of these features reinforces deepset’s ongoing commitment to building a robust trust layer within GenAI apps.

“Readers should be most excited for reliably creating applications that are trustworthy,” concluded Lucka. “I think that that's the big takeaway—having the tools available to create trustworthy applications with large language models.”

To learn more about deepset Cloud and its latest features, please visit https://www.deepset.ai/.


Sponsors