Search has gone through serious transformations over the years, all while maintaining its significance in even the most advanced eras of tech. With new iterations comes new methods of optimization, where large language models (LLMs) have a crucial role to play.
Sid Probstein, CEO, SWIRL, led the Data Summit session, “Uncovering Data You Know is There But Can’t Find,” exploring the ways in which LLMs can dramatically improve document retrieval.
The annual Data Summit conference returned to Boston, May 14-15, 2025, with pre-conference workshops on May 13.
Transforming search with LLMs is finding order amidst chaos, according to Probstein. Importantly, “It’s about getting search and LLMs to play nice together,” he added.
To drive that symbiotic reality, LLMs can optimize search, moving queries from answer-centric to document-centric. Though many see LLMs as the avenue in which to enable search, it can dramatically improve the way search is conducted in itself.
In a document-centric search, precise information is surfaced from the latest version of the data. Once located, conversing with the LLM about the document delivers even more relevant insights. After all, “LLMs are not just for search, they can translate, they can discuss,” said Probstein, emphasizing how LLMs can transcend text search into other structured data sources.
With GenAI augmentation, you can improve queries and documents themselves to optimize search. Creating a pipeline with GenAI can either improve the query or improve the document itself, prompting the LLM to clean titles, extract metadata from unstructured data, and more.
“Put the LLM between you and the data and it can improve your documents,” Probstein noted.
A popular way to improve search is through fine-tuning, where LLM models are trained with petabytes of data. But at runtime, it’s a compressed, smaller version, undoubtedly losing information and inducing hallucinations.
Retrieval-augmented generation is the key toward limiting hallucinations, according to Probstein, fetching information that exists and limiting the LLM to the data provided.
However, Probstein noted that a hallucination is not when an LLM provides an answer grounded in the data you provided that happens to be wrong; that’s an issue with your data.
Furthermore, “the LLM doesn’t know your business. In order for an LLM to know your business, you need to share the information,” namely through taxonomies and ontologies. This solves output precision and query understanding, especially if some details haven’t been released publicly.
Ultimately, Probstein suggests providing the LLM with:
- Databases schema and profile
- Sample queries
- Query examples
- User context (role, department, topics, date)
- A useful Sharepoint search endpoint
Many Data Summit 2025 presentations are available for review at https://www.dbta.com/datasummit/2025/presentations.aspx.