With, as Gartner predicts, generative AI (GenAI) spending to exceed all other IT spending combined, knowing exactly when to utilize—and when not to use—GenAI will be increasingly critical.
David Seuss, CEO, Northern Light, led the Data Summit session, “Where, When, & Why to Use GenAI,” considering the ways in which text analytics may pose greater value to many search use cases rather than GenAI.
The annual Data Summit conference returned to Boston, May 14-15, 2025, with pre-conference workshops on May 13.
“Should we be putting all of our efforts into GenAI?” asked Seuss.
Here’s the reality: Large language models (LLMs) represent nothing more than the wisdom of the crowds skillfully researched and summarized. They don’t know what things are—meaning, entities are just strings of tokens to them. They don't know the difference between IBM and cat, and they don’t have new ideas—ever, explained Seuss.
The options for creating GenAI solutions are as follows: Train a model, fine tune a pre-trained model, or use retrieval-augmented generation (RAG). The latter of these methods is becoming the de facto standard, according to Seuss, capable of enhancing the accuracy and reliability of GenAI models.
Yet there is a challenge of context windows; with RAG, only a small portion of a typical corpus can be sent to the LLM to generate a response.
“Users think results are coming back from the entire corpus, and it is not,” said Seuss. “Users think that all the content is being analyzed and summarized. In fact, only a very tiny fraction of the corpus is being processed.”
There are a variety of strategies to help fit into context windows, such as chunking documents, using natural language processing (NLP) to eliminate redundant text, or sending separate transactions for each document. However, while all of these techniques can improve GenAI results, none of them can truly eliminate the context window issue, Seuss explained.
There is an alternative for insight discovery, Seuss offered: Deep, automatic tagging of the full-text with taxonomies for a variety of areas, such as business strategy, company names, information technologies, life sciences, healthcare, and more. This can be used to analyze, filter, group, and navigate search results, preserving information quality while maintaining search efficacy—and even has the capacity to predict future trends.
Where GenAI relays the wisdom of the crowd—what people have already talked about—text analytics and tagged content brings about new concepts that might have otherwise been missed.
“Text analytics applied with taxonomies against a corpus…[drives] insight discovery; you get new knowledge that you would not have had with GenAI,” said Seuss.
Ultimately, while GenAI has dramatically reduced the time to accomplish research tasks and improve the quality of work, it can only summarize the preponderance of insights that human beings have written about, according to Seuss. For many tasks, text analytics and machine learning are still the more powerful solutions, especially when the corpus is large and needs to be analyzed in its entirety. It’s also valuable for when:
- Trends over many different time periods are relevant
- Visualization instead of text is the most powerful output
- The output depends on analysis rather than summarization
- New insights are needed
Many Data Summit 2025 presentations are available for review at https://www.dbta.com/datasummit/2025/presentations.aspx.