GenAI Governance and Ethics at Data Summit 2024

Hallucinations—or the tendency for generative AI (GenAI) to produce false, though seemingly correct, information—is the boogeyman under the bed of the GenAI boom. Applications that depend on the output of its GenAI system—which becomes increasingly dangerous in accuracy-dependent industries like healthcare—risk being at the whim of a model producing incorrect results based on poor LLM development or bad data.

Ranjeeta Bhattacharya, senior data scientist, BNY Mellon, and Yetkin Ozkucur, director for professional services and presales, Quest, led the annual Data Summit session, “Putting Generative AI to Work,” assessing the ways in which GenAI-based enterprise applications can be set up for success based on both development and data best practices.

The annual Data Summit conference returned to Boston, May 8-9, 2024, with pre-conference workshops on May 7.

Bhattacharya identified that GenAI and large language model (LLM) hallucinations originate during both the initial development of the model and the following refinement of existing model responses via prompt engineering.

But why do LLMs hallucinate?

According to Bhattacharya, some high-level reasons for hallucination include:

  • Source-reference divergence
  • Exploitation through jail-break prompts
  • Reliance on incomplete or contradictory datasets
  • Overfitting and lack of novelty
  • Guesswork from vague or insufficiently detailed prompts
  • Missing content or the retrieval strategy didn’t work
  • Answer is located within the database, but did not rank high enough
  • Correct documents are retrieved but not in LLM context window

To help mitigate against the generation of factually inaccurate information, Bhattacharya pointed to several different methods applicable to both the user interaction and the models underlying technology.

Prompt engineering and tuning is the first line of defense against hallucinations, according to Bhattacharya. By injecting more context into the prompt, the likelihood of an answer grounded in truth increases.

“It is always recommended to give clear instructions to the LLM,” said Bhattacharya. “In response, the LLM knows how to frame its output.”

Though LLMs do not excel at arithmetic output, Bhattacharya suggested chain-of-thought prompting to guide its reasoning, i.e., “Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?”

Bhattacharya further explained that Retrieval Augmented Generation (RAG) combines retrieval and generation aspects in language processing to empower an LLM with domain-specific knowledge, as well as allowing for more accurate and context-aware responses.

“[RAG] feeds an external knowledgebase to an existing LLM…[framing] the final response with both the external data source and the data that the LLM is trained on,” said Bhattacharya.

Another way to mitigate hallucinations is through the utilization of knowledge graphs, or structured representations of knowledge, that provide explicit and standardized vocabulary of concepts. This offers a rich source of domain-specific information that links user queries to standardized concepts, reduces linguistic variation in queries, and provides detailed information, explained Bhattacharya.

Human-in-the-loop (HITL) also serves to narrow LLM output, where human oversight—preferably from subject matter experts (SMEs)—validates LLM-generated outputs. Human annotators can assign scores and evaluate generated content against a baseline, ultimately persisting LLM ethics, accuracy, and reliability. General user education and awareness about the limitations of LLMs is another fundamental approach to creating a culture of responsible AI usage.

Ozkucur highlighted that while AI can help  enterprises disrupt, innovate, generate faster insights, cut costs, and increase productivity, implementing this technology both responsibly and successfully is dependent on high-quality, trusted data.

Being fit for GenAI, in terms of both the input of the datasets and the governance of the input and output, is a critical evaluation that any enterprise planning to adopt GenAI must consider.

A model-to-marketplace approach for managing and using proprietary data, Ozkucur argued, is a critical component of achieving a robust data foundation to support GenAI initiatives.

Such a marketplace invites the cultivation of data products—or combinations of data assets including models, reports, and more to solve a specific business problem—that afford greater agility in supplying data to the necessary consumers, explained Ozkucur. Consolidated within a data catalog, a data product delivers a wealth of information to create a foundation that drives successful GenAI.

Many Data Summit 2024 presentations are available for review at