Skip to content

Patterns to mitigate LLM hallucinations

October 1, 2024 | 02:15 AM

Hallucinations in a language model refer to situations where the model generates responses that, while grammatically correct and contextually coherent, are factually incorrect or entirely made up. Unlike obvious errors, hallucinations can be more challenging to detect because the generated text appears reliable and truthful, which can mislead both users and systems that rely on the model’s accuracy.

Hallucinations in large language models represent a significant challenge to the reliability and safety of artificial intelligence systems. Understanding the fundamental causes and types of hallucinations is essential for developing effective mitigation strategies. Through a combination of improvements in training data, model adjustments, implementation of verification mechanisms, and adoption of advanced patterns, it is possible to significantly reduce the incidence of hallucinations.

Table of contents

Open Table of contents

Common causes of hallucinations

Mitigating hallucinations in LLMs with Retrieval-Augmented Generation

It is now widely accepted that the implementation of Retrieval-Augmented Generation (RAG) over Large Language Models (LLMs) significantly enhances their output quality. By integrating external knowledge retrieval mechanisms during the generation process, RAG mitigates the inherent limitations of LLMs, such as hallucination and outdated information. Rather than relying solely on the pre-trained model’s knowledge, RAG dynamically queries relevant data sources in real-time, providing the LLM with up-to-date and contextually accurate information.

While RAG significantly reduces hallucinations by integrating real-time, relevant data during the generation process, it does not completely eliminate them. Factors such as incomplete or biased retrieval sources, errors in query formulation, or limitations in the LLM’s comprehension of retrieved information can still lead to hallucinations.

Patterns to mitigate LLM hallucinations

Contextual vector retrieval and generation pattern.

This pattern is used in systems where information retrieval is augmented by AI-generated responses. It combines a search process (retrieving relevant chunks of information based on a query) with natural language generation using AI models like. It is commonly applied in intelligent assistants, advanced chatbots, and search engines.

Problem

In large datasets, simply retrieving documents or information fragments based on a query is not enough to generate contextually rich and accurate answers. Users require generated responses that are both informed by the data and synthesized into coherent, human-like text.

Solution

pattern-contextual-vector-retrieval-and-generation

Hierarchical chunk retrieval pattern

This pattern is ideal for systems where hierarchical relationships exist between data chunks, such as parent-child relationships within documents. It applies to contexts where retrieving related sub-chunks (children) and linking them to their parent chunks provides a more complete and contextually accurate answer.

Problem

In large, hierarchical data sets, a single query might return fragmented information without proper context. Retrieving only the most relevant small chunks (children) may lead to incomplete or misleading responses. There’s a need to return both relevant sub-chunks and their parent context to generate more accurate and comprehensive responses.

Solution

pattern-parent-child

Hybrid search with fusion ranking pattern

This pattern is designed for search systems that combine dense (vector-based) and sparse (traditional keyword-based) retrieval techniques. Hybrid search approaches are particularly effective when attempting to balance the advantages of semantic understanding (from dense embeddings) with the precision of keyword matches (from sparse indices like BM25).

Problem

In information retrieval, relying solely on keyword-based (sparse) methods like BM25 may fail to capture the semantic relationships in queries and documents. On the other hand, dense vector retrieval can provide semantically rich results but might miss exact keyword matches, leading to poor precision in some cases.

Solution

pattern-hybrid-search

Sentence window retrieval pattern

The pattern is used in systems where retrieving relevant chunks of text based on a query benefits from contextual extension. This pattern is commonly applied in scenarios where the meaning of a sentence or chunk is enhanced or clarified by the surrounding sentences.

Problem

In some cases, retrieving single sentences or small chunks of information based solely on query similarity may result in responses that lack full context. Without surrounding sentences, the retrieved information can be incomplete or ambiguous. There’s a need for a pattern that retrieves not just the exact relevant chunks but also their neighboring sentences, providing additional context to improve the response’s accuracy.

Solution

pattern-sentence-window

Distributed agent-based query routing pattern

The pattern is designed for complex information retrieval systems where multiple documents or knowledge bases need to be queried independently, and results must be aggregated. Each document or data source is handled by a dedicated agent, and a top-level agent coordinates the query routing and result synthesis.

Problem

In systems with large, distributed datasets, a single retrieval system may not be able to process queries efficiently or return accurate results due to the diversity of data sources. Moreover, querying all data sources simultaneously may not be efficient or necessary. There is a need for a system that intelligently routes queries to the most relevant documents or knowledge bases and aggregates the results into a coherent response.

Solution

pattern-agent-query