15 RAG Strategies Every AI Engineer Should Know

Why RAG Matters

Large Language Models are powerful, but they hallucinate, go stale, and lack access to private data. Retrieval-Augmented Generation (RAG) solves this by grounding LLM responses in retrieved evidence — pulling relevant documents from a knowledge base before generating an answer.

But not all RAG is created equal. Naive “retrieve-and-generate” pipelines break down when queries are ambiguous, documents are long, or the knowledge base is noisy. This repository explores 15 distinct RAG strategies, each addressing a specific weakness in the basic pattern.

Every strategy includes a Python implementation under src/ and an architecture diagram under architecture/.

Quick Start

git clone https://github.com/vchirrav/ml-rag-strategies.git
cd ml-rag-strategies
pip install -r requirements.txt
export OPENAI_API_KEY="your-api-key-here"   # Never hardcode API keys

See requirements.txt for the full dependency list. All samples read credentials from environment variables — never hardcode secrets.

The 15 Strategies

Re-Ranking

Use case: Customer Support

Combines fast vector search with a cross-encoder scoring pass to surface the most relevant results from an initial candidate set. The retriever casts a wide net, then the re-ranker applies fine-grained relevance scoring to reorder results before they reach the LLM.

src/reranking/reranking_rag.py architecture/01_reranking.svg

Agentic RAG

Use case: Research Assistant

An LLM agent autonomously decides when and what to retrieve. Rather than always retrieving on every query, the agent reasons about whether retrieval is needed, formulates targeted queries, and can perform multi-step lookups with adaptive planning.

src/agentic_rag/agentic_rag.py architecture/02_agentic_rag.svg

Knowledge Graph RAG

Use case: Medical Q&A

Structures documents as entity-relationship graphs and retrieves via graph traversal rather than vector similarity alone. Entities and their relationships are extracted during indexing, enabling multi-hop reasoning across connected concepts.

src/knowledge_graph_rag/knowledge_graph_rag.py architecture/03_knowledge_graph_rag.svg

Contextual Retrieval

Use case: Legal Document Search

Prepends LLM-generated context summaries to each chunk before embedding. This preserves document-level meaning that would otherwise be lost during chunking, dramatically improving retrieval accuracy for context-dependent content.

src/contextual_retrieval/contextual_retrieval.py architecture/04_contextual_retrieval.svg

Query Expansion

Use case: E-Commerce Search

Rewrites the user query into multiple phrasings and retrieves for each variant, then deduplicates results. This bridges the vocabulary gap between how users ask questions and how information is stored in the knowledge base.

src/query_expansion/query_expansion.py architecture/05_query_expansion.svg

Multi-Query RAG

Use case: Comparative Analysis

Decomposes complex questions into independent sub-questions, retrieves context for each sub-question separately, and synthesizes a unified answer. Ideal for questions that span multiple topics or require comparison.

src/multi_query_rag/multi_query_rag.py architecture/06_multi_query_rag.svg

Context-Aware Chunking

Use case: Technical Manuals

Splits documents along semantic boundaries (headings, paragraphs, sections) rather than fixed token counts. Preserves the logical structure of documents, ensuring each chunk contains a complete, coherent unit of information.

src/context_aware_chunking/context_aware_chunking.py architecture/07_context_aware_chunking.svg

Late Chunking

Use case: Academic Papers

Processes full documents through the embedding model first, then splits the resulting token embeddings into chunks. Each chunk retains awareness of the full document context since embeddings were computed holistically before splitting.

src/late_chunking/late_chunking.py architecture/08_late_chunking.svg

Hierarchical RAG

Use case: Large Codebase Navigation

Uses two-level indexing: first retrieve high-level summaries to identify relevant documents, then drill into detailed chunks within those documents. Enables efficient coarse-to-fine retrieval over very large corpora.

src/hierarchical_rag/hierarchical_rag.py architecture/09_hierarchical_rag.svg

Fine-Tuned RAG

Use case: Biomedical Literature

Fine-tunes the embedding model or the generation model on domain-specific question-answer pairs. Adapts the retrieval and generation components to the vocabulary, concepts, and reasoning patterns of a specific domain.

src/fine_tuned_rag/fine_tuned_rag.py architecture/10_fine_tuned_rag.svg

HyDE (Hypothetical Document Embeddings)

Use case: Vague Query Resolution

Generates a hypothetical answer to the query, then uses that answer as the retrieval query instead of the original question. By embedding a plausible answer rather than a short question, retrieval quality improves significantly for underspecified queries.

src/hyde_rag/hyde_rag.py architecture/11_hyde_rag.svg

RAG-Fusion

Use case: Multi-Rephrase Combination

Generates multiple query variants, retrieves independently for each, and combines results using Reciprocal Rank Fusion (RRF). This reduces the impact of any single query formulation and produces more robust retrieval results.

src/fusion_rag/fusion_rag.py architecture/12_fusion_rag.svg

Self-RAG

Use case: Adaptive Chatbot

The model self-reflects at each generation step, deciding whether retrieval is needed, evaluating the relevance of retrieved documents, and verifying that its response is properly grounded. Introduces learned reflection tokens for autonomous quality control.

src/self_rag/self_rag.py architecture/13_self_rag.svg

Corrective RAG (CRAG)

Use case: Internal Wiki with Web Fallback

Grades retrieved documents for relevance and falls back to web search if the internal knowledge base returns insufficient results. Includes a knowledge refinement step that strips irrelevant content from retrieved passages before generation.

src/corrective_rag/corrective_rag.py architecture/14_corrective_rag.svg

Adaptive RAG

Use case: Help Desk Query Routing

Classifies incoming query complexity and routes to the appropriate retrieval strategy: simple queries use direct retrieval, moderate queries use query expansion, and complex queries are decomposed into sub-questions. Optimizes cost and latency by matching strategy to need.

src/adaptive_rag/adaptive_rag.py architecture/15_adaptive_rag.svg

When to Use Which Strategy

Improving Retrieval Quality

Re-Ranking (post-retrieval scoring), Contextual Retrieval (chunk-level context), Late Chunking (holistic embeddings), Fine-Tuned RAG (domain adaptation)

Handling Complex Queries

Query Expansion (vocabulary gap), Multi-Query RAG (decomposition), HyDE (vague queries), RAG-Fusion (multi-rephrase RRF)

Scaling to Large Corpora

Hierarchical RAG (two-level indexing), Context-Aware Chunking (semantic splitting), Knowledge Graph RAG (entity-relationship traversal)

Self-Correcting & Adaptive Systems

Self-RAG (reflection tokens), Corrective RAG (web fallback), Adaptive RAG (complexity-based routing), Agentic RAG (autonomous retrieval decisions)

Security Considerations for RAG Systems

RAG pipelines introduce unique attack surfaces that traditional applications don't have. Every strategy in this repository follows defensive practices:

No hardcoded secrets — All API keys are read from environment variables, never committed to source control
Context grounding — Responses are grounded in retrieved content to prevent hallucination and reduce prompt injection risk
No arbitrary code execution — None of the strategies execute dynamically generated code, eliminating code injection vectors
Minimal dependencies — Only well-maintained, widely-used libraries are included to reduce supply chain risk
Input validation — Review retrieved content before passing to the LLM to prevent indirect prompt injection attacks

Repository Structure

ml-rag-strategies/
├── README.md
├── RAG_LIBRARIES.md           # Companion guide to RAG libraries
├── requirements.txt           # Shared Python dependencies
├── architecture/              # SVG architecture diagrams
│   ├── 01_reranking.svg
│   ├── 02_agentic_rag.svg
│   ├── ...
│   └── 15_adaptive_rag.svg
└── src/                       # Python implementations
    ├── reranking/             # Strategy 01
    ├── agentic_rag/           # Strategy 02
    ├── knowledge_graph_rag/   # Strategy 03
    ├── contextual_retrieval/  # Strategy 04
    ├── query_expansion/       # Strategy 05
    ├── multi_query_rag/       # Strategy 06
    ├── context_aware_chunking/# Strategy 07
    ├── late_chunking/         # Strategy 08
    ├── hierarchical_rag/      # Strategy 09
    ├── fine_tuned_rag/        # Strategy 10
    ├── hyde_rag/              # Strategy 11
    ├── fusion_rag/            # Strategy 12
    ├── self_rag/              # Strategy 13
    ├── corrective_rag/        # Strategy 14
    └── adaptive_rag/          # Strategy 15

Also see RAG_LIBRARIES.md for a companion guide to the libraries and frameworks used across strategies.

Explore the Code

All 15 strategies are open-source with runnable implementations and architecture diagrams. Clone the repository, install dependencies, and start experimenting with RAG patterns.

View on GitHub Need help implementing RAG securely? Book a session