Skip to main content

Agentic RAG

Move beyond static retrieval. Agentic RAG puts autonomous agents in control of the retrieval pipeline — dynamically planning queries, reflecting on results, and orchestrating multi-source knowledge.

SDK Focus MCP + subagents outputFormat multi-step tool calls

The Evolution of RAG

RAG has evolved through three distinct stages:

Stage 1: Static RAG (Naive)
  Query → Retrieve → Generate → Done
  Single retrieval, single generation. No feedback loop.

Stage 2: Iterative RAG (Advanced)
  Query → Retrieve → Check → Re-retrieve → Generate
  Multi-step retrieval with predefined pipelines.

Stage 3: Agentic RAG (Autonomous)
  Query → Agent decides strategy → Retrieve/Search/Ask
       → Reflect on quality → Adjust → Generate
  Agent controls the entire workflow dynamically.

Why Agentic RAG?

Traditional RAG fails on complex queries because it treats retrieval as a one-shot black box. Agentic RAG solves this by adding four key capabilities:

Capability What It Does Example
Reflection Self-assess retrieval quality "These results don't answer the question — I need to reformulate"
Planning Decompose complex queries Break "compare X vs Y" into sub-queries for each
Tool Use Orchestrate multiple retrieval sources Vector DB + Web search + SQL query + API call
Multi-Agent Specialized agents collaborate Retriever agent + Fact-checker agent + Synthesizer agent

Architecture Patterns

Single-Agent Agentic RAG

One agent controls the full retrieval-generation cycle with dynamic routing:

┌─────────────────────────────────────────────────────────┐
│                   Agentic RAG Agent                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   User Query                                             │
│       ↓                                                  │
│   ┌─────────────┐                                        │
│   │ Query Router │ → Which source(s) to use?             │
│   └──────┬──────┘                                        │
│     ┌────┼────┬────────┐                                 │
│     ↓    ↓    ↓        ↓                                 │
│   [Vec] [Web] [SQL] [API]                                │
│     ↓    ↓    ↓        ↓                                 │
│   ┌─────────────────────────┐                            │
│   │   Result Evaluator      │ ← Good enough?             │
│   └──────────┬──────────────┘                            │
│         Yes? │ No? → Reformulate query → retry            │
│              ↓                                            │
│   ┌─────────────────────────┐                            │
│   │   Response Generator    │                            │
│   └─────────────────────────┘                            │
└─────────────────────────────────────────────────────────┘

Multi-Agent Agentic RAG

Specialized agents handle different aspects of retrieval:

Orchestrator Agent
  ├── Query Planner Agent    → Decomposes complex queries
  ├── Retrieval Agent        → Fetches from vector stores
  ├── Web Search Agent       → Searches the internet
  ├── Fact-Check Agent       → Validates retrieved information
  └── Synthesis Agent        → Combines and generates final answer

Intelligent Query Routing

The agent decides the best retrieval strategy based on query type:

def route_query(query: str, agent: Agent) -> Strategy:
    """Agent dynamically selects retrieval strategy."""
    
    analysis = agent.analyze(query)
    
    if analysis.is_factual:
        # Direct lookup from knowledge base
        return VectorSearchStrategy(top_k=5)
    
    elif analysis.is_comparative:
        # Decompose into sub-queries
        sub_queries = agent.decompose(query)
        return MultiQueryStrategy(sub_queries)
    
    elif analysis.needs_current_info:
        # Use web search + knowledge base
        return HybridStrategy(
            sources=[WebSearch(), VectorDB()],
            merge="interleave"
        )
    
    elif analysis.needs_structured_data:
        # Generate and execute SQL/API queries
        return StructuredQueryStrategy(
            schema=get_schema(),
            query_type="sql"
        )

Self-Reflection Loop

The defining feature of Agentic RAG: the agent evaluates its own retrieval results before generating a response.

max_iterations = 3

for i in range(max_iterations):
    # Retrieve
    contexts = retriever.search(query, top_k=10)
    
    # Reflect: are these results sufficient?
    evaluation = agent.evaluate(
        query=query,
        contexts=contexts,
        criteria=["relevance", "completeness", "freshness"]
    )
    
    if evaluation.score >= 0.8:
        break  # Good enough, proceed to generation
    
    # Not good enough — adapt
    if evaluation.issue == "irrelevant":
        query = agent.reformulate(query, feedback=evaluation.feedback)
    elif evaluation.issue == "incomplete":
        query = agent.expand(query, missing=evaluation.gaps)
    elif evaluation.issue == "outdated":
        retriever = WebSearchRetriever()  # Switch to web search

# Generate with validated context
response = agent.generate(query=query, contexts=contexts)

Agentic RAG with MCP

MCP makes Agentic RAG modular and composable. Each data source becomes an MCP server:

// MCP Configuration for Agentic RAG
{
  "mcpServers": {
    "vector-store": {
      "command": "python",
      "args": ["mcp_server_chroma.py", "--collection", "docs"]
    },
    "web-search": {
      "command": "npx",
      "args": ["-y", "@mcp/server-brave-search"]
    },
    "database": {
      "command": "npx",
      "args": ["-y", "@mcp/server-postgres", "postgresql://..."]
    },
    "confluence": {
      "command": "python",
      "args": ["mcp_server_confluence.py"]
    }
  }
}

// The agent now has unified access to ALL retrieval sources
// via standard MCP tool calls — no custom integration code needed
Agentic RAG agentic-rag.ts
import { query } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const RelevanceSchema = z.object({
  relevant: z.boolean(),
  confidence: z.number().min(0).max(1),
  needsMoreContext: z.boolean(),
  suggestedQuery: z.string().optional()
});

// Agent decides: retrieve, evaluate, re-retrieve
const response = query({
  prompt: "What is our refund policy for enterprise customers?",
  options: {
    mcpServers: {
      "docs": { command: "npx", args: ["mcp-server-docs"] }
    },
    allowedTools: ["mcp__docs__search", "mcp__docs__get_page"],
    outputFormat: {
      type: "json_schema",
      json_schema: {
        name: "RelevanceCheck",
        strict: true,
        schema: zodToJsonSchema(RelevanceSchema)
      }
    }
  }
});

Agentic RAG vs Traditional RAG

Aspect Traditional RAG Agentic RAG
Retrieval Single-shot, static Iterative, adaptive
Query handling Direct embedding Decomposition + routing
Quality control None (hope for the best) Self-reflection + validation
Data sources Single vector store Multi-source orchestration
Failure handling Garbage in, garbage out Retry with new strategy

Best Practices

  • Set iteration limits — Cap reflection loops (3-5 iterations) to prevent infinite retrieval
  • Define quality thresholds — Clear criteria for when retrieval is "good enough"
  • Cache aggressively — Store retrieval results to avoid redundant queries
  • Monitor retrieval cost — Each iteration adds latency and token usage
  • Fallback gracefully — If all retrieval fails, tell the user rather than hallucinate
  • Use MCP for modularity — Standardize data source integration for easy swapping

Next Steps

Try It: Self-deciding Search Agent

Build an agent that judges whether retrieval is needed or can answer directly.

  1. Create an MCP search tool that queries a document store
  2. Use outputFormat for structured relevance scoring
  3. If relevance < 0.7, the agent re-queries with a refined search
  4. Track re-query count and final relevance score
Gate: P8 Complete — Multi-step retrieval works, relevance scores above threshold, re-query count tracked.