Agentic RAG

Move beyond static retrieval. Agentic RAG puts autonomous agents in control of the retrieval pipeline — dynamically planning queries, reflecting on results, and orchestrating multi-source knowledge.

SDK Focus MCP + subagents outputFormat multi-step tool calls

The Evolution of RAG

RAG has evolved through three distinct stages:

Stage 1: Static RAG (Naive)
  Query → Retrieve → Generate → Done
  Single retrieval, single generation. No feedback loop.

Stage 2: Iterative RAG (Advanced)
  Query → Retrieve → Check → Re-retrieve → Generate
  Multi-step retrieval with predefined pipelines.

Stage 3: Agentic RAG (Autonomous)
  Query → Agent decides strategy → Retrieve/Search/Ask
       → Reflect on quality → Adjust → Generate
  Agent controls the entire workflow dynamically.

Why Agentic RAG?

Traditional RAG fails on complex queries because it treats retrieval as a one-shot black box. Agentic RAG solves this by adding four key capabilities:

Capability	What It Does	Example
Reflection	Self-assess retrieval quality	"These results don't answer the question — I need to reformulate"
Planning	Decompose complex queries	Break "compare X vs Y" into sub-queries for each
Tool Use	Orchestrate multiple retrieval sources	Vector DB + Web search + SQL query + API call
Multi-Agent	Specialized agents collaborate	Retriever agent + Fact-checker agent + Synthesizer agent

Architecture Patterns

Single-Agent Agentic RAG

One agent controls the full retrieval-generation cycle with dynamic routing:

┌─────────────────────────────────────────────────────────┐
│                   Agentic RAG Agent                      │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   User Query                                             │
│       ↓                                                  │
│   ┌─────────────┐                                        │
│   │ Query Router │ → Which source(s) to use?             │
│   └──────┬──────┘                                        │
│     ┌────┼────┬────────┐                                 │
│     ↓    ↓    ↓        ↓                                 │
│   [Vec] [Web] [SQL] [API]                                │
│     ↓    ↓    ↓        ↓                                 │
│   ┌─────────────────────────┐                            │
│   │   Result Evaluator      │ ← Good enough?             │
│   └──────────┬──────────────┘                            │
│         Yes? │ No? → Reformulate query → retry            │
│              ↓                                            │
│   ┌─────────────────────────┐                            │
│   │   Response Generator    │                            │
│   └─────────────────────────┘                            │
└─────────────────────────────────────────────────────────┘

Multi-Agent Agentic RAG

Specialized agents handle different aspects of retrieval:

Orchestrator Agent
  ├── Query Planner Agent    → Decomposes complex queries
  ├── Retrieval Agent        → Fetches from vector stores
  ├── Web Search Agent       → Searches the internet
  ├── Fact-Check Agent       → Validates retrieved information
  └── Synthesis Agent        → Combines and generates final answer

Intelligent Query Routing

The agent decides the best retrieval strategy based on query type:

def route_query(query: str, agent: Agent) -> Strategy:
    """Agent dynamically selects retrieval strategy."""
    
    analysis = agent.analyze(query)
    
    if analysis.is_factual:
        # Direct lookup from knowledge base
        return VectorSearchStrategy(top_k=5)
    
    elif analysis.is_comparative:
        # Decompose into sub-queries
        sub_queries = agent.decompose(query)
        return MultiQueryStrategy(sub_queries)
    
    elif analysis.needs_current_info:
        # Use web search + knowledge base
        return HybridStrategy(
            sources=[WebSearch(), VectorDB()],
            merge="interleave"
        )
    
    elif analysis.needs_structured_data:
        # Generate and execute SQL/API queries
        return StructuredQueryStrategy(
            schema=get_schema(),
            query_type="sql"
        )

Self-Reflection Loop

The defining feature of Agentic RAG: the agent evaluates its own retrieval results before generating a response.

max_iterations = 3

for i in range(max_iterations):
    # Retrieve
    contexts = retriever.search(query, top_k=10)
    
    # Reflect: are these results sufficient?
    evaluation = agent.evaluate(
        query=query,
        contexts=contexts,
        criteria=["relevance", "completeness", "freshness"]
    )
    
    if evaluation.score >= 0.8:
        break  # Good enough, proceed to generation
    
    # Not good enough — adapt
    if evaluation.issue == "irrelevant":
        query = agent.reformulate(query, feedback=evaluation.feedback)
    elif evaluation.issue == "incomplete":
        query = agent.expand(query, missing=evaluation.gaps)
    elif evaluation.issue == "outdated":
        retriever = WebSearchRetriever()  # Switch to web search

# Generate with validated context
response = agent.generate(query=query, contexts=contexts)

Agentic RAG with MCP

MCP makes Agentic RAG modular and composable. Each data source becomes an MCP server:

// MCP Configuration for Agentic RAG
{
  "mcpServers": {
    "vector-store": {
      "command": "python",
      "args": ["mcp_server_chroma.py", "--collection", "docs"]
    },
    "web-search": {
      "command": "npx",
      "args": ["-y", "@mcp/server-brave-search"]
    },
    "database": {
      "command": "npx",
      "args": ["-y", "@mcp/server-postgres", "postgresql://..."]
    },
    "confluence": {
      "command": "python",
      "args": ["mcp_server_confluence.py"]
    }
  }
}

// The agent now has unified access to ALL retrieval sources
// via standard MCP tool calls — no custom integration code needed

Agentic RAG agentic-rag.ts

import { query } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

const RelevanceSchema = z.object({
  relevant: z.boolean(),
  confidence: z.number().min(0).max(1),
  needsMoreContext: z.boolean(),
  suggestedQuery: z.string().optional()
});

// Agent decides: retrieve, evaluate, re-retrieve
const response = query({
  prompt: "What is our refund policy for enterprise customers?",
  options: {
    mcpServers: {
      "docs": { command: "npx", args: ["mcp-server-docs"] }
    },
    allowedTools: ["mcp__docs__search", "mcp__docs__get_page"],
    outputFormat: {
      type: "json_schema",
      json_schema: {
        name: "RelevanceCheck",
        strict: true,
        schema: zodToJsonSchema(RelevanceSchema)
      }
    }
  }
});

Agentic RAG vs Traditional RAG

Aspect	Traditional RAG	Agentic RAG
Retrieval	Single-shot, static	Iterative, adaptive
Query handling	Direct embedding	Decomposition + routing
Quality control	None (hope for the best)	Self-reflection + validation
Data sources	Single vector store	Multi-source orchestration
Failure handling	Garbage in, garbage out	Retry with new strategy

Best Practices

Set iteration limits — Cap reflection loops (3-5 iterations) to prevent infinite retrieval
Define quality thresholds — Clear criteria for when retrieval is "good enough"
Cache aggressively — Store retrieval results to avoid redundant queries
Monitor retrieval cost — Each iteration adds latency and token usage
Fallback gracefully — If all retrieval fails, tell the user rather than hallucinate
Use MCP for modularity — Standardize data source integration for easy swapping

Next Steps

Multi-Agent Systems — Multi-agent patterns for complex RAG
Code Generation — RAG applied to code understanding
Memory Systems — Long-term knowledge storage for RAG

Try It: Self-deciding Search Agent

Build an agent that judges whether retrieval is needed or can answer directly.

Create an MCP search tool that queries a document store
Use outputFormat for structured relevance scoring
If relevance < 0.7, the agent re-queries with a refined search
Track re-query count and final relevance score

Gate: P8 Complete — Multi-step retrieval works, relevance scores above threshold, re-query count tracked.