Agentic RAG
Move beyond static retrieval. Agentic RAG puts autonomous agents in control of the retrieval pipeline — dynamically planning queries, reflecting on results, and orchestrating multi-source knowledge.
The Evolution of RAG
RAG has evolved through three distinct stages:
Stage 1: Static RAG (Naive)
Query → Retrieve → Generate → Done
Single retrieval, single generation. No feedback loop.
Stage 2: Iterative RAG (Advanced)
Query → Retrieve → Check → Re-retrieve → Generate
Multi-step retrieval with predefined pipelines.
Stage 3: Agentic RAG (Autonomous)
Query → Agent decides strategy → Retrieve/Search/Ask
→ Reflect on quality → Adjust → Generate
Agent controls the entire workflow dynamically.
Why Agentic RAG?
Traditional RAG fails on complex queries because it treats retrieval as a one-shot black box. Agentic RAG solves this by adding four key capabilities:
| Capability | What It Does | Example |
|---|---|---|
| Reflection | Self-assess retrieval quality | "These results don't answer the question — I need to reformulate" |
| Planning | Decompose complex queries | Break "compare X vs Y" into sub-queries for each |
| Tool Use | Orchestrate multiple retrieval sources | Vector DB + Web search + SQL query + API call |
| Multi-Agent | Specialized agents collaborate | Retriever agent + Fact-checker agent + Synthesizer agent |
Architecture Patterns
Single-Agent Agentic RAG
One agent controls the full retrieval-generation cycle with dynamic routing:
┌─────────────────────────────────────────────────────────┐
│ Agentic RAG Agent │
├─────────────────────────────────────────────────────────┤
│ │
│ User Query │
│ ↓ │
│ ┌─────────────┐ │
│ │ Query Router │ → Which source(s) to use? │
│ └──────┬──────┘ │
│ ┌────┼────┬────────┐ │
│ ↓ ↓ ↓ ↓ │
│ [Vec] [Web] [SQL] [API] │
│ ↓ ↓ ↓ ↓ │
│ ┌─────────────────────────┐ │
│ │ Result Evaluator │ ← Good enough? │
│ └──────────┬──────────────┘ │
│ Yes? │ No? → Reformulate query → retry │
│ ↓ │
│ ┌─────────────────────────┐ │
│ │ Response Generator │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Multi-Agent Agentic RAG
Specialized agents handle different aspects of retrieval:
Orchestrator Agent
├── Query Planner Agent → Decomposes complex queries
├── Retrieval Agent → Fetches from vector stores
├── Web Search Agent → Searches the internet
├── Fact-Check Agent → Validates retrieved information
└── Synthesis Agent → Combines and generates final answer
Intelligent Query Routing
The agent decides the best retrieval strategy based on query type:
def route_query(query: str, agent: Agent) -> Strategy:
"""Agent dynamically selects retrieval strategy."""
analysis = agent.analyze(query)
if analysis.is_factual:
# Direct lookup from knowledge base
return VectorSearchStrategy(top_k=5)
elif analysis.is_comparative:
# Decompose into sub-queries
sub_queries = agent.decompose(query)
return MultiQueryStrategy(sub_queries)
elif analysis.needs_current_info:
# Use web search + knowledge base
return HybridStrategy(
sources=[WebSearch(), VectorDB()],
merge="interleave"
)
elif analysis.needs_structured_data:
# Generate and execute SQL/API queries
return StructuredQueryStrategy(
schema=get_schema(),
query_type="sql"
)
Self-Reflection Loop
The defining feature of Agentic RAG: the agent evaluates its own retrieval results before generating a response.
max_iterations = 3
for i in range(max_iterations):
# Retrieve
contexts = retriever.search(query, top_k=10)
# Reflect: are these results sufficient?
evaluation = agent.evaluate(
query=query,
contexts=contexts,
criteria=["relevance", "completeness", "freshness"]
)
if evaluation.score >= 0.8:
break # Good enough, proceed to generation
# Not good enough — adapt
if evaluation.issue == "irrelevant":
query = agent.reformulate(query, feedback=evaluation.feedback)
elif evaluation.issue == "incomplete":
query = agent.expand(query, missing=evaluation.gaps)
elif evaluation.issue == "outdated":
retriever = WebSearchRetriever() # Switch to web search
# Generate with validated context
response = agent.generate(query=query, contexts=contexts)
Agentic RAG with MCP
MCP makes Agentic RAG modular and composable. Each data source becomes an MCP server:
// MCP Configuration for Agentic RAG
{
"mcpServers": {
"vector-store": {
"command": "python",
"args": ["mcp_server_chroma.py", "--collection", "docs"]
},
"web-search": {
"command": "npx",
"args": ["-y", "@mcp/server-brave-search"]
},
"database": {
"command": "npx",
"args": ["-y", "@mcp/server-postgres", "postgresql://..."]
},
"confluence": {
"command": "python",
"args": ["mcp_server_confluence.py"]
}
}
}
// The agent now has unified access to ALL retrieval sources
// via standard MCP tool calls — no custom integration code needed
import { query } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
const RelevanceSchema = z.object({
relevant: z.boolean(),
confidence: z.number().min(0).max(1),
needsMoreContext: z.boolean(),
suggestedQuery: z.string().optional()
});
// Agent decides: retrieve, evaluate, re-retrieve
const response = query({
prompt: "What is our refund policy for enterprise customers?",
options: {
mcpServers: {
"docs": { command: "npx", args: ["mcp-server-docs"] }
},
allowedTools: ["mcp__docs__search", "mcp__docs__get_page"],
outputFormat: {
type: "json_schema",
json_schema: {
name: "RelevanceCheck",
strict: true,
schema: zodToJsonSchema(RelevanceSchema)
}
}
}
});
Agentic RAG vs Traditional RAG
| Aspect | Traditional RAG | Agentic RAG |
|---|---|---|
| Retrieval | Single-shot, static | Iterative, adaptive |
| Query handling | Direct embedding | Decomposition + routing |
| Quality control | None (hope for the best) | Self-reflection + validation |
| Data sources | Single vector store | Multi-source orchestration |
| Failure handling | Garbage in, garbage out | Retry with new strategy |
Best Practices
- Set iteration limits — Cap reflection loops (3-5 iterations) to prevent infinite retrieval
- Define quality thresholds — Clear criteria for when retrieval is "good enough"
- Cache aggressively — Store retrieval results to avoid redundant queries
- Monitor retrieval cost — Each iteration adds latency and token usage
- Fallback gracefully — If all retrieval fails, tell the user rather than hallucinate
- Use MCP for modularity — Standardize data source integration for easy swapping
Next Steps
- Multi-Agent Systems — Multi-agent patterns for complex RAG
- Code Generation — RAG applied to code understanding
- Memory Systems — Long-term knowledge storage for RAG
Try It: Self-deciding Search Agent
Build an agent that judges whether retrieval is needed or can answer directly.
- Create an MCP search tool that queries a document store
- Use
outputFormatfor structured relevance scoring - If relevance < 0.7, the agent re-queries with a refined search
- Track re-query count and final relevance score