Memory Systems
Memory systems let agents retain context, learn from experience, and stay consistent across sessions.
Memory Types
An agent's memory system is similar to the human brain and can be divided into several types:
┌─────────────────────────────────────────────────────────┐
│ Memory Architecture │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Working │ │ Episodic │ │ Semantic │ │
│ │ Memory │ │ Memory │ │ Memory │ │
│ │ (Short-term│ │ (Episodic │ │ (Semantic │ │
│ │ /Buffer) │ │ Memory) │ │ Memory) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Long-term Storage │ │
│ │ (Vector DB / File System) │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Working Memory
The current conversation context, held directly in the LLM's context window:
- The user's current request
- Recent conversation history
- Current task state
- Recently read file contents
import { query } from "@anthropic-ai/claude-agent-sdk";
// 1. Start a session — capture session_id
let sessionId: string;
const initial = query({ prompt: "Analyze src/ directory structure" });
for await (const msg of initial) {
if (msg.type === "system" && msg.subtype === "init") {
sessionId = msg.session_id; // Save for later
}
}
// 2. Resume the same session — agent remembers context
const resumed = query({
prompt: "Now refactor the largest file you found",
options: { resume: sessionId }
});
// 3. Fork session — explore alternative without modifying original
const forked = query({
prompt: "Try a different approach using GraphQL",
options: {
resume: sessionId,
forkSession: true // Original session stays unchanged
}
});
// 4. Continue — resume with new prompt (different from resume)
const continued = query({
prompt: "Also add error handling",
options: { continue: sessionId }
});
Episodic Memory
Past experiences and events, used to:
- Recall similar problems solved before
- Learn from failures
- Maintain conversational continuity
Semantic Memory
Knowledge and facts, including:
- Project code structure
- API documentation
- User preferences
- Domain knowledge
Implementation Methods
Context Window
The simplest memory method: directly use the LLM context:
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Create a function..."},
{"role": "assistant", "content": "Here's the function..."},
{"role": "user", "content": "Now add error handling"},
# ... more conversation
]
Limitation: constrained by context window size; you must truncate or summarize when it overflows.
import {
unstable_v2_createSession,
unstable_v2_prompt
} from "@anthropic-ai/claude-agent-sdk";
// Simpler multi-turn pattern with V2 APIs
const session = await unstable_v2_createSession({
model: "claude-sonnet-4-5",
workingDirectory: process.cwd(),
allowedTools: ["Read", "Grep", "Glob"]
});
// Turn 1
const stream1 = unstable_v2_prompt(session, "Analyze the codebase");
for await (const msg of stream1) { console.log(msg); }
// Turn 2 — same session, full context preserved
const stream2 = unstable_v2_prompt(session, "Suggest improvements");
for await (const msg of stream2) { console.log(msg); }
// Enable 1M token context for large codebases
const largeCtx = query({
prompt: "Analyze this entire monorepo",
options: {
betas: ["context-1m-2025-08-07"],
model: "claude-sonnet-4-5"
}
});
SDK Insight: resume vs continue vs fork
resume continues an existing session. continue resumes with a new prompt appended. forkSession creates a branch — the original session stays unchanged. Use forking to explore alternatives safely.
Vector Storage
Convert information into vectors to enable semantic search:
# Store
embedding = embed("The user prefers TypeScript over JavaScript")
vector_db.insert(embedding, metadata={"type": "preference"})
# Retrieve
query = embed("What language should I use?")
results = vector_db.search(query, top_k=5)
# → Returns: "The user prefers TypeScript..."
Common vector databases:
- Pinecone - Managed service, easy to use
- Chroma - Open source, good for local development
- Weaviate - Feature-rich open-source option
- pgvector - PostgreSQL extension
File System
Persist memory to files:
.agent/
├── memory/
│ ├── user_preferences.json
│ ├── project_context.md
│ └── conversation_history/
│ ├── 2024-01-15.json
│ └── 2024-01-16.json
└── knowledge/
├── codebase_index.json
└── api_docs.md
RAG (Retrieval-Augmented Generation)
RAG combines retrieval and generation so agents can use external knowledge:
┌──────────────────────────────────────────────────────┐
│ RAG Pipeline │
├──────────────────────────────────────────────────────┤
│ │
│ User Query │
│ ↓ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Embed │ → │ Search │ │
│ │ Query │ │ Vector DB │ │
│ └─────────────┘ └─────────────┘ │
│ ↓ │
│ Retrieved Contexts │
│ ↓ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Prompt = Query + Retrieved Contexts │ │
│ └─────────────────────────────────────────────┘ │
│ ↓ │
│ LLM │
│ ↓ │
│ Response │
└──────────────────────────────────────────────────────┘
Claude Code Memory Mechanism
Claude Code uses a CLAUDE.md file as project memory:
# CLAUDE.md
## Project Overview
This is a Next.js e-commerce application.
## Key Conventions
- Use TypeScript for all new files
- Follow the existing folder structure
- Use Tailwind CSS for styling
## Important Context
- Database: PostgreSQL with Prisma
- Auth: NextAuth.js
- State: Zustand
Advantages of this approach:
- Human-readable and editable
- Version-control friendly
- Zero dependencies
- Persistent across sessions
Best Practices
- Tiered storage - Hot data in context, cold data in vector storage
- Regular summaries - Compress long chats into summaries
- Relevance filtering - Retrieve only relevant info to avoid noise
- Freshness handling - Consider information recency
- Privacy protection - Do not store sensitive information
Next Steps
- Planning & Reasoning - How memory supports planning
- Multi-Agent - Memory sharing between agents
Try It: Session Lifecycle
Build a multi-turn agent that demonstrates all session management patterns.
- Start a session, capture
session_id - Resume with a follow-up question — verify the agent remembers context
- Fork the session and take a different approach
- Compare the forked result with the original — both should be valid
- Bonus: Try V2 Session APIs for a cleaner multi-turn pattern