Skip to main content

Memory Systems

Memory systems let agents retain context, learn from experience, and stay consistent across sessions.

SDK Focus resume forkSession continue v2 Session APIs betas: context-1m

Memory Types

An agent's memory system is similar to the human brain and can be divided into several types:

┌─────────────────────────────────────────────────────────┐
│                    Memory Architecture                   │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│   │  Working    │  │  Episodic   │  │  Semantic   │    │
│   │  Memory     │  │  Memory     │  │  Memory     │    │
│   │ (Short-term│  │ (Episodic  │  │ (Semantic  │    │
│   │  /Buffer)  │  │  Memory)   │  │  Memory)   │    │
│   └─────────────┘  └─────────────┘  └─────────────┘    │
│          ↓                ↓                ↓            │
│   ┌─────────────────────────────────────────────────┐  │
│   │              Long-term Storage                   │  │
│   │           (Vector DB / File System)              │  │
│   └─────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Working Memory

The current conversation context, held directly in the LLM's context window:

  • The user's current request
  • Recent conversation history
  • Current task state
  • Recently read file contents
Session Management sessions.ts
import { query } from "@anthropic-ai/claude-agent-sdk";

// 1. Start a session — capture session_id
let sessionId: string;
const initial = query({ prompt: "Analyze src/ directory structure" });
for await (const msg of initial) {
  if (msg.type === "system" && msg.subtype === "init") {
    sessionId = msg.session_id;  // Save for later
  }
}

// 2. Resume the same session — agent remembers context
const resumed = query({
  prompt: "Now refactor the largest file you found",
  options: { resume: sessionId }
});

// 3. Fork session — explore alternative without modifying original
const forked = query({
  prompt: "Try a different approach using GraphQL",
  options: {
    resume: sessionId,
    forkSession: true  // Original session stays unchanged
  }
});

// 4. Continue — resume with new prompt (different from resume)
const continued = query({
  prompt: "Also add error handling",
  options: { continue: sessionId }
});

Episodic Memory

Past experiences and events, used to:

  • Recall similar problems solved before
  • Learn from failures
  • Maintain conversational continuity

Semantic Memory

Knowledge and facts, including:

  • Project code structure
  • API documentation
  • User preferences
  • Domain knowledge

Implementation Methods

Context Window

The simplest memory method: directly use the LLM context:

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Create a function..."},
    {"role": "assistant", "content": "Here's the function..."},
    {"role": "user", "content": "Now add error handling"},
    # ... more conversation
]

Limitation: constrained by context window size; you must truncate or summarize when it overflows.

V2 Session APIs + 1M Context v2-sessions.ts
import {
  unstable_v2_createSession,
  unstable_v2_prompt
} from "@anthropic-ai/claude-agent-sdk";

// Simpler multi-turn pattern with V2 APIs
const session = await unstable_v2_createSession({
  model: "claude-sonnet-4-5",
  workingDirectory: process.cwd(),
  allowedTools: ["Read", "Grep", "Glob"]
});

// Turn 1
const stream1 = unstable_v2_prompt(session, "Analyze the codebase");
for await (const msg of stream1) { console.log(msg); }

// Turn 2 — same session, full context preserved
const stream2 = unstable_v2_prompt(session, "Suggest improvements");
for await (const msg of stream2) { console.log(msg); }

// Enable 1M token context for large codebases
const largeCtx = query({
  prompt: "Analyze this entire monorepo",
  options: {
    betas: ["context-1m-2025-08-07"],
    model: "claude-sonnet-4-5"
  }
});

SDK Insight: resume vs continue vs fork

resume continues an existing session. continue resumes with a new prompt appended. forkSession creates a branch — the original session stays unchanged. Use forking to explore alternatives safely.

Vector Storage

Convert information into vectors to enable semantic search:

# Store
embedding = embed("The user prefers TypeScript over JavaScript")
vector_db.insert(embedding, metadata={"type": "preference"})

# Retrieve
query = embed("What language should I use?")
results = vector_db.search(query, top_k=5)
# → Returns: "The user prefers TypeScript..."

Common vector databases:

  • Pinecone - Managed service, easy to use
  • Chroma - Open source, good for local development
  • Weaviate - Feature-rich open-source option
  • pgvector - PostgreSQL extension

File System

Persist memory to files:

.agent/
├── memory/
│   ├── user_preferences.json
│   ├── project_context.md
│   └── conversation_history/
│       ├── 2024-01-15.json
│       └── 2024-01-16.json
└── knowledge/
    ├── codebase_index.json
    └── api_docs.md

RAG (Retrieval-Augmented Generation)

RAG combines retrieval and generation so agents can use external knowledge:

┌──────────────────────────────────────────────────────┐
│                    RAG Pipeline                       │
├──────────────────────────────────────────────────────┤
│                                                       │
│   User Query                                          │
│       ↓                                               │
│   ┌─────────────┐    ┌─────────────┐                 │
│   │   Embed     │ →  │   Search    │                 │
│   │   Query     │    │  Vector DB  │                 │
│   └─────────────┘    └─────────────┘                 │
│                            ↓                          │
│                    Retrieved Contexts                 │
│                            ↓                          │
│   ┌─────────────────────────────────────────────┐    │
│   │  Prompt = Query + Retrieved Contexts        │    │
│   └─────────────────────────────────────────────┘    │
│                            ↓                          │
│                         LLM                           │
│                            ↓                          │
│                       Response                        │
└──────────────────────────────────────────────────────┘

Claude Code Memory Mechanism

Claude Code uses a CLAUDE.md file as project memory:

# CLAUDE.md

## Project Overview
This is a Next.js e-commerce application.

## Key Conventions
- Use TypeScript for all new files
- Follow the existing folder structure
- Use Tailwind CSS for styling

## Important Context
- Database: PostgreSQL with Prisma
- Auth: NextAuth.js
- State: Zustand

Advantages of this approach:

  • Human-readable and editable
  • Version-control friendly
  • Zero dependencies
  • Persistent across sessions

Best Practices

  • Tiered storage - Hot data in context, cold data in vector storage
  • Regular summaries - Compress long chats into summaries
  • Relevance filtering - Retrieve only relevant info to avoid noise
  • Freshness handling - Consider information recency
  • Privacy protection - Do not store sensitive information

Next Steps

Try It: Session Lifecycle

Build a multi-turn agent that demonstrates all session management patterns.

  1. Start a session, capture session_id
  2. Resume with a follow-up question — verify the agent remembers context
  3. Fork the session and take a different approach
  4. Compare the forked result with the original — both should be valid
  5. Bonus: Try V2 Session APIs for a cleaner multi-turn pattern
Gate: P4 Complete — resume, fork, continue sessions demonstrated. V2 vs V1 session model explained.