Skip to main content

Code Generation Agents

Coding agents are the most visible application of AI agents today. Understand how they work — from context management to multi-file editing to self-correcting workflows.

SDK Focus Read/Write/Edit/Bash enableFileCheckpointing rewindFiles() outputFormat

The 2026 Landscape

AI coding agents have evolved from autocomplete to autonomous software engineers. The market has reached $11B+ with several distinct approaches:

Tool Interface Context Window Strength
Claude Code Terminal CLI 200K tokens Deep reasoning, autonomous tasks
Cursor AI-first IDE 120K tokens IDE integration, tab prediction
GitHub Copilot VS Code extension 128K tokens Enterprise compliance, ecosystem
Google Antigravity Multi-agent 1M tokens Parallel workflows, visual verification

Coding Agent Architecture

┌─────────────────────────────────────────────────────────┐
│                  Coding Agent Runtime                     │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   ┌──────────────────────────────────────────────────┐  │
│   │              Context Management                   │  │
│   │  Project files │ Conversation │ CLAUDE.md/Rules  │  │
│   └──────────────────────────────────────────────────┘  │
│                          ↓                               │
│   ┌──────────────────────────────────────────────────┐  │
│   │              LLM Decision Engine                  │  │
│   │       Understand → Plan → Decide → Act            │  │
│   └──────────────────────────────────────────────────┘  │
│                          ↓                               │
│   ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐   │
│   │Read File│ │Edit File│ │  Shell  │ │  Search  │   │
│   └─────────┘ └─────────┘ └─────────┘ └──────────┘   │
│                          ↓                               │
│   ┌──────────────────────────────────────────────────┐  │
│   │              Verification Loop                    │  │
│   │   Run tests → Check lints → Validate output      │  │
│   └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Context Management

The critical challenge of coding agents. A typical codebase has millions of tokens, but the context window is limited. Effective strategies include:

Layered Context

Priority 1 (Always loaded):
  ├── System prompt + tool definitions
  ├── CLAUDE.md / project rules
  └── Current conversation

Priority 2 (On-demand):
  ├── Files being edited
  ├── Relevant test files
  └── Import dependencies

Priority 3 (Search-retrieved):
  ├── Semantic search results
  ├── Grep / glob matches
  └── Documentation snippets

Project Memory: CLAUDE.md

Claude Code pioneered using a markdown file as persistent project memory:

# CLAUDE.md — Project context for the coding agent

## Architecture
- Next.js 15 app router, TypeScript strict mode
- Database: PostgreSQL with Drizzle ORM
- Auth: NextAuth.js v5 with GitHub OAuth

## Conventions
- Use `pnpm` not npm
- Components in `src/components/`, co-located tests
- All API routes return { data, error } shape

## Known Issues
- Migration #47 has a race condition (see PR #312)
- Don't modify `legacy/` folder — scheduled for removal

The Coding Agent Workflow

A typical coding task follows this pattern:

Task: "Fix the login timeout bug"

Step 1: Understand
  → Read relevant files (auth.ts, middleware.ts)
  → Search for "timeout" references
  → Check recent git history

Step 2: Plan  
  → "The issue is in session.ts line 45 — 
     the token refresh is using setTimeout 
     instead of the session TTL config"

Step 3: Implement
  → Edit session.ts: replace hardcoded timeout
  → Update config.ts: add SESSION_TTL env var
  → Edit .env.example: document new variable

Step 4: Verify
  → Run existing tests: `pnpm test auth`
  → Run linter: `pnpm lint`
  → Add new test for timeout behavior

Step 5: Report
  → Summarize changes to the user
  → Suggest follow-up actions
File Checkpointing code-gen.ts
import { query } from "@anthropic-ai/claude-agent-sdk";

const response = query({
  prompt: "Refactor src/auth.ts to use JWT instead of sessions",
  options: {
    allowedTools: ["Read", "Write", "Edit", "Bash"],
    enableFileCheckpointing: true  // Enable rollback
  }
});

let checkpointId: string;
for await (const msg of response) {
  // Save checkpoint before risky changes
  if (msg.type === "user" && msg.uuid) {
    checkpointId = msg.uuid;
  }

  // If tests fail, rewind to checkpoint
  if (msg.type === "tool_result" && msg.content?.includes("FAIL")) {
    console.log("Tests failed — rewinding files...");
    await response.rewindFiles(checkpointId);
    // Files restored to pre-refactor state
  }
}

Sub-Agent Patterns

Complex coding tasks benefit from specialized sub-agents:

Main Agent (Orchestrator)
  │
  ├── Explorer Agent (fast model)
  │   "Search the codebase for all usages of AuthService"
  │   → Returns: file list + relevant snippets
  │
  ├── Implementer Agent (capable model)
  │   "Refactor AuthService to support OAuth2"
  │   → Returns: edited files + explanation
  │
  └── Reviewer Agent (capable model)
      "Review these changes for security issues"
      → Returns: review comments + approval

What Coding Agents Do Well (and Don't)

Strong fit:

  • Bug fixes with clear reproduction steps
  • Adding test coverage for existing code
  • UI feature changes (styling, layout, copy)
  • Documentation updates and generation
  • Routine refactoring (rename, extract, move)
  • Accessibility improvements
  • Technical debt reduction

Poor fit (needs human guidance):

  • Complex cross-system architecture changes
  • Security-critical authentication flows
  • Ambiguous requirements without clear specs
  • Performance optimization requiring profiling
  • Business logic with deep domain knowledge

Cost & Token Economics

Understanding token costs is crucial for production use:

  • A single coding task typically consumes 100K-400K tokens
  • At $3/$15 per million tokens (input/output), a complex task costs $1-6
  • Power users can exceed $200/month subscription limits mid-month
  • Strategy: use fast models for exploration, capable models for implementation

Best Practices

  • Write clear task descriptions — Treat issue descriptions as prompts with acceptance criteria
  • Maintain CLAUDE.md — Keep project context fresh and accurate
  • Read before edit — Always let the agent understand context before making changes
  • Verify with tests — Require the agent to run tests after changes
  • Iterate in PR comments — Review agent PRs like you'd review a junior developer
  • Use sub-agents — Split complex tasks into focused, parallelizable work

Next Steps

Try It: 3-Layer Self-Healing Loop

Build a code generation agent with automatic error recovery.

  1. Enable enableFileCheckpointing
  2. Layer 1: TypeScript compile check (Bash: tsc --noEmit)
  3. Layer 2: Runtime test (npx tsx test.ts)
  4. Layer 3: Visual/output assertion
  5. On failure: rewindFiles() and retry with error context
  6. Generate 10 scripts, target ≥ 70% first-pass success rate
Gate: P10 Complete — 10 scripts generated, ≥ 70% success, repair logs with diffs, file checkpointing demonstrated.