Code Generation Agents

Coding agents are the most visible application of AI agents today. Understand how they work — from context management to multi-file editing to self-correcting workflows.

SDK Focus Read/Write/Edit/Bash enableFileCheckpointing rewindFiles() outputFormat

The 2026 Landscape

AI coding agents have evolved from autocomplete to autonomous software engineers. The market has reached $11B+ with several distinct approaches:

Tool	Interface	Context Window	Strength
Claude Code	Terminal CLI	200K tokens	Deep reasoning, autonomous tasks
Cursor	AI-first IDE	120K tokens	IDE integration, tab prediction
GitHub Copilot	VS Code extension	128K tokens	Enterprise compliance, ecosystem
Google Antigravity	Multi-agent	1M tokens	Parallel workflows, visual verification

Coding Agent Architecture

┌─────────────────────────────────────────────────────────┐
│                  Coding Agent Runtime                     │
├─────────────────────────────────────────────────────────┤
│                                                          │
│   ┌──────────────────────────────────────────────────┐  │
│   │              Context Management                   │  │
│   │  Project files │ Conversation │ CLAUDE.md/Rules  │  │
│   └──────────────────────────────────────────────────┘  │
│                          ↓                               │
│   ┌──────────────────────────────────────────────────┐  │
│   │              LLM Decision Engine                  │  │
│   │       Understand → Plan → Decide → Act            │  │
│   └──────────────────────────────────────────────────┘  │
│                          ↓                               │
│   ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐   │
│   │Read File│ │Edit File│ │  Shell  │ │  Search  │   │
│   └─────────┘ └─────────┘ └─────────┘ └──────────┘   │
│                          ↓                               │
│   ┌──────────────────────────────────────────────────┐  │
│   │              Verification Loop                    │  │
│   │   Run tests → Check lints → Validate output      │  │
│   └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Context Management

The critical challenge of coding agents. A typical codebase has millions of tokens, but the context window is limited. Effective strategies include:

Layered Context

Priority 1 (Always loaded):
  ├── System prompt + tool definitions
  ├── CLAUDE.md / project rules
  └── Current conversation

Priority 2 (On-demand):
  ├── Files being edited
  ├── Relevant test files
  └── Import dependencies

Priority 3 (Search-retrieved):
  ├── Semantic search results
  ├── Grep / glob matches
  └── Documentation snippets

Project Memory: CLAUDE.md

Claude Code pioneered using a markdown file as persistent project memory:

# CLAUDE.md — Project context for the coding agent

## Architecture
- Next.js 15 app router, TypeScript strict mode
- Database: PostgreSQL with Drizzle ORM
- Auth: NextAuth.js v5 with GitHub OAuth

## Conventions
- Use `pnpm` not npm
- Components in `src/components/`, co-located tests
- All API routes return { data, error } shape

## Known Issues
- Migration #47 has a race condition (see PR #312)
- Don't modify `legacy/` folder — scheduled for removal

The Coding Agent Workflow

A typical coding task follows this pattern:

Task: "Fix the login timeout bug"

Step 1: Understand
  → Read relevant files (auth.ts, middleware.ts)
  → Search for "timeout" references
  → Check recent git history

Step 2: Plan  
  → "The issue is in session.ts line 45 — 
     the token refresh is using setTimeout 
     instead of the session TTL config"

Step 3: Implement
  → Edit session.ts: replace hardcoded timeout
  → Update config.ts: add SESSION_TTL env var
  → Edit .env.example: document new variable

Step 4: Verify
  → Run existing tests: `pnpm test auth`
  → Run linter: `pnpm lint`
  → Add new test for timeout behavior

Step 5: Report
  → Summarize changes to the user
  → Suggest follow-up actions

File Checkpointing code-gen.ts

import { query } from "@anthropic-ai/claude-agent-sdk";

const response = query({
  prompt: "Refactor src/auth.ts to use JWT instead of sessions",
  options: {
    allowedTools: ["Read", "Write", "Edit", "Bash"],
    enableFileCheckpointing: true  // Enable rollback
  }
});

let checkpointId: string;
for await (const msg of response) {
  // Save checkpoint before risky changes
  if (msg.type === "user" && msg.uuid) {
    checkpointId = msg.uuid;
  }

  // If tests fail, rewind to checkpoint
  if (msg.type === "tool_result" && msg.content?.includes("FAIL")) {
    console.log("Tests failed — rewinding files...");
    await response.rewindFiles(checkpointId);
    // Files restored to pre-refactor state
  }
}

Sub-Agent Patterns

Complex coding tasks benefit from specialized sub-agents:

Main Agent (Orchestrator)
  │
  ├── Explorer Agent (fast model)
  │   "Search the codebase for all usages of AuthService"
  │   → Returns: file list + relevant snippets
  │
  ├── Implementer Agent (capable model)
  │   "Refactor AuthService to support OAuth2"
  │   → Returns: edited files + explanation
  │
  └── Reviewer Agent (capable model)
      "Review these changes for security issues"
      → Returns: review comments + approval

What Coding Agents Do Well (and Don't)

Strong fit:

Bug fixes with clear reproduction steps
Adding test coverage for existing code
UI feature changes (styling, layout, copy)
Documentation updates and generation
Routine refactoring (rename, extract, move)
Accessibility improvements
Technical debt reduction

Poor fit (needs human guidance):

Complex cross-system architecture changes
Security-critical authentication flows
Ambiguous requirements without clear specs
Performance optimization requiring profiling
Business logic with deep domain knowledge

Cost & Token Economics

Understanding token costs is crucial for production use:

A single coding task typically consumes 100K-400K tokens
At $3/$15 per million tokens (input/output), a complex task costs $1-6
Power users can exceed $200/month subscription limits mid-month
Strategy: use fast models for exploration, capable models for implementation

Best Practices

Write clear task descriptions — Treat issue descriptions as prompts with acceptance criteria
Maintain CLAUDE.md — Keep project context fresh and accurate
Read before edit — Always let the agent understand context before making changes
Verify with tests — Require the agent to run tests after changes
Iterate in PR comments — Review agent PRs like you'd review a junior developer
Use sub-agents — Split complex tasks into focused, parallelizable work

Next Steps

Human-in-the-Loop — When to keep humans in the coding loop
Production Agents — Deploying coding agents at scale
Claude Code Deep Dive — Hands-on with a production coding agent

Try It: 3-Layer Self-Healing Loop

Build a code generation agent with automatic error recovery.

Enable enableFileCheckpointing
Layer 1: TypeScript compile check (Bash: tsc --noEmit)
Layer 2: Runtime test (npx tsx test.ts)
Layer 3: Visual/output assertion
On failure: rewindFiles() and retry with error context
Generate 10 scripts, target ≥ 70% first-pass success rate

Gate: P10 Complete — 10 scripts generated, ≥ 70% success, repair logs with diffs, file checkpointing demonstrated.