Code Generation Agents
Coding agents are the most visible application of AI agents today. Understand how they work — from context management to multi-file editing to self-correcting workflows.
The 2026 Landscape
AI coding agents have evolved from autocomplete to autonomous software engineers. The market has reached $11B+ with several distinct approaches:
| Tool | Interface | Context Window | Strength |
|---|---|---|---|
| Claude Code | Terminal CLI | 200K tokens | Deep reasoning, autonomous tasks |
| Cursor | AI-first IDE | 120K tokens | IDE integration, tab prediction |
| GitHub Copilot | VS Code extension | 128K tokens | Enterprise compliance, ecosystem |
| Google Antigravity | Multi-agent | 1M tokens | Parallel workflows, visual verification |
Coding Agent Architecture
┌─────────────────────────────────────────────────────────┐
│ Coding Agent Runtime │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Context Management │ │
│ │ Project files │ Conversation │ CLAUDE.md/Rules │ │
│ └──────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ LLM Decision Engine │ │
│ │ Understand → Plan → Decide → Act │ │
│ └──────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│ │Read File│ │Edit File│ │ Shell │ │ Search │ │
│ └─────────┘ └─────────┘ └─────────┘ └──────────┘ │
│ ↓ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Verification Loop │ │
│ │ Run tests → Check lints → Validate output │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Context Management
The critical challenge of coding agents. A typical codebase has millions of tokens, but the context window is limited. Effective strategies include:
Layered Context
Priority 1 (Always loaded):
├── System prompt + tool definitions
├── CLAUDE.md / project rules
└── Current conversation
Priority 2 (On-demand):
├── Files being edited
├── Relevant test files
└── Import dependencies
Priority 3 (Search-retrieved):
├── Semantic search results
├── Grep / glob matches
└── Documentation snippets
Project Memory: CLAUDE.md
Claude Code pioneered using a markdown file as persistent project memory:
# CLAUDE.md — Project context for the coding agent
## Architecture
- Next.js 15 app router, TypeScript strict mode
- Database: PostgreSQL with Drizzle ORM
- Auth: NextAuth.js v5 with GitHub OAuth
## Conventions
- Use `pnpm` not npm
- Components in `src/components/`, co-located tests
- All API routes return { data, error } shape
## Known Issues
- Migration #47 has a race condition (see PR #312)
- Don't modify `legacy/` folder — scheduled for removal
The Coding Agent Workflow
A typical coding task follows this pattern:
Task: "Fix the login timeout bug"
Step 1: Understand
→ Read relevant files (auth.ts, middleware.ts)
→ Search for "timeout" references
→ Check recent git history
Step 2: Plan
→ "The issue is in session.ts line 45 —
the token refresh is using setTimeout
instead of the session TTL config"
Step 3: Implement
→ Edit session.ts: replace hardcoded timeout
→ Update config.ts: add SESSION_TTL env var
→ Edit .env.example: document new variable
Step 4: Verify
→ Run existing tests: `pnpm test auth`
→ Run linter: `pnpm lint`
→ Add new test for timeout behavior
Step 5: Report
→ Summarize changes to the user
→ Suggest follow-up actions
import { query } from "@anthropic-ai/claude-agent-sdk";
const response = query({
prompt: "Refactor src/auth.ts to use JWT instead of sessions",
options: {
allowedTools: ["Read", "Write", "Edit", "Bash"],
enableFileCheckpointing: true // Enable rollback
}
});
let checkpointId: string;
for await (const msg of response) {
// Save checkpoint before risky changes
if (msg.type === "user" && msg.uuid) {
checkpointId = msg.uuid;
}
// If tests fail, rewind to checkpoint
if (msg.type === "tool_result" && msg.content?.includes("FAIL")) {
console.log("Tests failed — rewinding files...");
await response.rewindFiles(checkpointId);
// Files restored to pre-refactor state
}
}
Sub-Agent Patterns
Complex coding tasks benefit from specialized sub-agents:
Main Agent (Orchestrator)
│
├── Explorer Agent (fast model)
│ "Search the codebase for all usages of AuthService"
│ → Returns: file list + relevant snippets
│
├── Implementer Agent (capable model)
│ "Refactor AuthService to support OAuth2"
│ → Returns: edited files + explanation
│
└── Reviewer Agent (capable model)
"Review these changes for security issues"
→ Returns: review comments + approval
What Coding Agents Do Well (and Don't)
Strong fit:
- Bug fixes with clear reproduction steps
- Adding test coverage for existing code
- UI feature changes (styling, layout, copy)
- Documentation updates and generation
- Routine refactoring (rename, extract, move)
- Accessibility improvements
- Technical debt reduction
Poor fit (needs human guidance):
- Complex cross-system architecture changes
- Security-critical authentication flows
- Ambiguous requirements without clear specs
- Performance optimization requiring profiling
- Business logic with deep domain knowledge
Cost & Token Economics
Understanding token costs is crucial for production use:
- A single coding task typically consumes 100K-400K tokens
- At $3/$15 per million tokens (input/output), a complex task costs $1-6
- Power users can exceed $200/month subscription limits mid-month
- Strategy: use fast models for exploration, capable models for implementation
Best Practices
- Write clear task descriptions — Treat issue descriptions as prompts with acceptance criteria
- Maintain CLAUDE.md — Keep project context fresh and accurate
- Read before edit — Always let the agent understand context before making changes
- Verify with tests — Require the agent to run tests after changes
- Iterate in PR comments — Review agent PRs like you'd review a junior developer
- Use sub-agents — Split complex tasks into focused, parallelizable work
Next Steps
- Human-in-the-Loop — When to keep humans in the coding loop
- Production Agents — Deploying coding agents at scale
- Claude Code Deep Dive — Hands-on with a production coding agent
Try It: 3-Layer Self-Healing Loop
Build a code generation agent with automatic error recovery.
- Enable
enableFileCheckpointing - Layer 1: TypeScript compile check (Bash:
tsc --noEmit) - Layer 2: Runtime test (
npx tsx test.ts) - Layer 3: Visual/output assertion
- On failure:
rewindFiles()and retry with error context - Generate 10 scripts, target ≥ 70% first-pass success rate