Skip to main content

LLM & Prompts

Large language models are the agent's "brain." Prompt engineering is a key skill for building effective agents.

SDK Focus systemPrompt outputFormat Zod schema structured_output maxThinkingTokens

The LLM's Role in an Agent

The LLM serves as the core decision engine in an agent system:

  • Understanding - Parse user intent and context
  • Reasoning - Analyze problems and devise solutions
  • Decision-making - Choose which tools to use and how to act
  • Generation - Produce code, documents, replies, and more

System Prompt Design

The system prompt defines the agent's "persona" and behavioral boundaries. A good system prompt should include:

System Prompt Config system-prompt.ts
import { query } from "@anthropic-ai/claude-agent-sdk";

// Form 1: Simple string
const agent1 = query({
  prompt: "Review this PR",
  options: {
    systemPrompt: "You are a senior code reviewer. Be thorough but constructive."
  }
});

// Form 2: Preset with append (preserves Claude Code defaults)
const agent2 = query({
  prompt: "Review this PR",
  options: {
    systemPrompt: {
      type: "preset",
      preset: "claude_code",
      append: "\n\nFocus on security vulnerabilities and performance."
    }
  }
});

1. Identity Definition

You are an expert software engineer assistant. 
You help users write, debug, and improve code.
You are precise, helpful, and safety-conscious.

2. Capability Description

You have access to the following tools:
- read_file: Read contents of a file
- write_file: Create or modify files
- run_command: Execute shell commands
- search_code: Search codebase for patterns

3. Behavioral Guidelines

Guidelines:
- Always read files before modifying them
- Explain your reasoning before taking actions
- Ask for clarification when requirements are unclear
- Never execute destructive commands without confirmation

4. Output Format

Response Format:
1. First, analyze the request
2. Then, explain your approach
3. Execute necessary actions
4. Summarize what was done

Prompting Techniques

Chain of Thought (CoT)

Ask the model to show its reasoning to improve accuracy on complex tasks:

Think step by step:
1. What is the user asking for?
2. What information do I need?
3. What's the best approach?
4. What tools should I use?

Few-shot Learning

Provide examples to guide the model's behavior:

Example 1:
User: "Create a Python function to calculate fibonacci"
Action: write_file("fib.py", "def fibonacci(n):...")

Example 2:
User: "Fix the bug in app.js"
Action: read_file("app.js")
Action: write_file("app.js", "// fixed version...")

Structured Output

Use JSON or a specific format to keep outputs parsable:

Respond in this JSON format:
{
  "thought": "your reasoning",
  "action": "tool_name",
  "action_input": { ... }
}
Structured Output structured-output.ts
import { query } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";

// Define output schema with Zod
const ReviewSchema = z.object({
  summary: z.string().describe("One-line summary"),
  issues: z.array(z.object({
    severity: z.enum(["critical", "warning", "info"]),
    file: z.string(),
    line: z.number(),
    message: z.string()
  })),
  approved: z.boolean()
});

const response = query({
  prompt: "Review the code in src/auth.ts",
  options: {
    model: "claude-sonnet-4-5",
    outputFormat: {
      type: "json_schema",
      json_schema: {
        name: "CodeReview",
        strict: true,
        schema: zodToJsonSchema(ReviewSchema)
      }
    }
  }
});

for await (const message of response) {
  if (message.type === "result" && message.structured_output) {
    // Guaranteed to match schema — type-safe!
    const review = ReviewSchema.parse(message.structured_output);
    console.log(`Approved: ${review.approved}`);
    review.issues.forEach(i =>
      console.log(`[${i.severity}] ${i.file}:${i.line} — ${i.message}`)
    );
  }
}

SDK Insight: Schema Guarantees

outputFormat with strict: true guarantees the response matches your Zod schema exactly. Access the validated result via message.structured_output. This eliminates the need for manual JSON parsing and error handling — the SDK handles it.

Context Management

LLMs have context length limits, so you need to manage context strategically:

Context Window

Model Context length Notes
Claude 3.5 200K tokens Great for large codebases
GPT-4 Turbo 128K tokens Strong general capability
Gemini 1.5 1M tokens Ultra-long context

Management Strategies

  • Sliding window - Keep recent conversations, drop older ones
  • Summary compression - Compress history into summaries
  • Retrieval augmentation - Fetch relevant context on demand
  • Tiered storage - Store important info in long-term memory

Model Selection

Use different models for different tasks:

  • Complex reasoning - Claude 3.5 Sonnet, GPT-4
  • Code generation - Claude 3.5 Sonnet, Codex
  • Fast responses - Claude 3 Haiku, GPT-3.5
  • Local deployment - Llama 3, CodeLlama

Extended Thinking Control

Use maxThinkingTokens to control the reasoning budget. Higher values improve quality on complex tasks but increase cost and latency. Start with the default and increase only when tasks require deep reasoning.

Best Practices

  • Keep the system prompt concise but complete
  • Use concrete examples instead of abstract descriptions
  • Define the output format clearly
  • Set a reasonable temperature (usually 0-0.3)
  • Test edge cases and error handling

Next Steps

Try It: Schema-Constrained Output

Build an agent that always returns validated, type-safe JSON.

  1. Define a Zod schema for a task analysis: { task, complexity, estimatedMinutes, requiredTools }
  2. Use outputFormat to enforce the schema
  3. Send 5 different task descriptions and verify all outputs validate
  4. Compare: run the same prompts without outputFormat — how often does the output break?
Gate: P3 Complete — Structured output validates 100%, system prompt preset understood, thinking budget tradeoff measured.