LLM & Prompts
Large language models are the agent's "brain." Prompt engineering is a key skill for building effective agents.
The LLM's Role in an Agent
The LLM serves as the core decision engine in an agent system:
- Understanding - Parse user intent and context
- Reasoning - Analyze problems and devise solutions
- Decision-making - Choose which tools to use and how to act
- Generation - Produce code, documents, replies, and more
System Prompt Design
The system prompt defines the agent's "persona" and behavioral boundaries. A good system prompt should include:
import { query } from "@anthropic-ai/claude-agent-sdk";
// Form 1: Simple string
const agent1 = query({
prompt: "Review this PR",
options: {
systemPrompt: "You are a senior code reviewer. Be thorough but constructive."
}
});
// Form 2: Preset with append (preserves Claude Code defaults)
const agent2 = query({
prompt: "Review this PR",
options: {
systemPrompt: {
type: "preset",
preset: "claude_code",
append: "\n\nFocus on security vulnerabilities and performance."
}
}
});
1. Identity Definition
You are an expert software engineer assistant.
You help users write, debug, and improve code.
You are precise, helpful, and safety-conscious.
2. Capability Description
You have access to the following tools:
- read_file: Read contents of a file
- write_file: Create or modify files
- run_command: Execute shell commands
- search_code: Search codebase for patterns
3. Behavioral Guidelines
Guidelines:
- Always read files before modifying them
- Explain your reasoning before taking actions
- Ask for clarification when requirements are unclear
- Never execute destructive commands without confirmation
4. Output Format
Response Format:
1. First, analyze the request
2. Then, explain your approach
3. Execute necessary actions
4. Summarize what was done
Prompting Techniques
Chain of Thought (CoT)
Ask the model to show its reasoning to improve accuracy on complex tasks:
Think step by step:
1. What is the user asking for?
2. What information do I need?
3. What's the best approach?
4. What tools should I use?
Few-shot Learning
Provide examples to guide the model's behavior:
Example 1:
User: "Create a Python function to calculate fibonacci"
Action: write_file("fib.py", "def fibonacci(n):...")
Example 2:
User: "Fix the bug in app.js"
Action: read_file("app.js")
Action: write_file("app.js", "// fixed version...")
Structured Output
Use JSON or a specific format to keep outputs parsable:
Respond in this JSON format:
{
"thought": "your reasoning",
"action": "tool_name",
"action_input": { ... }
}
import { query } from "@anthropic-ai/claude-agent-sdk";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
// Define output schema with Zod
const ReviewSchema = z.object({
summary: z.string().describe("One-line summary"),
issues: z.array(z.object({
severity: z.enum(["critical", "warning", "info"]),
file: z.string(),
line: z.number(),
message: z.string()
})),
approved: z.boolean()
});
const response = query({
prompt: "Review the code in src/auth.ts",
options: {
model: "claude-sonnet-4-5",
outputFormat: {
type: "json_schema",
json_schema: {
name: "CodeReview",
strict: true,
schema: zodToJsonSchema(ReviewSchema)
}
}
}
});
for await (const message of response) {
if (message.type === "result" && message.structured_output) {
// Guaranteed to match schema — type-safe!
const review = ReviewSchema.parse(message.structured_output);
console.log(`Approved: ${review.approved}`);
review.issues.forEach(i =>
console.log(`[${i.severity}] ${i.file}:${i.line} — ${i.message}`)
);
}
}
SDK Insight: Schema Guarantees
outputFormat with strict: true guarantees the response matches your Zod schema exactly. Access the validated result via message.structured_output. This eliminates the need for manual JSON parsing and error handling — the SDK handles it.
Context Management
LLMs have context length limits, so you need to manage context strategically:
Context Window
| Model | Context length | Notes |
|---|---|---|
| Claude 3.5 | 200K tokens | Great for large codebases |
| GPT-4 Turbo | 128K tokens | Strong general capability |
| Gemini 1.5 | 1M tokens | Ultra-long context |
Management Strategies
- Sliding window - Keep recent conversations, drop older ones
- Summary compression - Compress history into summaries
- Retrieval augmentation - Fetch relevant context on demand
- Tiered storage - Store important info in long-term memory
Model Selection
Use different models for different tasks:
- Complex reasoning - Claude 3.5 Sonnet, GPT-4
- Code generation - Claude 3.5 Sonnet, Codex
- Fast responses - Claude 3 Haiku, GPT-3.5
- Local deployment - Llama 3, CodeLlama
Extended Thinking Control
Use maxThinkingTokens to control the reasoning budget. Higher values improve quality on complex tasks but increase cost and latency. Start with the default and increase only when tasks require deep reasoning.
Best Practices
- Keep the system prompt concise but complete
- Use concrete examples instead of abstract descriptions
- Define the output format clearly
- Set a reasonable temperature (usually 0-0.3)
- Test edge cases and error handling
Next Steps
- Tools & Actions - Learn how agents call tools
- Memory Systems - Understand agent memory mechanisms
- Claude Code Prompts - See real system prompt examples
Try It: Schema-Constrained Output
Build an agent that always returns validated, type-safe JSON.
- Define a Zod schema for a task analysis:
{ task, complexity, estimatedMinutes, requiredTools } - Use
outputFormatto enforce the schema - Send 5 different task descriptions and verify all outputs validate
- Compare: run the same prompts without
outputFormat— how often does the output break?