prepareStep Semantics

Why `prepareStep` is the most critical hook

Almost all context engineering at agent runtime happens inside prepareStep:

Compacting long contexts (trigger summary / tiered trimming once token threshold is hit)
Injecting system-reminders (ephemeral constraints, not in the conversation history)
Dynamically filtering tool visibility (activeTools filtering — show the model only a small candidate subset each step)
Forcing tool usage (toolChoice: "required" to prevent premature exit)
Dynamically switching models (cheap model for planning, strong model for execution)
Applying cache control breakpoints (Anthropic prompt cache) — ordering matters, see Compaction Boundary as Cache Anchor

It’s also the easiest hook to mis-use — get the reference semantics wrong and you pollute the whole conversation. This page systematically covers the full API signature and common patterns, closing with a real mutate-vs-push trap breakdown.

Prerequisite: read Message Reference Model first. This page assumes you know the difference between stepInputMessages and prepareStepResult.messages.

Full signature

type PrepareStepCallback = (options: {
  model: LanguageModel; // Model for this step (overridable per invocation)
  steps: StepResult[]; // Completed historical steps (doesn't include current)
  stepNumber: number; // Current step index (0-based)
  messages: ModelMessage[]; // stepInputMessages — see Message Reference Model
  experimental_context: unknown; // Context passed through layers (business-layer-defined shape)
}) =>
  | Promise<{
      model?: LanguageModel; // Switch model for this step
      system?: string; // Override system prompt for this step
      messages?: ModelMessage[]; // Messages sent to model for this step (only)
      toolChoice?: ToolChoice; // Tool selection strategy for this step
      activeTools?: string[]; // Tool visibility list for this step
      providerOptions?: ProviderOptions; // Per-step provider options (merged with L1/L2's providerOptions)
      experimental_context?: unknown; // Override context for this step (rare)
    }>
  | undefined;

Return undefined or no return: the SDK uses defaults for this invocation (stepInputMessages, settings-level model, activeTools, toolChoice, etc.).

Return partial fields: the SDK overrides only those fields; others keep their defaults (streamText path dist/index.js:7195-7220; generateText equivalent at 4322-4340).

7 overridable fields

Field	Overrides	Typical use	Scope of override
`model`	This step’s LLM	Light/heavy split, fallback	This step only
`system`	This step’s system prompt	A/B prompts, context-based persona	This step only
`messages`	Messages sent to the model	Compaction, reminder injection, filtering	This step only
`toolChoice`	Tool selection policy	`"required"` to force tool call, `"none"` to disable, `{ type: "tool", toolName }` to force specific	This step only
`activeTools`	Visible tools for this step	Tool search pool narrowing, hiding dangerous tools by context	This step only
`providerOptions`	Provider-specific options (merged with L1/L2’s `providerOptions`, not replaced — `dist:7219`)	Dynamically shift Anthropic `cache_control` breakpoints, adjust reasoning / thinking budget per step, swap OpenAI seed per step	This step only
`experimental_context`	Context seen by downstream `tool.execute`	Per-step state passing (rare — usually set once at L1)	This step only
(no `tools`)	—	Tool set is fixed at L1/L2; prepareStep can filter but not add	—

Critical constraint: prepareStep cannot add new tools — only filter from the fixed ToolSet at L1/L2 via activeTools. If you need “per-step dynamic tool set”, either register everything at L1 and filter with activeTools, or use prepareCall to swap the entire tools (see Lifecycle → prepareCall).

4 typical patterns

Pattern 1: Reminder injection (ephemeral context)

You want to give the model a temporary “system hint” every step (e.g. “no emojis”, “prefer internal tools”) without polluting the conversation history:

prepareStep: async ({ messages, experimental_context }) => {
  const ctx = experimental_context as MyContext;
  if (ctx.reminders.length === 0) return undefined;

  const remindersText = ctx.reminders
    .map(r => `<system-reminder>\n${r}\n</system-reminder>`)
    .join('\n');

  // OK — append a new message; visible this step, gone next step
  return {
    messages: [
      ...messages,
      { role: 'user', content: `[system directive — not user input]\n${remindersText}` },
    ],
  };
},

Why this works: see Message Reference Model row 3 — push is the only “auto-disappears next step” pattern.

Pattern 2: Tool narrowing (big toolset → small candidate)

The “tool search pool” pattern — when an agent is wired to MCP or a dynamic tool-discovery source, the available tool set may be hundreds; but each step the model should only see a small subset relevant to the current context (lower token cost, better selection accuracy):

prepareStep: async ({ experimental_context }) => {
  const ctx = experimental_context as MyContext;
  const pool = ctx.toolDiscovery;
  if (!pool?.active) return undefined;

  return {
    activeTools: pool.getActiveToolNames(),   // e.g. ["read_file", "grep", "tool_search"]
  };
},

Why not use tools: tools is registered at L1 (the full set); activeTools is a name filter — the SDK screens by name in prepareToolsAndToolChoice (dist/index.js:4330-4334), zero cost.

Pattern 3: Force tool usage (prevent premature exit)

Scenario: the agent has a todo list, and must absolutely not produce a final answer while items are pending:

prepareStep: async ({ experimental_context }) => {
  const ctx = experimental_context as MyContext;
  const hasIncompleteTodos = ctx.todos?.some(t => t.status !== 'done') ?? false;

  if (hasIncompleteTodos) {
    return { toolChoice: 'required' };   // Force a tool call this step
  }
  return undefined;
},

When to use toolChoice alternatives:

'auto' (default): model chooses freely
'required': must call a tool (which tool is up to the model)
'none': disable tools (pure generation)
{ type: 'tool', toolName: 'finish' }: force calling the specified tool

Typical uses: 'required' for “pending todos block exit”, { type: 'tool', toolName: 'complete' } at a subagent’s finalize phase to force result delivery, 'none' for “final summarization step must not call tools anymore”.

Pattern 4: Dynamic model switching

prepareStep: async ({ model, messages, stepNumber }) => {
  // First 3 steps use a cheap model for planning, then switch to a strong model
  if (stepNumber < 3) {
    return { model: haikuModel };
  }
  return { model: sonnetModel };
},

Caveat: switching models does NOT switch tool schemas — tool definitions live at L1. Some providers are sensitive to tool schema format (OpenAI vs Anthropic). Before switching models, ensure the tool schema is compatible on both sides.

The mutate-vs-push trap (the canonical anti-pattern)

This is the direct application of Message Reference Model to prepareStep. Below is a real production case — the original version carried the pollution bug, and was later refactored to the push pattern:

Anti-example (polluting version)

// Anti-pattern: mutating objects inside initialMessages
prepareStep: async ({ messages }) => {
  const remindersText = ctx.getRemindersText();
  const lastMessage = messages[messages.length - 1];

  if (lastMessage.role === 'user') {
    // WRONG — direct mutation permanently pollutes this message in initialMessages
    lastMessage.content += `\n${remindersText}`;
  } else {
    // OK — push a new message, gone next step
    messages.push({ role: 'user', content: remindersText });
  }
  return { messages };
},

What’s wrong:

lastMessage is a reference to stepInputMessages[k] — the same object as the user message in initialMessages.
lastMessage.content += ... creates a new string and assigns it to .content — but the field is overwritten on the original object, which still sits in initialMessages.
After step 1 pollutes it, step 2/3/4 each rebuild stepInputMessages = [...initialMessages, ...responseMessages] and spread in this polluted message — the reminder is always there.
Worse, each step re-appends the reminder — accumulating exponentially.

Correct version (uniform push)

prepareStep: async ({ messages }) => {
  const remindersText = ctx.getRemindersText();
  if (!remindersText) return undefined;

  // OK — regardless of whether last is user, push a new message
  return {
    messages: [
      ...messages,
      { role: 'user', content: `[system directive — not user input]\n${remindersText}` },
    ],
  };
},

Why this works:

A new user message object is created each step; visible this step only.
Next step’s stepInputMessages rebuild is pristine [...initialMessages, ...responseMessages]; the new message vanishes.
No accumulation risk, no shared-reference pollution.

One-sentence verdict

Never mutate the fields of any message object you receive from prepareStep’s messages parameter.

Operation	Verdict
`msg.content += 'x'`	Pollutes
`msg.content.push(part)`	Pollutes
`msg.metadata = {...}`	Pollutes
`msg.role = 'system'`	Pollutes
`return { messages: [...messages, newMsg] }`	Safe
`return { messages: messages.filter(...) }`	Safe (new array, no element mutation)
`return { messages: messages.map((m, i) => i === N ? {...m, content: 'new'} : m) }`	Safe (new object replaces target object)

Step-transition timeline (focused view)

sequenceDiagram participant SL as streamText loop participant P as prepareStep participant Model rect rgba(200, 220, 255, 0.2) Note over SL: Step n SL->>SL: stepInputMessages = [...initialMessages, ...responseMessages] SL->>P: prepareStep({ messages: stepInputMessages }) Note over P: WRONG — if you mutate messages[k].content: initialMessages[k] permanently polluted P->>SL: prepareStepResult.messages (new array for this step) SL->>Model: convertToLanguageModelPrompt(messages) Note over SL: OK — returned array is for this step only; not written back to initialMessages Model-->>SL: assistant + tool (pushed to responseMessages) end rect rgba(255, 220, 200, 0.2) Note over SL: Step n+1 SL->>SL: stepInputMessages rebuilt = [...initialMessages, ...responseMessages] Note over SL: Arrays returned from step n vanish but mutated objects persist! SL->>P: prepareStep({...}) end

Performance notes

prepareStep runs before every step’s model call and is blocking — the model call won’t start until your Promise resolves.

Common performance pitfalls:

Sync I/O (fs.readFileSync, sync DB queries) — directly blocks the event loop.
Token counting (@anthropic-ai/tokenizer or tiktoken) — first load takes ~100ms for model weights; counting every step adds up quickly. Practical fix: cache the prior round’s token estimate and only recount the message segments that changed.
LLM calls (for summarization during compaction) — a single LLM compaction call takes seconds; doubles your per-step wait time. Practical fix: separate “decide whether to compact” (cheap sync by token estimate, only when threshold is hit) from “actually compact” (only then goes to LLM).

Rule of thumb: keep prepareStep’s total time (sync + async) under 200ms. Any higher and each step adds perceptible latency — a 20-step task adds 4+ seconds.

Division of labor vs `prepareCall`

Scenario	Use `prepareCall` (once per call)	Use `prepareStep` (once per step)
Pick model by user tier	Use: decide once at call start	Overkill: recomputes each step
Dynamically switch model by context	Can’t: don’t know at call start	Use: decide each step from steps / messages
Inject system-reminder	Can’t: reminder is runtime state	Use: correct scenario
Compact messages	Can’t: one-shot at call start, no reaction to growth	Use: compact only when token threshold hit
Replace entire tool set	Use: prepareStep can’t add tools	Can’t: only filters, doesn’t add

Why prepareStep is the most critical hook