Agent = Model + Harness
The core thesis behind Anthropic's harness engineering — why agent performance is determined by the harness around the model, not just the model itself
The Core Equation
An AI agent is not just a language model. It is a language model embedded in a harness — the surrounding infrastructure that determines what the model can see, do, and remember.
Agent = Model + Harness
Think of the model as an engine. The harness is the car — steering, brakes, suspension, fuel system. The best engine in the world, without a chassis to direct it, goes nowhere useful. Conversely, a well-engineered harness can make a mid-range model outperform a frontier model running with poor infrastructure.
The + denotes structural composition, not independence. Model and harness are co-evolutionary: the harness encodes
assumptions about model capabilities, and those assumptions go stale as models improve. A more precise reading is Agent
= f(Model, Harness) — the two constrain and shape each other. The additive formula captures the structural insight; the
dynamic interplay is explored in the sections that follow.
This insight, crystallized across a series of Anthropic engineering publications from late 2024 through early 2026, represents a fundamental shift in how the industry thinks about building AI agents. The bottleneck is no longer model intelligence alone — it’s the engineering of the systems around the model.
What Is a Harness?
A harness is everything that wraps an LLM to turn it into a functioning agent:
| Component | Role |
|---|---|
| System prompt | Instructions, persona, constraints — the model’s “job brief” |
| Tools | Functions the model can invoke to interact with external systems |
| Context management | What information enters the context window, and when |
| Session state | Durable memory that persists beyond a single context window |
| Orchestration loop | The control flow that calls the model, routes tool calls, handles errors |
| Evaluation | Quality checks on the model’s output before acting on it |
| Sandbox | The execution environment where generated code runs safely |
The model provides intelligence. The harness provides structure, safety, and persistence. Neither is sufficient alone.
From Broad Harness to Three-Component Decomposition
The table above is the broad definition of harness — everything outside the model. This definition comes from the 2024–2025 era of practice and captures the key insight that “the bottleneck is not the model.”
But the April 2026 Managed Agents publication pushed this thinking a critical step further: the broad harness itself needs to be decoupled. Bundling all non-model components together creates irreplaceable “pet” systems (see the “Don’t Adopt a Pet” section in that chapter). The more precise decomposition is:
- Session — An independent, durable event log (“session state” from the table above, elevated to a standalone component)
- Harness (narrow) — The orchestration loop only (“orchestration loop + context management + evaluation” from above)
- Sandbox — A disposable execution environment (“sandbox + tools” from above)
The three components can fail and be replaced independently. This decomposition is the core topic of the final chapter in this section.
Throughout this section, “harness” refers to the broad definition — everything outside the model — unless otherwise noted. When the distinction matters, “harness (narrow)” is used explicitly.
One dimension the formula omits: the human participant. Production agents operate within human-in-the-loop checkpoints, permission boundaries, and feedback cycles. The human provides intent, oversight, and course correction — a participant in the agent system, not a component of the harness.
Why Harness Engineering Matters
Models improve; harnesses must evolve
A recurring theme in Anthropic’s research: harnesses encode assumptions about model capabilities, and those assumptions go stale as models improve. For example:
- Claude Sonnet 4.5 exhibited “context anxiety” — prematurely wrapping up tasks near context limits. Harnesses added context resets to compensate. Claude Opus 4.5 no longer exhibited this behavior, making those resets dead weight.
- Early harnesses decomposed work into small “sprints” because models couldn’t maintain coherence over long tasks. Claude Opus 4.6 improved long-horizon planning enough that sprint decomposition became unnecessary scaffolding.
The lesson: every harness component encodes an assumption worth stress-testing. When new models arrive, re-examine your harness and strip components that are no longer load-bearing.
The model is commodity; the harness is the product
As foundation models converge in capability, the differentiator shifts to the harness. Two teams using the same model will produce wildly different agent quality depending on their harness engineering:
- How they manage context across long sessions
- How they decompose tasks across multiple agents
- How they validate and evaluate model outputs
- How they recover from failures
This is why Anthropic increasingly frames agent development as harness engineering — the craft of building the right infrastructure around the model.
The Evolution of Agent Engineering
Anthropic’s thinking on agent development has evolved through four distinct publications, each building on the last:
| Publication | Date | Key Contribution |
|---|---|---|
| Building Effective Agents | Dec 2024 | Taxonomy of agent patterns: workflows vs. autonomous agents |
| Effective Harnesses for Long-Running Agents | Nov 2025 | Two-agent harness for multi-session tasks |
| Harness Design for Long-Running Apps | Mar 2026 | GAN-inspired three-agent architecture |
| Scaling Managed Agents | Apr 2026 | Infrastructure: decoupling brain from hands |
A fifth cross-cutting theme — Context Engineering — runs through all four publications and represents the shift from optimizing individual prompts to managing the entire information lifecycle of an agent. It is significant enough to be a separate section in its own right.
Three Principles
From these publications, three principles emerge:
1. Start simple, add complexity only when needed
“The most successful implementations use simple, composable patterns rather than complex frameworks.” — Building Effective Agents
Don’t start with a multi-agent orchestration system. Start with a single model call and a good prompt. Add tools. Add evaluation. Add multi-agent coordination. Each layer of complexity should be justified by a measurable improvement.
2. Separate generation from evaluation
“Agents asked to evaluate their own work confidently praise mediocre outputs.” — Harness Design for Long-Running Apps
Self-evaluation bias is a fundamental limitation. The agent doing the work should not be the agent judging the work. Separate generators from evaluators — it’s easier to tune a standalone evaluator to be skeptical than to make a generator self-critical.
3. Design for model improvement
“Every harness component encodes assumptions that warrant stress testing — they may be incorrect and can quickly become obsolete as models improve.” — Harness Design for Long-Running Apps
Better models need less scaffolding, but simultaneously create space for more ambitious harnesses. The work of a harness engineer is to continuously find the right level of structure for the current model’s capabilities — neither over-constraining a capable model nor under-supporting a limited one.
Reading Guide
This section traces Anthropic’s harness engineering publications in detail:
-
Agent Patterns — The building blocks: from augmented LLMs to autonomous agents. The taxonomy that grounds everything else.
-
Long-Running Harness — How to make agents work across multiple context windows. The initializer/coder pattern.
-
Multi-Agent Harness — The GAN-inspired planner/generator/evaluator architecture for complex applications.
-
Managed Agents — Infrastructure at scale: decoupling the brain from the hands, sessions as durable logs, and interfaces that outlast implementations.
The cross-cutting theme — Context Engineering — has its own section. Compaction, memory design, prompt layering, and progressive disclosure are treated there as a unified design discipline.
Sources
- Building Effective Agents — Anthropic, December 2024
- Effective Harnesses for Long-Running Agents — Anthropic, November 2025
- Harness Design for Long-Running Application Development — Anthropic, March 2026
- Scaling Managed Agents: Decoupling the Brain from the Hands — Anthropic, April 2026
- Effective Context Engineering for AI Agents — Anthropic, 2026