Overview

The Core Equation: Agent = Model + Harness

An AI agent is not just a language model. It is a language model embedded in a harness — the surrounding infrastructure that determines what the model can see, do, and remember.

Agent = Model + Harness

Think of the model as an engine. The harness is the car — steering, brakes, suspension, fuel system. The best engine in the world, without a chassis to direct it, goes nowhere useful. Conversely, a well-engineered harness can make a mid-range model outperform a frontier model running with poor infrastructure.

The + denotes structural composition, not independence. Model and harness are co-evolutionary: the harness encodes assumptions about model capabilities, and those assumptions go stale as models improve. A more precise reading is Agent = f(Model, Harness) — the two constrain and shape each other. The additive formula captures the structural insight; the dynamic interplay is explored in the sections that follow.

This insight, crystallized across a series of Anthropic engineering publications from late 2024 through early 2026, represents a fundamental shift in how the industry thinks about building AI agents. The bottleneck is no longer model intelligence alone — it’s the engineering of the systems around the model.

What Is a Harness?

A harness is everything that wraps an LLM to turn it into a functioning agent:

Component	Role
System prompt	Instructions, persona, constraints — the model’s “job brief”
Tools	Functions the model can invoke to interact with external systems
Context management	What information enters the context window, and when
Session state	Durable memory that persists beyond a single context window
Orchestration loop	The control flow that calls the model, routes tool calls, handles errors
Evaluation	Quality checks on the model’s output before acting on it
Sandbox	The execution environment where generated code runs safely

The model provides intelligence. The harness provides structure, safety, and persistence. Neither is sufficient alone.

From Broad Harness to Three-Component Decomposition

The table above is the broad definition of harness — everything outside the model. This definition comes from the 2024–2025 era of practice and captures the key insight that “the bottleneck is not the model.”

But the April 2026 Managed Agents publication pushed this thinking a critical step further: the broad harness itself needs to be decoupled. Bundling all non-model components together creates irreplaceable “pet” systems (see the “Don’t Adopt a Pet” section in that chapter). The more precise decomposition is:

Session — An independent, durable event log (“session state” from the table above, elevated to a standalone component)
Harness (narrow) — The orchestration loop only (“orchestration loop + context management + evaluation” from above)
Sandbox — A disposable execution environment (“sandbox + tools” from above)

The three components can fail and be replaced independently. This decomposition is the core topic of the final chapter in this section.

Throughout this section, “harness” refers to the broad definition — everything outside the model — unless otherwise noted. When the distinction matters, “harness (narrow)” is used explicitly.

One dimension the formula omits: the human participant. Production agents operate within human-in-the-loop checkpoints, permission boundaries, and feedback cycles. The human provides intent, oversight, and course correction — a participant in the agent system, not a component of the harness.

Why Harness Engineering Matters

Models improve; harnesses must evolve

A recurring theme in Anthropic’s research: harnesses encode assumptions about model capabilities, and those assumptions go stale as models improve. For example:

Claude Sonnet 4.5 exhibited “context anxiety” — prematurely wrapping up tasks near context limits. Harnesses added context resets to compensate. Claude Opus 4.5 no longer exhibited this behavior, making those resets dead weight.
Early harnesses decomposed work into small “sprints” because models couldn’t maintain coherence over long tasks. Claude Opus 4.6 improved long-horizon planning enough that sprint decomposition became unnecessary scaffolding.

The lesson: every harness component encodes an assumption worth stress-testing. When new models arrive, re-examine your harness and strip components that are no longer load-bearing.

The model is commodity; the harness is the product

As foundation models converge in capability, the differentiator shifts to the harness. Two teams using the same model will produce wildly different agent quality depending on their harness engineering:

How they manage context across long sessions
How they decompose tasks across multiple agents
How they validate and evaluate model outputs
How they recover from failures

This is why Anthropic increasingly frames agent development as harness engineering — the craft of building the right infrastructure around the model.

The Evolution of Agent Engineering

Anthropic’s thinking on agent development has evolved through four distinct publications, each building on the last:

Publication	Date	Key Contribution
Building Effective Agents	Dec 2024	Taxonomy of agent patterns: workflows vs. autonomous agents
Effective Harnesses for Long-Running Agents	Nov 2025	Two-agent harness for multi-session tasks
Harness Design for Long-Running Apps	Mar 2026	GAN-inspired three-agent architecture
Scaling Managed Agents	Apr 2026	Infrastructure: decoupling brain from hands

A fifth cross-cutting theme — Context Engineering — runs through all four publications and represents the shift from optimizing individual prompts to managing the entire information lifecycle of an agent. It is significant enough to be a separate section in its own right.

Three Principles

From these publications, three principles emerge:

1. Start simple, add complexity only when needed

“The most successful implementations use simple, composable patterns rather than complex frameworks.” — Building Effective Agents

Don’t start with a multi-agent orchestration system. Start with a single model call and a good prompt. Add tools. Add evaluation. Add multi-agent coordination. Each layer of complexity should be justified by a measurable improvement.

2. Separate generation from evaluation

“Agents asked to evaluate their own work confidently praise mediocre outputs.” — Harness Design for Long-Running Apps

Self-evaluation bias is a fundamental limitation. The agent doing the work should not be the agent judging the work. Separate generators from evaluators — it’s easier to tune a standalone evaluator to be skeptical than to make a generator self-critical.

3. Design for model improvement

“Every harness component encodes assumptions that warrant stress testing — they may be incorrect and can quickly become obsolete as models improve.” — Harness Design for Long-Running Apps

Better models need less scaffolding, but simultaneously create space for more ambitious harnesses. The work of a harness engineer is to continuously find the right level of structure for the current model’s capabilities — neither over-constraining a capable model nor under-supporting a limited one.

Reading Guide

This section spans two abstraction layers.

The first layer is concrete harness designs (audience: agent system authors) — each doc traces an Anthropic engineering publication that builds a specific kind of harness for a specific kind of problem:

Workflow vs. Agent — Building blocks: from augmented LLMs to autonomous agents. The taxonomy that grounds everything else.
Long-Running Agent — How to make agents work across multiple context windows. The initializer/coder dual-agent pattern.
Multi-Agent Harness — The GAN-inspired planner/generator/evaluator architecture for complex applications.

The second layer is meta-infrastructure (audience: platform / infrastructure engineers) — stepping out of “how to build a harness” and asking “how do we make the harness itself replaceable”:

Managed Agents — Infrastructure at scale: decoupling the brain from the hands, sessions as durable logs, interfaces that outlast any specific implementation.

The cross-cutting theme — Context Engineering — has its own section. Compaction, memory design, prompt layering, and progressive disclosure are treated there as a unified design discipline.

Sources

Building Effective Agents — Anthropic, December 2024
Effective Harnesses for Long-Running Agents — Anthropic, November 2025
Harness Design for Long-Running Application Development — Anthropic, March 2026
Scaling Managed Agents: Decoupling the Brain from the Hands — Anthropic, April 2026
Effective Context Engineering for AI Agents — Anthropic, 2026