What is the difference between a harness and scaffolding in an AI agent?

Scaffolding is the behavior-defining layer around an LLM—system prompts, tool descriptions, context management. The harness is the execution layer that actually calls tools and runs the agent loop. Products like Claude Code use 'harness' to mean the entire non-model wrapper.

Why does this terminology matter?

Inconsistent definitions slow team communication and make it harder to evaluate agent designs. Shared vocabulary helps practitioners distinguish between architectural choices (scaffold design) and infrastructure choices (harness implementation).

Did this post provide new frameworks or just clarify existing terms?

Hugging Face clarified existing terms that had drifted in meaning across the industry. The post acknowledges that many terms lack universally accepted definitions and aims to provide practical mental models rather than enforce one correct vocabulary.

Harness vs. Scaffold: Why AI Agent Terminology Matters for Builders

The Terminology Crisis at AI Agent Conference Season

Hugging Face published a glossary on May 25 tackling a confusion that surfaced at ICLR 2026: AI agent concepts—especially “harness” and “scaffold”—lack consistent, agreed-upon definitions across frameworks and products. According to the Hugging Face Blog, researcher Ari Goldberg highlighted the problem directly at the conference: attendees offered multiple conflicting explanations for the same terms, suggesting the field is moving faster than its vocabulary can standardize. This post aims to ground definitions for practitioners building, deploying, or using agent systems.

Scaffolding and Harness: The Core Distinction

According to the Hugging Face Blog, scaffolding refers to the behavior-defining layer surrounding an LLM—system prompts, tool descriptions, parsing logic, and context management across steps. This layer shapes how the model perceives and interacts with the environment, both during training and inference.

The harness, by contrast, is the execution infrastructure: the component that actually runs the agent loop and invokes tools when the model signals intent to do so. A raw language model can express intent to call a tool, but it cannot execute the tool without a harness.

However, product naming muddies this distinction. The Hugging Face Blog notes that Claude Code, Codex, and other agentic products use “harness” to describe the entire wrapper around the model—collapsing scaffolding and execution into one label. Claude Code’s own documentation, according to Hugging Face, states: “Claude Code serves as the agentic harness around Claude,” using a broad definition of harness to mean everything except the base model.

Why This Matters for Agent Architecture Decisions

The blurred terminology creates a practical problem: teams cannot easily discuss whether a design issue stems from scaffold choices (how prompts are structured, what context is available) or harness choices (how tool calls are routed, how the loop is managed). Practitioners working with tools like Hermes Agent or building custom agent systems risk talking past each other when using the same word to mean different things.

Hugging Face’s glossary acknowledges that many agent-related terms—policy, skills, sub-agents, rollout, reward—remain undefined across the field. The post frames its role not as enforcing a single vocabulary, but as providing mental models that make cross-team and cross-framework discussions tractable. The field is evolving too quickly for consensus; clarifying the conceptual boundaries is a practical intermediate step.

Why This Matters

As AI agents move from research prototypes to production deployment, shared terminology becomes a prerequisite for collaboration. Teams choosing between frameworks, building custom harnesses, or evaluating agent designs need to distinguish scaffolding (behavior and reasoning layer) from harness (execution and integration layer). Hugging Face’s glossary lowers the entry barrier for newcomers and gives practitioners a reference point when vendor terminology diverges—a common problem as products race to claim the “agent” category without settling on what that means technically.