What Are These “Harnesses” Everyone Keeps Talking About?

For decades, ships crossing the English Channel crashed into the rocks at Eddystone.

They were low, treacherous reefs, often invisible in fog and storms. They were not far from civilization. They were not in the middle of some unknown ocean. They were right there, in plain sight.

And that is exactly what made them so dangerous.

At the end of the seventeenth century, someone had a bold idea: build a lighthouse directly on top of the rocks.

Not nearby.

Not on the coast.

Right on top of the danger.

The first attempt failed catastrophically. A storm wiped out the lighthouse and killed its designer, Henry Winstanley.

But from that failure came a new question.

Not: “How do we build a lighthouse?”

But: “How do we build a lighthouse that keeps working during the storm?”

That question changed maritime engineering forever.

Because a lighthouse does not move ships.
It does not control the sea.
It does not create wind.
It does not power the engines.

And yet, for centuries, it was one of the most important technologies in navigation.

Its job was simple: keep showing sailors where the rocks were.

LLMs are powerful ships.

Harnesses are lighthouses.

Intelligence Is Not Enough

Over the past few years, we have talked endlessly about language models: GPT, Claude, Gemini, Llama, and many others.

We have treated them as the center of the story. And in part, they are. A language model is an extraordinary machine. It can write code, summarize documents, interpret requests, generate plans, analyze errors, and suggest solutions.

But there is a misunderstanding.

An LLM, by itself, is not a product.

It is an engine.

And a powerful engine, without a steering wheel, brakes, dashboard, fuel system, safety rules, and a road to drive on, will not get us very far. In fact, it may take us very quickly to the wrong place.

This is where harnesses come in.

A harness is the layer that makes an LLM usable in the real world.

It does not create intelligence.

It channels it.

It does not replace the model.

It connects the model to the world.

It is not the brain.

It is the nervous system, the cockpit, the lighthouse, and sometimes the emergency brake.

Why Everyone Is Talking About Them

You may have heard of Claude Code, Codex, OpenCode, Pi Agent, and many other similar tools.

They are often described as “agents,” “coding agents,” “AI developers,” “AI assistants,” or “autonomous workflows.” But under the surface, much of the difference between these tools is not just the model they use.

It is the harness.

A harness decides how to connect the model to external tools.
It decides when to run a command.
It decides which files to read.
It decides how much context to pass to the model.
It decides what to remember and what to forget.
It decides how to use MCPs.
It decides how to orchestrate multiple agents.
It decides when to ask the user for permission.
It decides when to stop.

These decisions may sound like implementation details.

They are not.

They are the product.

Two tools can use the same model and behave in completely different ways. One may feel fast, disciplined, and smart. The other may feel confused, slow, and expensive.

Not because the model is different.

Because the lighthouse is different.

The Context Problem

The most important part of a harness is probably context management.

As humans, we tend to assume that “more information” means “more intelligence.” But often, the opposite is true.

A model with too much context can become less effective. It has to navigate irrelevant files, old instructions, unnecessary details, duplicated fragments, noisy logs, previous requests, and information that no longer matters.

It is like asking a captain to sail through a storm while someone dumps every nautical map ever created onto his desk.

The problem is not having many maps.

The problem is knowing which map matters right now.

A good harness must do exactly that: choose.

It must keep the context small, clean, and relevant. A smaller context often means a more focused, cheaper, and faster model.

But there is a risk.

If the context is too small, the model loses essential information. It forgets constraints, goals, prior decisions, and errors that have already been fixed. It starts doing the same work twice. Or worse, it makes decisions that look correct but are based on incomplete memory.

The real art is not giving the model everything.

The real art is giving it only what matters.

Efficiency Is a Form of Intelligence

Today, many comparisons between harnesses focus on features.

This tool supports more commands.
That one supports more agents.
This one has more integrations.
That one handles MCPs better.
This one can edit files.
That one can run tests.

All true.

But there is another dimension, less flashy and perhaps more important: efficiency.

To build the same feature, some harnesses take less time and fewer tokens. Others are more capable in complex situations but consume far more resources.

There is no single right answer.

A very powerful harness may be perfect for a complex refactoring, but absurdly heavy for a simple task. A lightweight harness may be excellent for quick operations, but fragile when it has to coordinate multiple tools, maintain memory, or reason across a large codebase.

Sometimes we see systems that take tens of seconds and thousands of tokens just to reply “hi.”

Of course, saying hello to an LLM is not useful. And neither is thanking it.

But we are human.

For thousands of years, we have treated objects as if they had souls. We named ships, talked to cars, cursed at printers, and thanked elevators when they arrived quickly.

So it is not surprising that we also say “thanks” to a statistical machine.

The problem is not our “thanks.”

The problem is a harness that takes it too seriously.

Agent Swarms and the Return of Bureaucracy

Some harnesses can orchestrate teams of agents, often called agent swarms.

The idea is fascinating: instead of having one agent do everything, we can have multiple specialized agents. One analyzes requirements. One writes code. One runs tests. One checks security. One reviews the architecture. One produces documentation.

It sounds like a small artificial organization.

And that is exactly why it brings back an old human problem: bureaucracy.

When one agent makes a mistake, the problem is simple. When ten agents collaborate badly, the problem becomes political.

Who decides?
Who verifies?
Who has the final say?
Who prevents one agent from endlessly correcting another agent’s work?
Who makes sure the team does not burn a million tokens to produce three lines of code?

Agent swarms will be extremely useful in some situations.

But they are not magic.

They are organizations.

And like all organizations, they work only when they have clear roles, clear constraints, clear goals, and a strong control system.

Here too, the value is not just in the model.

It is in the harness.

Security Is the Real Reef

In an interview, the creator of Claude Code was asked what the hardest part was to build.

His answer was: security.

That is a much less glamorous answer than “reasoning,” “agentic workflows,” or “multi-step planning.”

But it is probably the right one.

A harness can run commands on a user’s machine. It can read files. It can modify them. It can delete them. It can install packages. It can call APIs. It can access repositories, tickets, documents, databases, secrets, logs, and infrastructure.

At that point, the question is no longer just: “Does the model understand?”

The question becomes: “What are we allowing it to do?”

The naive solution is to ask the user for permission every time.

But that does not work.

Because after the third prompt, the user will just click yes.

It is the same reason we have spent years accepting cookies, permissions, software licenses, and privacy policies without reading them. Not because we are stupid. Because no human being can live a normal life while reading every contract, every warning, and every authorization request.

If a harness asks for permission for everything, it is not making the system safer.

It is training the user to ignore security.

On the other hand, if it never asks for permission, it becomes dangerous.

The challenge is finding the right point between two opposite failures: blocking everything and allowing everything.

That is where a mature harness separates itself from a toy.

Building on Top of Harnesses

For a long time, many companies assumed they had to build their own harness.

Their own workflow.
Their own agent framework.
Their own interface.
Their own orchestration layer.
Their own memory system.
Their own integration with internal tools.

In some cases, that still makes sense.

But more and more, the better strategy will be different: build on top of existing harnesses.

Instead of building everything from scratch, companies will create commands, hooks, SDK calls, MCP tools, policies, templates, tests, guardrails, and internal integrations. They will let harnesses evolve, improve, compete, and optimize.

In other words, they will take advantage of the free innovation produced by competition between tools.

This is an important shift.

When a technology is immature, everyone builds everything.
When a technology matures, platforms, standards, extensions, and conventions emerge.

In the early days of the web, every website felt like a standalone experiment. Then came browsers, frameworks, CMSs, APIs, libraries, CDNs, and standards.

With harnesses, we are still in the phase where many people are building their own wooden lighthouse on the rocks.

Some will be wiped out by the first storm.

Others will become part of the invisible infrastructure of the future.

The Tip of the Iceberg

Today, we mostly talk about harnesses in the context of software development.

But it would be a mistake to think it will stop there.

Every time an LLM has to operate in a real environment, it will need a harness.

In finance.
In healthcare.
In manufacturing.
In customer support.
In cybersecurity.
In logistics.
In product design.
In education.
In scientific research.

Wherever there are tools to use, data to select, permissions to control, memory to manage, actions to verify, and risks to contain, there will be a need for a harness.

The model will be the visible part.

The harness will be the part that decides whether the model can actually work.

It is possible that a few years from now, we will talk less and less about models themselves. Not because they will become unimportant, but because they will become increasingly interchangeable. Like electric motors. Like databases. Like containers.

The question will not only be: “Which model are you using?”

The question will be: “What system are you letting it operate inside?”

The Lighthouse in the Storm

The Eddystone rocks were not dangerous because no one knew they existed.

They were dangerous because, in the critical moment, in fog, darkness, and storm, knowing in theory was not enough.

Sailors needed a continuous signal.

They needed infrastructure.

They needed a lighthouse.

We are in the same situation with LLMs.

We know they are powerful.
We know they can be wrong.
We know they can waste resources.
We know they can take risky actions.
We know they can get lost in context.
We know they can sound intelligent while sailing straight toward the rocks.

The point is no longer proving that the ship can move.

The point is getting it safely to its destination during the storm.

That is why harnesses exist.

They are not the most spectacular part of artificial intelligence.

But they may become one of the most important.

So far, we have only seen the tip of the iceberg.

The best part is still ahead.

Grab some popcorn.

And come watch with me.

Leave a comment