Loop engineering: the skill that quietly replaced prompt engineering

Written by

Alicia Kirana Utomo

Reviewed by

Katelin Teen

Last edited June 25, 2026

Expert Verified

A person facing a closed perceive-reason-act-observe-verify loop with control dials around it

TL;DR

A year ago the way to get more out of an AI model was to write a better prompt. That lever has quietly moved. The new craft is loop engineering: designing the loop an AI agent runs inside (perceive, reason, act, observe, repeat) instead of perfecting a single instruction. The unit of work shifted from "write a good prompt" to "design a good loop," and the levers are the tools the agent can call, when it's allowed to stop, what it remembers across turns, how it verifies its own work, and the guardrails that keep a runaway loop from doing damage.

This started in the coding-agent world (Simon Willison named the skill in late 2025, and Anthropic, LangChain, and Thoughtworks have all formalised it since), but the most useful place to see it is customer support, where a loop that resolves a ticket end-to-end has to triage, look things up, take an action, and check itself before it replies. Every property that makes a support AI trustworthy in production is a loop-engineering decision, not a prompt.

The practical takeaway: you can engineer that loop yourself, or use a platform where it's already built for support. That's exactly what eesel does, simulation for verification, confidence-based routing as the stopping condition, scoped actions as the guardrails, so you get the engineered loop without maintaining one.

So what is loop engineering, exactly?

Start with the thing being engineered. An AI agent, stripped to its essence, is "an LLM in a while loop with tools." It takes in some input, the model reasons about what to do, it calls a tool, it looks at the result, and it goes around again until the task is done or it hits a limit. That cycle is the AI agent loop, and it's the one feature that separates an agent from a chatbot: a chatbot answers in a single pass, an agent persists and adapts across many steps.

Loop engineering is the discipline of making that loop reliable. As Simon Willison put it when he named the practice:

"My preferred definition of an LLM agent is something that runs tools in a loop to achieve a goal. The art of using them well is to carefully design the tools and loop for them to use."
Simon Willison, Designing agentic loops (September 2025)

The clearest way to see where it fits is to look at how the craft has grown in layers. Prompt engineering came first: write one good instruction. Then context engineering: curate the whole set of tokens the model sees on each turn, not just the prompt. Loop engineering sits on top of both, it designs the runtime system around the model.

Three nested layers showing prompt engineering inside context engineering inside loop engineering, the unit of work getting bigger each time

Anthropic frames the middle layer as "the natural progression of prompt engineering," and the same logic carries up one more step:

"An agent running in a loop generates more and more data that could be relevant for the next turn of inference, and this information must be cyclically refined."
Anthropic, Effective context engineering for AI agents

So the relationship isn't competitive, it's nested. Prompt engineering optimizes a single call. Context engineering optimizes the state the model sees each turn. Loop engineering optimizes the machinery that decides whether the agent ever reaches a good state at all. If you've read our prompt engineering explainer, this is the next floor up.

Why the loop became the lever, not the prompt

For most of 2023 and 2024, the smartest thing you could do was learn to talk to the model. That worked because models answered in one shot. The moment they started running in loops, calling tools, and acting over many steps, the prompt stopped being the bottleneck. The thing most likely to break an agent now isn't a badly-worded instruction, it's a loop with no off-switch, no memory strategy, or no way to check its own work.

Solomon Hykes, the founder of Docker, captured the danger in one line that the whole field now quotes:

"An AI agent is an LLM wrecking its environment in a loop."
Solomon Hykes, via Simon Willison (AI Engineer World's Fair, June 2025)

That's the reframe. A more powerful model in a badly-designed loop is more dangerous, not less, because it executes its bad ideas more competently. The practitioner crowd worked this out fast. On Hacker News, one of the most-upvoted submissions on the topic is titled, flatly, "The canonical agent architecture: a while loop with tools," and another popular thread on the unreasonable effectiveness of an LLM agent loop is full of people surprised by how well a simple loop performs once the scaffolding around it is right.

LangChain put a clean equation on it: Agent = Model + Harness.

"Harness engineering is how we build systems around models to turn them into work engines. The model contains the intelligence and the harness makes that intelligence useful."
LangChain, The anatomy of an agent harness

Call it loop engineering, harness engineering, or agentic coding (the labels are still settling), the idea is the same: if you're not the model, you're the loop, and the loop is where the engineering now lives.

The five levers of a well-engineered loop

Read across the people defining this field, Willison, Anthropic, LangChain, Thoughtworks, and the loop decomposes into the same handful of levers every time. These are the dials you actually turn.

A central perceive-reason-act-observe loop with five control dials around it labelled tools, stopping conditions, context, verification, and guardrails

Tools (the agent-computer interface). What the agent can actually do. Anthropic spent more time optimizing tools than the prompt on their SWE-bench work, and coined the term ACI (agent-computer interface) as the agent's equivalent of a UI. Willison prefers plain shell commands for coding agents because "coding agents are really good at running shell commands."
Stopping conditions. When the loop is allowed to quit, on success, on a max-iteration cap, on a budget limit, or when it detects it's making no progress. A loop with no termination logic either declares victory early or never stops. Anthropic: "it's also common to include stopping conditions (such as a maximum number of iterations) to maintain control."
Context management. What survives across turns. The techniques here are compaction (summarize and restart near the window limit), note-taking (a NOTES.md the agent keeps outside its context), and sub-agents that each burn tens of thousands of tokens but return a tight 1,000 to 2,000 token summary. The reason it matters: recall degrades as the token count grows, so you have to actively curate (see our note on context window size).
Verification. How the loop proves it actually did the thing. This is the single most-repeated lever. Willison says a coding agent's value "is massively amplified by a good, cleanly passing test suite." Anthropic's harness for long-running agents uses a JSON feature list of 200+ end-to-end features, each marked passes: false, so the agent can't mark a feature done without proving it.
Guardrails. What stops the loop doing harm. Sandboxes, tightly scoped credentials, and budgets. Willison gave Claude Code its own Fly.io org with a $5 budget so a runaway loop couldn't spend real money.

When an agent misbehaves, the fix is almost always one of these five, not a reworded prompt. Here's the quick diagnostic I reach for:

Which lever is your flaky agent missing?

Pick the symptom you're seeing.

It confidently gives wrong answers

Verification + guardrails. The loop never checks itself. Add a confidence threshold (only act when sure) and a verification step that tests answers against real, known-good cases before they go live.

It loops forever and burns tokens

Stopping conditions. There's no off-switch. Add a max-iteration cap, a token or cost budget, and no-progress detection so the loop quits instead of spinning.

It forgets what it was doing on long tasks

Context management. The window filled up and recall decayed. Use compaction, external notes, or sub-agents that return short summaries instead of dumping everything back into context.

It can talk about the task but can't do it

Tools. The agent has no way to act. Give it real, well-documented action tools (lookups, writes, API calls) and design that interface as carefully as you would a UI.

It did something destructive

Guardrails. It had too much freedom. Run it in a sandbox, issue scoped credentials limited to test or staging, and set hard budget limits on anything that can spend money or change records.

Loop engineering, but for customer support

Here's the part most coding-focused coverage misses. The clearest real-world version of a loop engineer's job isn't building a coding CLI, it's running an AI agent on a live support queue, and I say that as someone who builds these agents for a living.

A support ticket is a near-perfect loop. The agent perceives the incoming message, reasons about intent, retrieves what it needs (order history, docs, account state), takes an action (a refund, a reset, a ticket update), verifies the result, and then either resolves or hands off. Anthropic singles out support as "a natural fit for more open-ended agents" precisely because the work needs both conversation and action, with success that "can be clearly measured through user-defined resolutions."

A support ticket flowing through a triage, retrieve, act, verify loop with a confidence gate that resolves or hands to a human

And every one of the five levers maps onto a support decision that buyers care about more than they realise. We've spent the last three-plus years putting AI agents on live support queues, and the thing that decides whether a rollout works is never the cleverness of the prompt. It's the loop. The sharpest version of this I've heard came from a CX lead at a DTC supplements brand on Gorgias and Shopify, running about 7,000 tickets a month, who told us on a call:

"The AI will never be able to answer 100% of the questions... I need an AI who is only handling the tickets that it's confident to handle and all the other ones, leave them alone."

That is a loop-engineering requirement in plain English. "Only handle what you're confident about" is a stopping condition and a guardrail fused together, and it was the single feature that decided their buying decision. A bot that tries to answer everything looks impressive in a demo and quietly torches trust in production. This is the same failure mode behind most AI chatbot problems: no confidence gate, so it answers when it shouldn't. It's also the line between a real agent and a rule-based chatbot, which can't make that judgment at all.

The verification lever has an exact support analogue too. In coding, you verify with a test suite. In support, you verify by simulating the agent against your own past tickets before it ever touches a customer, watching what it would have said, where coverage is thin, and what it gets wrong. That's the support-world equivalent of Anthropic's "200 features, all passes: false" discipline, and it's why we built simulation into eesel's helpdesk agent rather than asking teams to find out live.

eesel AI working inside Zendesk, drafting and triaging tickets in the helpdesk

The numbers back up why the engineering matters. In a 2026 benchmark report, Notch puts legacy chatbots at 10 to 25% resolution and agentic platforms (the ones that "connect directly to CRM, billing, and claims systems and execute") at 70 to 85% end-to-end resolution. The gap isn't model quality, every tier can call the same frontier models. The gap is whether someone engineered the loop around them. The report's sharpest line is a buying tip: the honest question to ask a vendor is "not what their resolution rate is, but what they count as resolved."

Loop lever	In a coding agent	In a support agent
Tools	Shell, file edits, tests	Refunds, lookups, ticket updates, KB search
Stopping condition	Task done / max iterations	Confidence threshold, then hand off
Context	Compaction, `NOTES.md`, sub-agents	Past tickets, help docs, account state
Verification	Cleanly passing test suite	Simulation against historical tickets
Guardrails	Sandbox, scoped creds, $5 budget	Ticket-type exclusions, action scoping, human-in-the-loop

You can also tune that loop in plain language instead of code, which is the part that makes loop engineering accessible to a support team rather than only to engineers.

Updating an eesel agent's behavior through a natural-language instruction in the dashboard chat

Build the loop yourself, or use one that's already built

Once you see support automation as loop engineering, the build-versus-buy question gets clearer. You can build the loop yourself on the raw Claude or OpenAI API, plenty of technical teams do, and the Claude Code best practices write-ups are a genuinely good place to learn the craft. But the harness is the hard part, and it's the part you have to keep maintaining. Anthropic's own long-running-agent work involved a 200-plus feature verification list, sub-agent orchestration, and compaction logic, and that's just to keep a coding agent on track. A production support loop adds confidence routing, multilingual handling, per-ticket-type rules, and clean human handoff on top.

One engineering lead at a crypto-hardware company, running a 300-plus article knowledge base, summed up the calculus after choosing to buy:

"We could try to write our own LLM application but we didn't want to invest our time into that. We wanted something that we would not have to maintain."
Karel, engineering lead at GENERAL BYTES (eesel customer)

That's the real trade. The loop is now the product, so the question is whether you want to be the one engineering and maintaining it, or whether you'd rather buy a loop that's already engineered for the support case. If you're evaluating the latter, our best AI helpdesk software roundup, the field of AI agent examples, and our pick of the best AI agents are a good map of who's done that engineering and how well.

Try eesel

eesel AI is, in the framing of this whole post, a loop that's already engineered for support. You plug it into your helpdesk (Zendesk, Freshdesk, Gorgias, Help Scout, and 100+ more), and it learns from your past tickets and docs on day one. The five levers come pre-built for the support case: simulation against your historical tickets is the verification step, confidence-based routing is the stopping condition, and scoped actions plus ticket-type exclusions are the guardrails, all configurable in plain language rather than code.

eesel AI helpdesk dashboard overview showing connected sources and agent activity

That's why a team like Gridwise saw eesel resolve 73% of tier-1 requests in the first month, with the loop running supervised first and earning autonomy as the simulation proved it safe. You get the engineered loop, and the verification to trust it, without standing up and maintaining your own. You can start free and simulate it against your own ticket history before a single customer sees it.

Frequently asked questions

What is loop engineering in AI?

Loop engineering is the practice of designing, tuning, and supervising the loop an autonomous AI agent runs inside (the perceive, reason, act, observe cycle it repeats until a goal is proven done) rather than hand-writing each prompt. It covers the tools the agent can call, when it stops, what it keeps in context, how it verifies its work, and the guardrails around it. It's the runtime layer on top of the basic AI agent loop.

How is loop engineering different from prompt engineering?

Prompt engineering optimizes a single instruction to the model. Loop engineering optimizes the whole machinery around the model across many turns. Anthropic frames context engineering as "the natural progression of prompt engineering," and loop engineering is the layer above that. If you're new to the basics, our prompt engineering guide is a good starting point before you move up the stack.

What are the parts of an AI agent loop?

A loop runs four stages on repeat: perceive (take in input), reason (decide what to do), act (call a tool), and observe (check the result), looping until it hits a stopping condition. The design levers around it are tools, stopping conditions, context management, verification, and guardrails. See how this differs from a one-shot bot in AI agents vs AI chatbots.

Do I need to learn loop engineering to use an AI support agent?

No. Loop engineering is what a platform like eesel's AI helpdesk agent does for you: confidence-based routing is the stopping condition, simulation against past tickets is the verification step, and scoped actions are the guardrails. You configure the behavior in plain language; the loop itself is already built. Compare the alternatives in our best AI helpdesk software roundup.

Is loop engineering the same as vibe coding?

Not quite. Vibe coding is letting an AI write code from loose natural-language prompts. Loop engineering is the disciplined craft of designing the runtime an agent operates in, which is what makes that coding actually reliable. Simon Willison even coined "vibe engineering" for the serious version. See it in action with agentic coding CLIs.

Put a well-engineered loop on your support queue

Simulate against past tickets, route by confidence, go live with control.

Book a demo Try for free

Share this article

Article by

Alicia Kirana Utomo

Kira is a writer at eesel AI with a Computer Science background and over a year of hands-on experience evaluating AI-powered customer service tools. She focuses on breaking down how helpdesk platforms and AI agents actually work so that support teams can make better buying decisions.