Loop engineering: the skill that quietly replaced prompt engineering

Alicia Kirana Utomo
Written by

Alicia Kirana Utomo

Katelin Teen
Reviewed by

Katelin Teen

Last edited June 25, 2026

Expert Verified
A person facing a closed perceive-reason-act-observe-verify loop with control dials around it

So what is loop engineering, exactly?

Start with the thing being engineered. An AI agent, stripped to its essence, is "an LLM in a while loop with tools." It takes in some input, the model reasons about what to do, it calls a tool, it looks at the result, and it goes around again until the task is done or it hits a limit. That cycle is the AI agent loop, and it's the one feature that separates an agent from a chatbot: a chatbot answers in a single pass, an agent persists and adapts across many steps.

Loop engineering is the discipline of making that loop reliable. As Simon Willison put it when he named the practice:

"My preferred definition of an LLM agent is something that runs tools in a loop to achieve a goal. The art of using them well is to carefully design the tools and loop for them to use."

The clearest way to see where it fits is to look at how the craft has grown in layers. Prompt engineering came first: write one good instruction. Then context engineering: curate the whole set of tokens the model sees on each turn, not just the prompt. Loop engineering sits on top of both, it designs the runtime system around the model.

Three nested layers showing prompt engineering inside context engineering inside loop engineering, the unit of work getting bigger each time
Three nested layers showing prompt engineering inside context engineering inside loop engineering, the unit of work getting bigger each time

Anthropic frames the middle layer as "the natural progression of prompt engineering," and the same logic carries up one more step:

"An agent running in a loop generates more and more data that could be relevant for the next turn of inference, and this information must be cyclically refined."

So the relationship isn't competitive, it's nested. Prompt engineering optimizes a single call. Context engineering optimizes the state the model sees each turn. Loop engineering optimizes the machinery that decides whether the agent ever reaches a good state at all. If you've read our prompt engineering explainer, this is the next floor up.

Why the loop became the lever, not the prompt

For most of 2023 and 2024, the smartest thing you could do was learn to talk to the model. That worked because models answered in one shot. The moment they started running in loops, calling tools, and acting over many steps, the prompt stopped being the bottleneck. The thing most likely to break an agent now isn't a badly-worded instruction, it's a loop with no off-switch, no memory strategy, or no way to check its own work.

Solomon Hykes, the founder of Docker, captured the danger in one line that the whole field now quotes:

"An AI agent is an LLM wrecking its environment in a loop."

Solomon Hykes, via Simon Willison (AI Engineer World's Fair, June 2025)

That's the reframe. A more powerful model in a badly-designed loop is more dangerous, not less, because it executes its bad ideas more competently. The practitioner crowd worked this out fast. On Hacker News, one of the most-upvoted submissions on the topic is titled, flatly, "The canonical agent architecture: a while loop with tools," and another popular thread on the unreasonable effectiveness of an LLM agent loop is full of people surprised by how well a simple loop performs once the scaffolding around it is right.

LangChain put a clean equation on it: Agent = Model + Harness.

"Harness engineering is how we build systems around models to turn them into work engines. The model contains the intelligence and the harness makes that intelligence useful."

Call it loop engineering, harness engineering, or agentic coding (the labels are still settling), the idea is the same: if you're not the model, you're the loop, and the loop is where the engineering now lives.

The five levers of a well-engineered loop

Read across the people defining this field, Willison, Anthropic, LangChain, Thoughtworks, and the loop decomposes into the same handful of levers every time. These are the dials you actually turn.

A central perceive-reason-act-observe loop with five control dials around it labelled tools, stopping conditions, context, verification, and guardrails
A central perceive-reason-act-observe loop with five control dials around it labelled tools, stopping conditions, context, verification, and guardrails
  1. Tools (the agent-computer interface). What the agent can actually do. Anthropic spent more time optimizing tools than the prompt on their SWE-bench work, and coined the term ACI (agent-computer interface) as the agent's equivalent of a UI. Willison prefers plain shell commands for coding agents because "coding agents are really good at running shell commands."
  2. Stopping conditions. When the loop is allowed to quit, on success, on a max-iteration cap, on a budget limit, or when it detects it's making no progress. A loop with no termination logic either declares victory early or never stops. Anthropic: "it's also common to include stopping conditions (such as a maximum number of iterations) to maintain control."
  3. Context management. What survives across turns. The techniques here are compaction (summarize and restart near the window limit), note-taking (a NOTES.md the agent keeps outside its context), and sub-agents that each burn tens of thousands of tokens but return a tight 1,000 to 2,000 token summary. The reason it matters: recall degrades as the token count grows, so you have to actively curate (see our note on context window size).
  4. Verification. How the loop proves it actually did the thing. This is the single most-repeated lever. Willison says a coding agent's value "is massively amplified by a good, cleanly passing test suite." Anthropic's harness for long-running agents uses a JSON feature list of 200+ end-to-end features, each marked passes: false, so the agent can't mark a feature done without proving it.
  5. Guardrails. What stops the loop doing harm. Sandboxes, tightly scoped credentials, and budgets. Willison gave Claude Code its own Fly.io org with a $5 budget so a runaway loop couldn't spend real money.

When an agent misbehaves, the fix is almost always one of these five, not a reworded prompt. Here's the quick diagnostic I reach for:

Which lever is your flaky agent missing?
Pick the symptom you're seeing.
Verification + guardrails. The loop never checks itself. Add a confidence threshold (only act when sure) and a verification step that tests answers against real, known-good cases before they go live.
Stopping conditions. There's no off-switch. Add a max-iteration cap, a token or cost budget, and no-progress detection so the loop quits instead of spinning.
Context management. The window filled up and recall decayed. Use compaction, external notes, or sub-agents that return short summaries instead of dumping everything back into context.
Tools. The agent has no way to act. Give it real, well-documented action tools (lookups, writes, API calls) and design that interface as carefully as you would a UI.
Guardrails. It had too much freedom. Run it in a sandbox, issue scoped credentials limited to test or staging, and set hard budget limits on anything that can spend money or change records.

Loop engineering, but for customer support

Here's the part most coding-focused coverage misses. The clearest real-world version of a loop engineer's job isn't building a coding CLI, it's running an AI agent on a live support queue, and I say that as someone who builds these agents for a living.

A support ticket is a near-perfect loop. The agent perceives the incoming message, reasons about intent, retrieves what it needs (order history, docs, account state), takes an action (a refund, a reset, a ticket update), verifies the result, and then either resolves or hands off. Anthropic singles out support as "a natural fit for more open-ended agents" precisely because the work needs both conversation and action, with success that "can be clearly measured through user-defined resolutions."

A support ticket flowing through a triage, retrieve, act, verify loop with a confidence gate that resolves or hands to a human
A support ticket flowing through a triage, retrieve, act, verify loop with a confidence gate that resolves or hands to a human

And every one of the five levers maps onto a support decision that buyers care about more than they realise. We've spent the last three-plus years putting AI agents on live support queues, and the thing that decides whether a rollout works is never the cleverness of the prompt. It's the loop. The sharpest version of this I've heard came from a CX lead at a DTC supplements brand on Gorgias and Shopify, running about 7,000 tickets a month, who told us on a call:

"The AI will never be able to answer 100% of the questions... I need an AI who is only handling the tickets that it's confident to handle and all the other ones, leave them alone."

That is a loop-engineering requirement in plain English. "Only handle what you're confident about" is a stopping condition and a guardrail fused together, and it was the single feature that decided their buying decision. A bot that tries to answer everything looks impressive in a demo and quietly torches trust in production. This is the same failure mode behind most AI chatbot problems: no confidence gate, so it answers when it shouldn't. It's also the line between a real agent and a rule-based chatbot, which can't make that judgment at all.

The verification lever has an exact support analogue too. In coding, you verify with a test suite. In support, you verify by simulating the agent against your own past tickets before it ever touches a customer, watching what it would have said, where coverage is thin, and what it gets wrong. That's the support-world equivalent of Anthropic's "200 features, all passes: false" discipline, and it's why we built simulation into eesel's helpdesk agent rather than asking teams to find out live.

eesel AI working inside Zendesk, drafting and triaging tickets in the helpdesk

The numbers back up why the engineering matters. In a 2026 benchmark report, Notch puts legacy chatbots at 10 to 25% resolution and agentic platforms (the ones that "connect directly to CRM, billing, and claims systems and execute") at 70 to 85% end-to-end resolution. The gap isn't model quality, every tier can call the same frontier models. The gap is whether someone engineered the loop around them. The report's sharpest line is a buying tip: the honest question to ask a vendor is "not what their resolution rate is, but what they count as resolved."

Loop leverIn a coding agentIn a support agent
ToolsShell, file edits, testsRefunds, lookups, ticket updates, KB search
Stopping conditionTask done / max iterationsConfidence threshold, then hand off
ContextCompaction, NOTES.md, sub-agentsPast tickets, help docs, account state
VerificationCleanly passing test suiteSimulation against historical tickets
GuardrailsSandbox, scoped creds, $5 budgetTicket-type exclusions, action scoping, human-in-the-loop

You can also tune that loop in plain language instead of code, which is the part that makes loop engineering accessible to a support team rather than only to engineers.

Updating an eesel agent's behavior through a natural-language instruction in the dashboard chat
Updating an eesel agent's behavior through a natural-language instruction in the dashboard chat

Build the loop yourself, or use one that's already built

Once you see support automation as loop engineering, the build-versus-buy question gets clearer. You can build the loop yourself on the raw Claude or OpenAI API, plenty of technical teams do, and the Claude Code best practices write-ups are a genuinely good place to learn the craft. But the harness is the hard part, and it's the part you have to keep maintaining. Anthropic's own long-running-agent work involved a 200-plus feature verification list, sub-agent orchestration, and compaction logic, and that's just to keep a coding agent on track. A production support loop adds confidence routing, multilingual handling, per-ticket-type rules, and clean human handoff on top.

One engineering lead at a crypto-hardware company, running a 300-plus article knowledge base, summed up the calculus after choosing to buy:

"We could try to write our own LLM application but we didn't want to invest our time into that. We wanted something that we would not have to maintain."

Karel, engineering lead at GENERAL BYTES (eesel customer)

That's the real trade. The loop is now the product, so the question is whether you want to be the one engineering and maintaining it, or whether you'd rather buy a loop that's already engineered for the support case. If you're evaluating the latter, our best AI helpdesk software roundup, the field of AI agent examples, and our pick of the best AI agents are a good map of who's done that engineering and how well.

Try eesel

eesel AI is, in the framing of this whole post, a loop that's already engineered for support. You plug it into your helpdesk (Zendesk, Freshdesk, Gorgias, Help Scout, and 100+ more), and it learns from your past tickets and docs on day one. The five levers come pre-built for the support case: simulation against your historical tickets is the verification step, confidence-based routing is the stopping condition, and scoped actions plus ticket-type exclusions are the guardrails, all configurable in plain language rather than code.

eesel AI helpdesk dashboard overview showing connected sources and agent activity
eesel AI helpdesk dashboard overview showing connected sources and agent activity

That's why a team like Gridwise saw eesel resolve 73% of tier-1 requests in the first month, with the loop running supervised first and earning autonomy as the simulation proved it safe. You get the engineered loop, and the verification to trust it, without standing up and maintaining your own. You can start free and simulate it against your own ticket history before a single customer sees it.

Frequently asked questions

What is loop engineering in AI?
Loop engineering is the practice of designing, tuning, and supervising the loop an autonomous AI agent runs inside (the perceive, reason, act, observe cycle it repeats until a goal is proven done) rather than hand-writing each prompt. It covers the tools the agent can call, when it stops, what it keeps in context, how it verifies its work, and the guardrails around it. It's the runtime layer on top of the basic AI agent loop.
How is loop engineering different from prompt engineering?
Prompt engineering optimizes a single instruction to the model. Loop engineering optimizes the whole machinery around the model across many turns. Anthropic frames context engineering as "the natural progression of prompt engineering," and loop engineering is the layer above that. If you're new to the basics, our prompt engineering guide is a good starting point before you move up the stack.
What are the parts of an AI agent loop?
A loop runs four stages on repeat: perceive (take in input), reason (decide what to do), act (call a tool), and observe (check the result), looping until it hits a stopping condition. The design levers around it are tools, stopping conditions, context management, verification, and guardrails. See how this differs from a one-shot bot in AI agents vs AI chatbots.
Do I need to learn loop engineering to use an AI support agent?
No. Loop engineering is what a platform like eesel's AI helpdesk agent does for you: confidence-based routing is the stopping condition, simulation against past tickets is the verification step, and scoped actions are the guardrails. You configure the behavior in plain language; the loop itself is already built. Compare the alternatives in our best AI helpdesk software roundup.
Is loop engineering the same as vibe coding?
Not quite. Vibe coding is letting an AI write code from loose natural-language prompts. Loop engineering is the disciplined craft of designing the runtime an agent operates in, which is what makes that coding actually reliable. Simon Willison even coined "vibe engineering" for the serious version. See it in action with agentic coding CLIs.

Share this article

Alicia Kirana Utomo

Article by

Alicia Kirana Utomo

Kira is a writer at eesel AI with a Computer Science background and over a year of hands-on experience evaluating AI-powered customer service tools. She focuses on breaking down how helpdesk platforms and AI agents actually work so that support teams can make better buying decisions.

Related Posts

All posts →
Abstract illustration of an AI agent loop: a perceive, reason, act, observe cycle with arrows looping around
AI agents

What is an AI agent loop? A plain-English guide

An AI agent loop is the perceive, reason, act, observe cycle that turns an LLM into an agent. Here is how it works, and why it matters for support.

Alicia Kirana UtomoAlicia Kirana UtomoJun 17, 2026
Claude Tag, Anthropic's AI teammate for Slack, illustrated banner
AI agents

What is Claude Tag? Anthropic's @Claude AI teammate for Slack

Claude Tag is Anthropic's AI teammate for Slack: tag @Claude in, hand off the work, and it runs async. Here's how it works, what it costs, and where it fits.

Rama Adi NugrahaRama Adi NugrahaJun 25, 2026
Best Gradient Labs alternatives 2026 hero banner
helpdesk

The 8 best Gradient Labs alternatives in 2026

Gradient Labs is excellent for regulated finance, but it's enterprise-only with no public price. Here are the 8 best Gradient Labs alternatives, ranked, with real pricing.

Kurnia Kharisma Agung SamiadjieKurnia Kharisma Agung SamiadjieJun 25, 2026
Gradient Labs alternatives 2026 hero banner
helpdesk

8 best Gradient Labs alternatives in 2026

Gradient Labs is a strong AI agent for regulated finance, but it isn't built for everyone. Here are 8 Gradient Labs alternatives, who each one is for, and real pricing.

Rama Adi NugrahaRama Adi NugrahaJun 25, 2026
Gradient Labs hero banner, AI customer operations for financial services
customer-support

Gradient Labs: what it is, how Otto works, and who it's for

A hands-on look at Gradient Labs, the AI customer operations agent built for financial services: how Otto works, pricing, real results, and who should actually use it.

Alicia Kirana UtomoAlicia Kirana UtomoJun 25, 2026
Illustration of a phased AI customer support rollout
Customer Support

How to implement AI in customer support: a step-by-step guide

A practical, phased guide to implementing AI in customer support: audit your knowledge, start in copilot mode, simulate on past tickets, then automate with confidence routing.

Riellvriany IndriawanRiellvriany IndriawanJun 25, 2026
Illustration of a roundup comparing the best CoSupport AI alternatives for 2026
Customer Support

The 8 best CoSupport AI alternatives for 2026

A hands-on look at the best CoSupport AI alternatives for 2026, from self-serve AI agents to enterprise platforms, with real pricing, pros, and cons.

Kurnia Kharisma Agung SamiadjieKurnia Kharisma Agung SamiadjieJun 24, 2026
Illustration of an AI customer support quality assurance review: a scorecard and a magnifying glass over support conversations
Customer Support

AI customer support quality assurance: how to actually trust your AI agent

AI support quality assurance is how you prove your AI agent answers well, not just often. Here's what to measure and how to QA before and after launch.

Riellvriany IndriawanRiellvriany IndriawanJun 19, 2026
Illustration for a roundup of the best AI for ecommerce customer service in 2026
helpdesk

What is the best AI for ecommerce customer service? (2026)

I ranked the best AI for ecommerce customer service in 2026, from Shopify-native helpdesks to AI you can layer on your existing stack, with real pricing and picks.

Riellvriany IndriawanRiellvriany IndriawanJun 19, 2026

Ready to hire your AI teammate?

Set up in minutes. No credit card required.

Get started free