Engineering Notes

Engineering Notes

Thoughts and Ideas on AI by Muthukrishnan

Agent Loops Explained: The Simple Pattern Behind Useful AI Agents

24 Jun 2026

Every few months, AI gets a new phrase that suddenly shows up everywhere.

Lately, that phrase is agent loop.

You will see people describe agent loops as the future of software, the thing that unlocks autonomy, or the missing ingredient behind useful AI agents. The idea is much less exotic than that.

An agent loop is a repeating cycle:

flowchart LR
    Think[Think] --> Act[Act]
    Act --> Observe[Observe]
    Observe --> Think
  

Think. Act. Observe. Repeat.

That is most of it. The important change is not that loops are new. The change is that language models can now take part in the reasoning step inside the loop.

The difference between an LLM and an agent

A normal LLM call is usually one pass.

A user asks a question. The model writes an answer. The interaction ends.

flowchart TD
    UserQuestion[Question] --> LLM[LLM]
    LLM --> Answer[Answer]
  

That works when the model already has enough context. If you ask it to summarize text, draft an email, explain a concept, or transform something already in the prompt, one pass may be fine.

It starts to break down when the task needs outside information or intermediate decisions.

For example:

Those tasks cannot usually be solved by answering from the prompt alone. The model has to inspect something, use the result, decide what to do next, and repeat.

That is where the agent loop appears.

flowchart TD
    Goal[User goal] --> Reason1[Reason about what is needed]
    Reason1 --> Tool1[Use a tool]
    Tool1 --> Observe1[Observe result]
    Observe1 --> Reason2[Reason again]
    Reason2 --> Tool2[Use another tool]
    Tool2 --> Observe2[Observe result]
    Observe2 --> Decision{Enough information?}
    Decision -- No --> Reason2
    Decision -- Yes --> Final[Final answer]
  

The system keeps cycling until the work is done, blocked, or stopped by a limit.

A real example

Suppose a user asks:

What are the top five open Jira bugs affecting enterprise customers?

A plain LLM cannot know that from training data. The answer lives across Jira, customer records, support tickets, escalation notes, and maybe recent incident reports.

An agent might handle it like this:

sequenceDiagram
    participant User
    participant Agent
    participant Jira
    participant CRM
    participant Reports

    User->>Agent: Find the top five open bugs affecting enterprise customers
    Agent->>Jira: Search open bugs
    Jira-->>Agent: Matching tickets
    Agent->>Jira: Read candidate tickets
    Jira-->>Agent: Ticket details and severity
    Agent->>CRM: Match affected accounts to enterprise customers
    CRM-->>Agent: Customer tier and account impact
    Agent->>Reports: Check incidents and escalation history
    Reports-->>Agent: Business impact signals
    Agent->>Agent: Rank by severity and impact
    Agent-->>User: Summary report with top five bugs
  

Each step changes what the agent should do next. The Jira search tells it which tickets to read. The ticket details tell it which customers to check. The customer data changes the ranking. The escalation history changes the final answer.

That is the loop doing useful work.

Why people care now

The pattern itself is old.

What changed is that LLMs can now do some of the reasoning that humans used to do between tool calls.

Older workflows often looked like this:

flowchart LR
    H1[Human decides] --> T1[Tool]
    T1 --> H2[Human interprets]
    H2 --> T2[Tool]
    T2 --> H3[Human decides again]
  

After every result, a person had to decide what to do next.

With an AI agent, the loop can look more like this:

flowchart LR
    L1[LLM reasons] --> T1[Tool]
    T1 --> L2[LLM interprets result]
    L2 --> T2[Tool]
    T2 --> L3[LLM synthesizes answer]
  

The reasoning is not fully automated, and it should not be treated as if it is. But enough of it can be automated that the system can handle longer tasks than a single prompt ever could.

That still does not make the agent magic. It needs tools, permissions, context, stopping conditions, and a way to check whether it succeeded. The loop is just the skeleton. Most of the real engineering lives around it.

Agent loops are not new

Software engineers have seen this pattern for years.

Debugging is a loop:

flowchart TD
    ReadCode[Read code] --> RunProgram[Run program]
    RunProgram --> ObserveError[Observe error]
    ObserveError --> ModifyCode[Modify code]
    ModifyCode --> RunProgram
  

Quality assurance is a loop:

flowchart TD
    TestFeature[Test feature] --> ObserveResult[Observe result]
    ObserveResult --> Decision{Bug found?}
    Decision -- Yes --> ReportDefect[Report defect]
    ReportDefect --> Retest[Retest]
    Retest --> ObserveResult
    Decision -- No --> Done[Accept result]
  

Customer support is a loop:

flowchart TD
    ReadTicket[Read ticket] --> NeedInfo{Need more information?}
    NeedInfo -- Yes --> AskQuestion[Ask customer a question]
    AskQuestion --> ReceiveResponse[Receive response]
    ReceiveResponse --> Investigate[Investigate]
    NeedInfo -- No --> Investigate
    Investigate --> Reply[Reply]
  

The pattern was already everywhere. The new part is that software can now perform some of the reasoning step instead of waiting for a human at every turn.

The smallest possible agent loop

Most agent frameworks reduce to something almost boring:

while not task_complete:
    think()
    act()
    observe()

Whether you use LangGraph, the OpenAI Agents SDK, Claude Code, or your own loop, the basic shape is the same.

A practical version might look like this:

for step in range(max_steps):
    response = llm(messages)

    if response.type == "final":
        return response.answer

    if response.type == "tool_call":
        result = run_tool(response)
        messages.append(result)

The model decides whether it needs more information. If it does, it asks for a tool call. The application runs the tool and feeds the result back into the model. The cycle continues until the model can answer or the system stops it.

stateDiagram-v2
    [*] --> CallModel
    CallModel --> FinalAnswer: final response
    CallModel --> ExecuteTool: tool call
    ExecuteTool --> AppendObservation: tool result
    AppendObservation --> CallModel: continue loop
    FinalAnswer --> [*]
  

Frameworks add memory, retries, tracing, approvals, streaming, parallel tool calls, and graph-based state transitions. Those features matter. Underneath them, the same loop is still there.

A project you can build in an afternoon

A Jira investigator is a good first agent because the loop is easy to see.

A user asks:

Why is customer login failing?

The agent might work through the problem like this:

flowchart TD
    Start[User asks why login is failing] --> SearchJira[Search Jira]
    SearchJira --> ReadIncidents[Read matching incidents]
    ReadIncidents --> SearchDeployments[Search recent deployments]
    SearchDeployments --> ReadLogs[Read error logs]
    ReadLogs --> Explain[Generate explanation]
    Explain --> Done[Return answer with evidence]
  

Each step narrows the investigation:

  1. Search Jira for related incidents.
  2. Read matching tickets.
  3. Search recent deployments.
  4. Read error logs.
  5. Generate an explanation.

The code can be small. The hard parts are connecting reliable tools, handling permissions, and deciding what evidence is enough for the final answer.

The hard part

Social media often treats the loop itself as the breakthrough.

In practice, the loop is the easy part. The difficult engineering starts once the loop exists.

When should it stop?

Without limits, an agent can keep going forever. A production agent needs stopping rules: maximum steps, timeouts, confidence thresholds, completion criteria, or explicit handoff rules.

How many iterations are allowed?

Every pass through the loop costs time and money. A ten-step investigation may be useful. A hundred-step investigation may be slow, expensive, and hard to audit.

What happens when a tool fails?

External systems are messy. APIs time out. Search results are incomplete. Permissions fail. Agents need recovery strategies, not just happy-path tool calls.

How do you prevent bad actions?

An agent can call the wrong tool, pass invalid arguments, or misunderstand a result. Tool schemas, validation, constrained action spaces, and human approval gates reduce that risk.

How do you verify success?

Many tasks need proof, not just a confident final message. It is not enough for an agent to say it completed a migration, updated a record, or found the root cause. The system needs checks that verify the outcome.

How do multiple agents coordinate?

As systems grow, specialized agents may need to collaborate. Without clear ownership, shared state, and communication rules, multi-agent systems can create more confusion than leverage.

This is where most production engineering time goes.

mindmap
  root((Production agent loop))
    Stopping
      Max steps
      Timeouts
      Completion criteria
    Reliability
      Tool failures
      Retries
      Fallbacks
    Safety
      Validation
      Permissions
      Human approval
    Verification
      Tests
      Evidence
      Success checks
    Coordination
      Shared state
      Ownership
      Handoffs
  

Why agent loops matter

Agent loops matter because language models can now participate in them.

For decades, humans supplied the reasoning between actions. Now software can handle part of that reasoning, which lets machines run workflows that used to need constant human supervision.

The loop is still simple:

flowchart TD
    Goal[Goal] --> Reason[Reason]
    Reason --> Act[Act]
    Act --> Observe[Observe]
    Observe --> Continue{Complete?}
    Continue -- No --> Reason
    Continue -- Yes --> Result[Result]
  

The opportunity is figuring out which useful work fits that shape.

Once you notice the pattern, you start seeing it everywhere: business processes, engineering workflows, support operations, research tasks.

The future of agents probably will not come from inventing a brand-new loop. It will come from mapping more real work onto the loops we already understand, then doing the unglamorous engineering that makes those loops reliable.