2026-05-09

AI Agent Loop Architecture: When to Use ReAct vs. Plan-Execute vs. Reflexion

Three loop shapes power most production agents in 2026. Picking the wrong one wastes 5x the tokens. Here is how to choose.

#ai
#agents
#architecture

AI Agent Loop Architecture: When to Use ReAct vs. Plan-Execute vs. Reflexion

TL;DR

Three loop shapes dominate production agent code in 2026: ReAct (think-act-observe), Plan-Execute (split planner + worker), and Reflexion (self-critique on failure). They are not interchangeable. Pick the wrong one for your task and you burn 5x the tokens to land in the same place — or worse, never land at all.

This post is the cheat sheet we use at Qyndex when wiring a new agent into the supervisor graph. It is a free PDF version too — link at the end.

Section 1 — The three shapes, side-by-side

ReAct (Reason + Act)

The default. The model alternates between thoughts and tool calls, with each tool's result fed back as a new observation.

# Conceptual ReAct loop. Real implementations live in
# packages/agents/qyndex_agents/_react.py.
def react_loop(task: str, tools: list[Tool], max_steps: int = 10) -> str:
    history: list[Step] = []
    for step in range(max_steps):
        thought, action = llm.next_step(task=task, history=history)
        if action.kind == "final":
            return action.output
        observation = tools[action.name].run(action.args)
        history.append(Step(thought, action, observation))
    raise StepBudgetExceeded()

Properties:

One model call per step. Cheap to start, expensive at depth.
Linear thinking. No backtracking; if step 4 was wrong, the whole tail is wrong.
Easy to debug. The history is a flat trace.

Use ReAct when:

The task fits comfortably in 5–10 tool calls.
The tools are well-typed and the model rarely needs to "rethink".
Latency matters: ReAct stops the moment it has an answer.

Plan-Execute

A two-phase shape. A planner model produces a structured DAG of steps; a (cheaper) executor model runs each step in turn, only escalating to the planner if a step fails.

# Conceptual Plan-Execute loop.
def plan_execute(task: str, tools: list[Tool]) -> str:
    plan = planner_llm.plan(task)         # one big call (Sonnet/Opus)
    results: dict[str, str] = {}
    for step in plan.steps:
        try:
            results[step.id] = executor_llm.run(step, tools, results)  # cheap (local Llama)
        except StepFailed as e:
            plan = planner_llm.replan(plan, step.id, e)
    return results[plan.terminal_step_id]

Properties:

Costs are bimodal. One expensive plan + N cheap executions.
Backtracking is structural — replan is a first-class operation.
The plan is auditable. A human can read the DAG before execution.

Use Plan-Execute when:

The task is multi-stage and the stages are mostly known up front (research → outline → draft → review).
You can afford one expensive call (Sonnet, Opus) but want the bulk to run on a local 70B.
An audit trail of "here is what we are about to do" is required before tool calls land.

Reflexion

A meta-loop on top of either ReAct or Plan-Execute. The agent runs, self-critiques the output against a rubric, and re-runs if the critique fails — capped at N iterations.

# Conceptual Reflexion loop, wrapping any inner agent.
def reflexion(task: str, inner: Agent, rubric: Rubric, max_iters: int = 3) -> str:
    history: list[str] = []
    for iter in range(max_iters):
        attempt = inner.run(task, prior_attempts=history)
        critique = critic_llm.score(attempt, rubric)
        if critique.passed:
            return attempt
        history.append(f"Iter {iter}: {critique.failure_reason}")
    raise QualityGateFailed(history=history)

Properties:

Cost grows with quality bar. Each retry doubles spend.
Failures are visible. The history is a debugging gold mine.
Best for offline workloads — Reflexion is too slow for user-facing chat.

Use Reflexion when:

Quality is non-negotiable (publishing content, code that ships, legal text).
You have a judgeable rubric — vague "is it good?" rubrics make the critic LLM a coin-flip.
The 3-iteration cap matches your latency budget.

Section 2 — The decision tree

flowchart TD
    A[New agent task] --> B{Latency-sensitive?}
    B -->|Yes < 5s| C[ReAct]
    B -->|No| D{Multi-stage with known structure?}
    D -->|Yes| E[Plan-Execute]
    D -->|No| C
    C --> F{Quality bar above 'best effort'?}
    E --> F
    F -->|Yes| G[Wrap in Reflexion]
    F -->|No| H[Ship as-is]
    G --> H

The branches we care about:

Latency budget. Reflexion is dead on arrival for a chat UI; ReAct + Reflexion would make a "send" button take 12 seconds. For anything user-blocking, pick ReAct and stop.
Task structure. If the task naturally decomposes into a DAG (research → outline → draft → publish), Plan-Execute saves money by routing the bulk to a cheap model. If the task is exploratory (debug this stack trace, find the relevant docs), ReAct's flat history is a better fit.
Quality bar. If the output is going to a human reviewer, you probably do not need Reflexion — humans are the critic. If the output ships unchecked (auto-published content, code merged on green CI), Reflexion is the gate that earns its keep.

Section 3 — How Qyndex wires this in production

Our supervisor graph (Architecture §8) is a hybrid:

Researcher is ReAct. Fast; the tools are 5–6 search/fetch calls and the depth is bounded.
Strategist + Writer are Plan-Execute. The Strategist outputs a campaign DAG (one Sonnet call); the Writer fans it out across pillars using a local 70B.
QC is Reflexion. The 7-rubric gate has a hard "score >= 85" pass bar, with a 3-iteration cap and model-tier escalation (Sonnet -> Opus on iter 3). On iter-3 fail the bundle becomes a human review item.

The implementation lives in packages/agents/qyndex_agents/. The Reflexion wrapper is in _qc_loop.py; it is dispatcher-agnostic so swapping QC's inner agent from ReAct to Plan-Execute is a one-line change.

Cost numbers (real, not synthetic)

For a single Qyndex content campaign (one researched insight -> ~12 platform variants):

Loop shape	Avg tokens	Avg cost (USD)	p95 latency
ReAct only	480k	$1.40	28s
Plan-Execute	410k	$0.65	22s
+ Reflexion (QC)	510k	$0.95	41s

Plan-Execute saves 54% on cost vs. ReAct because the bulk runs on a local Llama-3.3-70B; Reflexion adds 46% back but catches the ~8% of campaigns that would have shipped with brand-voice drift.

Without the Source Verification Playbook backstopping the QC critic, Reflexion's gains evaporate: a critic that scores "is this true?" without HEAD-checking the citations will pass hallucinated bundles every time.

Closing

Pick the loop that matches your latency budget first, then your task structure, then your quality bar. ReAct for speed, Plan-Execute for structure, Reflexion for quality. Mixing all three in the same graph is fine — the supervisor decides which agent gets which shape.

If you want the side-by-side decision tree as a printable PDF, grab the gated download — same content, formatted for the engineering-doc-folder shelf.

Drafted by AI agents, reviewed by Shravan.