LLMs and the Control Flow Problem: Why AI Agents Fail at the Unexpected

Dec 18, 2025 · 3 min read

_ □ ×

OpenAI-style transformers hit 99.7% accuracy on a problem harder than most benchmarks, and the reason why exposes a flaw in every AI agent today.

The Math Misconception

There’s a common narrative: “LLMs are bad at math.” People point to failures on arithmetic and conclude that language models lack numerical reasoning.

But here’s what’s strange: LLMs can solve incredibly complex mathematical reasoning problems. They can prove theorems, derive equations, and work through multi-step proofs.

So what’s actually going on?

It’s Not Math. It’s Control Flow.

The insight from recent research is that LLMs don’t struggle with math. They struggle with control flow.

Control flow means:

Loops that run an arbitrary number of times
Branches that depend on runtime values
Recursion with variable depth
State that accumulates across steps

These are things that traditional computers are perfect at. You write a loop, it executes exactly as many times as needed. No matter if that’s 3 times or 3 million.

Transformers don’t work this way. They process fixed-length sequences in fixed-depth passes. There’s no native loop construct. Every “iteration” has to be somehow encoded in the forward pass.

Why This Matters for AI Agents

AI agents (Claude Code, Cursor, and other agentic systems) are essentially LLMs trying to do control flow.

Think about what an agent does:

Look at the current state
Decide what action to take
Execute the action
Check if the goal is achieved
If not, go to step 1

This is a loop. And loops are exactly what transformers struggle with.

That’s why you see agents:

Going in circles (loop that doesn’t terminate)
Missing exit conditions (loop runs too long)
Giving up too early (loop doesn’t run enough)
Losing track of progress (state management across iterations)

The 99.7% Problem

The research I referenced showed transformers achieving 99.7% accuracy on certain numerical tasks, specifically tasks that could be solved with pattern matching and a fixed number of reasoning steps.

But change the problem slightly (make it require a variable number of steps based on the input) and accuracy collapses.

It’s not that the model can’t do the reasoning. It’s that it can’t reliably determine how much reasoning to do.

What This Means Practically

When working with AI agents, keep in mind:

Tasks that work well:

Fixed-step procedures
Pattern matching
Transformation with known structure
Analysis with bounded scope

Tasks that struggle:

“Keep going until done”
Variable-depth recursion
Finding something that might not exist
Complex state management across many steps

Mitigations That Help

There are ways to work around the control flow problem:

Bounded iterations. Instead of “fix all bugs,” try “check for and fix up to 3 bugs.”
Explicit state management. Give the agent a scratchpad and ask it to track progress explicitly.
Checkpoints. Break long tasks into phases with human verification between them.
Structured output. Ask for explicit loop tracking: “iteration 1 of N, current state: X, next action: Y.”
Early termination signals. Give clear criteria for when to stop.

The Architectural Limitation

This isn’t something that can be fixed with more training or better prompts. It’s a fundamental architectural limitation of transformers.

Some research directions that might help:

Looped transformers that can iterate internally
Hybrid architectures with explicit control flow primitives
Tool use for algorithmic tasks (let the LLM decide WHAT to do, let code handle HOW MANY times)

But for now, the limitation is real, and understanding it helps you work with AI agents more effectively.

The Practical Takeaway

When an AI agent fails, ask yourself: “Is this fundamentally a control flow problem?”

If the task requires:

Determining how many iterations to perform
Managing state across an unknown number of steps
Searching until a condition is met

Then you’re fighting the architecture. Consider restructuring the task, adding human checkpoints, or using tools for the parts that require reliable iteration.

The AI isn’t stupid. It’s just not built for loops.

Understanding this distinction will save you hours of frustration and help you design better human-AI workflows.

More Thoughts

AI Agents as Force Multipliers: Why Developers Are More Valuable Than Ever

_ □ ×

AI Agents as Force Multipliers: Why Developers Are More Valuable Than Ever

Sep 18, 2025 · 2 min read

AI coding agents amplify expertise rather than replace it. The fundamentals of software engineering matter more than …

AI Adoption in Legal Tech: What I Learned at a Legal Conference

_ □ ×

AI Adoption in Legal Tech: What I Learned at a Legal Conference

Jul 18, 2025 · 3 min read

I walked into a legal conference ready for some serious eye-rolls about AI. Boy, was I wrong. The legal industry is …

Panel of Experts Prompting: Using AI Like a Boardroom, Not an Intern

_ □ ×

Panel of Experts Prompting: Using AI Like a Boardroom, Not an Intern

May 25, 2025 · 2 min read

Most people prompt ChatGPT like it's an intern. Here's a technique to use it like a boardroom of experts, and get …

Django in the LLM-Native Era: Why It's the Perfect AI-Friendly Framework

_ □ ×

Django in the LLM-Native Era: Why It's the Perfect AI-Friendly Framework

May 18, 2025 · 3 min read

Django's conventions, batteries-included approach, and 20-year codebase make it ideal for AI-assisted development. …