How It Actually Works

Chapter 2 — How It Actually Works

_ □ ×

How It Actually Works

One Dev, Five Sessions, Zero Chaos

The best mental model: AI agents are like a very smart junior developer who never sleeps. They amplify skilled engineers. They don't replace them. An agent without a good engineer directing it produces mediocre code. An agent with a good engineer produces exceptional output at unprecedented speed. On a typical build day, I run 3-5 concurrent Claude Code sessions, each on a separate screen working on unrelated tasks. I'm not reviewing every line. I'm watching whether the agent's architectural decisions make sense, whether files are organized correctly, and whether the code matches the intent. It's supervision, not pair programming. One practical constraint: each session has a finite memory (called a "context window"), so I scope every task to fit a single session and start fresh when one runs too long, like shift changes at a hospital.

💬 JaanaDogan - Instant Message

JaanaDogan: "I gave Claude Code a description of the problem, it generated what we built last year in an hour."

Principal Engineer, Google · Jan 2026

The Control Loop

Every agentic coding session follows the same cycle. The developer defines the goal, the agent executes, and the human reviews. This is not “vibe coding.” Professionals maintain control at every step.

Plan

→ ↓

Execute

→ ↓

Verify

→ ↓

Human Review

💊 choice.exe — There Is No Spoon

_ □ ×

You've seen the control loop. Now you face a choice.

Vibe Coding vs. Agentic Coding

“Vibe coding” has become a buzzword, but people conflate it with agentic coding. They solve different problems. Vibe coding means tools like Lovable or Bolt, pure prompting without ever reading the code. Agentic coding means commanding AI agents the way you’d manage a dev team: you set the objective, define the constraints, review the output, and maintain oversight at every step. I learned this distinction the hard way. We once used Lovable to prototype an American Idol-style voting platform for a client in Jamaica. The prototype came together so fast that we charged only $3,000, assuming the remaining 20% would be easy. It was a disaster. The last 20% took far longer than the first 80%. Lovable gave us a beautiful demo, but production-grade code required an engineer in the loop at every step. That project taught me exactly where vibe coding ends and real engineering begins.

The Oversight Problem, and How to Scale It

The obvious question: if one engineer is running 3-5 parallel AI sessions, how can they actually review all that output? This is where the tooling matters. Agents can be configured to verify their own work (running tests, linting code, checking consistency) without asking for permission on every routine operation. Sub-agents can review each other’s output. CI pipelines run tests before anything merges. The robots help you watch the robots. My job is to design the system of checks that keeps quality high, not to read every line. That’s the same skill that makes a great engineering manager.

Write the Test First

Test-Driven Development is the single most effective workflow for agentic coding. Write the test first, let the agent code to pass it, and failures tell you exactly what’s wrong. The feedback loop is as tight as it gets. And the test-first constraint has a structural side effect: it pushes agents toward cleaner architecture, because code that’s easy to test tends to have clear boundaries and minimal coupling. I’ve tried every workflow variation with agents. TDD produces correct code faster than anything else.

AI Makes Weird Mistakes. So Do Humans.

A common objection: “LLM mistakes are bizarre and shocking in ways human mistakes aren’t.” True. Agents make thousands of small stupid mistakes, and you have to watch them. Even Opus 4.6, which is dramatically smarter than earlier models, still does confidently dumb things. But compare an agent’s output to the average junior developer’s first draft, and the agent is often better: more consistent style, fewer typos, better test coverage. The difference is that human mistakes feel familiar while AI mistakes feel alien. Both need code review. That’s where the control loop earns its keep. I rarely fix things by hand anymore. I correct through prompting and the agent fixes itself. The control loop exists precisely because nobody, human or AI, ships perfect code on the first try.

Gabe

That said, vibe coding tools have their place. We use Lovable to build free proof-of-concept prototypes for clients before writing production code. There's actually a powerful workflow that bridges both worlds. I call it the "moonwalk": vibe code a quick prototype to explore the problem space, then extract a detailed spec from it and throw the prototype away entirely. Rebuild against the spec using proper agentic coding with tests, architecture, and review. The prototype was never the product. It was research. It's often faster to build something twice (once quick and dirty, once properly) than to fix a messy prototype into production code.

Project Context: How the Agent Learns Your Standards

The agent reads a project instruction file (CLAUDE.md) at the start of every session. This file describes your stack, your conventions, your test commands, and your guardrails. No training required, no fine-tuning, no vendor lock-in. You update a text file and the agent follows your rules from the first line of code. A good CLAUDE.md is the single highest-leverage investment in agent productivity. Without one, the agent guesses. With one, it follows your team's patterns consistently.

Plan Mode: Human Approval Before Any Changes

Before writing code, a well-configured agent enters plan mode. It reads the relevant files, proposes an approach, and waits for the developer to approve before making any changes. The developer can redirect, ask questions, or reject the plan entirely. This is the "Human Review" step from the control loop, applied before the first line is written. It means the agent never goes off on its own for twenty minutes and comes back with something unusable.

Automation: Hooks, Subagents, and Quality Gates

Three features turn a chatty agent into a smooth workflow. Hooks run automatically on every agent action: linting after edits, running tests after changes, checking for security issues. Subagents handle focused subtasks like searching the codebase or analyzing a module. Pre-approved permissions let the agent run routine commands (tests, builds, formatting) without stopping to ask for permission on every step. Combined, these are the "robots watching robots" from the oversight section: automated checks that catch problems before they ever reach human review.

# Project: [Your App Name]

## Commands
- `[npm run dev]` — start local dev server on port 3000
- `[npm test]` — run full test suite
- `[npm run test -- path/to/file.test.ts]` — run a single test
- `[npm run lint]` — check code style
- `[npm run build]` — production build
- Check if dev server is already running before starting a new one

## Architecture
- Framework: [Next.js / Rails / Django / etc.]
- Language: [TypeScript / Python / etc.]
- Database: [PostgreSQL / MongoDB / etc.]
- Auth: [NextAuth / Clerk / custom JWT]
- Hosting: [Vercel / AWS / Railway]

## Project Structure
- `src/services/` — business logic (never in route handlers)
- `src/routes/` — API endpoints
- `src/components/` — UI components
- `src/lib/` — shared utilities
- `tests/` — mirrors src/ directory structure
[Customize paths for your stack]

## Conventions
- File names: kebab-case. Components: PascalCase
- All API routes return { data, error } response shape
- Never use `any` — define types for all data shapes
- Keep functions under 50 lines; extract when they grow
- New features require tests before merging

## Testing
- Run tests before every commit
- Unit tests for all service functions
- Integration tests for all API endpoints
- 80% minimum coverage on critical paths (auth, payments, data mutations)

## Security
- No hardcoded secrets — use environment variables
- Validate all user input at the API boundary
- Parameterized queries only — no SQL string concatenation
- All endpoints require auth unless explicitly listed as public

## Code Review Standards
- Every PR must pass CI before merge
- AI-generated code gets the same review rigor as human code
- Flag any new dependencies for team review

## Guardrails
- Don't modify CI/CD config without asking
- Don't add new dependencies without discussing alternatives
- Don't refactor code outside the scope of the current task
- Don't delete tests, even if they seem redundant

Want the full reference on Skills, Agents, Hooks, and MCP Servers?

See the Claude Code Cheat Sheet →