Welcome to the Agentic Coding Setup Wizard
One engineer, multiple AI sessions, 2x the output. This is how it works, from someone who ships this way every day.
I've been building software for 15+ years, from SaaS platforms used by thousands to leading a product through acquisition. When agentic coding arrived, I was skeptical. Then it changed how I work entirely. This guide is the playbook I wish I'd had: real stories, real mistakes, and the system I use daily to ship 2x faster with AI agents.
- Feature requests pile up faster than the team can ship
- 2-week sprint cycles feel slow
- Your highest-paid engineers are buried in implementation, not strategy
- AI handles the grunt work autonomously
- Features ship in days, not sprints
- Your team focuses on architecture & product thinking
- Learn where AI prototyping ends and real engineering begins (with a real project disaster)
- Calculate what AI-augmented output is worth for your team with the ROI calculator
- Score your codebase readiness with an interactive assessment
- Get the 90-day rollout plan I use with clients, not a generic template
- See the honest limitations, including what agentic coding still can’t do
What Is Agentic Coding?
Autocomplete Is Obsolete
Most teams using AI today are still in the autocomplete era. Their developers get single-line suggestions while typing. Useful, but not transformational. Agentic AI is a different category entirely: it writes entire features, reviews pull requests, and runs 10-15 concurrent coding sessions.

I was skeptical when I first tried these tools. I'd shipped everything from SaaS platforms used by thousands of students to ad tech tools to a podcasting platform (Computer Engineering at UPR Mayagüez, MS from Georgia Tech). I'd seen plenty of "this changes everything" tools come and go. But when I moved from Cursor's early agents to Claude Code, something actually shifted. For the first time, I wasn't babysitting a context window or prompting file by file. The agent understood my codebase and made real architectural decisions. Not perfect, but no longer a toy.
- Suggests the next line of code as you type
- Developer does 95% of the work, AI fills gaps
- One developer = one task at a time
- Speeds up typing, not thinking
- No context beyond the current file
- Writes entire features from a description
- AI does 70-80% of implementation, developer guides and reviews
- One developer manages 5-15 parallel AI sessions
- Handles the 'how' so developers focus on 'what' and 'why'
- Understands your entire codebase, tests, and conventions
Signs Your Team Is Still in the Autocomplete Era
- Your developers use Copilot only for code completion
- AI tools aren't part of your code review process
- Nobody on the team has tried Claude Code, Cursor Agent, or similar tools
- Your CI/CD pipeline wasn't designed for AI-generated code
- Developers still context-switch between one task at a time
- You measure productivity by lines of code, not features shipped
How It Actually Works
It's Not Magic
The best mental model: AI agents are like a very smart junior developer who never sleeps. They amplify skilled engineers. They don't replace them. An agent without a good engineer directing it produces mediocre code. An agent with a good engineer produces exceptional output at unprecedented speed. On a typical build day, I run 5-6 concurrent Claude Code sessions, each on a separate screen working on unrelated tasks. I'm not reviewing every line. I'm watching whether the agent's architectural decisions make sense, whether files are organized correctly, and whether the code matches the intent. It's supervision, not pair programming. One practical constraint: each session has a finite memory (called a "context window"), so I scope every task to fit a single session and start fresh when one runs too long, like shift changes at a hospital.
Principal Engineer, Google · Jan 2026
The Control Loop
Every agentic coding session follows the same cycle. The developer defines the goal, the agent executes, and the human reviews. This is not "vibe coding." Professionals maintain control at every step.
Vibe Coding vs. Agentic Coding
"Vibe coding" has become a buzzword, but people conflate it with agentic coding. They solve different problems. Vibe coding means tools like Lovable or Bolt, pure prompting without ever reading the code. Agentic coding means commanding AI agents the way you'd manage a dev team: you set the objective, define the constraints, review the output, and maintain oversight at every step. I learned this distinction the hard way. We once used Lovable to prototype an American Idol-style voting platform for a client in Jamaica. The prototype came together so fast that we charged only $3,000, assuming the remaining 20% would be easy. It was a disaster. The last 20% took far longer than the first 80%. Lovable gave us a beautiful demo, but production-grade code required an engineer in the loop at every step. That project taught me exactly where vibe coding ends and real engineering begins.
The Oversight Problem, And How to Scale It
The obvious question: if one engineer is running 5-6 parallel AI sessions, how can they actually review all that output? This is where the tooling matters. Agents can be configured to verify their own work (running tests, linting code, checking consistency) without asking for permission on every routine operation. Sub-agents can review each other's output. CI pipelines run tests before anything merges. The robots help you watch the robots. My job is to design the system of checks that keeps quality high, not to read every line. That's the same skill that makes a great engineering manager.
AI Makes Weird Mistakes. So Do Humans.
A common objection: "LLM mistakes are bizarre and shocking in ways human mistakes aren't." True. Agents make thousands of small stupid mistakes, and you have to watch them. Even Opus 4.5, which is dramatically smarter than earlier models, still does confidently dumb things. But compare an agent's output to the average junior developer's first draft, and the agent is often better: more consistent style, fewer typos, better test coverage. The difference is that human mistakes feel familiar while AI mistakes feel alien. Both need code review. I rarely fix things by hand anymore. I correct through prompting and the agent fixes itself. The control loop exists precisely because nobody, human or AI, ships perfect code on the first try.

That said, vibe coding tools have their place. We use Lovable to build free proof-of-concept prototypes for clients before writing production code. There's actually a powerful workflow that bridges both worlds. I call it the "moonwalk": vibe code a quick prototype to explore the problem space, then extract a detailed spec from it and throw the prototype away entirely. Rebuild against the spec using proper agentic coding with tests, architecture, and review. The prototype was never the product. It was research. It's often faster to build something twice (once quick and dirty, once properly) than to fix a messy prototype into production code.
Project Context and Planning
Agents read instruction files (like CLAUDE.md) that describe your codebase architecture, conventions, and testing requirements. This is how the agent "learns" your standards without training. Before writing code, a well-configured agent enters plan mode: it reads relevant files, proposes an approach, and waits for the developer to approve before touching anything. These two features, project context and plan mode, are what separate a useful agent from a loose cannon. For the full technical setup, see the cheat sheet linked below.
Subagents, Hooks, and Automation
Developers can delegate focused tasks to lightweight subagents, smaller and cheaper models with restricted permissions that can analyze code but can't modify it. Meanwhile, hooks automatically lint, test, or validate code every time the agent writes a file. Combined with pre-approved permissions for routine commands, these features turn a chatty agent into a smooth automated workflow. This is the "robots watching robots" principle from the oversight section, made concrete.
# Project: [Your App Name]
## Architecture
- Framework: [Next.js / Rails / Django / etc.]
- Language: [TypeScript / Python / Go]
- Database: [PostgreSQL / MongoDB]
- Hosting: [AWS / Vercel / GCP]
## Conventions
- Use kebab-case for file names, PascalCase for components
- All API routes return { data, error } shape
- Business logic lives in /src/services/, not in route handlers
- Never use `any` type. Define interfaces for all data shapes
## Testing
- Run `npm test` before committing
- Unit tests required for all service functions
- Integration tests required for all API endpoints
- Minimum 80% coverage on critical paths
## Security
- Never hardcode secrets. Use environment variables
- Validate all user input at the API boundary
- Use parameterized queries. No string concatenation for SQL
- All endpoints require authentication unless explicitly public
## Code Review Standards
- Every PR must pass CI before merge
- AI-generated code gets the same review rigor as human code
- Flag any new dependencies for team review
Want the full reference on Skills, Agents, Hooks, and MCP Servers?
See the Claude Code Cheat Sheet →What Your Team Needs to Change
The Infrastructure Tax
Agentic coding doesn't just work on any codebase. I've assessed teams where the codebase simply wasn't ready: no tests, no typing, no conventions. In those cases, I start by creating basic CLAUDE.md files and setting up automated PR reviews and security scanning with Claude. These entry points help the team benefit from AI agents without requiring developers to change their entire workflow overnight. The investment you make in code quality directly multiplies your AI productivity.
Creator of XP/TDD · 2025
Why Good Practices Are Now Infrastructure
Before AI agents, test coverage was a best practice. Now it's infrastructure. An agent that can run tests after every change catches its own mistakes and iterates to a correct solution. Without tests, the agent writes code that looks right but may not work, and you won't know until production.
Code Review Needs Redesign
When AI generates 70-80% of the code, your review process needs to change. Developers shift from reviewing each other's line-by-line changes to evaluating AI output for correctness, security, and architectural fit. That's a different skill, and your team needs to develop it intentionally.
Security Is Non-Negotiable
AI-generated code needs both automated security scanning and human oversight. Agents can inadvertently introduce vulnerabilities that pass tests but create attack surfaces. Automated SAST/DAST tools plus human security review create the right safety net.
Skill Composition Creates Emergent Capabilities
When you combine strong typing, comprehensive tests, clear architecture docs, and fast CI, the agent becomes dramatically more capable than with any single practice. These improvements are multiplicative, not additive. A well-prepared codebase makes the agent 5-10x more effective than a messy one.
Is Your Codebase AI-Ready?
- Test coverage above 80% on critical paths
- Type system in use (TypeScript, Python type hints, etc.)
- CI pipeline runs in under 10 minutes
- Code follows consistent naming conventions
- Architecture is documented (even briefly)
- Linting and formatting are automated
- Dependencies are up to date (within 6 months)
- No hardcoded secrets in the repository
- Clear separation of concerns (not a monolithic tangle)
- API contracts are well-defined
## PR Review Checklist — AI-Generated Code
### Correctness
- [ ] Does the code actually solve the stated problem?
- [ ] Are edge cases handled (null, empty, overflow)?
- [ ] Do all tests pass, including new ones?
- [ ] Is the logic correct, not just plausible-looking?
### Security
- [ ] No hardcoded secrets or credentials
- [ ] User input is validated and sanitized
- [ ] No SQL injection, XSS, or CSRF vulnerabilities
- [ ] Authentication/authorization checks in place
- [ ] SAST scan passes clean
### Architecture
- [ ] Follows existing patterns in the codebase
- [ ] No unnecessary abstractions or over-engineering
- [ ] Changes are in the right layer (service/controller/model)
- [ ] No circular dependencies introduced
### Performance
- [ ] No N+1 queries or unbounded loops
- [ ] Database queries are indexed appropriately
- [ ] No memory leaks (event listeners, subscriptions cleaned up)
### Maintainability
- [ ] Code is readable without AI-generated comments
- [ ] No dead code or unused imports
- [ ] Variable names are meaningful, not generic
- [ ] Complex logic has tests, not just comments
Scored lower than you'd like? Most teams do. I'll audit your actual codebase and show you exactly what to fix first, for free.
Get your free codebase audit →The Economics
Has Software Dropped 90%?
The cost of implementing software has collapsed. What used to take a team of five two weeks can now be done by one developer with AI agents in two days. But this doesn't mean you need fewer engineers. It means you need different ones. Programming is still hard, even with these tools. It still takes a lot of effort. Anyone telling you otherwise is selling something.
Independent AI Researcher · Oct 2025
Implementation Time Collapsed. Thinking Time Didn't.
AI agents eliminated the tedious part of coding: the typing, the boilerplate, the repetitive patterns. What they didn't change is the hard part. Understanding the problem, choosing the right architecture, making tradeoff decisions, and knowing what NOT to build. The ratio of thinking to typing shifted from 30/70 to 70/30.
The Jevons Paradox: Cheaper Software Means MORE Demand
In economics, when a resource gets cheaper, demand for it increases. When coal became cheaper to burn, the world didn't use less coal. It used dramatically more. The same applies to software. When building features costs 80% less, companies don't build the same features cheaper. They build 5x more features. Your backlog won't shrink. It'll explode with things that were previously "not worth building."
You Need Fewer Devs But BETTER Devs
A senior developer with agentic AI tools can produce the output of 3-5 junior developers. But you still need that senior developer's judgment, architectural knowledge, and ability to direct the AI effectively. The uncomfortable part: juniors working unsupervised with these tools will ship bugs they can't catch. They don't yet have the experience to spot when the agent is confidently wrong. That doesn't mean you ban juniors from the tools. It means you structure their access. Pair them with seniors for the first few weeks. Limit them to AI-ASSIST tasks where a senior reviews every PR. Have them write the prompts and review plans before the agent executes. They'll learn faster this way than they ever did reading Stack Overflow. The winning formula: give your best engineers AI superpowers, ramp juniors with guardrails, and add technical leadership to wire it all together.
What Works Best, And Where It Falls Short
AI agents perform best with statically-typed languages (TypeScript, Go, Rust) and established codebases with clear patterns. They're excellent for CRUD apps, web applications, and SaaS products. I'll be honest about the limits: for very large codebases with delicate, security-critical code (think the Linux kernel, low-level C++ systems), I'd be skeptical of the gains. Agentic coding shines brightest on the kind of software most companies actually build: web apps, APIs, internal tools. Greenfield projects with no conventions give the agent nothing to learn from, unless you front-load the spec work. Open-ended features and new architectures absolutely work with agentic coding, but the investment shifts. Instead of spending time on implementation, you spend it on super detailed PRDs, specs, and architecture docs. The agent's output quality is directly proportional to the spec quality. Vague prompt, vague code. Detailed spec with clear acceptance criteria, constraints, and edge cases? The agent nails it. Legacy codebases with good test coverage are surprisingly productive because the tests guide the agent.
Estimate Your Team's AI Productivity Gain
See how agentic coding multiplies your team's output, and what that's worth in equivalent headcount.
60% of work at 2.4x + 40% at 1.0x = 1.8x net output per developer
Coding task speedup (2.0‑2.5x) based on range reported for experienced developers using agentic tools (GitHub research, McKinsey). Smaller teams assumed more senior, so higher per-dev boost. "Equivalent annual value" is what you'd pay to hire the extra headcount your AI-augmented team now replaces.
Your 90-Day Action Plan
From Zero to AI-Powered Team
Here's a concrete, phase-by-phase plan to bring agentic coding to your team. Print this out (Cmd+P) and use it as your roadmap. Each phase has specific deliverables and success metrics.
Phase 1: Foundation (Days 1-30)
Audit your codebase, set up the tooling, and pilot with one engineer.
- Audit test coverage and identify critical paths below 80%
- Set up a CLAUDE.md (or equivalent) project instruction file
- Choose one senior developer as the AI pilot
- Install and configure agentic tools (Claude Code, Cursor, etc.)
- Pre-approve safe commands (git, npm test, linting) so the agent doesn't block on routine permissions
- Pick one well-defined feature for the first AI-assisted build
- Measure baseline: how long does a typical feature take today?
- Establish AI code review guidelines (what to look for in AI output)
Success metrics: Pilot engineer completes one feature with AI assistance. You have baseline metrics to compare against.
Phase 2: Expansion (Days 31-60)
Expand to the full team and establish AI-assisted workflows.
- Roll out agentic tools to the full development team
- Train team on effective prompting and agent oversight
- Integrate AI code review into your PR process
- Set up automated security scanning for AI-generated code
- Set up hooks for automatic quality checks (lint on every edit, type-check on save)
- Start tracking AI-assisted vs. manual feature completion times
- Identify codebase gaps that slow down AI (missing tests, types, docs)
- Begin filling those gaps as part of regular sprint work
Success metrics: Full team using agentic tools daily. Feature completion time reduced by 30-50% compared to baseline.
Phase 3: Optimization (Days 61-90)
Optimize workflows, scale parallel sessions, and measure ROI.
- Train senior devs on running parallel AI sessions (5-10 concurrent)
- Implement custom project-specific AI skills and conventions
- Set up automated AI-assisted security reviews
- Redesign sprint planning to account for AI-boosted velocity
- Document your team's AI playbook (what works, what doesn't)
- Calculate ROI: effective output vs. tool costs vs. baseline
- Plan next quarter's roadmap based on new velocity
Success metrics: Features that took two sprints now ship in one. AI playbook documented. ROI quantified for leadership.
## Sprint Planning — AI-Augmented Workflow
### Story Classification
Tag each story before estimation:
**AI-SOLO** - Agent can complete with minimal oversight
Examples: CRUD endpoints, unit tests, data migrations,
boilerplate components, documentation updates
Estimate: 10-20% of pre-AI estimate
**AI-ASSIST** — Developer leads, agent accelerates
Examples: New features, refactoring, integrations,
bug fixes with clear reproduction steps
Estimate: 30-50% of pre-AI estimate
Note: Open-ended or greenfield work fits here, but
requires detailed PRDs/specs upfront. The harder the
problem, the more the effort shifts from coding to
specifying. Budget time for spec writing accordingly.
**HUMAN-ONLY** — Requires human judgment throughout
Examples: Architecture decisions, security-critical code,
performance optimization, vendor evaluations
Estimate: 80-100% of pre-AI estimate
### Capacity Planning Adjustments
- Senior dev with AI: 2-3x previous velocity on AI-SOLO/ASSIST
- Junior dev with AI: 1.2-1.5x (still learning to review)
- AI-SOLO stories can run in parallel (batch 3-5 per dev)
- Budget 20% of sprint for AI infrastructure improvements
### Sprint Review Additions
- Track AI-assisted vs. manual completion times
- Review AI-generated code quality (bugs found post-merge)
- Update CLAUDE.md with new patterns discovered
(e.g., agent keeps starting the dev server when it's
already running, add one line to CLAUDE.md and it
never happens again. Encode the fix, not the complaint.)
- Share effective prompts across the team
Common Mistakes
What Goes Wrong Without Guidance
I've seen every one of these mistakes firsthand, and made a few myself. The biggest lesson I learned early on: don't just prompt endlessly. Systematize from day one. Build your CLAUDE.md files, set up your hooks, encode your conventions. I wasted weeks early on just prompting and prompting without building the infrastructure to make the agent consistently good. Every mistake below is avoidable with the right leadership.
"Just give everyone Copilot and call it done"
Without workflow redesign, teams get a 10-20% boost instead of the dramatic gains they expected. Developers use AI as fancy autocomplete. Money spent, minimal return.
"Skip the tests, AI will get it right"
AI code without tests is AI liability. Tests aren't just for humans. They're the agent's feedback loop. Without them, the agent can't verify its own work and you can't verify it either.
"Let the junior devs figure out the AI tools on their own"
Unsupervised juniors with agentic tools ship plausible-looking code they can't actually evaluate. The fix is structure, not gatekeeping. Start juniors on AI-ASSIST tasks with mandatory senior review. Have them write prompts and approve plans before the agent codes. They'll build judgment fast, but they need guardrails while they do. I've seen this go both ways: juniors left alone produce a mess, juniors paired with a senior who coaches them through the review process level up remarkably quickly.
"We'll figure out security later"
AI generates subtle vulnerabilities that pass tests. Without automated scanning from day one, you accumulate security debt at the same rate you're shipping features, which is much faster than before.
"Rewrite everything from scratch with AI"
AI works best with existing codebases that have patterns, tests, and conventions. Starting from zero gives the agent nothing to learn from. You lose the institutional knowledge embedded in your code. When I port a system, I put the original source code in a reference folder and set up hooks so the agent checks its implementation against the original for consistency. The old code IS the specification.
"Measure productivity by lines of code"
AI makes LOC meaningless. Teams tracking LOC reward verbose AI output. Measure features shipped, bug rates, and cycle time instead.
How Many of These Mistakes Is Your Team Making?
- Your team adopted AI tools without redesigning workflows
- Test coverage isn't treated as critical AI infrastructure
- Junior developers use AI coding tools without structured senior oversight
- Security scanning isn't automated for AI-generated code
- You've considered rewriting major systems from scratch with AI
- Productivity is measured by lines of code or commit volume
Recognizing the mistakes is step one. I'll audit your team's workflow and show you which fixes will have the biggest impact, for free.
Get your free audit →The Leadership Gap
Why This Requires More Than Just Tools
You can hand every developer on your team the best AI tools available. Without technical leadership to wire it all together (the architecture, the workflows, the code quality infrastructure, the review processes), you'll get a 20% improvement instead of a 200% one.

The teams investing in AI infrastructure today (the tests, the types, the CLAUDE.md files, the review processes) are building a compounding advantage. Every quarter, agent capabilities take a leap. The teams with the infrastructure absorb each leap immediately. The teams without it start over every time. I've watched this pattern play out across three technology shifts in my career. The winners weren't the ones who adopted first. They were the ones who built the foundation to absorb what came after. That's the real argument for starting now: be ready for what these tools become in 18 months.
Tools Without Strategy Is Just Expensive Autocomplete
The teams seeing the biggest productivity gains aren't the ones with the best tools. They're the ones with technical leaders who redesigned their entire development workflow around AI capabilities. They restructured code review, rethought sprint planning, invested in test infrastructure, and created AI-specific conventions. This is CTO-level work.
Programmers Aren't Going Anywhere
The idea that programmers are going away is a fantasy. Until we have AGI, we're going to keep programming. Non-technical people can now build simple tools, and that's real. But large, complex systems still need experienced engineers who know what questions to ask. What IS real: AI agents are becoming more autonomous every quarter, and the teams that build the infrastructure now (the tests, the types, the conventions) will be positioned to absorb each new leap. Agentic coding is a new programming language, a higher-level one. The engineers who learn it now will have a massive advantage.
The Opportunity Is Now
Right now, adopting agentic coding is a genuine competitive advantage, but it won't stay that way. Within 12-18 months, this will be table stakes. The teams that start now get to learn, iterate, and build their AI infrastructure while the stakes are low. The real advantage is compounding your team's skills so they're ready for whatever comes next, not just shipping faster today. I've been through enough technology shifts to know that early movers don't win because they rush. They win because they learn first.
## AI-Powered Development: Leadership Brief
### The Opportunity
AI coding agents now let one engineer do the work of three.
Our team can adopt this paradigm and unlock significant
capacity, but it requires intentional technical leadership.
### The Shift
Agentic AI coding (Claude Code, Cursor, GitHub Copilot agents)
has moved beyond autocomplete. AI agents now write entire
features, run tests, and handle 70-80% of implementation.
### The Evidence
- Anthropic writes 70-90% of its code with Claude Code (Boris Cherny, Jan 2026)
- 25% of YC startups have 95% AI-written code (Garry Tan, YC)
- Experienced devs see 2-5x productivity gains (Simon Willison)
- Google engineer replicated a year of work in one hour (Jaana Dogan)
### What We Need
1. Technical leadership to architect AI-ready workflows
2. Codebase improvements: test coverage, typing, CI speed
3. 90-day phased rollout (audit → pilot → full team)
4. Redesigned code review process for AI-generated output
### The Ask
Approve a 90-day pilot program with fractional CTO oversight
to assess our codebase, configure tooling, train the team,
and measure results against current baselines.
### Expected ROI
- Features that took sprints now ship in days
- Reduced time-to-market on backlog items
- Team upskilled on AI-augmented development
- Infrastructure improvements that compound over time
Leadership Readiness Check
- A designated technical leader owns your AI adoption strategy
- Code review processes have been redesigned for AI-generated output
- Sprint planning accounts for AI-augmented velocity
- Test infrastructure is a funded, first-class priority
- Your team has a documented AI playbook (what works, what doesn't)
- Security scanning is automated for all code, human and AI
This guide gave you the knowledge. The 90-day plan gave you the roadmap. I use agentic coding daily at Nexrizen to ship client projects. I recently built a complete voice-based medical assistant prototype for a client using Claude Code and a detailed PRD, and the result blew my mind with how close it was to the final product. I've also helped teams who weren't AI-ready get started with just CLAUDE.md files and automated PR reviews, without requiring their developers to change how they work. The next step is a 30-minute audit where I assess your codebase, score your AI readiness, and give you a prioritized list of what to fix first. No pitch, no strings. Just a technical review from someone who's been building software for 15+ years and ships this way every day.
30 minutes. Your codebase. Actionable next steps. No pitch.
Get your free audit →Setup Complete!
Installation log
6 components installed. 0 errors. Ready to deploy.
You've got the knowledge, the plan, and the templates. Now let's see how they apply to your actual codebase.