The Middle Path: Why I Stopped Letting AI Write All My Code

Lars Faye's Agentic Coding is a Trap finally pushed me to write about something I've been thinking about and experimenting with for a while. He talks about many things - skill atrophy, “supervision” fatigue and most importantly for me - the growing distance b/w you and the system you are deploying. I’ve landed on a system/framework that balances the nature of the task at hand, agent involvement and cost (because I can’t pretend that all of us have unlimited token budgets).

The Trigger

December was when we aligned on Cursor as our editor at Komplai. Data till April.
Chart created by our agent Larry on Komplai.

In December, we went all-in on Cursor as our tool of choice at Komplai, over the last 5 months, we have shipped fast, features have been flying out, some good and some that should have probably been better off as ideas a little longer. My code base was growing at a pace I’d never managed before.

One part of me thinking about the value of our previous workflow is cost, we are a bootstrapped business, so by no means we have unlimited budgets (but I don’t think we’ve let engineering suffer).

But the bigger factor here was that I could not explain critical workflows without looking at the code - or asking the model. Not immediately, but after sometime, when expanding on something, in the middle of discussions or debugging some hairy issue.

This was a system I had built, bills, payments, RCM, Fx, Withholding tax and their implication on accounting. Before agents, if you’d asked me for a mental model, I’d have given you a concrete answer, enough to serve as a baseline, enough to debug from when things break late in the night. After a few months of heavy agentic coding, the math and the side-effects had become something I had to read the code for every time, I had lost my gut instinct for it.

Non Negotiable - Think before you prompt

Before we get into the workflow, there’s one non-negotiable for me. I never go to the AI with a problem I haven’t thought about at all.

This is not about biasing the model towards my solution, in fact, I usually don’t share my initial solution with it. Sometimes, the agent comes back with something better than what I had in mind. But thinking before prompting gives me something equally important: the ability to catch when the agent is going astray (and this has saved me countless times).

When you’ve thought about the problem even a little, you can call out blind spots, if you have not, then you don’t have anything to measure the AI’s input against.

Anthropic's own AI Fluency Index found that when AI produces polished-looking artifacts — code, apps, documents - users become less likely to identify missing context, check facts, or question the model's reasoning. The output looks finished, so people treat it as finished.

Same workflow, Different hands

In the pre-AI era, when I sat down for a big task, I’d open a diary or a note-taking app and write you bullet points: a rough decomposition of the work. Then I’d code from one bullet to the next in a single sitting.

My workflow with agents is almost identical, if its a large task, first I think about it, then I’d get the agent in plan mode (GPT 5.x FTW), if the work is dense enough, I’d ask the agent to break it into phases, and then go from one phase to another. Reviewing and committing each step. One bullet point at a time.

This is counter to a lot of “give it a goal and let it rip” advice on the internet, but for critical work, this is the workflow that gives me a good balance of speed and sanity.

Having said that, it would be sub-optimal to do this for everything and all kinds of tasks. Agents & models have genuinely been improving and it only makes sense of taking advantage of their capabilities.

I have a small framework to decide the ratio of agent:human involvement.

The Human-Agent Collaboration Matrix

This framework splits work into 4 categories, each with a different Human:AI ratio.

1. Exploration & Prototyping: Agent Heavy

When I'm exploring an idea, testing a hypothesis, or building a throwaway prototype, I let the agent run. Speed matters here. Correctness doesn't, yet. This is code I'm prepared to delete, so the risk of not deeply understanding it is low.

The key word is throwaway. The moment a prototype graduates to something real, it moves to a different tier.

2. Fixes: Smell dependent

Minor, trivial fixes - a broken CSS rule, a typo in validation message, updating some basic API to add new fields are largely agent driven, we go from Slack -> Linear -> Cursor -> PR without writing any code.

But, when a fix smells bigger than it looks, I like to get more involved. A bandaid fix might get the test to pass, but the exploration around it can uncover structural issues. If we need more than a patch, it becomes collaborative again.

3. Refactors

Here's the thing about doing a lot of Tier 1 work with AI: you end up with a lot of code, fast. Code you may not fully understand. Code with redundancies, inconsistent patterns, and bloated files that an agent was too conservative to split.

When it's time to clean up, I use the agent to understand - map the branches, trace the usages, find the dead code. But the actual restructuring decisions are mine. What to merge, what to split, what to throw away. For business-critical code, I drive this entirely. For everything else, it becomes a plan-implement loop with the agent.

This is also where I rebuild understanding of code I delegated too aggressively in Tier 1. Refactoring is learning. You can't clean up code you don't understand, and by the time you're done cleaning it up, you do.

4. Business Critical and Domain Modelling

This is where I'm most hands-on. Autocomplete over agents. I write the core routines, the domain models, the business logic. The agent fills in utilities, helper functions, boilerplate around what I've written.

I do this because when I write a new routine and try to use it, I get a feel for it. Is it doing too much? Too little? Do I need an orchestrator around it? Does the API surface feel right? That tactile feedback doesn't happen when an agent writes the code and you review it, at least for me.

This is where the payment story comes in. RCM calculations, FX conversions, withholding tax: the math and the side-effects have real financial consequences. I need to be the person who wrote that code, not the person who approved a PR of that code. When something breaks in production, the difference between "I wrote this" and "I reviewed this" is the difference between a 10-minute fix and a 2-hour investigation.

Skill Decay - The non-token cost

Lars frames the supervision paradox well: you need strong skills to oversee agents, but agents erode the very skills you need to oversee them. I'd add a second paradox specific to startups and small teams.

The promise of agentic coding is that a small team can move like a large one. The reality is that you're generating a large team's worth of code with a small team's capacity to maintain it. And maintenance is where the bill comes due.

I'm already seeing this shift. In past roles, engineers who'd maintained a system for a while had a grip on their mental model. You could have a conversation about what the system does, how it fails, what the tradeoffs were, and they could answer from memory. That grip is loosening. The code is getting written, but the understanding isn't being written alongside it.

And that's with experienced engineers who at least have their pre-AI instincts to fall back on. Juniors and freshers entering the field now don't have that safety net. They're building their foundational skills through the agent. When Claude goes down, so does their ability to function, and that's something I'm not comfortable with.

This isn't just about individual skill atrophy. It's about your team's collective ability to hold production. AI is writing more code than ever, but humans are still the ones woken up when it breaks.

Your Mileage May Vary

I want to be clear: I'm not arguing against using AI for coding. I use it every day. It's made me faster, helped us build a lot more without having to expand the team, it's helped me explore ideas I wouldn't have had the patience to prototype manually, and it's genuinely good at a lot of things I don't enjoy doing.

If you’re an IC - I urge you to find a way to maintain those mental models, don’t compromise on your thinking and understanding of building scalable systems. We used to write code for humans to maintain complexity and I don’t think agents are very different.

If you’re a leader - Try to ensure that your team still has a handle on what’s running in production. You are responsible for your services and as such, you should guide your team to find the right balance b/w speed and sanity.

My approach is not about slowing down, its about being deliberate. And for parts of your system that your (& your customers’) business runs on, deliberate is what I prefer.

The Middle Path: Why I Stopped Letting AI Write All My Code

The Trigger

Non Negotiable - Think before you prompt

Same workflow, Different hands

The Human-Agent Collaboration Matrix

1. Exploration & Prototyping: Agent Heavy

2. Fixes: Smell dependent

3. Refactors

4. Business Critical and Domain Modelling

Skill Decay - The non-token cost

Your Mileage May Vary

Comments

More from this blog

Beyond Scale: What Growth Means in B2B

Why I support Manchester United

Jetbrains AI Assistant Review

2024 - Year in Books

Command Palette

The Trigger

Non Negotiable - Think before you prompt

Same workflow, Different hands

The Human-Agent Collaboration Matrix

1. Exploration & Prototyping: Agent Heavy

2. Fixes: Smell dependent

3. Refactors

4. Business Critical and Domain Modelling

Skill Decay - The non-token cost

Your Mileage May Vary

Comments

More from this blog