Blog

Red-Green-Wut?

Jan Vermeir

Jan Vermeir

May 29, 2026
4 minutes

I always found that one of the best practices taken from extreme programming and test-driven development is red-green-refactor. Red means you write a test that proves the behavior you need is absent, i.e., the test fails and therefore turns red. Green means the test passes, so you know you're done when the test that used to fail now succeeds.

Nothing changed in this regard. When vibe-coding at breakneck speed, you can still do test-driven development. It actually works well for AIs and for humans. First ask AI for a plan, then, instead of asking for an implementation, you might ask for a failing test that shows the behavior needed is absent. Next, ask for an implementation that makes the test succeed, and there we are. Test green -> ship it.

I can imagine that if a team works this way, a lot of software can be produced in a very short time. If AI makes all of us into 10X developers, then 1 month with a team of 4 developers will produce 40 months of software, right? Doing some back of the envelope handwavy computations, we might end up with 20,000 lines of code very quickly. Maybe in a month or a week? The numbers don't really matter, actually.

Think back to the projects you've worked on in the past. What usually happened after the first couple of thousand lines of code? At that point, two or three months may have passed of snail-paced pre-AI coding. Over those two months, you would have learned what the problem you're solving is really about, and you may find that some of the decisions you made early on are starting to work against you. Test coverage might be missing (demos, deadlines, we'll fix it later), the structure of the code base might need an overhaul, Sonar is yelling at you to fix all those code duplications and whatnot.

This is where the refactor comes in. The vital third practice in red, green, refactor. When a piece of software works, you improve by looking at the solution you made in the context of the code you already have, and you make changes. On a micro level, you change the code you just added to optimize it, making it better and cleaner. AI is really useful for this: reset your Claude code session and ask it to comment on the changes on the current branch and make improvements. That should take care of the little errors, duplications, inconsistencies and inefficiencies that tend to sneak into any code base.

There's also the macro level, however. When you make enough small-scale changes, at some point the overall structure needs an overhaul to accommodate new developments and insights. AI-generated code is no different: a lot of small changes eventually turn into a big mess, only faster because now we can produce more code faster. At this point we need refactor on a larger scale: look at the architecture, decide what needs to change to accommodate new features and then change the code to make it future-proof.

These refactorings take time, especially if you don't want to look at your code as a black box, but instead take responsibility for what was generated.

I'm still unsure about the best solution to this problem. For now, in the team I'm part of, we just do our code reviews of each pull request. We can produce the pull request faster than before, and then we basically slow down; we look at what we've got and make all the code better, not only what we just vibed. Or write a new story to make it better. Or we learn where we might have to improve in the near future.

Refactor on a macro scale.

And about those Sonar warnings: just give them to AI and watch 'm disappear. Magic.

Ps. no AIs got hurt writing this post.

Written by

Jan Vermeir

Developing software and infrastructure in teams, doing whatever it takes to get stable, safe and efficient systems in production.

Contact

Let’s discuss how we can support your journey.