Turning AI-Generated Mess Into a Sustainable Codebase

If you’re reading this article, you’ve probably already had at least your first taste of vibe coding. You open your editor and start asking for feature after feature. Whether you did it with a simple agent or in a more structured way, following a BMAD flow or some other Spec-Driven Development (SDD) approach, the end result won’t be that different: you’ve got an application that more or less works, but now it’s time to get your hands in there and clean things up.

If you’re just getting started, the code is probably a disaster. What began as an experiment is now something you want to turn into something more stable and readable: reorganize the code, give consistency to what you’ve built.

And on top of that, you got carried away, and in the excitement you ended up with a hundred REST endpoints that now need fixing.

It’s refactoring time.

The scenario above is probably very common: the speed at which you can build things pushes you to always ask for “just one more little piece.” The desire to see a complete product is huge, especially at the beginning, and the classic “I’ll fix it later” takes over.

So I’ll try to list a set of steps to run a repeatable refactor across a large number of points. The REST endpoints case is just an example, but the same approach applies easily in other contexts.

Catalog the work before touching the code

First, you need to avoid wandering through chaos while doing the process. Start by creating a system to catalog endpoints and keep track of progress.

Without reinventing the wheel, the first step is very simple: ask your agent to create a JSON file with the list of endpoints, and the work status set to your starting point.

Something like:

Create a JSON file in the DOCS directory called API_STATUS containing a list of all routes present in the <routes-folder> directory.
Each item in the file should be:
{
  "url": "/organizations",
  "method": "GET",
  "status": "to do"
}

This operation will produce the file as requested, and you can start tracking what needs to be done.

Commit the file to the repo.

Refactor one endpoint by hand (set the standard)

Do the first refactor yourself. You already trusted the agent in the first iteration, and this time you want the code reorganized according to your style guide.

One important thing to keep in mind: you’ll need to do more than one iteration, so there’s no need to make this refactor perfect and final. In fact, splitting the problem helps agents work more accurately.

Here’s a list of changes you’ll probably want to apply:

normalization of type names
fixing file names
moving logic created in the handler into pre/post handler management
extracting logic into separate services (agents tend to create big “do everything” functions)

Manually update the status of the endpoint you fixed, and commit everything.

Turn the changes into a refactoring spec

At this point you have:

one endpoint refactored manually by you
a status file with the list of endpoints

Creating the first specs document is extremely easy. Ask the agent something like:

Create a markdown document to apply the refactor I performed in the last commit to other endpoints.
Name the document DOCS/refactoring.md inside directory Y.
Do not apply any of these changes yet.

The result will probably be a document describing all the micro-changes you made in detail.

Reviewing the document and adding a few parts helps you start applying everything correctly immediately, but it’s not strictly necessary. It will improve on its own as you notice missing pieces.

Commit the created document.

Run the playbook: refactor endpoints in a loop

At this point the project contains everything you need:

a document that tells you what the next endpoint to refactor is
the specs document with the refactoring rules

Load DOCS/refactoring.md into your agent’s context and write something like:

Take from DOCS/API_STATUS.json the next endpoint to change and apply the rules from the context to it.

The agent will perform the refactor for that endpoint.

Review, correct, and evolve the spec

A first review of the generated code will almost certainly reveal missing pieces or things you didn’t anticipate.

No problem: ask (via prompt) to fix the issue and update the refactoring file with the required changes. Along with applying the fix correctly, it will update the specs for future operations.

During the automatic refactoring process you’ll also notice some unwanted behaviors, like exotic ways of running tests or type checking. Same as above: don’t hesitate to interrupt the agent, explain the correct method, and always ask it to keep the document updated.

At the end of every iteration: review, commit the code, request an update of the status file.

After a few iterations you’ll see you no longer need to update the refactoring document.

Extra tactics for staying in control

If during the iterations you realize you skipped an important part of the refactor, unless you’re at the very beginning, set it aside and create a new document to run an additional refactoring step once you finish the first one.

Some models are more “obedient” than others. I alternate between Gemini 3, ChatGPT 5, and Claude Opus 4.5. All three work very well for problem-solving, but:

Gemini makes decisions and takes paths that aren’t the desired ones
ChatGPT sometimes loses the thread and I’m forced to restart
Claude, at the moment, is the one that best adapts to the method described

You can ask the agent to optimize the context file using more agent-like language, just ask with a prompt. This can save around 50% of tokens in context. The experiments done so far are positive and the operations are performed correctly, so it’s worth trying.

Don’t rush. You don’t have a magic wand, you just have a very powerful tool that needs controlled use. Too many changes means very large reviews.

Closing thoughts: process beats panic

What I’ve described is a process I now use regularly, and it has been a real turning point in my activity as an AI developer.

The interesting part is that this approach doesn’t “automate refactoring”: it automates repeatability. And when the work is repetitive (100 endpoints, inconsistent naming, monolithic handlers, services that grew randomly), repeatability is everything: it reduces friction, lowers error rates, keeps reviews smaller and more frequent, and most importantly prevents every endpoint from becoming a special case.

In practice, you’re doing something simple but powerful: you move intelligence from the code to the process. Code changes constantly; the process is what lets you keep direction. The status file forces you to be honest about “what’s missing.” The refactoring document becomes your operational style guide. And the agent is no longer the thing that “writes code for you,” but the thing that executes a playbook.

And in the end, the most useful thing happens: it stops being “vibe coding + final panic” and becomes a sustainable workflow. Not because AI is magic, but because you’ve channeled it into a system that produces quality by construction.

If I had to summarize everything in one sentence: don’t ask the agent to be good, put it in a position where it can’t do otherwise.

Now that you have a playbook for turning prototypes into maintainable code, here’s where to go deeper:

Vibe Coding is not a production strategy - Understanding why casual prompting creates the mess this article helps you fix
8 Patterns for Spec-Driven Development - The architectural patterns that prevent the chaos from happening in the first place
From prompts to contracts: an intro to Spec Driven Development - How to shift from ad-hoc prompts to versioned specs that anchor intent
5 mistakes when starting with Agents - Common pitfalls when working with AI agents and how to avoid them
Working Patterns for AI Development Teams (Part 1) - Structured approaches for spec-driven context and evaluation-driven quality