The rise of vibe coding: Why architecture still matters in the age of AI agents

By Michael Chiaramonte

Most of us were excited about coding with AI assistants not too long ago. Although not always exactly matching what we were looking for, tools like GitHub Copilot provided us with entire blocks of code suggestions to help us complete tasks more rapidly. Beyond the confines of most auto-completion, these assistants helped developers write code at what we thought was, at the time, immense speed.

Fast forward less than two years, and the developer experience is undergoing another radical transformation built upon this initial wave. Instead of having AI make partial suggestions to help us build functions and small pieces of the application, one suggestion at a time, now, it’s possible for agents to write almost every line of code and make every decision around the framework in which the app is built. Although it comes in many different flavors, one that stands out is “vibe coding.” It’s essentially agentic coding taken to the extreme, where the agent works to create an application with minimal human intervention. Most of the human interaction occurs when repeatedly pressing the Tab key or instructing the agent to proceed.

Whether using vibe coding or cautiously guiding agents to generate or refactor large fragments of a codebase, the developer’s workflow has undergone significant changes. Instead of manually writing boilerplate, scaffolding architectures, or even debating framework choices, developers can let the coding agent make these choices. Tools like Replit, Lovable, and GitHub Copilot’s agentic mode make it easier to go from idea to code.

But are these AI agents actually building good software?

Post from Vas Moza @vasumanmoza, engineer at Meta

More specifically, are these generated applications:

Scalable and maintainable for enterprise applications?
Following best practices for code and architecture?
Able to be guided towards better architectural outcomes?

These questions become more complex when we consider that most enterprise development isn’t building new applications from scratch; it’s working with existing codebases that may be years or decades old. While much of the excitement and trends around AI coding focus on greenfield projects and newly built applications, the reality is that most developers spend their time maintaining, extending, and modernizing legacy systems.

The answers to these questions will likely determine the longevity of this trend and the apps that emerge from it. In this blog, we’ll explore how AI agents implicitly make architecture decisions and the risks that come with it. We’ll demonstrate how developers and architects can craft more effective prompts that integrate architectural thinking into agent workflows. We’ll touch on another rapidly emerging adjacent trend — vibe speccing. And finally, we’ll examine how tools like vFunction can validate, guide, and enhance agent-generated code by leveraging real-world architectural insights. Let’s start by exploring how AI agents are making decisions regarding application architecture.

How do AI agents make architectural decisions?

Computers are much faster at making decisions than humans in many situations, but when it comes to designing an application, there are many decisions to be made. Generally, before developers start coding, they’ve already thought about many high-level questions.

Which frameworks to use?
How to organize files and folders?
How to handle state, caching, or error boundaries?
Where to define business logic?
How to structure the data flow and external dependencies?

These same questions are also “considered” by the agent, and the answers are provided much quicker than humans. This means AI agents don’t just generate code; they generate architecture by default.

Even if the prompt doesn’t include explicit architectural instructions, the agent still makes architectural decisions and implements them as part of code generation. These architectural decisions are baked into the codebase, and they can have significant implications for scalability, maintainability, and performance.

Implicit vs. explicit architecture

AI agents, especially general-purpose LLMs, make decisions based on patterns in training data. An agent is only as good as the data it is trained on, and many agents follow a self-learning cycle, which means they improve over time with the feedback they receive. Because the agent’s logic is shaped by feedback, if it is poor, then the outcomes will also be poor. For the architecture of a generated application, this means an agent may:

Adopt patterns that were popular but are now considered outdated
Favor simplicity or familiarity over best practices or modularity
Skip important layers of abstraction (e.g., services, repositories, data transfer objects) unless specifically instructed.

To fully understand the good and bad of agents, you need to use them. Here’s an example of what a coding agent may do when given a simple prompt without much direction around the architecture. Let’s assume you feed the following prompt to the agent:

“Build a REST API in Node.js that allows users to create and view blog posts.”

From this prompt, the agent might generate something like this:

A single server.js file
Inline route handlers
MongoDB connection hardcoded into the route logic
No service layer, input validation, or test coverage

Most likely, this app will function as intended. If you don’t look under the hood, everything might seem fine until you hit the limits of this simplistic design later, when you expand on features or scale the app. The architectural implications of this implementation mean the application:

Has no separation of concerns
Is hard to scale or test
Contains tightly coupled logic that makes refactoring painful
Has no domain boundaries or layered architecture

Although the application works, you’re almost guaranteed to bang up against the wall quickly. You’re essentially baking technical debt into the application from the start.

Why this matters

If you’re experimenting with AI-driven coding, chances are your agents are making quiet decisions that set your project on a fixed trajectory. Once the base code is generated, with various patterns already intertwined into it, decisions that have a major impact have been made even before you write a single line yourself. This becomes a bigger issue as the app grows or gets handed off to other teams. This challenge is further amplified when working with existing codebases, where agents working with the existing application must navigate not only architectural decisions but also legacy constraints, existing integrations, and business logic that may not be immediately apparent from the code structure alone and may require additional context to make accurate changes.

When it comes to modern software architecture, it is generally a best practice to focus on specific elements from the design phase through to implementation. Key points usually include:

Loose coupling and high cohesion
Resilience and fault tolerance
Modularity and testability
Observability and performance under load

These concerns, if not addressed through initial and subsequent prompts when using the agent to create the application, might be overlooked. This makes sense since the agent, by default, is looking to create something that works, rather than adhering to the principles of systemic design.

Vibe coding falls short when an agent is fed unrefined prompts.

Can AI agents build scalable, resilient systems for the enterprise?

At first glance, AI-generated applications look impressive. Within seconds, potentially minutes for larger apps, a working application is spun up and ready to go. The syntax is (generally) clean. The app works. The API responds as expected. However, beneath the surface, there is often a lack of architectural rigor. Experienced developers and architects can peek under the hood and prompt the agent to make changes for the better. But what about less experienced developers or those with no technical background using these systems to build mission-critical applications? And what happens when these same agents are turned loose on complex, business-critical enterprise systems with years of accumulated logic and technical debt? Without making sure the application’s code is architecturally sound, brittle and hard-to-scale systems can develop over time.

Much of this may not matter for the typical vibe-coded application with just a few users. But are vibe and agent-led coding techniques ready for the enterprise? It all comes down to the architecture they generate and their ability to align with proven best practices. Vibe coding has begun to permeate the enterprise, where flaws related to scale and security (among other factors) are more detrimental and may not be easily identified. Whether written by a developer or an agent, code is only as good as the architectural foundation it’s built on.

So are AI agents building scalable and resilient systems? Let’s break this down across three key architectural qualities: scalability, resilience, and best practices.

Scalability: Will it grow with you?

Application scalability isn’t about how fast the code runs on your laptop — it’s about how well the app handles increased users, traffic, and complexity. When AI agents create code, there are a few common shortcomings, including:

No separation between compute and storage
Missing pagination or rate limiting on API endpoints
Business logic is tightly coupled with facade layers
Synchronous request handling that blocks under load

Infusing these anti-patterns into an application may be acceptable for a toy app or a quick proof-of-concept, but applications with any level of usage will likely struggle under load. Unlike well-architected applications, agents rarely incorporate strategies such as asynchronous processing, caching layers, or horizontal scaling considerations.

Resilience: Can it recover gracefully?

Application resiliency goes beyond having an app that works. With resilient systems, when things break, they recover. Unless very specifically prompted, most AI-generated code doesn’t account for:

Transient network failures
Rate-limited external APIs
Unexpected database outages

Most agents write code that overlooks components that would be included in many production-ready applications. If a developer created the application, they would likely include features such as retry logic, circuit breakers, graceful fallbacks, timeouts, and structured error propagation. With the AI-generated code, you’ll usually get a happy-path implementation that assumes every service is always up and every request succeeds. There’s nothing wrong with this in the prototyping stages; it’s even fine for a demo. However, it’s a risk in anything that’s expected to run in production.

Best practices: Is it built to last?

Even when an AI-generated app “works,” it might not age well. It’s not to say that every application built directly by a developer is perfect either, but with pull requests and best practices being scanned for by the team as the application develops, major issues are less likely to fall through the cracks.

Things that senior developers and architects look for in a well-architected app are often overlooked by agents. These include:

Clear domain boundaries (domain-driven design, modular monoliths, or microservices)
Test coverage (especially integration and contract tests)
Observability (structured logging, tracing, metrics)
Secure defaults (input validation, authentication, authorization)

Here’s a comparison of well-known best practices and the typical output from agents building applications:

Aspect	Typical Agent Output	Best Practice
File structure	Flat or minimal	Modular with clear boundaries
Error handling	Try/catch or nothing	Centralized with typed error responses
Input validation	Often skipped	Required for every field
Business logic location	In route handlers	In services or domain layers
Observability	Console logs	Structured logs + tracing

It’s nothing against AI agents. They are great at building code that runs, but they don’t inherently build systems that last. Without guardrails and solid prompting skills (driven by an engineer/architect’s expert skills and experience), agents tend to overfit to short-term utility rather than long-term architecture and sustainability.

If you want to move fast and build well, AI agents are definitely part of the equation. Of all the above points, the answer is clear: you need to guide the agent with architectural intent and inspect the results critically. One of the best ways to do this is to work with the agent on a plan before implementation. Some newer platforms and models, such as vibe speccing, are already doing this, outlining an implementation plan and key details to the developer before the agent flies off and does its thing. Creating a specification for the software piece before beginning an infinite loop of code generation cycles can be more effective for agents and lead to a cleaner, initially generated codebase. This optimized flow is akin to what you’d do if you were working on this code with a team of humans, ensuring that the generated application meets the standards you’d hold developers and architects to if they built it by hand.

Vibe coding with existing and legacy codebases: the enterprise reality

While the promise of vibe coding is exciting, compared to starting a net-new application, the reality in most enterprise environments is far more complex. The majority of enterprise development isn’t greenfield work on shiny new projects but instead revolves around working with existing codebases that are five, ten, or even 20 years old. Within these systems are layers of technical debt, undocumented business logic, and complex interdependencies that make agent-driven development significantly more challenging to do well.

Tangled, monolithic application underscores complexity of legacy codebases.

Unlike greenfield scenarios, where agents can make architectural decisions from a blank slate, working with existing codebases requires understanding existing systems before making changes. This creates a fundamentally different risk profile that most discussions around AI coding agents don’t adequately address, but will become a top priority as these tools infiltrate the enterprise.

The legacy code challenge

Legacy systems present unique obstacles that agents aren’t naturally equipped to handle. Years of quick fixes and workarounds may appear as “bad code” to an agent, but they serve critical functions. Business rules are embedded in code without clear documentation. Systems have grown organically with tight coupling between components that isn’t immediately obvious. Architectural decisions that made sense at the time may appear outdated without understanding the original constraints and context that extend beyond the code itself. It’s not that agents won’t attempt to make changes; they will. The problem is that the change may not be in the overall best interest of the system.

Unique risks with existing codebases

When agents work with existing codebases, several specific risks emerge. Agents may not understand why certain “bad” patterns exist. Again, what appears to be poor architecture might actually be a workaround for deeper issues encountered many years before or integration constraints that the agent can’t see. They might optimize a database query in one service without realizing it breaks a process within another service. Or they might “clean up” what appears to be redundant validation logic, not realizing it handles edge cases that only occur with specific legacy data. To add insult to injury, many of these legacy codebases have poor code coverage on the unit and regression testing sides, which means that changes made may not be able to be validated for overall compatibility.

The promise vs. reality

The promise of using AI agents with existing codebases is seductive: “Just point the agent at this legacy code and modernize it.” The reality is far more nuanced. Without proper architectural context, agents often make changes that may work locally or in the scope of a single service but break the system globally.

This is the fundamental challenge of using AI agents with existing codebases: agents excel at local optimization but struggle with understanding the system’s overall architecture. They can improve code in isolation, but this may make the overall system worse.

Prompting for better architecture: Tips & examples

Many of us have used ChatGPT and similar models over the last few years to answer our questions. The result we get is generally a direct result of the prompt used. For general questions, this is usually alright, although hallucinations are still present, but when you’re using the output of agents and their underlying LLMs to create critical infrastructure, it’s a bigger issue. So, it goes without saying that most architectural flaws in AI-generated code don’t come from the model being “wrong;” they come from vague prompts.

If you ask an AI agent to “build an app,” it will do just that: build an app. However, it won’t necessarily build one that is testable, modular, observable, or future-proof unless you explicitly request those qualities.

The good news? You can prompt your way to better architecture. Let’s look at how this can be done for a greenfield app you’re building.

Tip #1: Be specific about layers and responsibilities

Agents tend to collapse everything into route handlers unless told otherwise. Prevent this by breaking out the expected architecture. This requires expanding the prompt to steer it with explicit commands on how you want the application to be structured. For example, here is a poor prompt that would leave a lot of decisions up to the agent to arbitrarily make:

Build a REST API in Flask to manage tasks.

To enhance this prompt, we can then add a few more pieces based on how we want the specific application to be built (which in this case is a Flask app). Here is an example of the improved prompt:

Build a Flask API to manage tasks, using a layered architecture with:

• a controller for routing,

• a service layer for business logic,

• and a repository layer for data access using SQLAlchemy.

By prompting like this, the agent should structure the code into separate modules and layers. This will more closely align the output with the architecture and structure that you would want to see if you were coding this by hand.

Tip #2: Mention non-functional requirements

Building on this initial prompt, we can also incorporate non-functional requirements related to observability, security, and resilience. Generally, if not explicitly asked for, these won’t magically appear in the AI output. It’s best to be explicit and assume that these implementation details don’t appear unless you ask. Here’s an example of some further prompting we could add to the previous prompt:

Add basic logging using the logging module, input validation for all endpoints using Pydantic, and retry logic for database operations.

Even better, we can add in more explicit instructions on what to expect and how the application should handle it to the prompt, like so:

The API should log each request, track execution time, and return a structured JSON error if something fails.

This will further improve the structure and functionality of our application. Again, prompting in an extremely explicit way is the best way to ensure that the app is built according to requirements. What the agent doesn’t know, it will fill in the blanks for, and not always in a good way.

Tip #3: Think like a system designer, not a feature dev

Lastly, when you prompt, don’t just describe features; describe the architectural goals of the system. Agents are quite adept at understanding design principles, but they are not always effective at incorporating them into the generated code from the outset. Once again, going extremely explicit is the way to be. The beauty of modern agents and LLMs is that the context windows are massive, so you can add an extensive list of specifications and design instructions without the worry of overwhelming the underlying LLM. To build further on the previous prompt, let’s examine additional text to incorporate into the prompt that ensures the application output aligns with our true needs and design requirements. For example, we could add this to the above prompt:

Build a modular task management API that can scale horizontally and supports future transition to a microservice architecture. Prioritize:

Clear separation of concerns
Statelessness
Dependency injection, where appropriate

Key takeaways for prompting with architecture in mind

Prompts are the new architecture documents. By prompting with structure, constraints, and intent, you can get significantly better agent output. Combine this with architecture-aware tools (like vFunction, which we’ll cover next), and you can shift AI from just generating code to creating sound, scalable systems for new and existing complex systems in a fraction of the time it takes to hand-code such applications.

The best approach to prompting is to scaffold your system incrementally, rather than using a single massive prompt. Similar to building an app from scratch, use the agent to build a base application you are happy with, then add enhancements through targeted iterations. And never be afraid to be overly specific. The agent knows a lot more than any human developer could possibly know. This means that you can easily feed a prompt to the agent like this:

“Create a modular [app type] using a layered architecture: controllers, services, repositories. Use [framework] and [ORM].”
“Ensure the API has structured logging, input validation, and retry logic for all external calls.”
“Design the system to support horizontal scaling, with stateless services and no shared session state.”

And see results that should align very closely with the expected output. Of course, iterating on an application with AI while ensuring that the architecture is aligned is a significant task. Luckily, vFunction provides a platform that can be integrated into these flows to ensure everything is architecturally sound, eliminating the need for time-intensive manual audits. Let’s look at how vFunction fits into the workflow next!

Using vFunction to ground agent output in real architecture

While AI agents can quickly generate large volumes of code, evaluating the architectural quality of that code, especially in existing applications, remains a major challenge. That’s where tools like vFunction come in. Beyond architectural observability, vFunction actively guides and validates modernization efforts by providing the architectural context that agents lack. This is especially critical in legacy systems, where understanding what the code does is only part of the picture.

Agents also need to understand how that code fits and works in the broader architecture, something vFunction’s deep static and dynamic analysis delivers, enabling more informed and reliable modernization decisions.

What vFunction does

vFunction combines static and dynamic analysis with data science to uncover architectural technical debt, provides relevant context to code assistants for automated refactoring, and breaks monoliths into scalable, cloud-ready services for faster service transformation.

As part of this process, vFunction analyzes your application and identifies critical architectural issues, including:

Domain boundaries and entanglement
Dead code and god classes
Technical debt hotspots and anti-patterns
Metrics like modularity and complexity

vFunction combines static and dynamic analysis with data science to uncover and fix architectural technical debt.

Essentially, it provides a baseline architecture, based on the static and dynamic analysis of your app’s current structure, and identifies areas for improvement. When working with existing codebases, vFunction goes beyond surface-level code analysis. It understands the actual runtime behavior of your application, including how different components interact, which code paths are actually used, and where the real architectural boundaries exist, as opposed to what the code structure suggests. This dynamic understanding is crucial when guiding agents to make changes to existing systems.

Pairing AI agents with vFunction’s insights

Once you have a baseline, you can use that information to feed context into your agent and guide its next steps. These insights are turned into specific, structured TODOs (tasks), each paired with a refined GenAI prompt optimized for code assistants like Amazon Q. Rather than relying on guesswork, you can now instruct agents with architectural context. This transforms the agent from a raw code generator into an architecture-aware co-pilot.

vFunction’s architectural insights are turned into specific, structured TODOs (tasks), each paired with a refined GenAI prompt optimized for code assistants.

For legacy systems, this process is particularly powerful because vFunction can identify which parts of the system are safe to modify and which require extreme caution. It can detect dead code that can be safely removed, identify god classes that should be split, and highlight areas where refactoring will have the most impact on architectural quality. Here is a high-level example workflow of how this would work:

Run vFunction on your existing monolith.

This will allow vFunction to understand the underlying architecture and dependencies. With this understanding, vFunction will generate TODOs and corresponding prompts that will help the agent refactor the application towards the target state.

Feed this into your AI prompt:

Based on the most pressing TODOs, you’ll select a prompt and inform the agent of the changes you’d like it to implement. For example, here is a prompt that vFunction might generate to improve dynamic class exclusivity:

“We want to split the class com.oms.service.InventoryService into two variants, which we’ll refer to as local and global. The local variant should be used in the execution paths below, and the global one in all other cases. In order to minimise code duplication, the local variant can inherit from the global variant. The execution paths are:

Path 1:

1. controller.InventoryController.fetchInventory()

2. service.InventoryService$$CGLIB.fetchInventory()”

Have the agent implement changes based on vFunction’s guidance.
Re-run vFunction to compare the new state to the original baseline.

Are modularity scores improving?
Is the dependency graph simpler?
Has domain entanglement decreased?

Ensuring agent-generated changes align with architectural goals is critical—especially in large, complex codebases where multiple iterations may be needed to get it right. A strong feedback loop helps ensure those changes enhance architectural quality rather than introduce new issues. vFunction supports this by detecting when agent changes add coupling or violate architectural boundaries, providing immediate feedback to guide the next iteration.

AI + architecture: Augmentation, not automation

AI agents aren’t ready to replace senior architects and engineers, but they can augment their workflow when paired with strong tools like vFunction. By providing architectural observability with data science and GenAI and grounding agent actions in real data, you shift from vibe coding to intentional, architecture-first development. This is the future of production-ready agent-based application development.

Conclusion

We’re entering a new era of software development, one where AI agents write more and more of the code, but humans still hold the architectural vision. With the right prompting strategies and the right tools to measure what matters, you can build faster without sacrificing structure.

Whether you’re modernizing a monolith or starting a new app from scratch, the combination of AI and architecture tooling like vFunction gives you a scalable path forward consisting of one prompt, one refactor, and one architectural improvement at a time.

Want to see how vFunction brings architectural intelligence to AI-driven development? Get in touch—we’d love to show you how it works inside modern developer environments and help you bring structure to speed.

Michael Chiaramonte

Principal Architect

Starting from a young age, Michael has loved building things in code and working with tech. His career has spanned many industries, platforms, and projects over the 20+ years he has worked as a software engineer, architect, and leader.