AI Doesn't Write Good Software: The Environment Does

Are we building better software, or just more software?

Generative AI has dramatically reduced the cost of producing code. We can ask it to implement a feature, write a service, generate tests, refactor a module, or scaffold an entire architecture in minutes.

And that sounds incredible.

But there’s an uncomfortable question we should be asking ourselves as an industry: are we building better software, or just more software?

Because if the bottleneck used to be writing code, now it’s something else entirely: understanding it, validating it, maintaining it, and knowing whether it actually fits the system we’re building.

The problem isn’t new

For years, the software industry has been repeating the same mistakes: rushing before understanding, building before validating, leaving tests for later, accepting technical debt as if it were inevitable, confusing speed with progress, designing accidental architectures, and reviewing too late.

The difference is that AI now lets us make those same mistakes at staggering speed.

Before AI, the industry already had a serious problem. CISQ estimated that in 2022, the cost of poor software quality in the United States was approximately $2.41 trillion, and that accumulated technical debt was around $1.52 trillion. AI doesn’t arrive in a healthy, perfectly controlled industry. It arrives in one that was already dragging structural problems.

The point isn’t that AI creates a completely new problem. It’s that it accelerates problems that already existed.

AI doesn’t create new technical debt. It accelerates the technical debt we already knew how to create.

Verification debt

There’s a concept that deserves attention: verification debt. AI can generate code that looks correct, but that code must be verified. If teams accept changes without reviewing them properly, the debt isn’t just in the code — it’s in the validation process itself.

We’ve reduced the cost of producing code, but not necessarily the cost of understanding it, validating it, deploying it, operating it, and maintaining it. The new bottleneck won’t always be writing code. It will be knowing whether that code is correct.

And perhaps the real risk isn’t that AI replaces developers, but that it allows us to automate our lack of judgment.

The thesis

After laying out the problem, the thesis is straightforward:

AI doesn’t write good software on its own. It does so when it works within an environment that provides clear boundaries, reliable feedback, and spaces for human intervention.

Quality doesn’t depend solely on the model. It depends on the workflow where we integrate it. This isn’t a talk against AI. It’s against its naive, uncontrolled, or purely speed-driven use.

A workflow with judgment

If AI accelerates our bad practices, we need a way to introduce judgment before, during, and after code generation. The solution isn’t a specific tool or a closed framework — it’s a workflow:

Discovery → Plan → Review → Implement → Verify

It’s not Scrum for agents. It’s a minimal control structure: adaptable to every team, designed to separate when we want AI to reason, when we want it to propose, when it should execute, and when we need to validate what it has done.

Discovery: are we understanding the problem correctly?

Before implementing, the system should think about the problem. In this phase, we don’t want code. We want AI to help understand the problem, explore alternatives, identify risks, ask questions, surface assumptions, and detect ambiguities.

The human contributes what AI doesn’t have reliably: business context, real constraints, historical decisions, product knowledge, and sensitivity to risk.

The guiding question: are we understanding the problem well before touching the code?

Plan: do we know what we’re going to change and why?

Turn the reasoning into a concrete sequence of steps. The plan should clarify what will change, which files or modules might be affected, what won’t be touched, what tests are needed, and what risks need to be controlled.

The human can accept, correct, or reject the plan before the agent modifies anything.

The guiding question: does this change make sense before we execute it?

Review: does the solution make sense before we build it?

Review before implementing. This isn’t about reviewing code — it’s about reviewing the intent, the approach, the trade-offs, the impact on the system, the architectural coherence, and the maintainability of the solution.

This phase prevents accepting plausible but poorly oriented solutions. It also prevents reviewing too late, when code has already been written and it’s emotionally harder to discard it.

The guiding question: do we agree with the solution before paying the cost of building it?

Implement: is it executing the plan or improvising?

AI implements the change, but no longer blindly. It does so after going through discovery, planning, and review. It stops being a code generation machine and becomes an executor within a decision framework.

The human oversees that AI follows the plan, doesn’t improvise out-of-scope changes, respects the decisions made, and doesn’t introduce unnecessary complexity.

The guiding question: is it executing the plan or improvising?

Verify: what signals do we have that this is correct?

Validate that the change is correct. Verify doesn’t just mean checking that the code compiles. It means gathering enough signals to decide whether the change can be accepted: automated tests, typing, linters, static analysis, continuous integration, diff review, manual testing, functional validation, architecture rules, contracts, observability, security, and updated documentation.

The human decides whether the signals are sufficient and whether the risk is acceptable.

The guiding question: what signals do we have that this is correct?

Human in the Loop: deciding at the right moment

One of the central ideas is that Human in the Loop doesn’t mean simply looking at what AI has done at the end. It means introducing human judgment at the points where it has the most impact: before building, before accepting a solution, before integrating a change, and before taking on technical risk.

The human isn’t there to review every line like an emotional compiler. They’re there to contribute what AI doesn’t have reliably: intention, context, judgment, experience, responsibility, and sensitivity to future impact.

Human in the Loop isn’t reviewing at the end. It’s deciding at the right moment.

The orchestrator, the harnesses, and the human

This workflow can be the foundation of an orchestrator. Not because the orchestrator automates everything, but because it separates responsibilities: when to reason, when to decide, when to execute, when to verify, and when to request human intervention.

Harnesses are mechanisms that allow AI to operate with boundaries, signals, and feedback: tests, types, linters, static analysis, architecture rules, team conventions, living documentation, CI pipelines, security reviews, observability, human validation, API contracts, metrics, feature flags, and staging environments.

An agent without harnesses is speed without direction.

The equation is simple:

The orchestrator provides structure.
The harnesses provide feedback.
The human provides judgment.

All three elements are necessary. None replaces the others.

Flexibility is the key

The workflow must adapt to each team and situation. For a small bug, discovery might last thirty seconds, the plan might be very simple, and verification might be limited to one test and a quick review. For an architecture change, discovery might take a full session, the plan might require several steps, the review might involve multiple people, and verification might include tests, formal review, security, performance, and observability.

The key isn’t to always apply the same weight to each phase, but to consciously decide how much control each change needs.

The beauty of the flow is that it doesn’t impose a process. It imposes a question: at what point do we let AI act, and with what signals do we decide that what it did is acceptable?

Where we’re heading

We don’t want AI to simply go faster. We want it to go faster in the right direction.

The goal isn’t to ask AI to do more things. The goal is to improve how we collaborate with it: not just asking for code, but for reasoning; not accepting changes without a plan; not implementing without reviewing intent; not integrating without signals; not delegating judgment.

AI doesn’t free us from technical responsibility. It makes it more important.

The real challenge isn’t deciding whether we use AI or not. That conversation, in many teams, is already behind us. The challenge is deciding how we use it, with what boundaries, with what quality signals, and with what spaces for human intervention.

Let’s define our harnesses. Let’s make our rules explicit. Let’s protect our standards. Let’s review before accepting. Let’s verify before integrating. And above all, let’s keep technical judgment at the center.

Because the future of software development won’t belong only to those who know how to use a tool better. It will belong to those who know how to create better systems for using it responsibly.

AI doesn’t write good software on its own. The environment does. And that environment is designed by us.

This article is based on the talk “AI Doesn’t Write Good Software: The Environment Does” that I presented at CommitConf 2026.

Resources

Slides

CommitConf 2026 slide deck