How Far Can Generative AI Reduce the Cost of Pure Replacement Modernization?#

I often hear people talk about how to use generative AI in development.

Still, I always feel a slight snag at the entrance to that conversation. There is plenty of talk about how “generative AI is amazing,” but it is hard to tell what is primary information, what is expectation, and what is an actual practical constraint.

In this session, I originally set out to organize development methodologies that make use of generative AI. Terms that sound promising keep appearing: Eval-driven development, Spec-first, Context Engineering, and so on.

But as the conversation continued, my interest gradually shifted to a different question.

Can generative AI make “pure replacement” modernization realistic for legacy systems, even when it had previously been hard to justify on cost-effectiveness grounds?

What I Mean by Pure Replacement#

The word modernization covers a broad range of work.

In real projects, it often includes not only replacing a programming language, but also consolidating systems, changing operational rules, revisiting business processes, and improving the UI. In that case, the investment value is easier to explain, because the project is not merely making something old new again; it is also changing the business value.

This time, though, I was thinking under much narrower assumptions.

  • Do not change the system scope

  • Do not change the external environment either

  • Keep screens, logic, and timing as close as possible to the original

  • Migrate even latent bugs and implicit rules

  • Replace something that ran in COBOL or a similar language with another language

In other words, business improvement is not the main objective. The goal is to stay focused on “running the same thing on a different technical foundation.”

Once I set those assumptions, the usual explanations of modernization start to feel a little weaker.

For example, arguments such as “it is hard because external integration partners change” or “it is hard because operational rules change” are set aside under this premise. The external environment is assumed not to change, and operations are kept as close as possible to the current ones.

Then the question becomes quite simple.

Could generative AI make pure replacement cheaper, even though it has long been considered hard to justify spending money on?

Where Generative AI Seems Likely to Help#

There are certainly parts of this question where expectations seem reasonable.

Generative AI seems likely to be quite useful for code conversion itself. It can read old code, propose an implementation in another language, and create surrounding test scaffolds and documentation. When differences appear, it can also organize candidate causes.

If work that humans used to translate piece by piece can be changed into a workflow where AI drafts and humans review, the cost of the conversion phase may drop significantly.

Up to this point, I think it is fair to be fairly optimistic.

But that does not mean we can say pure replacement has become easy. What came up repeatedly in the session was that building something and explaining that it is the same are two different things.

What Remains at the End Is Equivalence Verification#

The hard part of pure replacement is not building a correct new system. It is confirming that it is correct in the same way as the old system, and wrong in the same way as well.

For example, overflow, rounding, character encodings, initial values, sort order, NULL handling, date boundaries, batch execution order, and return values on exceptions can all shift subtly when the language or runtime foundation changes.

In ordinary new development, such differences might be organized as “new specifications.” In pure replacement, however, the differences themselves become the problem. If old behavior has become a business assumption, then even behavior that is not elegant becomes part of the compatibility target.

This is where equivalence verification matters.

Give the same input to the old and new systems, then compare outputs, logs, side effects, and state transitions. Use historical data or golden data, and when differences appear, investigate their causes. If necessary, decide whether to allow the difference or move the new behavior closer to the old one.

Generative AI can help with this difference analysis. It can add test cases, classify differences, and point out suspicious conversion points.

Even so, humans ultimately decide, “within this range, we consider them the same.” This is not a place where AI can take responsibility on our behalf.

Less “Cannot Reproduce” Than “Cannot Say We Reproduced It”#

This is where my understanding shifted a little through the conversation.

At first, examples such as timing dependencies, human dependencies, external dependencies, and implicit rules came up as things that block modernization. But if we strongly assume pure replacement, we can also think of them as things that should simply be included in the migration target.

If serial execution is required, migrate that mechanism too. If there are latent bugs, include them in the premise. If there are off-screen operations, assume the external environment does not change. Treat integration partners as unchanged as well.

Once we do that, the explanation “it is impossible because we cannot reproduce it” becomes weaker.

The real issue is how far we can prove that we have reproduced it, how much that proof costs, and who accepts accountability for the explanation.

With the arrival of generative AI, the conversion cost of pure replacement may decrease. But responsibility for verification does not disappear automatically.

This framing feels more convincing to me right now.

What Remains When Viewed as a Development Methodology#

This also connects to broader methodologies for using generative AI.

Eval-driven development and Spec-first are said to be important because AI generation has become powerful enough that we need to decide in advance what counts as passing.

In pure replacement modernization, Spec-first is not only about writing new specifications. Rather, it means writing the criteria for judging compatibility.

  • Which inputs and outputs to compare

  • Which side effects to compare

  • Which differences are not acceptable

  • Which differences should be treated as specification updates

  • At what point the replacement can be judged acceptable

If we let AI perform the conversion without these criteria, code appears quickly. But we cannot judge whether it succeeded.

The more speed increases, the more weakness in the judgment criteria matters later. This is one of the risks in using generative AI.

My Current Understanding#

Through this session, my current view is as follows.

Generative AI expands the possibilities for pure replacement modernization. In particular, it can reduce the cost of code conversion, test scaffolding, difference analysis, and documentation.

On the other hand, generative AI does not remove the accountability for explaining that two systems are the same. What remains heavy at the end of pure replacement is not the conversion work itself, but equivalence verification, acceptance gates, and responsibility boundaries.

That is why generative AI feels less like a silver bullet and more like a powerful compressor. It compresses work, increases the material available for verification, and makes the places where humans must judge more visible.

But if we have not decided what counts as the same, that power can become dangerous instead.

If I continue this line of thought, I would like to create a checklist specifically for pure replacement.

  • Compatibility judgment items

  • How to create golden data

  • Rules for classifying differences

  • Acceptance gates

  • A cost-effectiveness explanation for management

Modernization in the age of generative AI cannot be judged only by asking whether AI can perform the conversion.

The real question is how far we can use AI to improve verifiability.

Article Information

Author:

mtakagishi

Publication Date:

2026-06-21